🇺🇸TRAIN Act: U.S. Congress Moves Toward Mandatory AI Training Transparency Bipartisan lawmakers have introduced the Transparency and Responsibility for Artificial Intelligence Networks (TRAIN) Act in the U.S. House, aiming to give copyright holders access to AI training records to determine whether their works were used to train generative AI models without consent or compensation. The bill, led by Rep. Madeleine Dean (PA-04) and Rep. Nathaniel Moran (TX-01), follows a Senate version reintroduced by Senators Peter Welch, Marsha Blackburn, Adam Schiff, and Josh Hawley. This is the first time the TRAIN Act has been introduced in the House. The proposal is modeled on enforcement mechanisms used in online piracy cases and responds to the current lack of any clear process for creators to verify whether their content was ingested into training datasets. The bill has support from major creator and rights-holder organizations, including the Recording Industry Association of America (RIAA) and SAG-AFTRA, alongside groups representing musicians, publishers, and copyright licensing. If enacted, the TRAIN Act would shift AI copyright disputes from speculation to evidence by establishing a legal path to training-data disclosure. It would also add pressure on AI companies that do not currently reveal how their models are trained. #AIandLaw#Copyright#TrainingData#Transparency
静态网站悖论 个人网站的两种不同实现方式:一种是复杂的内容管理系统(CMS),另一种是简单的静态 HTML 文件。文章指出,尽管大多数普通用户倾向于使用复杂的解决方案(如 WordPress),但实际上,只有少数专业软件工程师能够选择更简单的静态网站。 via HackerNews 2024 10 09 前两天刚好听朋友说 square space 已经涨到了近乎搞笑的 $25 月费,做不用来盈利的个人博客实在难以 justify。这篇文章中吐槽得很在点子上: normal users are stuck with a bunch of greedy clowns that make them pay for every little thing, all while wasting ungodly amounts of computational power to render what could have been a static website in 99% of cases. 普通用户被困在了一群屁大点功能都要收费的贪婪小丑手里,与此同时浪费着人神共愤额度的算力来渲染 99% 的情况下都可以作为静态的网站。 当然原文中说的“只有少数专业软件工程师才能选择更简单的静态网站”略微夸张并不认同,因为静态站至少是比 self-host 的动态 CMS 少太多维护了。我的 backlog 里也一直躺了篇安利新手用静态站并拉踩 WP 的文,不过网上这种文已经有无数了也还是拦不住前赴后继往各种 CMS 的坑里冲的新手,觉得写了又有什么意义呢就还搁着没写。(当然迟早会像以前反复造的无数轮子一样被废话欲战胜的 but not today) #indieblog#newletter
Hashtags
找到 4 条相似帖子
搜索 #trainingdata
🇪🇺📖Study Finds Limited Availability of AI Training Data Disclosures Under EU AI Act Researchers from Trinity College Dublin report that information about AI training data required under the AI Act is often missing and difficult to locate. The law requires developers to publish summaries explaining how their models were trained, using a disclosure template designed to help copyright holders enforce their rights regarding the use of copyrighted material in AI training. A pre-print study funded by Mozilla found that only a small number of such summaries could be identified. The researchers also found structural issues in accessing the disclosures. The AI Act does not specify where companies must publish the summaries, leaving the decision to developers. As a result, no common publication mechanism exists and practices vary widely. The template created by the European Commission AI Office has led to heterogeneous implementations, making it difficult to determine whether the available documents meet EU transparency requirements. Most of the identified disclosures were produced by smaller organizations, including documentation for Switzerland’s Apertus national model. A document published by Microsoft for one of its open-source models was also reviewed, but the study found that it lacked several required details. Researchers recommend creating a centralized portal for publishing transparency summaries to improve accessibility and support enforcement once the AI Act obligations become applicable in August. #AIAct#AITransparency#TrainingData#Copyright#AIGovernance#AIRegulation#EULaw
@venturevillagewall · Post #3551 · 2024/12/20 09:32
Fraction AI Raises $6M Fraction AI successfully secured $6M in funding for its groundbreaking project aimed at democratizing access to high-quality training data for artificial intelligence using Web3 technology. The funding round concluded on December 18, 2024. #FractionAI#Funding#AI#Web3#TrainingData#TechInvestment#Innovation#DataDemocratization
🇺🇸Court Allows Enforcement of California AI Training Data Disclosure Law A US federal court has denied a request by Elon Musk’s AI company xAI to block enforcement of California Assembly Bill 2013. The law requires AI developers whose models are accessible in California to publicly disclose key information about training datasets, including dataset sources, collection timelines, whether collection is ongoing, and whether datasets contain copyrighted, trademarked, patented, or personal data. Companies must also indicate whether training data was licensed or purchased and the extent of synthetic data used. xAI argued the law would force disclosure of trade secrets, including dataset sources, dataset sizes, and data-cleaning methods. According to the company, such transparency could allow competitors to infer what datasets it uses and replicate its approach. The company warned that compliance could be “economically devastating” and reduce the value of its proprietary data practices. However, US District Judge Jesus Bernal ruled that xAI failed to demonstrate that the law requires disclosure of protected trade secrets. The court found the company’s claims too general and based largely on hypotheticals. The motion for a preliminary injunction was denied, allowing the law—which took effect in January—to remain in force while the lawsuit continues. #AIRegulation#AITransparency#TrainingData#TradeSecrets#AIAct#AIGovernance#TechLaw