The way technology companies scrape and use copyrighted material to train generative AI tools could be in for a significant change.
The US Copyright Office wrote in a late February letter to Congress that it intends to have finalized by the end of this year a full report on generative AI with “recommendations as to any appropriate legislative or regulatory action” regarding AI. The report will be published in “several sections,” according to a Tuesday blog post from the USCO, all of which will “analyze the impact of AI on copyright.”
The first section will be published this spring and focus on “the use of AI to digitally replicate human artists’ appearances, voices, or other aspects of their identities.” The second section will be released in the summer and “address the copyrightability of works incorporating AI-generated material.”
The USCO will then release sections on two additional areas of concern that have proven to be most contentious: the “training of AI models on copyrighted works” and “any licensing considerations and liability issues” for AI models that produce copyrighted material. Both aspects invoke if and how creators and rights holders should be compensated for content used to train large language models, or LLMs, that make generative AI tools function.
By the time it releases these recommendations, the USCO will have been considering how to act on generative AI for roughly two years. It has also been weighing for months possible changes to US Copyright laws and rules, which make no specific mention of generative AI or related use cases.
The USCO opened a public comment period last summer to seek input on generative AI and its various implications on creative works and copyright. It was extended given the level of interest and eventually the Office received nearly 11,000 comments from people in every state and 66 other countries, along with every major tech company, many large investment firms, authors, actors, video game creators, professional sports leagues, movie studios, and “even a class of middle school students,” according to the letter to Congress.
President Joe Biden’s administration has become more outspoken on generative AI. The administration released an executive order on AI with a number of directives to federal offices, and Biden’s AI advisor Ben Buchanan last month told Business Insider the White House wants to see that “people who create meaningful content are appropriately compensated for it.”
Although generative AI has been around for years, the explosive popularity of OpenAI’s ChatGPT tool launched in late 2022 led to a greater public understanding of how generative AI models are developed through mass scraping every bit of data on the web. Regardless of ownership, scraped content is almost never licensed or paid for, and it’s next to impossible for creators and owners to prevent their work from becoming part of massive data sets used for training.
Several tech companies and investors involved in AI, including Meta and Andreessen Horowitz, told the USCO last year that being required to pay for the huge amount of copyrighted content AI models require would be so expensive that it would make the development of the technology unfeasible.
Warring interests and goals have opened up a growing fight between content creators and tech companies building generative AI. At least six active federal lawsuits were filed last year, where creators and rights holders, from book authors to The New York Times, accuse tech companies of wrongfully using their work. Generally, the tech companies claim they are within their legal rights to use any web data they can find, copyrighted or not, arguing it falls under copyright law’s “fair use” doctrine. Creators and owners largely disagree.
Are you a tech employee or someone with a tip or insight to share? Contact Kali Hays at khays@insider.com or on secure messaging app Signal at 949-280-0267. Reach out using a non-work device.