As conversational AI like ChatGPT becomes more advanced and widely used, an important question arises: Who owns the text and other content these systems generate? This issue is complex, with reasonable arguments on both sides.
On one hand, the text generated by these systems is original work produced autonomously by the AI. The AI does the creative work, so perhaps it could be argued that the AI system itself owns the output. However, under current law AIs lack personhood and the ability to own property. The ownership must reside with a legal entity.
This leads to the argument that the company who created the AI owns its output. Companies like OpenAI, Google, Anthropic and others invest significant resources into developing, training and running these models. Without their efforts, the AI could not function or produce any output. Under this view, the companies own the economic rights to the AI’s output, as they own the means of production.
(Recently Reddit Licenses User Content to AI Company)
However, language models are often trained on massive data sets scraped from the internet. Most of the training data is written by humans who do not work for the AI companies. This suggests the output incorporates ideas and expressions from a diverse group, and perhaps the public should maintain some ownership stake. There are also concerns that over-privatization of AI-generated content could limit public access to beneficial technology.
Overall there are good-faith positions on multiple sides of this issue. Reasonable people can disagree on what the ideal system looks like. As AI capabilities advance, we as a society must reflect carefully on how to properly allocate rights to both fuel innovation and serve the public interest. The optimal system will likely recognize interests of AI developers, training data creators, and the general public. But finding the right balance is challenging. Constructive, nuanced policy is needed to establish ownership frameworks that are fair and benefit society.
Potential Approaches
There are a few potential approaches that could balance the various stakeholder interests regarding AI output ownership:
- A royalty system that compensates the creators of source training material when their work is used to generate profits. This could allow the public to benefit from commercial applications.
- Open sourcing certain foundational models to allow public usage while companies can still monetize proprietary improvements and custom models.
- Government licensing models that enable companies to operate AI services in exchange for obligations to act responsibly and serve the public interest.
- Limiting copyright over any raw AI output that is highly derivative of source training material, while protecting company interests regarding underlying software.
- Shared public-private data infrastructure that allows companies to build innovations with publicly-owned open data.
Each model has merits and drawbacks. Hybrid approaches may emerge combining elements of public and private ownership. As we determine how to regulate this technology, we should remain cognizant of impacts on both innovation and the common good. Responsible governance of AI will likely require compromise and creativity.
There are still more facets to this issue as well. As we navigate this complex terrain, we must proceed with care, wisdom and concern for how these systems will shape our collective future. An open and thoughtful discourse on these issues is critical.