Difference between revisions of "Copyright issues behind ChatGPT's creation"

From SI410
Jump to: navigation, search
(Questions need to be answered)
(1. Can you copyright the output of a generative AI model, and if so, who owns it?)
Line 32: Line 32:
  
 
In either case, the legislative history of the necessity for human authorship and later legal decisions upholding the requirement will be heavily debated.
 
In either case, the legislative history of the necessity for human authorship and later legal decisions upholding the requirement will be heavily debated.
 +
 +
 +
====Authorship belongs to humans====
 +
For generative AI in general, the ownership of their creation is likely to have three results.
 +
 +
(1) a work that became public domain as soon as it was created
 +
(2) a work that is derived from the resources the AI tool was trained on. Who owns the dataset used to train the AI tool and the degree of similarity between any given work in the training dataset and the AI work are two common factors that affect the ownership of the derived work.
 +
(3) a work considered as a innovative creation of the human who is directing the AI.
 +
 +
The 1st and 2nd approach can be applied to the copyright issues of ChatGPT's creation. However, the 3rd also requires a clear measurement of the level of human dedication along with the help of AI in generating work. In the case of ChatGPT, the human operator only has limited dedication in the creation process. Therefore, the 3rd approach is usually not applicable.
  
 
===2. if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates?===   
 
===2. if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates?===   

Revision as of 19:43, 29 January 2023

ChatGPT(Chat Generative Pre-trained Transformer) is a new chatbot model released by OpenAI, an artificial intelligence research lab, on November 30, 2022. The model uses natural language processing tools powered by artificial intelligence technology. ChatGPT is able to conduct conversations by learning and understanding modern human language, mainly English, and can also interact based on the contextual information of the chat. It performs chatting and communicating behavior truly like a human, and even completes tasks as writing emails, video scripts, translation, and code under certain scenarios.[1]

To train the model behind ChatGPT, a huge amount of data is collected from the Internet and applied to both supervised and reinforcement machine learning techniques. The answers delivered by ChatGPT, sometimes, are highly similar to the answers online created by human authors. Other times, it summarizes multiple answers, created by human authors, from its training dataset. Whether the creation of ChatGPT is considered to have originality is highly debating. Ethical issues like copyright get more and more attention from the general public.


Copyright

Copyright refers to the ownership of a creative work. Issues of copyright are mainly related to the use, distribution and protection of creative works. Creative works can be with formats in literary, artistic, educational or musical background. Copyright is intended to protect the originality of the idea created by the author with the form of a creative work, not the idea itself.[2]


History

In the past, generative AI would not rise copyright issues. Back to 2010s, most of the AI models were still under development and had a lot of problems generating works. Their creation is far below the the human level either in complexity or in aesthetics. Models could only generate blurry artworks with black-and-white faces. Chatbots were far behind the maturity of conducting regular conversation.

However, with a series of responses deliberately picked from the best responses of generative AI, an illusion of what AI model could do impressed the general public. Inspired by modern science fictions and other medias, rumors on AI threats human beings soon caught people's attention. That being said, the generative AI was still harmless to human content creators, even though with narrow and well-defined tasks, they could generate some results. [3]


ChatGPT's Limitation

Although ChatGPT appears to be quite remarkable, it still has limitations. These restrictions include the inability to respond to questions that are phrased in a particular way since it requires rephrasing in order to comprehend the question from the the conversational background.[4] Though ChatGPT can tell the difference between "appropriate" and "inappropriate" requests, it can still process "inappropriate request", which is not like OpenAI designed it to be. Users have found ways around pre-set principles of processing requests. "inappropriate requests", like generating instructions for illegal activities, can still be made by rephrasing the request as a hypothetical though experiment.

Another significant drawback is the poor quality of the replies it provides, which occasionally seem reasonable but are overly vague and unpractical. When it encounters confusing words, ChatGPT tends to make assumptions about how to interpret those words instead of asking the user for further clarification. This interactive behavior often results in a confusion to its users.

Polarized views

With all of its impressive creations and limitations, ChatGPT has received many attentions. After it was posted for public testing on 30th November 2022, within the 1st week of its launch, ChatGPT has reached 1 million users.[5] From one side, people think these technologies were undoubtedly capable of violating copyright laws, and they would soon be subject to major legal repercussions. Others said the reverse, with similar assurance: that everything taking place in the realm of generative AI is legal and above board, and any legal actions are bound to fail.

Questions need to be answered

1. Can you copyright the output of a generative AI model, and if so, who owns it?

Regarding intellectual property, Bern Elliot, analyst at Gartner, states that the model for ChatGPT "is trained on a corpus of creative works and it is yet unknown what the legal precedent may be for reuse of this content, assuming it was formed from the intellectual property of other human creators."[6]

Authorship belongs to non-humans(ChatGPT)

In general, it is not acceptable for non-humans, like ChatGPT, to claim authorship. In the US, there is no copyright protection for works generated solely by a machine. [7] For a work to enjoy copyright protection under current U.S. law, “the work must be the result of original and creative authorship by a human author.“[8]If there is an ongoing copyright dispute over AI-generated content, one way to dispute the requirement of human authorship is to either appeal a Copyright Office registration denial or pursue an infringer after failing to register copyrights with the Copyright Office. 

In either case, the legislative history of the necessity for human authorship and later legal decisions upholding the requirement will be heavily debated.


Authorship belongs to humans

For generative AI in general, the ownership of their creation is likely to have three results.

(1) a work that became public domain as soon as it was created (2) a work that is derived from the resources the AI tool was trained on. Who owns the dataset used to train the AI tool and the degree of similarity between any given work in the training dataset and the AI work are two common factors that affect the ownership of the derived work. (3) a work considered as a innovative creation of the human who is directing the AI.

The 1st and 2nd approach can be applied to the copyright issues of ChatGPT's creation. However, the 3rd also requires a clear measurement of the level of human dedication along with the help of AI in generating work. In the case of ChatGPT, the human operator only has limited dedication in the creation process. Therefore, the 3rd approach is usually not applicable.

2. if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates?

When deciding if something is fair use, there are a number of considerations, explains Daniel Gervais, a professor at Vanderbilt Law School who specializes in intellectual property law and has written extensively on how this intersects with AI. Two factors, though, have “much, much more prominence,” he says. “What’s the purpose or nature of the use and what’s the impact on the market.” In other words: does the use-case change the nature of the material in some way (usually described as a “transformative” use), and does it threaten the livelihood of the original creator by competing with their works?

3. What kind of legal restraints could — or should — be put in place on data collection? In other words, can there be peace between the people building these systems and those whose data is needed to create them?

Potential Solution

For many creators, it seems the damage has already been done. But AI startups are at least suggesting new approaches for the future. One obvious step forward is for AI researchers to simply create databases where there is no possibility of copyright infringement — either because the material has been properly licensed or because it’s been created for the specific purpose of AI training. One such example is “The Stack” — a dataset for training AI designed to specifically avoid accusations of copyright infringement. It includes only code with the most permissive possible open-source licensing and offers developers an easy way to remove their data on request. Its creators say their model could be used throughout the industry.[9]


Still working on it (from Daniel Wang)

References

  1. What is CHATGPT and why does it matter? here's everything you need to know. ZDNET. (n.d.). Retrieved January 27, 2023, from https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-need-to-know/
  2. Stim, Rich (27 March 2013). ["Copyright Basics FAQ"](https://fairuse.stanford.edu/overview/faqs/copyright-basics/). The Center for Internet and Society Fair Use Project. Stanford University. Retrieved 21 July 2019.
  3. Roose, K. (2022, December 5). _The brilliance and weirdness of chatgpt_. The New York Times. Retrieved January 27, 2023, from https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html
  4. What is CHATGPT and why does it matter? here's everything you need to know. ZDNET. (n.d.). Retrieved January 27, 2023, from [1](https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-need-to-know/)
  5. Ruby, D., & About The Author Daniel Ruby Content writer with 10+ years of experience. I write across a range of subjects. (2023, January 2). CHATGPT statistics for 2023: Comprehensive facts and data. Demand Sage. Retrieved January 27, 2023, from https://www.demandsage.com/chatgpt-statistics/
  6. Why is ChatGPT making waves in the AI market? Gartner. (n.d.). Retrieved January 29, 2023, from https://www.gartner.com/en/newsroom/press-releases/2022-12-08-why-is-chatgpt-making-waves-in-the-ai-market
  7. Vincent, J. (2022, November 15). The scary truth about AI copyright is nobody knows what will happen next. The Verge. Retrieved January 27, 2023, from https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data
  8. McKendrick, J. (2022, December 26). Who ultimately owns content generated by CHATGPT and other AI platforms? Forbes. Retrieved January 29, 2023, from https://www.forbes.com/sites/joemckendrick/2022/12/21/who-ultimately-owns-content-generated-by-chatgpt-and-other-ai-platforms/?sh=7205359e5423
  9. Vincent, J. (2022, November 15). The scary truth about AI copyright is nobody knows what will happen next. The Verge. Retrieved January 27, 2023, from https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data