Copyright issues behind ChatGPT's creation

From SI410
Jump to: navigation, search

ChatGPT(Chat Generative Pre-trained Transformer) is a new chatbot model released by OpenAI, an artificial intelligence research lab, on November 30, 2022. The model uses natural language processing tools powered by artificial intelligence technology. ChatGPT is able to conduct conversations by learning and understanding modern human language, mainly English, and can also interact based on the contextual information of the chat. It performs chatting and communicating behavior truly like a human, and even completes tasks such as writing emails, video scripts, translation, and code under certain scenarios.[1]

To train the model behind ChatGPT, a huge amount of data is collected from the Internet and applied to both supervised and reinforcement machine learning techniques. The answers delivered by ChatGPT, sometimes, are highly similar to the answers online created by human authors. Other times, it summarizes multiple answers, created by human authors, from its training dataset. Whether the creation of ChatGPT is considered to have originality is highly debating. Ethical issues like copyright get more and more attention from the general public.


engadget - OpenAI will soon test a paid version of its hit ChatGPT bot[2]


Copyright

Copyright is a legal framework that affords creators of original works with the exclusive rights to regulate the utilization and dissemination of their productions. The aim of copyright is to foster creativity by offering authors, artists, and other creators incentives to generate new works. Copyright law offers protection to a broad spectrum of works, including literature, music, software, film, photography, and architecture. [3]

Under copyright law, the owner of an original work is granted exclusive control over the following aspects of the work:[4]

  • Reproduction: The right to make copies of the work.
  • Distribution: The right to sell, rent, or otherwise distribute copies of the work.
  • Display: The right to show the work in public.
  • Performance: The right to perform the work in public, such as a play or musical composition.
  • Derivative Works: The right to make adaptations or alterations of the original work.

In many jurisdictions, the moment a work is created and fixed in a tangible medium, such as writing a book or composing a song, it is automatically protected by copyright law. Registering the work with the relevant copyright office provides evidence of ownership and can assist in the enforcement of rights in legal proceedings.[5]

Additionally, copyright law acknowledges certain exceptions and limitations to the copyright owner's exclusive rights. In the United States, the principle of fair use permits limited utilization of copyrighted works without the copyright owner's authorization for specific purposes, such as criticism, commentary, news reporting, education, scholarship, or research.

A fundamental aspect of copyright law is the granting of exclusive control over the use of a work to the copyright owner. Additionally, the law allows for the transfer of these rights, enabling the copyright owner to sell, license, or otherwise transfer the right to utilize the work to a third party. Such transfers are frequently executed through agreements such as publishing or licensing contracts.[6]

International treaties and agreements, including the Berne Convention and the World Intellectual Property Organization (WIPO) Copyright Treaty, also extend copyright protection across multiple countries. This enables creators to secure protection for their works on a worldwide basis.[7]

History

The examination of the relationship between copyright and AI technology is a continuously developing field and it is anticipated that it will continue to evolve as AI technology becomes increasingly prevalent in society.

Mid-20th century

As advancements were made in computer and AI technology, questions were raised about the potential effect on intellectual property laws. Despite the technology being in its early stages, these discussions centered primarily on theoretical concerns.

1980s and 1990s

With the rise in the usage of personal computers and the internet, early AI systems were developed that were capable of producing original works, such as music and poetry. This led to the examination of the question of whether AI-generated content can be considered original and qualify for copyright protection.

Late 1990s to early 2000s

As the advancement and usage of AI systems increased, attention was given to the topic of AI and copyright. Views were divided with some experts considering AI-generated content as not being original and thus not deserving of copyright protection, while others believed AI systems should be recognized as the creators of the works they produced.

Mid-2010s

As advancements in AI technology persisted, several lawsuits relating to copyright and AI were initiated. One such example was the lawsuit filed by a group of photographers against Google in 2014, regarding the use of their images in the company's street view mapping service. Another case was a lawsuit brought by a musician against a music streaming service, Spotify, for the use of his compositions in its playlist recommendations.

Late 2010s to present

The appropriate legal framework for AI and copyright remains a topic of ongoing discussion and debate. While some countries have enacted specific laws to address the issue, a clear consensus has yet to be reached. Meanwhile, organizations and experts are calling for a more nuanced approach that takes into account the unique characteristics of AI and its various applications in generating original works. [8]


The model behind ChatGPT

ChatGPT is a transformer-based language model, a type of artificial intelligence. The model is trained on a vast amount of text data, utilizing artificial neural networks to generate text based on the patterns learned from the text data during its training.[9]

When a prompt or question is provided to ChatGPT, the model processes the text and generates a response. This is achieved by predicting the subsequent word in the sequence based on the preceding words. The model employs a complex mathematical procedure to evaluate the probability of various words being the next word in the sequence, and selects the word with the highest probability.

The "transformer" aspect of the model's name refers to a specific type of neural network architecture that is utilized to process the text data. The transformer architecture is designed to handle sequences of data, making it appropriate for language modeling.


ChatGPT's Strength

ChatGPT is capable of generating text with a high level of fluency and coherence that is similar to human language. It is a useful resource for a wide range of NLP applications.

Question Answering

ChatGPT has the ability to comprehend and provide answers to questions on a diverse array of subjects, such as history, science, and current events. For instance, if a query of "Who was the first president of the United States?" is made, ChatGPT would provide the response of "The first president of the United States was George Washington." [10]

Text Summarization

ChatGPT is capable of generating a brief summary of a longer piece of text, such as an article, a news story, or a research paper. The model can analyze the input text, extract the most relevant information, and condense it into a shortened form that retains the core meaning of the original text.

Conversational Modeling

ChatGPT is capable of generating responses in a conversational manner, making it well-suited for the development of chatbots. The model can understand the context and intent behind user inputs, and generate appropriate and coherent responses. For instance, if the query "How are you today?" is made, ChatGPT could respond with "I am functioning well, thank you for asking. How are you today?" [11]

Text Generation

ChatGPT has the ability to produce new text based on a specified prompt. This can be applied to tasks such as language translation, story writing, or generating responses in a chatbot. The model utilizes its comprehension of language patterns and grammar to generate coherent and diverse text that is consistent with the specified prompt. As an example, if the prompt "Write a short story about a magical world" is provided, ChatGPT could generate a story describing a fantastical place filled with mythical creatures and spells. [12]


ChatGPT's Limitation

ChatGPT, despite being regarded as advanced, still faces certain constraints in its functionality.

Fact checking

ChatGPT is trained on a large amount of text data from the internet, which can include false or inaccurate information. As a result, the model may generate responses that are not entirely accurate. It is important to critically evaluate the information generated by ChatGPT and corroborate it with other sources.

Common sense reasoning

ChatGPT is not designed to have a deep understanding of common sense knowledge and may struggle with tasks that require this kind of understanding. For example, it may generate responses that are logically inconsistent or do not align with real-world expectations. [13]

Ethical considerations

Like all AI models, ChatGPT is not capable of considering ethical considerations when generating text. It may generate responses that are insensitive, inappropriate, or offensive, and it is up to human users to intervene and prevent such responses from being used. [14]

Legal considerations

While ChatGPT is equipped to differentiate between requests that are appropriate and those that are not, it still has the ability to process requests that fall outside of the parameters set by OpenAI. Some users have found ways to circumvent the established principles for processing requests.[15]

Question Answering

While ChatGPT can generate coherent and consistent responses for general conversations, it may still face difficulty in comprehending questions that are expressed in specific ways, which necessitates rephrasing for accurate understanding.

Polarized views

With all of its impressive creations and limitations, ChatGPT has received much attention. After it was posted for public testing on 30th November 2022, within the 1st week of its launch, ChatGPT has reached 1 million users.[16]

The question of who holds the copyright for the output produced by AI systems like ChatGPT is a complex matter, with varying perspectives. There are varying opinions regarding the legality of the technology and its potential for violating copyright laws.

One perspective is that AI systems like ChatGPT lack the capacity to hold a copyright, as they are not human and therefore do not possess the legal right to intellectual property. According to this viewpoint, the copyright for the AI's output would belong to the human creator or owner of the system, such as OpenAI in the case of ChatGPT.

Alternatively, there are those who argue that the output generated by AI systems like ChatGPT can be seen as a form of original expression and that the AI system should be granted copyright protection. Those in favor of this viewpoint argue that AI systems like ChatGPT have the ability to generate distinctive and innovative output that is not merely a reflection of the training data. Thus, this output should be protected by copyright laws.

The issue of copyright ownership for outputs created by AI remains unresolved in many countries and continues to be a subject of discussion. Laws and regulations related to AI and copyright can differ depending on the jurisdiction and the specific circumstances. Consulting a legal expert in accordance with specific needs and circumstances is a common course of action. [17]


Issues behind copyright

1. Can you copyright the output of a generative AI model, and if so, who owns it?

Regarding intellectual property, Bern Elliot, analyst at Gartner, states that the model for ChatGPT "is trained on a corpus of creative works and it is yet unknown what the legal precedent may be for reuse of this content, assuming it was formed from the intellectual property of other human creators."[18]

Authorship belongs to non-humans(ChatGPT)

In the current legal framework of the United States, it is generally accepted that non-human entities, such as ChatGPT, cannot hold authorship rights for works they generate.[19] Copyright protection under current U.S. law requires that a work must be the result of original and creative authorship by a human author. However, there may be instances where the question of AI-generated content and authorship arises, and these cases may be addressed through the appeal of a Copyright Office registration denial or through legal action after a failure to register copyrights with the Copyright Office.

In either case, the legislative history of the necessity for human authorship and later legal decisions upholding the requirement will be heavily debated.

Authorship belongs to humans

For generative AI in general, the ownership of their creation is likely to have three results.[20]

  1. a work that became public domain as soon as it was created
  2. a work that is derived from the resources the AI tool was trained on. Who owns the dataset used to train the AI tool and the degree of similarity between any given work in the training dataset and the AI work are two common factors that affect the ownership of the derived work.
  3. a work considered as an innovative creation of the human who is directing the AI.

The 1st and 2nd approach can be applied to the copyright issues of ChatGPT's creation. However, the 3rd also requires a clear measurement of the level of human dedication along with the help of AI in generating work. In the case of ChatGPT, the human operator only has limited dedication in the creation process. Therefore, the 3rd approach is usually not applicable.


2. Commercial use of the output of a generative AI model

The utilization of content produced by ChatGPT for commercial purposes requires obtaining the necessary permissions and licenses. ChatGPT, a large language model developed by OpenAI, generates text based on the context of an interaction and the responses generated may vary in accordance with the input received. [21] A license from OpenAI or the relevant rights holders may be required to utilize the content generated by ChatGPT for commercial purposes, which can depend on the specific circumstances of the use case. Obtaining the necessary permissions and licenses prior to utilizing any content for commercial purposes is the standard procedure in such cases.

The responsibility of obtaining a license from OpenAI for commercial use of the content generated by ChatGPT remains with the user, and it remains questionable. Additionally, in cases where ChatGPT is utilized to condense a copyrighted work(such as translating a book in English to another language), it raises questions regarding the need for obtaining paid permission from the author or publisher. The future reaction of OpenAI and relevant third parties to potential commercial use of ChatGPT remains uncertain.[22]

Potential Solution

It appears that the copyright infringement has already occurred for many creators. However, companies who developed those generative AI do proposing fresh strategies to solve copyright issues related to their generative AI for the future. Dataset, where every collection within it belongs to the public domain, has been created and used for AI training in response to the copyright infringement. 

"The Stack," a dataset for AI training created explicitly to avoid claims of copyright infringement, is an example for that approach. When it permits, the dataset only includes open-source licensing. For the parts where the ownership is not explicitly mentioned, it traces back to the issuer of those sources and asks for permission before using it. When there is any change in the ownership of sources after "The Stack" claims them, developers have easy access to remove those sources on request.[23] According to its creators, this model could be used throughout the industry as a solid way to solve copyright issues related to generative AI.


References

  1. What is CHATGPT and why does it matter? here's everything you need to know. ZDNET. (n.d.). Retrieved January 27, 2023, from https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-need-to-know/
  2. Fingas, J. (2023, January 11). OpenAI will soon test a paid version of its hit Chatgpt Bot. Engadget. Retrieved February 11, 2023, from https://www.engadget.com/openai-chatgpt-professional-paid-chatbot-143004442.html
  3. Stim, Rich (27 March 2013). ["Copyright Basics FAQ"](https://fairuse.stanford.edu/overview/faqs/copyright-basics/). The Center for Internet and Society Fair Use Project. Stanford University. Retrieved 21 July 2019.
  4. Stokes, S. (n.d.). Art and copyright. Google Books. Retrieved February 11, 2023, from https://books.google.com/books?id=h-XBqKIryaQC&as_brr=3
  5. Service unavailable. GOV.UK. (n.d.). Retrieved February 11, 2023, from https://www.ipo.gov.uk/copy/c-claim/c-register.htm
  6. Yu, P. K. (n.d.). Intellectual property and information wealth: Issues and practices in the Digital age, volume 1. Google Books. Retrieved February 11, 2023, from https://books.google.com/books/about/Intellectual_Property_and_Information_We.html?id=bnW8ypT9_pIC
  7. MacQueen, H. L. (n.d.). Contemporary intellectual property: Law and policy. Google Books. Retrieved February 11, 2023, from https://books.google.com/books?id=_Iwcn4pT0OoC
  8. Roose, K. (2022, December 5). The brilliance and weirdness of chatgpt. The New York Times. Retrieved January 27, 2023, from https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html
  9. Fingas, J. (2023, January 11). OpenAI will soon test a paid version of its hit Chatgpt Bot. Engadget. Retrieved February 11, 2023, from https://www.engadget.com/openai-chatgpt-professional-paid-chatbot-143004442.html
  10. Fingas, J. (2023, January 11). OpenAI will soon test a paid version of its hit Chatgpt Bot. Engadget. Retrieved February 11, 2023, from https://www.engadget.com/openai-chatgpt-professional-paid-chatbot-143004442.html
  11. OpenAI's CHATGPT is scary good at my job, but it can't replace me (yet). ZDNET. (n.d.). Retrieved February 11, 2023, from https://www.zdnet.com/article/openais-chatgpt-is-scary-good-at-my-job-but-it-cant-replace-me-yet/
  12. OpenAI's CHATGPT is scary good at my job, but it can't replace me (yet). ZDNET. (n.d.). Retrieved February 11, 2023, from https://www.zdnet.com/article/openais-chatgpt-is-scary-good-at-my-job-but-it-cant-replace-me-yet/
  13. Chatgpt: Threat or menace?: Inside higher ed. Higher Ed Gamma. (n.d.). Retrieved February 11, 2023, from https://www.insidehighered.com/blogs/higher-ed-gamma/chatgpt-threat-or-menace
  14. Bogost, I. (2022, December 16). CHATGPT is dumber than you think. The Atlantic. Retrieved February 11, 2023, from https://www.theatlantic.com/technology/archive/2022/12/chatgpt-openai-artificial-intelligence-writing-ethics/672386/
  15. What is CHATGPT and why does it matter? here's everything you need to know. ZDNET. (n.d.). Retrieved January 27, 2023, from [1](https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-need-to-know/)
  16. Ruby, D., & About The Author Daniel Ruby Content writer with 10+ years of experience. I write across a range of subjects. (2023, January 2). CHATGPT statistics for 2023: Comprehensive facts and data. Demand Sage. Retrieved January 27, 2023, from https://www.demandsage.com/chatgpt-statistics/
  17. Hillemann, D., & Zimprich, S. (2022, December 9). Chatgpt - legal challenges, legal opportunities. Fieldfisher. Retrieved February 11, 2023, from https://www.fieldfisher.com/en/insights/chatgpt-legal-challenges-legal-opportunities
  18. Why is ChatGPT making waves in the AI market? Gartner. (n.d.). Retrieved January 29, 2023, from https://www.gartner.com/en/newsroom/press-releases/2022-12-08-why-is-chatgpt-making-waves-in-the-ai-market
  19. Vincent, J. (2022, November 15). The scary truth about AI copyright is nobody knows what will happen next. The Verge. Retrieved January 27, 2023, from https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data
  20. McKendrick, J. (2022, December 26). Who ultimately owns content generated by CHATGPT and other AI platforms? Forbes. Retrieved January 29, 2023, from https://www.forbes.com/sites/joemckendrick/2022/12/21/who-ultimately-owns-content-generated-by-chatgpt-and-other-ai-platforms/?sh=7205359e5423
  21. Loafars. (2023, January 27). Is chat GPT free for commercial use? Chat GPT Pro. Retrieved February 11, 2023, from https://opchatgptai.com/is-chat-gpt-free-for-commercial-use/
  22. Hillemann, D., & Zimprich, S. (2022, December 9). Chatgpt - legal challenges, legal opportunities. Fieldfisher. Retrieved February 11, 2023, from https://www.fieldfisher.com/en/insights/chatgpt-legal-challenges-legal-opportunities
  23. Vincent, J. (2022, November 15). The scary truth about AI copyright is nobody knows what will happen next. The Verge. Retrieved January 27, 2023, from https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data