Difference between revisions of "GitHub Copilot"

From SI410
Jump to: navigation, search
(Competing Products and/or Models)
(Comparation with Related Products and Models)
Line 21: Line 21:
 
Until October 2021, GitHub said that there were about 30 percent of new code on its platform had been written with the support of GitHub Copilot.<ref>Yahoo! (n.d.). Ai programming tool copilot helps write up to 30% of code on github. Yahoo! News. Retrieved January 27, 2022, from https://news.yahoo.com/ai-programming-tool-copilot-helps-153003394.html?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAALpM1T7xnc9DESuPjGd-jHOy8tJZPKl0NXQfgqmWbutVv5qspfKUJjgreY3hcRCinYM5yz3NOW7syx4v1pImWFuUhA99mTeb3AUBvWiwChbN9mIbqdl3X_cHBA1BikviAMQAn07FaCm6NbAhu7rakAf8HWSTN1Q46wjMZnUYS8bN</ref>
 
Until October 2021, GitHub said that there were about 30 percent of new code on its platform had been written with the support of GitHub Copilot.<ref>Yahoo! (n.d.). Ai programming tool copilot helps write up to 30% of code on github. Yahoo! News. Retrieved January 27, 2022, from https://news.yahoo.com/ai-programming-tool-copilot-helps-153003394.html?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAALpM1T7xnc9DESuPjGd-jHOy8tJZPKl0NXQfgqmWbutVv5qspfKUJjgreY3hcRCinYM5yz3NOW7syx4v1pImWFuUhA99mTeb3AUBvWiwChbN9mIbqdl3X_cHBA1BikviAMQAn07FaCm6NbAhu7rakAf8HWSTN1Q46wjMZnUYS8bN</ref>
  
==Comparation with Related Products and Models==
+
==Comparation Toward Related Products and Models==
 
===Kite===
 
===Kite===
 
Kite is an AI programing assistant which supports code completions for developers.<ref name="Kite.com">Kite - free AI coding assistant and code auto-complete plugin. Code Faster with Kite. (n.d.). Retrieved January 28, 2022, from https://www.kite.com/ </ref> Kite and GitHub Copilot has been regarded as alternatives toward each other.<ref name="Copilot alters">Ramnani, M. (2022, January 1). Top 8 alternatives to github copilot. Analytics India Magazine. Retrieved January 28, 2022, from https://analyticsindiamag.com/top-8-alternatives-to-github-copilot/</ref> Comparing with GitHub Copilot, Kite currently have been intergrated with more choices of code editors.<ref name="Assistant compare">Software. in 2022. (n.d.). Retrieved January 28, 2022, from https://slashdot.org/software/comparison/GitHub-Copilot-vs-Kite-vs-Tabnine/</ref> The model used by GitHub Copilot is a modified version of GPT-3, while the model used by Kite is GPT-2.<ref name="GitHub copilot"/><ref name="Kite.com"/> The training set used by GitHub Copilot contains more lines of code comparing with Kite.<ref name="GitHub copilot"/><ref name="Kite.com"/>
 
Kite is an AI programing assistant which supports code completions for developers.<ref name="Kite.com">Kite - free AI coding assistant and code auto-complete plugin. Code Faster with Kite. (n.d.). Retrieved January 28, 2022, from https://www.kite.com/ </ref> Kite and GitHub Copilot has been regarded as alternatives toward each other.<ref name="Copilot alters">Ramnani, M. (2022, January 1). Top 8 alternatives to github copilot. Analytics India Magazine. Retrieved January 28, 2022, from https://analyticsindiamag.com/top-8-alternatives-to-github-copilot/</ref> Comparing with GitHub Copilot, Kite currently have been intergrated with more choices of code editors.<ref name="Assistant compare">Software. in 2022. (n.d.). Retrieved January 28, 2022, from https://slashdot.org/software/comparison/GitHub-Copilot-vs-Kite-vs-Tabnine/</ref> The model used by GitHub Copilot is a modified version of GPT-3, while the model used by Kite is GPT-2.<ref name="GitHub copilot"/><ref name="Kite.com"/> The training set used by GitHub Copilot contains more lines of code comparing with Kite.<ref name="GitHub copilot"/><ref name="Kite.com"/>

Revision as of 06:17, 28 January 2022

Back • ↑Topics • ↑Categories

GitHub Copilot is an AI pair programmer developed by GitHub and OpenAI.[1] GitHub Copilot was designed to help users by autocompleting code.[3] GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions.[1] GitHub Copilot is powered by OpenAI Codex, an AI system created by OpenAI.[1] The GitHub Copilot technical preview is available as an extension for Visual Studio Code, Neovim, and the JetBrains suite of IDEs.[1] GitHub Copilot was announced by GitHub on 29 June 2021.[4] The programing languages GitHub Copilot currently supports includes Python, JavaScript, TypeScript, Ruby, Java and Go, but also provides autocompleting functionalities on languages it does not aimed to support.[5]

Although GitHub claimed that their usage of public data inside its training set of Copilot is a "fair use," there is not a settled law that directly allows or forbids the usage in this case.[6]

Technology

GitHub Copilot is powered by a distinct production version of Codex, a GPT language model finetuned on publicly available code from GitHub, and study its Python code-writing capabilities.[7] The idea was generated from the observation that GPT-3, a language model which was not explicitly trained for code generation, can generate simple programs from Python docstrings.[7] The model Codex first be used toward the scenarios of coding, but was expected to be adopted by more fields.

Copilot was trained with a repository that contains public code, made by a network of developers that exist on the GitHub platform.[5]

GitHub have put a few filters in place to prevent Copilot from generating offensive language, but the possibility of producing undesired outputs, including biased, discriminatory, abusive, or offensive outputs still remains.[1][3]

Origin

This project is a result of Microsoft's $1 billion investment into OpenAI, the research firm now led by Y Combinator president Sam Altman.[3]

Accuracy

GitHub benchmarked against a set of Python functions that have test coverage in open source repos. They blanked out the function bodies and asked GitHub Copilot to fill them in. The model got right 43% of the time on the first try, and 57% of the time when allowed 10 attempts.[1]

Achievements

Until October 2021, GitHub said that there were about 30 percent of new code on its platform had been written with the support of GitHub Copilot.[8]

Comparation Toward Related Products and Models

Kite

Kite is an AI programing assistant which supports code completions for developers.[9] Kite and GitHub Copilot has been regarded as alternatives toward each other.[10] Comparing with GitHub Copilot, Kite currently have been intergrated with more choices of code editors.[11] The model used by GitHub Copilot is a modified version of GPT-3, while the model used by Kite is GPT-2.[1][9] The training set used by GitHub Copilot contains more lines of code comparing with Kite.[1][9]

In order to process the suggestion, GitHub Copilot has to upload parts of the code file the user is editing, while GitHub has stated that they would not collect any private code.[1] Kite has stated that it is "fully functional for the most part without an internet connection," and that they would not send any code or any byproducts of the editing code to the cloud.[12] Each of GitHub Copilot and Kite have to send some kinds of information that indicates the interaction between the user and the product, in order to improve the product itself.[1][12]

Tabnine

Tabnine is an AI programing assistant which supports code completions for developers.[13] Tabnine and GitHub Copilot has been regarded as alternatives toward each other.[10] Comparing with GitHub Copilot, Tabnine currently have been intergrated with more choices of code editors.[11] The model used by GitHub Copilot is a modified version of GPT-3, while the model used by Tabnine is GPT-2.[1][13]

GPT-3

GPT-3, or the third generation Generative Pre-trained Transformer, is a neural network machine learning model trained using internet data to generate any type of text. [14] GPT-3 has been used to produce and classify text that are similar to those language natural text produced by human.[15] OpenAI took part in the development of both GPT-3 and Codex, which is used to power GitHub Copilot.[1][14]

Ethical Issues

Copyright

The generation of a model for training an artificial intelligence algorithms always involvs collecting examples with the corresponding type.[16] For Copilot, it is trained on public GitHub repositories of any license, which contains billions of lines of public code, contributed by more than 73 million developers that exist on the GitHub platform.[4][5] GitHub claimed that the model should be analyzing and generating code from the training set, instead of searching.[1] They also admitted that GitHub Copilot sometimes does generate same code from the training set, but this appears mostly when the user haven't provided enough unique code inside the program.[17]

GitHub's CEO Nat Friedman stated that: "In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler."[18]

Legality of Using Public Data to Train Machine Learning Systems

Until June 2021, the US government has not published any official document that directly declares the legality of the usage of publicly avaliable data toward artificial intelligence algorithms in this case, and as a result, the case also has not been tested in court.[4] The Free Software Foundation (FSF) has started to fund a call to examine toward the legal issues of GitHub Copilot.[19] The questions given by the Free Software Foundation involves the concerns of the developers toward whether the usage can be regraded as a "fair use," the concerns of whether the code provided by the output of the program will be related to copyright infringement, the ability of GitHub Copilot to discover the violations of licenses, and so on.

Open Source Code Protecting Mechanisms

Due to the consideration of keeping open sourced code to be protected from being overly used by deep learning models, which was inspired by the discussions upon GitHub Copilot and other similar products that are taking advantage of public data, a prototype called CoProtector have been establish.[20] The researchers indicated that CoProtector "utilizes data poisoning techniques to arm source code repositories for defending against such exploits."[20] The researchers believe that according to the kind of license used by an open sourced project, the code inside the project may not be able to dirctely be used for free, and such kind of behaviors could have chances of causing copyright infringement.[20]

Commercial Product

GitHub Copilot is now free for a limited number of users.[1] GitHub states that: "If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future."[1]

According to 17 U.S.C. 107 - Limitations on exclusive rights: Fair use, one of the factors for fair use is that "the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes."[6] Although the users may use GitHub Copilot as a tool to produce nonprofit products as the output of the program, GitHub Copilot also allows users to use its functionality, as well as its output from the training data set that includes publicly accessable data, in commercial scenarios.[6]

Threat Toward Originality

GitHub stated that for Copilot, "the vast majority of the code that it suggests is uniquely generated and has never been seen before."[1] Besides, they also stated that they are working on a filter in order to keep the cases of replication in track, and decrease the possibility of such cases appearing as well.[1]

However, the question which is still remained uncleared is that: whether GitHub Copilot, or any other AI programing assistants that are based on natural language processing models, is producing new code.[21] GitHub Copilot have not proved that they are generating new code, instead of making different combinations of existing code in order to avoid the restrictions of copyright.[21]

Ownership and Responsibility

According to GitHub, when a user uses GitHub Copilot to support programing, "the code you write with its help, belong to you, and you are responsible for it."[1] Without the statement, the output given by GitHub Copilot may be able to be explained as being owned by GitHub, so that it not only excludes the value of the product for users, but also makes GitHub may be able to be sued due to their product output which is outside of their control.[21]

However, there are cases when GitHub Copilot could produce code with a kind of license and insert it into a project that is not allowed by the license, without being noticed by the user of GitHub Copilot.[22] As a result, if a user is the owner of the produced code, this will lead to a legal issue for the user.[21]

Besides, if the user is the owner of the output of GitHub Copilot, as GitHub is using at least some information about the interaction between a user and the suggested code given by GitHub Copilot in order to train the model of GitHub Copilot, it is possible that Github may have infringed the rights reserved to the user under copyright law.[23]

In addition to this case, the training set used by the model that powers GitHub Copilot also have the possibility of containing code that may cause infringement of copyright, so that it is unclear that whether GitHub Copilot, which is taking advantages from the problematic code, should also be regraded as taking part in the infringement activity.[23]

Concerns From Developers

Within a week after the announcement of Copilot given by GitHub, there are several developers establishing their concerns toward the copyright issue on Twitter.[24] One of the posts, which had earned more than 3000 likes at that time, stated that: "GitHub scraped your code. And they plan to charge you for copilot after you help train it further."[24]

Some of the developers have the concern about the responsibility when violating licenses. Although GitHub has stated that only 0.1% of the code generated by Copilot is reciting, the developers are not sure about whether the remaining 99.9% of code can be regarded as combinations of existing programing projects or not.[25] Besides, if the violation of a coding license happened, it is vague that whether the company that developed Copilot, or the developer who used Copilot, or the company or organization which is gaining benifit from Copilot, should face the legal problem.[25]

The Privacy of Projects After Using Copilot

GitHub stated that: "In order to generate suggestions, GitHub Copilot transmits part of the file you are editing to the service."[1] They also refuted the question of whether private code will be collected by Copilot, but admitted that they will collect the users' choices of whether or not accepting each piece of suggestion given by Copilot.[1] The degree of how close can the collected information be to reveal what the developer is doing inside the private project still remains unclear, and it might have different impact toward projects with different security levels: the developers of open sourced projects can ignore an information exposure at a relatively high level, while as for the developers of military projects have to be careful toward any level of data collections from the external.[25]

Reliability of Code

On the official website of Copilot, GitHub stated that Copilot does not produce perfect codes: "GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense."[1] GitHub also stated that the users are in charge of the code, so that the users have the responsibility of testing, reviewing, and checking the code suggestions given by Copilot.[1]

There are cases where GitHub Copilot could produce untrustable code.[26] The model which was used to power GitHub Copilot was trained over the set of code projects which contains those that were unvetted, so that the training result have the possibility of producing executable but buggy code.[26] In 89 testing scenarios generated by researchers that are "relevant to high-risk cybersecurity weaknesses" including a list of "Top 25" Common Weakness Enumeration given by MITRE, including 1689 programs, around 40% of those completed by GitHub Copilot was found to be vulnerable.[26] However, as GitHub Copilot is a close sourced system which is close to a black box, while the system is powered by a generative model, the output of the same input may be different for each time, so that the experiment result may not be reproducible.

Replacing Developers

In June 2021 after Copilot was first announced, there are people establishing their worry toward the question whether GitHub Copilot, or other AI programming assistants, will be gradually replacing the positions of developers.[27] The model that supports GitHub Copilot works by taking the prompts given by the users as input, and then predict the coding goal while being controlled by the users.[28] In LSE Business Review, Ravi Sawhney wrote that the current version of GitHub Copilot have been proved by examples that it can generate executable code that can match the background content, while whether it is an useful tool for programers to improve their productive still remains not secured.[28] Although GitHub Copilot have already shown the ability of making the bar of becoming a programer lower when it first released, Ravi Sawhney believes that the possibility of it replacing current programers still have not been seen.[28]

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 GitHub copilot · your AI pair programmer. GitHub Copilot. (n.d.). Retrieved January 27, 2022, from https://copilot.github.com/
  2. Sawers, P. (2021, June 29). GitHub launches copilot to power pair programming with ai. VentureBeat. Retrieved January 27, 2022, from https://venturebeat.com/2021/06/29/github-launches-copilot-to-power-pair-programming-with-ai/
  3. 3.0 3.1 3.2 Gershgorn, D. (2021, June 29). GitHub and OpenAI launch a new AI tool that generates its own code. The Verge. Retrieved January 27, 2022, from https://www.theverge.com/2021/6/29/22555777/github-openai-ai-tool-autocomplete-code
  4. 5.0 5.1 5.2 MoneyControl. (n.d.). Explained: Everything you need to know about github copilot. Moneycontrol. Retrieved January 27, 2022, from https://www.moneycontrol.com/news/technology/explained-everything-you-need-to-know-about-github-copilot-7920251.html
  5. 6.0 6.1 6.2 Howard, G. D. (2021). GitHub Copilot: Copyright, Fair Use, Creativity, Transformativity, and Algorithms.
  6. 7.0 7.1 Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  7. Yahoo! (n.d.). Ai programming tool copilot helps write up to 30% of code on github. Yahoo! News. Retrieved January 27, 2022, from https://news.yahoo.com/ai-programming-tool-copilot-helps-153003394.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAALpM1T7xnc9DESuPjGd-jHOy8tJZPKl0NXQfgqmWbutVv5qspfKUJjgreY3hcRCinYM5yz3NOW7syx4v1pImWFuUhA99mTeb3AUBvWiwChbN9mIbqdl3X_cHBA1BikviAMQAn07FaCm6NbAhu7rakAf8HWSTN1Q46wjMZnUYS8bN
  8. 9.0 9.1 9.2 Kite - free AI coding assistant and code auto-complete plugin. Code Faster with Kite. (n.d.). Retrieved January 28, 2022, from https://www.kite.com/
  9. 10.0 10.1 Ramnani, M. (2022, January 1). Top 8 alternatives to github copilot. Analytics India Magazine. Retrieved January 28, 2022, from https://analyticsindiamag.com/top-8-alternatives-to-github-copilot/
  10. 11.0 11.1 Software. in 2022. (n.d.). Retrieved January 28, 2022, from https://slashdot.org/software/comparison/GitHub-Copilot-vs-Kite-vs-Tabnine/
  11. 12.0 12.1 Kite. (n.d.). FAQ. Kite Help Desk. Retrieved January 28, 2022, from https://help.kite.com/article/105-faq
  12. 13.0 13.1 Code faster with AI code completions. Code Faster with AI Code Completions. (n.d.). Retrieved January 28, 2022, from https://www.tabnine.com/
  13. 14.0 14.1 Schmelzer, R. (2021, June 11). What is GPT-3? everything you need to know. SearchEnterpriseAI. Retrieved January 28, 2022, from https://www.techtarget.com/searchenterpriseai/definition/GPT-3
  14. Wikimedia Foundation. (2022, January 27). GPT-3. Wikipedia. Retrieved January 28, 2022, from https://en.wikipedia.org/wiki/GPT-3
  15. Mark A. Lemley and Bryan Casey. (2021, March 20). Fair learning. Texas Law Review. Retrieved January 27, 2022, from https://texaslawreview.org/fair-learning/
  16. Research recitation. GitHub Docs. (n.d.). Retrieved January 27, 2022, from https://docs.github.com/en/github/copilot/research-recitation
  17. In general: (1) training ML systems on public data is fair use (2) the output be...: Hacker news. In general: (1) training ML systems on public data is fair use (2) the output be... | Hacker News. (n.d.). Retrieved January 27, 2022, from https://news.ycombinator.com/item?id=27678354
  18. Krill, P. (2021, August 2). GitHub copilot is 'unacceptable and unjust,' says Free Software Foundation. InfoWorld. Retrieved January 27, 2022, from https://www.infoworld.com/article/3627319/github-copilot-is-unacceptable-and-unjust-says-free-software-foundation.html
  19. 20.0 20.1 20.2 Sun, Z., Du, X., Song, F., Ni, M., & Li, L. (2021). CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning. arXiv preprint arXiv:2110.12925.
  20. 21.0 21.1 21.2 21.3 GitHub copilot and license restrictions. zephyrtronium. (n.d.). Retrieved January 28, 2022, from https://zephyrtronium.github.io/articles/copilot.html
  21. 23.0 23.1 Neil. (2021, June 30). Internet, Telecoms and tech law decoded. decodedlegal Internet telecoms and tech law decoded. Retrieved January 28, 2022, from https://decoded.legal/blog/2021/06/github-copilot-initial-thoughts-from-an-english-law-perspective
  22. 25.0 25.1 25.2 Martins, S. (2021, July 16). 4 concerns I have about github copilot. Medium. Retrieved January 27, 2022, from https://betterprogramming.pub/4-concerns-about-github-copilot-b9214d5416fa
  23. 26.0 26.1 26.2 Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2021). An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions. arXiv preprint arXiv:2108.09293.
  24. Ramel06/30/2021, D. (n.d.). Will AI replace developers? github copilot revives existential threat angst. Visual Studio Magazine. Retrieved January 27, 2022, from https://visualstudiomagazine.com/articles/2021/06/30/github-copilot-comments.aspx
  25. 28.0 28.1 28.2 Sawhney, R. (2021). Can artificial intelligence make software development more productive?. LSE Business Review.