"Please don't upload my code on GitHub!"

65d899ff13acd5266ece89eda96007c2.gif

Organize | Zheng Liyuan

Listing | CSDN (ID: CSDNnews)

For most programmers, GitHub is a magical open source community: there are rich learning materials, famous project codes, novice Xiaobai can also communicate directly with programming masters, and helping others "fill holes" can also improve themselves...

Therefore, it may be hard for anyone to expect that one day GitHub will become the existence of developer resistance - an article written by a developer (hereinafter referred to as "T" ) was on the Hacker News hot list today, with the title: " Please don't upload my code on GitHub!"

36358f75cb57912e4508b3385aea4b0a.png

e5971dfe0783044e5984b08839ddf022.png

 "Culprit": Copilot

To be honest, seeing this title, I believe many people's first reaction is: What's wrong with GitHub, is there any problem? In this regard, T pointed out "straight to the point" in the article: "GitHub has many problems, the most noteworthy of which is a function called Copilot."

That's right, Copilot, regarded by programmers as an "AI coding artifact", is the "culprit".

According to the official introduction, GitHub Copilot is an AI pairing programmer, powered by Codex, a generated pre-trained AI model created by OpenAI: "It can help you write code faster, reduce workload, extract context from comments and code, and make instant suggestions Individual lines of code and entire functions."

Sounds smart and efficient, i.e. AI can help generate code - but the question is, how did Copilot learn to code?

▶ “GitHub Copilot is trained on billions of lines of code to turn natural language prompts into coding suggestions in dozens of languages.”

▶ “The OpenAI Codex was trained on open-source code and natural language, so it’s applicable to both programming and human language... trained on public natural language text and source code, including code in public repositories on GitHub.”

To put it simply, GitHub Copilot will indeed generate some suitable code, but at its root, its source may be code written by others-and this obviously involves code copyright issues.

e9f3b607743f0ef085e49be23efcc480.png

Bypass the GPL agreement?

One thing needs to be clear: open source code does not mean that we can freely use these source codes to do whatever we want, so many different open source agreements (also called open source licenses/open source licenses, such as GPL/LGPL, etc.) , in order to stipulate the scope and rights of free use of open source code.

Take the most famous open source agreement GPL as an example, its requirement: as long as the code of the GPL agreement is used in a piece of software, the software product must also adopt the GPL agreement, which must be open source and free.

Here's the problem, then: Many of the billions of lines of open-source code that Copilot learns from, as well as code in public repositories on GitHub, are under the GPL. When Copilot generates code snippets, it does not display the information of the author of the original code, nor does it remind related open source agreements

This will result in part of the code following the GPL agreement being written into some proprietary or commercial projects that are not open source-not only violating the terms of the license, but also infringing on the intellectual property rights of the original code author.

Therefore, as an open source developer, T appealed in the article: "We are tired of this legal abuse, and we want to stop now! That's why we ask you, as other developers in the open source community, not to GitHub uploads our code. In short, we want to protect our work."

8282301ff05d76d0c9d81494b42f46d9.png

The Copilot Controversy Continues

With the AIGC upsurge in recent months, generative AI such as MidJourney, Stable Diffusion, and Copilot have attracted attention, but at the same time, copyright issues of various AI achievements have also been pushed to the forefront. In fact, the controversy about Copilot has been going on since its release, and most of them revolve around: Is the code generated by Copilot native or copied?

It is understood that as early as less than a week after Copilot was released, some developers discovered the real hammer of Copilot's "copy code": the code recommended by Copilot also had the original code "WTF" comment.

ae00072361d1db574f2d673091449c4b.png

This matter once caused a lot of discussion in the circle. At that time, some developers said that because Copilot seriously violated the rights of copyright owners, they would no longer use GitHub in the future:

0907f92c4c52cdf0331a7097023b1192.png

Due to the excessive fermentation of the matter, Nat Friedman, who is also the CEO of GitHub in 2021, responded on Twitter:

In general: (1) training ML systems on public data is a legitimate use (2) the output belongs to the operator, just like a compiler.

We expect IP and AI to be an interesting policy discussion around the world in the coming years, and we're eager to be a part of it!

cbcb9b96df794777e78275b380451ac1.png

However, since then, Microsoft and GitHub have not made any relevant handling or statement on the copyright and open source license disputes of the code generated by Copilot.

Nowadays, "Copilot will bypass the GPL agreement", "Copilot will turn open source code into commercial works" and other statements are intensified, so as a developer, what do you think about this?

Reference link:

https://nogithub.codeberg.page/

https://news.ycombinator.com/item?id=35859142

https://twitter.com/natfriedman/status/1409914420579344385

b5fc0193bc47003aac9293d26bf9e887.gif

Guess you like

Origin blog.csdn.net/FL63Zv9Zou86950w/article/details/130633250