OpenAI is subject to a class action lawsuit! Steal millions of user information? The star model becomes a "data thief"!

   "Despite having an agreement to purchase and use personal information, the defendants took a different approach: stealing." Recently, a law firm took OpenAI to court in a 157-page lawsuit, alleging that it was motivated by profit. Under this circumstance, a large amount of personal information is stolen to train artificial intelligence models.

   The indictment said that OpenAI's data grabbing scale was unprecedented. The company stole about 300 billion words of content from the Internet, including books, articles, websites and posts, and even personal information without consent. The data theft, with estimated victims in the millions and potential losses of $3 billion, violated the terms of the service agreement as well as state and federal privacy and property laws.

   “By harvesting previously obscure personal data of millions of people and appropriating it to develop unstable, untested technology, OpenAI puts everyone at immeasurable risk, regardless of any responsible Data protection and usage measures are unacceptable," said Timothy K. Giordano, a partner at the law firm.

   Therefore, the plaintiff's request is to ask the court to temporarily freeze the commercial access and further development of OpenAI products. This includes allowing people to opt out of data collection and preventing their products from surpassing human intelligence and causing harm to others. In addition to OpenAI, Microsoft, its main supporter, is also listed as a defendant.

   OpenAI isn't the only company using the internet to fetch vast amounts of data to train AI models; Google, Meta, Microsoft, and a growing number of others are doing the same. But a partner of the law firm said that they decided to pursue OpenAI because last year OpenAI stimulated bigger competitors to launch their own artificial intelligence products through ChatGPT, so they were naturally the first target.

   As data-based large models flourish, data security issues are becoming more and more important. Therefore, whether OpenAI collects and utilizes user personal information legally and reasonably in accordance with its privacy policy, and whether it can effectively identify and eliminate personal information "accidentally" contained in its training data sources, may be the focus of controversy in this lawsuit.

   This wave is not flat, and that wave rises again. According to Reuters, two more authors have sued OpenAI in San Francisco federal court. They believe that OpenAI misused its work to train ChatGPT, mined the data of thousands of books without permission, and violated the author's copyright.

   According to public information, in March this year, after ChatGPT was found to accidentally leak user chat records, the Italian Data Protection Agency announced at the end of March that it would temporarily disable ChatGPT and investigate the tool’s alleged violation of privacy rules. Canada is also investigating complaints that OpenAI "collected, used and disclosed personal information without consent".

   In April of this year, Reddit officially announced that it will charge companies that call its API because companies such as OpenAI and Google use the data on the platform to train models. All of a sudden, problems surrounding OpenAI's training data were constantly being exposed.

   The generative artificial intelligence product built on the principle of large model is the "violent aesthetics" under the blessing of computing power and data. Data is the threshold. The massive data in the corpus has a high degree of data compliance risk. It has 100 million users and billions of visits ChatGPT bears the brunt of its problems because of its "big tree".

   However, this is not an isolated case of the company OpenAI and the product ChatGPT. Data security issues such as privacy leaks, storage of sensitive information, and unauthorized access exposed by them are common problems that large-scale products may face after they are applied. Since the release of ChatGPT, Chinese companies have released more than 70 basic large models. With the mushrooming large models, how to achieve data compliance in the next commercial process has become a "must-answer question" that every product needs to face.

Summarize

   The wave of AI will not stop. How to steer the rudder well and find a balance between enterprise survival and compliant production has become an era proposition under the fourth industrial revolution. For enterprises that have released or are about to release basic large models, ensuring data compliance will become one of the issues they must deal with.

Guess you like

Origin blog.csdn.net/LinkFocus/article/details/131518941