Lawsuit Alleges OpenAI Stole Personal Data to Train ChatGPT

OpenAI is being sued for allegedly stealing massive amounts of personal data, including medical records and information about children, to train its language model, ChatGPT.
The lawsuit claims that OpenAI engaged in web crawling activities without obtaining people’s permission, amassing a large dataset for training purposes.
The lawsuit seeks a temporary freeze on commercial access to and development of OpenAI’s products until the company implements stricter regulations and safeguards.
The lawsuit also requests financial compensation for individuals whose data was used to train ChatGPT.

KEYPOINTS

• OpenAI is accused of stealing personal data to train ChatGPT.
• The data was allegedly collected without people's permission.
• The lawsuit seeks to prevent OpenAI from continuing to collect and use personal data without consent.
• The lawsuit also seeks financial compensation for those whose data was used to train ChatGPT.

A new lawsuit has been filed against OpenAI, accusing the company of stealing vast quantities of personal data, including medical records and information about children, in order to train its language model, ChatGPT. The class-action lawsuit claims that OpenAI engaged in web crawling activities without obtaining people’s permission, amassing a large dataset for training purposes.

The lawsuit, which was filed in the US District Court for the Northern District of California, alleges that OpenAI, under the leadership of Sam Altman, conducted secret data harvesting to enable ChatGPT to replicate human language. The plaintiffs’ lawyers argue that OpenAI chose to steal personal data instead of following established protocols for the acquisition and use of such information.

According to the lawsuit, OpenAI utilized web crawling techniques to collect extensive amounts of data, including a proprietary dataset called WebText2. This dataset reportedly scraped data from Reddit posts and associated websites. The lawsuit asserts that OpenAI’s actions allowed the company to access private conversations, medical data, and various other types of personal information shared on the internet, all without the knowledge or consent of the data owners.

The lawsuit claims that the alleged data theft affected millions of Americans, including those who do not even use AI tools. The plaintiffs argue that OpenAI’s actions constitute negligence and illegal theft of personal data on a massive scale.

OpenAI has not yet responded to requests for comment regarding the lawsuit.

Take a look at this too …

Mira Murati’s Official AI startup, Thinking Machines Lab

AI progress is a collective effort. The startup plans to benefit the public and focus on human-AI collaboration rather than full automation.

How DeepSeek Became AI’s New Power Player

OpenAI and SoftBank Launch $500 Billion AI Data Center Company

Donald Trump stands to the side as OpenAI CEO Sam Altman delivers a speech from a lectern at the White House.

Why Perplexity AI Thinks a Merger Can Save TikTok US

Day 3 of Shipmas: OpenAI Launches Sora, Its Text-to-Video AI Tool

OpenAI Launches Sora, Its Text-to-Video AI Tool /aidigitalx

Additionally, the lawsuit alleges that OpenAI not only collected data from the general public’s online activities but also stored and disclosed users’ private information. This includes details provided during the creation of OpenAI accounts, chat logs, and social media information. The plaintiffs claim that data from users of integrated applications such as Snapchat, Stripe, Spotify, Microsoft Teams, and Slack was also collected without proper consent. However, none of the mentioned companies have commented on the allegations.

The lawsuit seeks a temporary freeze on commercial access to and development of OpenAI’s products until the company implements stricter regulations and safeguards. The plaintiffs are also pushing for the ability to opt out of data collection and for measures to prevent OpenAI’s products from surpassing human intelligence and causing harm. Furthermore, the lawsuit requests financial compensation for individuals whose data was used to train ChatGPT.

In addition to OpenAI, major backer Microsoft has been named as a co-defendant in the lawsuit.

To protect the privacy of the plaintiffs, they have been identified in the lawsuit only by their initials, occupations, and states of residence.

The popularity of generative AI, which includes the creation of text, audio, images, and videos, has surged since OpenAI’s release of ChatGPT. While the technology has been used for personal, professional, and academic purposes, concerns have been raised about the potential misuse and access to personal data.

Earlier this year, Italy implemented a temporary ban on ChatGPT, citing privacy concerns and the lack of a legal basis for the mass collection and storage of personal data used to train the algorithms behind the language model. Several companies, including Amazon and Microsoft, have cautioned employees against entering confidential information into the chatbot. Samsung has gone a step further by banning the use of generative AI tools by its staff.

The lawsuit underscores the potential risks associated with AI platforms. While AI has the potential to bring about positive advancements, there are concerns about its impact on job markets, the dissemination of false information, and the potential for malicious use. OpenAI’s creators have suggested that AI could surpass human expertise in many domains within the next decade, leading some critics to raise concerns about the technology’s potential existential risks.

The outcome of the lawsuit and its implications for OpenAI’s data practices and the development of AI technology remains to be seen.

NewsletterYour weekly roundup of the best stories on AI. Delivered to your inbox weekly.