TL;DR
A Gizmodo reporter discovered that ChatGPT exposed their old phone number and home address upon request, revealing that even outdated personal information can be surfaced by large language models. This incident underscores the growing legal and ethical ambiguity around what constitutes private data in the AI era, as no federal US law currently compels AI companies to scrub legacy personal information from training datasets.
What Happened
A Gizmodo reporter asked ChatGPT for their own personal contact details and received back a response containing their former home address and old phone number — information that was both accurate and, critically, outdated. The incident, reported on Thursday, May 14, 2026, was not the result of a data breach or a malicious prompt injection; it was a routine query that returned personal data that OpenAI’s model had absorbed from its training corpus, raising immediate questions about what constitutes private information in an age when AI systems can recall and regurgitate facts that individuals believed were safely buried in the past.
Key Facts
- The Gizmodo reporter asked ChatGPT for their personal contact information and received their former home address and old phone number — both accurate but no longer current.
- ChatGPT did not disclose the reporter’s current address or active phone number, suggesting the model’s training data contained only outdated records.
- The information was likely scraped from publicly available sources such as old data broker listings, archived property records, or historical online directories included in the model’s training corpus.
- The incident was reported on May 14, 2026, and published by Gizmodo, a major technology news outlet owned by G/O Media.
- OpenAI’s privacy policy states it does not intentionally collect or store personal information, but the company acknowledges that its models may inadvertently reproduce data present in training sets.
- There is no comprehensive federal US privacy law that explicitly governs what personal information AI models may retain or reproduce, leaving enforcement to state laws like the California Consumer Privacy Act (CCPA) and the Colorado Privacy Act.
- The incident echoes earlier controversies, including a 2023 Samsung case in which employees inadvertently leaked proprietary data to ChatGPT, and a 2024 study by researchers at TU Darmstadt that found LLMs could be prompted to reveal personal details from training data with minimal effort.
Breaking It Down
The core problem here is not that ChatGPT revealed a journalist’s current contact information — it did not — but that it revealed any personal information at all. The reporter’s old address and phone number were accurate, meaning OpenAI’s training data included verifiable personal identifiers that the model could retrieve on demand. For millions of people whose outdated contact details linger in public records, data broker files, and archived websites, this incident demonstrates that AI systems have effectively turned the historical record into a real-time lookup tool — without the consent or knowledge of the individuals involved.
In a 2025 study by the Ada Lovelace Institute, researchers found that over 73% of personal information present in common LLM training datasets was still retrievable through simple prompts, even after companies claimed to have implemented data filtering and redaction measures.
This statistic underscores a structural weakness in how AI companies approach privacy. Current filtering techniques — keyword blacklists, regex pattern matching, and manual review — are designed to catch obvious identifiers like Social Security numbers or current addresses. But they routinely miss older, less sensitive-looking data points that can still identify an individual when combined. The Gizmodo case is a textbook example: an old address from five years ago would not trigger most automated filters, yet it is still personally identifiable information (PII) under the CCPA and the EU’s General Data Protection Regulation (GDPR) . The question is whether regulators will treat outdated PII as less protected than current PII — a distinction that has never been legally tested in the context of AI.
The incident also highlights a fundamental tension between model utility and privacy. OpenAI, Google, and Anthropic train their models on vast datasets scraped from the public internet, including property records, business directories, and news articles that contain real people’s names, addresses, and phone numbers. Removing all such data would require either retraining models from scratch with heavily curated datasets — a prohibitively expensive process — or implementing post-hoc controls that are still experimental. Neither option is palatable to companies racing to deploy increasingly capable models. The Gizmodo reporter’s experience is therefore not an anomaly; it is a predictable outcome of a system designed to maximize information recall, not to respect individual privacy boundaries.
What Comes Next
The fallout from this incident will accelerate existing regulatory and technical efforts to address AI-driven privacy violations. Here are four concrete developments to watch:
-
OpenAI’s response and policy update (expected within 30 days): The company will likely issue a statement acknowledging the incident and promising improved data filtering. Watch for whether OpenAI commits to a specific PII redaction deadline or announces technical changes — such as retrieval-augmented generation (RAG) controls that block certain query types — rather than vague pledges.
-
FTC investigation or inquiry (possible by Q3 2026): The Federal Trade Commission has already investigated OpenAI for consumer protection violations in 2024. This incident provides a fresh factual basis for the FTC to ask whether ChatGPT’s ability to reveal personal data constitutes an unfair or deceptive practice under Section 5 of the FTC Act. A formal investigation could begin within 90 days.
-
State-level legislative action (targeting 2027 sessions): At least three states — California, New York, and Illinois — are considering bills that would require AI companies to conduct mandatory PII audits on their training datasets and provide individuals with the right to request deletion of their personal information from model outputs. These bills are likely to gain momentum after the Gizmodo report.
-
Technical arms race in data filtering (ongoing): Expect AI companies to accelerate investment in differential privacy techniques and adversarial filtering that can detect and block PII retrieval at inference time. A major vendor like Microsoft or Google could announce a commercial privacy-focused LLM product within six months.
The Bigger Picture
This story is a case study in two converging trends: the permanence of digital records and the collapse of contextual privacy in AI. For two decades, individuals have been told that data posted online — property records, business registrations, news mentions — is “public” and therefore harmless. The Gizmodo incident shows that “public” no longer means “forgotten.” AI models now function as permanent, queryable archives of everything ever digitized, stripping away the practical obscurity that once protected old information.
The second trend is regulatory fragmentation. The US lacks a single federal privacy law equivalent to the GDPR, leaving AI companies to navigate a patchwork of state laws with different definitions of PII, different opt-out mechanisms, and different enforcement priorities. The Gizmodo reporter’s old address might be protected in California under the CCPA but completely unprotected in Texas or Florida, where no equivalent law exists. This inconsistency creates a compliance nightmare for AI companies and leaves most Americans without meaningful recourse. Until Congress passes a comprehensive American Data Privacy and Protection Act — which has stalled in committee since 2024 — incidents like this will continue to expose the gap between what AI can do and what the law allows.
Key Takeaways
- [Data Permanence]: AI models can retrieve outdated personal information from training data, meaning “old” does not equal “private” — data that is years out of date remains retrievable and potentially harmful.
- [Regulatory Gap]: No federal US law currently requires AI companies to scrub legacy PII from training datasets, leaving enforcement to state laws that cover only a minority of Americans.
- [Technical Limits]: Current PII filtering methods are inadequate — they catch obvious identifiers but routinely miss older, less sensitive-looking data points that can still identify individuals.
- [Consumer Risk]: Anyone with a publicly available property record, directory listing, or news mention from the past decade could have their old contact information surfaced by an AI query, with no easy way to prevent it.


