Deepak Gupta

The Long Tail of Data Breaches

The Data That Wouldn't Die

In January 2026, AT&T finalized a $177 million settlement with the FCC - the largest telecommunications data breach settlement in U.S. history. The settlement resolved investigations into breaches that had originally occurred in 2019 and 2022. But the story of how that data traveled from AT&T's systems to a payout seven years later reveals something critical about modern breach economics that most organizations fundamentally misunderstand.

The original breach involved approximately 8.9 million AT&T wireless customer records, including Social Security numbers that had been provided to AT&T by customers and shared with a marketing vendor. The vendor was supposed to use the data for personalized campaign generation. Instead, the vendor's systems were compromised, and the data ended up on dark web marketplaces.

Here's what made this case exceptional: the Social Security numbers had been hashed - not stored in plaintext. AT&T's defense was that hashed SSNs were not usable. But by 2024, security researchers demonstrated that the hashing algorithm used was weak enough that every single SSN could be reversed in a matter of hours using commodity hardware. Nine digits, limited character space, a predictable format - it was a brute-force exercise, not a cryptographic challenge.

Data that AT&T considered "protected" in 2019 was fully exposed by 2024. The $177 million settlement reflected not just the original failure to secure the data, but the compounding failure of relying on inadequate protection that degraded over time.

This is the long tail of data breaches. The incident doesn't end when you patch the vulnerability or file the breach notification. It continues for years - sometimes decades - as stolen data circulates, gets combined with other data sets, and becomes more dangerous over time.

The Breach Lifecycle

Most organizations think of a breach as an event - a point in time when data is stolen. In reality, a breach is a process that unfolds over years.

Phase 1: Initial Compromise (Months 0-3)

The data is stolen. It may be exfiltrated in bulk or siphoned slowly over weeks. The organization may or may not detect it. The average time to detect a data breach in 2025 was 194 days according to IBM's research - meaning most organizations are breached for over six months before they even know it happened.

Phase 2: First-Sale Market (Months 3-12)

The stolen data enters the dark web marketplace. Fresh, complete data sets command premium prices:

Data Type	First-Sale Price (per record)	Volume Required for Significant Payout
Complete identity (SSN + DOB + address)	$10-$50	10,000+
Healthcare records	$250-$1,000	1,000+
Credit card with CVV	$15-$40	5,000+
Bank account credentials	$50-$200	2,000+
Corporate email credentials	$5-$20	50,000+
Social media accounts	$1-$10	100,000+

The initial buyer is typically a criminal group that specializes in monetization - identity fraud, financial theft, or further targeted attacks.

Phase 3: Data Aggregation (Months 12-36)

This is where things get interesting - and dangerous. The stolen data begins to be combined with data from other breaches. A name and email from one breach gets merged with an SSN from another and a phone number from a third. These aggregated profiles are far more valuable than any individual data set.

Criminal data brokers operate sophisticated matching algorithms that can correlate records across hundreds of breach data sets using common identifiers - email addresses, phone numbers, and names combined with other context. The result is comprehensive identity profiles that enable highly targeted fraud.

Phase 4: Long-Tail Exploitation (Years 2-10+)

The data continues to be valuable and often appreciates over time for several reasons:

New exploitation techniques emerge: Hashed data that was considered protected becomes crackable as computing power increases (as in the AT&T case)
Regulatory penalties compound: Class action lawsuits, FCC fines, and state attorney general actions can take years to resolve, with penalties increasing as the scope of harm becomes clear
Re-identification advances: Data that was "anonymized" becomes identifiable when combined with new data sources
Victim awareness decreases: People change passwords and freeze credit after a breach, but these protections lapse over time. Two years later, the credit freeze is lifted, the new password has been reused elsewhere, and the stolen data is usable again

Warning

Data does not decay. It appreciates. A Social Security number stolen in 2020 is just as valid in 2026. The protective measures victims take are temporary, but the stolen data is permanent.

Instagram's 17.5 Million Account Crisis

In mid-2025, security researchers at a European data protection firm disclosed that approximately 17.5 million Instagram accounts had been scraped through an API vulnerability. The data included:

Public profile information (usernames, bios, follower counts)
Email addresses associated with business accounts
Phone numbers from contact information
Location data from geotagged posts
Engagement patterns and posting schedules

Meta initially downplayed the incident, characterizing it as "scraping of publicly available information" rather than a breach. This framing was technically defensible but practically misleading. While individual data points may have been publicly visible, the aggregation of 17.5 million profiles into a searchable, downloadable database created something qualitatively different from what was publicly accessible.

The Scraping vs. Breach Distinction

This incident highlights a growing gray area in data security. When an API allows mass extraction of data that is individually public but not collectively accessible, is that a breach?

Regulators are increasingly saying yes. The European Data Protection Board ruled in 2024 that mass scraping of public data violates GDPR when it results in the creation of new databases that the data subjects didn't consent to. Several U.S. states have adopted similar positions under their consumer privacy laws.

For organizations operating APIs, the lesson is clear: rate limiting, authentication, and anomaly detection on APIs are not optional. If your API allows an automated tool to extract millions of records - regardless of whether those records are individually "public" - you have a security problem that regulators will hold you accountable for.

The Real-World Impact

The Instagram scrape data was combined with data from previous LinkedIn and Facebook scraping incidents to create comprehensive social engineering profiles. These profiles were used to:

Craft highly convincing spear-phishing emails referencing victims' real interests, locations, and professional connections
Execute business email compromise (BEC) attacks targeting accounts with large followings
Launch credential stuffing attacks using email-password combinations from other breaches
Enable physical stalking by correlating location data with real-time social media activity

One documented case involved a series of targeted attacks against Instagram influencers with business accounts. Attackers used scraped email addresses and phone numbers to reset account passwords, then held the accounts for ransom. Over 400 accounts with follower counts exceeding 100,000 were compromised in a two-month period, with ransom demands averaging $5,000 per account.

TikTok's Privacy Crisis: 150% Surge in Deletions

In the first quarter of 2026, TikTok experienced a 150% surge in account deletions in the U.S. and European markets. The immediate trigger was a series of investigative reports revealing the extent of data collection by the platform, but the underlying cause was a cumulative loss of user trust driven by multiple incidents.

The Compounding Trust Problem

TikTok's situation illustrates how data privacy concerns compound over time:

2020: Reports emerge that TikTok's clipboard access on iOS was reading clipboard contents even when the app wasn't in active use. Apple's iOS 14 transparency feature exposed this behavior.

2022: BuzzFeed News obtained leaked audio from internal TikTok meetings confirming that U.S. user data had been accessed by employees in China, contradicting public assurances.

2023: The Montana state legislature passed a complete ban on TikTok (later blocked by courts). Congressional hearings drew intense scrutiny to data practices.

2024: The Protecting Americans from Foreign Adversary Controlled Applications Act was signed into law, requiring ByteDance to divest TikTok or face a ban.

2025: Additional reporting revealed that TikTok's algorithmic recommendations used biometric data (face geometry, voice patterns) more extensively than previously disclosed. Privacy researchers demonstrated that TikTok's in-app browser injected JavaScript that could monitor all user interactions on external websites opened through the app.

Early 2026: The 150% deletion surge, driven by cumulative trust erosion.

The Business Impact of Privacy Failure

TikTok's advertising revenue in affected markets dropped an estimated 23% in Q1 2026 compared to the same period in 2025. Brands that had built significant marketing presences on the platform scrambled to diversify. Some major advertisers paused campaigns entirely pending clarity on data practices.

For any organization that collects user data, TikTok's trajectory is instructive. Users didn't leave because of a single incident. They left because of a pattern - each new revelation confirmed suspicions and reduced the benefit-of-the-doubt that users were willing to extend.

Trust Event	User Response	Recovery Difficulty
First incident	Concern, media coverage, minimal behavior change	Low - "everyone makes mistakes"
Second incident	Skepticism, some users adjust privacy settings	Moderate - pattern begins to emerge
Third incident	Active distrust, privacy-conscious users leave	High - narrative is established
Fourth+ incidents	Mass action, regulatory intervention, deletion surges	Very high - trust rebuilding takes years

Note

For a detailed analysis of how consumer trust erosion translates to business impact, see my article: The Business Cost of Privacy Failures

Data Decay vs. Data Appreciation

One of the most dangerous misconceptions in data security is that stolen data becomes less useful over time. The opposite is often true.

Data That Decays

Some categories of stolen data do lose value:

Session tokens and temporary credentials: Expire within hours or days
Credit card numbers: Useful until the card is reported and reissued (typically 30-90 days)
Short-lived API keys: Rotate on schedules (hours to months)
One-time passwords: Useless after a single use window

Data That Appreciates

Other categories become more valuable over time:

Social Security numbers: Never change (in practice). Useful for identity fraud indefinitely
Medical records: Permanent. Value increases as healthcare fraud techniques mature
Biometric data: Cannot be changed. Fingerprints, face geometry, and voice patterns are permanent identifiers
Personal history and relationships: Useful for social engineering long after collection
Email addresses: Stable identifiers that persist for years, useful for account correlation across platforms
Education and employment history: Enables targeted social engineering and credential fraud

The Aggregation Multiplier

The most important factor in data appreciation is aggregation. Individual data points may have limited value. But when combined across multiple breaches, they create something far more dangerous than the sum of their parts.

Consider this example:

Breach A (2020): Email address + hashed password leaked from a retail site
Breach B (2022): Same email address + phone number leaked from a social media platform
Breach C (2023): Phone number + home address leaked from a food delivery service
Breach D (2024): Home address + SSN leaked from an insurance company

No single breach exposed enough to enable identity theft. But together, a criminal now has: email, password (cracked from the hash), phone number, home address, and SSN. That's a complete identity package - assembled from four different breaches over four years, none of which individually seemed catastrophic.

This is why "we only leaked email addresses" is never an adequate response to a breach. Those email addresses will be correlated with data from other breaches to build profiles that enable significant harm.

The Financial Timeline of a Breach

Organizations consistently underestimate how long a breach continues to cost money. Here's a realistic timeline based on aggregate data from major incidents:

Time Period	Cost Category	Typical Range
Month 1-3	Incident response, forensics, initial remediation	$500K - $5M
Month 3-6	System rebuilding, security upgrades, customer notification	$1M - $10M
Month 6-12	Regulatory investigations begin, legal defense preparation	$500K - $3M
Year 1-2	Class action lawsuits filed, regulatory fines assessed	$2M - $50M
Year 2-3	Settlements negotiated, ongoing legal costs	$5M - $100M
Year 3-5	Insurance premium increases, ongoing monitoring obligations	$1M - $10M
Year 5-7	Residual legal costs, ongoing breach monitoring for affected individuals	$500K - $5M

AT&T's timeline spanned from 2019 to 2026 - seven years from initial breach to final settlement. And the $177 million represented only the FCC portion. Class action settlements, state-level penalties, and internal remediation costs pushed the total far higher.

Lessons for CISOs

1. Plan for Data Longevity

When you assess the risk of a potential breach, don't evaluate it based on the data's value today. Evaluate it based on the data's value in five years, after it has been aggregated with other breaches and processing power has increased.

This means:

Encryption standards matter more than you think: AES-256 is safe today. MD5-hashed SSNs are not. Choose algorithms that will withstand decades of computing advances
Data minimization is your best protection: Data you don't collect can't be breached. Challenge every data collection practice - do you really need Social Security numbers? Do you really need to retain data for seven years?
Tokenization over encryption for identifiers: Where possible, use tokenization rather than encryption for sensitive identifiers. Tokens have no mathematical relationship to the original data, so they can't be reversed regardless of computing power

2. Treat API Scraping as a Security Event

The Instagram incident demonstrates that API abuse is a data breach by another name. Ensure your security program includes:

Rate limiting on all APIs, especially those returning user data
Anomaly detection for unusual API access patterns
Authentication requirements for bulk data access
Regular API security assessments
Monitoring for your organization's data appearing in scraped datasets

3. Model Your Aggregation Exposure

Map the data you hold against known breach datasets. If your customers' email addresses appear in other breaches, the data you hold becomes more valuable to attackers because it completes the profile. Services like Have I Been Pwned provide APIs that can help assess this exposure.

4. Build for Regulatory Endurance

The AT&T settlement came seven years after the breach. Your legal and compliance posture needs to withstand that timeline. This means:

Preserving all incident response documentation indefinitely
Maintaining relationships with outside legal counsel who specialize in breach litigation
Budgeting for multi-year legal defense costs in breach response planning
Carrying cyber insurance with adequate limits and multi-year coverage terms

Tip

For a detailed framework on calculating the long-term financial impact of data breaches and building appropriate reserves, see my analysis: The True Cost of Data Breaches

The Uncomfortable Truth

Every organization that has ever been breached - and many that haven't yet discovered they've been breached - is sitting on a ticking financial and legal liability. The data stolen from your systems in 2022 is still out there. It's being aggregated, correlated, and weaponized. The regulatory consequences haven't fully materialized yet. The class action attorneys haven't finished filing.

The long tail of a data breach is measured in years and hundreds of millions of dollars. The organizations that understand this invest differently. They invest in data minimization, in strong encryption, in breach preparedness that spans years rather than months. They treat data as a liability as much as an asset.

Because the data that was stolen today will still be causing damage in 2033. The question is whether you'll be prepared for that reality - or whether you'll be the next organization writing a nine-figure settlement check for data you thought was protected seven years ago.