Deepak Gupta

security

DDoS and Rate-Limiting for Authentication Endpoints

Updated 2026-05-15 · 11 min read · By @guptadeepak

Key takeaways

Authentication endpoints are asymmetric targets — small request size, large server cost (password hashing, MFA dispatch, session creation). Volumetric attacks scale destructively.
Rate limiting must be composite: per-IP, per-IP-range, per-ASN, per-user, per-device, per-geo, each with its own thresholds.
Per-user lockout alone is defeated by credential stuffing (the attacker spreads attempts across many users). The defense requires the broader composite.
Edge defense (CDN, WAF, dedicated DDoS protection) handles volumetric layers; application defense handles per-user and behavioral tiers. Both are required.
The hardest tuning problem: tight rate limits also lock out legitimate users when an attacker stuffs many usernames. Score-based gates outperform binary thresholds.

Why auth endpoints are prime targets

The threat model decomposes into three layers, each requiring its own defense:

Volumetric DDoS: high-rate floods (HTTP, SYN, UDP amplification) aimed at exhausting capacity. Edge defense handles this.
Application-layer DDoS: targeted requests that look legitimate individually but overwhelm the application logic in aggregate. Application rate limiting plus bot defense.
Credential stuffing under cover: ATO campaigns running at moderate volume during volumetric events, attributing the eventual account takeover to the DDoS. Composite defense plus credential monitoring.

The composite rate-limiting model

Single-dimension rate limiting is defeated by attackers who control the dimension you're not limiting on. Composite rate limiting layers thresholds across many dimensions:

Per-IP. Simplest, cheapest, defeats unsophisticated single-source attacks. Defeated by distributed attacks from many IPs.

Per-IP-range / per-CIDR. Catches attacks from cloud-provider IP ranges or residential-proxy networks where individual IPs are distinct but ranges concentrate.

Per-ASN (autonomous system). Caps traffic from entire networks. Effective against ASNs that legitimate traffic doesn't typically originate from (e.g., a sudden spike from a hosting-provider ASN for a B2C product).

Per-user / per-username. Caps failed attempts against one account. Defeats targeted-account attacks; defeated by credential stuffing across many usernames.

Per-device fingerprint. Caps attempts from one device across multiple usernames. Effective against the credential-stuffing pattern where one tool tries many accounts.

Per-geography. Tighter thresholds for high-risk geographic origins; permissive for the typical user distribution.

Per-session / per-cookie. Caps abuse within an authenticated session.

Behavioral. Caps based on velocity changes — sudden spike in any dimension is throttled regardless of absolute threshold.

The thresholds for each are tuned against legitimate traffic patterns. Setting all of them tight enough to stop sophisticated attacks also breaks legitimate users; score-based gates that consider the composite picture outperform individual binary thresholds.

Edge defense vs application defense

The clean split:

Edge layer (CDN, WAF, dedicated DDoS protection) handles:

Volumetric attacks (Gbps-scale floods).
IP reputation filtering.
ASN and geographic filtering.
Initial bot fingerprinting (TLS JA3/JA4, HTTP header anomalies).
Pre-application rate limiting on raw request volume.

Vendors: Cloudflare, Fastly, AWS Shield, Akamai, Imperva, Radware.

Application layer handles:

Per-username, per-account thresholds.
Behavioral signals invisible to edge (mouse patterns, typing cadence).
Credential-stuffing detection (which usernames are being tried, which are succeeding).
MFA challenge rate limits.
Account-takeover scoring across signals.

Both are required. Edge alone misses the per-account view; application alone gets crushed by volume before it can do anything intelligent. The composite is the production target.

The credential-stuffing problem specifically

Credential stuffing is the recurring rate-limiting bypass. The attack pattern:

Attacker has a credential dump from a breach of some other service — millions of username/password pairs.
Attacker spreads attempts across many target accounts at moderate per-account rate (1-3 attempts per username over an hour).
Per-user lockout never activates (each username sees few attempts).
Per-IP rate limit also bypassed because attacker uses distributed proxies.
The few percent of credentials that work produce account takeovers.

The defenses that hold against this pattern:

Credential monitoring: check incoming login credentials against breach databases via the k-anonymity protocol. Force step-up MFA or password reset for credentials known to be breached. See Credential Monitoring.
Adaptive auth: even when the credential is correct, require step-up MFA for logins with anomalous signals (new device, new geography, behavioral mismatch). See Adaptive Risk-Based Authentication.
Pattern detection at scale: cross-account analysis to detect coordinated low-rate attacks that no single-account threshold catches. CIAM platforms with strong threat intelligence (Auth0 Bot Detection, Microsoft Entra Identity Protection, Okta ITP) ship this.
Phishing-resistant primary: passkeys eliminate the credential-stuffing surface entirely — no shared secret to stuff.

Hashing-cost throttling

Specific to KDF-based password verification: every login attempt runs Argon2id (or bcrypt, scrypt) which deliberately costs 100-500ms of CPU. An attacker who can flood the login endpoint with username/password pairs exhausts CPU before the database does.

Defenses:

Pre-hash rate limiting: throttle requests at the edge or in application middleware before invoking the KDF. Reject obviously-bogus traffic without paying the hashing cost.
Failed-attempt amplification: track failed attempts and don't even run the hash on usernames that have already exceeded failure thresholds.
Username-existence check timing: ensure the timing of failed-on-known-username vs failed-on-unknown-username is constant, so attackers can't enumerate users by timing the auth response.
Async hashing with timeout: if the hashing pipeline is over capacity, fail fast with a "try again" rather than queueing requests indefinitely.

Modern CIAM platforms handle most of this automatically; self-hosted deployments need explicit attention.

The MFA channel defense

MFA endpoints are their own attack surface. Push notifications, SMS, email all have channel-specific costs (SMS especially — each message costs real money). Attackers can:

Trigger MFA-fatigue spam (push notifications until the user approves out of habit).
Exhaust SMS budget by triggering many MFA challenges (SMS-pumping fraud is a multi-billion-dollar industry).
Lock users out by triggering rate limits on the legitimate MFA channels.

Defenses:

Strict per-account, per-channel rate limits on MFA challenge issuance.
Number-matching push to defeat MFA-fatigue.
SMS pumping fraud defense: country-level send caps, premium-rate-number filtering, behavioral anomaly detection on registered phone numbers.
Alternative-channel offering: when one channel is rate-limited, surface an alternative rather than blocking entirely.

Tuning friction vs availability

The recurring operational challenge: rate limits tight enough to stop attacks also lock out legitimate users during incidents. The patterns that balance:

Score-based gates: produce a composite risk score per request; use the score to gate friction rather than binary thresholds.
Soft-fail with retry guidance: failed limits surface a CAPTCHA or step-up MFA instead of hard rejection.
Per-customer tuning (for B2B SaaS): some customers see more traffic than others; one-size-fits-all thresholds create false positives for high-volume customers.
Active incident-response runbook: when an attack triggers, dial up the thresholds dynamically rather than waiting for next deployment. Document who can do this and how.
False-positive rate as a primary metric: track legitimate-user lockouts as carefully as attacker blocks. The right defense protects users from attacks; the wrong defense protects users from themselves.

Implementation guidance

Layer edge + application defense from day one. CDN-native DDoS protection (Cloudflare, Fastly, AWS Shield) plus CIAM-native adaptive auth plus rate limiting at multiple dimensions.
Add credential monitoring — k-anonymity check on every login against Have I Been Pwned or equivalent.
Pre-hash rate limiting — don't run Argon2id on every incoming request; throttle the obvious noise first.
MFA channel rate limits per user, per channel. Strict; SMS pumping fraud is real.
Number-matching push — non-negotiable in 2026 for any push-based MFA.
Composite scoring over binary thresholds — score-based gates outperform single-dimension limits.
Track false-positive rate explicitly. Block-everything looks good in dashboards but kills conversion silently.
For high-target verticals (fintech, gaming, crypto), add dedicated bot defense (DataDome, HUMAN, Kasada). Reference Bot Defense for the broader picture.
Move to passkeys as primary auth. The bot-defense and rate-limiting surface shrinks dramatically when the credential isn't phishable or stuffable.

Where to next

FAQ

Why are authentication endpoints prime targets for DDoS?: Three reasons. First, asymmetry — a small login request (a few hundred bytes) triggers expensive server work (Argon2id hashing taking 100-500ms, database lookups, MFA dispatch, session creation). The cost ratio favors the attacker. Second, business impact — if login is down, the whole product is down for new sessions, even if other endpoints work. Third, identity-system disruption can mask account-takeover campaigns running underneath the noise.
What's the difference between rate limiting and DDoS protection?: Rate limiting is the application-layer control that caps requests per identifier (IP, user, session) per time window. DDoS protection is the network/edge layer that handles volumetric attacks before they reach the application — typically via a CDN or dedicated DDoS service that scrubs traffic. Both are required: edge defense handles the volumetric tier; application rate limiting handles the per-identifier tier that edge can't see (after authenticated traffic mixes with anonymous).
Why doesn't per-user lockout stop credential stuffing?: Per-user lockout activates when one username sees too many failures. Credential stuffing spreads attempts across thousands of usernames (each attempted once or twice with a different breached password), so no single username hits the lockout threshold. The aggregate volume is enormous but no per-user threshold detects it. The defense requires composite rate limiting across IPs, IP ranges, ASNs, geographies, and device fingerprints in addition to per-user thresholds.
How tight should login rate limits be?: Depends on the legitimate baseline. Most production deployments target: 5-10 failed attempts per username per 15 minutes before lockout; 100-500 requests per IP per 5 minutes before throttling; broader IP-range and ASN thresholds set by observed legitimate traffic patterns. The exact numbers vary by user demographics, geographic distribution, and traffic patterns. Tune against actual production data; instrument false-positive rate alongside true-positive rate.
Does dedicated DDoS protection matter for login endpoints specifically?: Yes, increasingly. Cloudflare DDoS Protection, AWS Shield Advanced, Akamai Prolexic, F5 Silverline all explicitly target the auth-endpoint use case. The 2023-2026 attack data shows targeted DDoS against login endpoints (not just generic web traffic) as a deliberate ATO-adjacent tactic — DDoS the legitimate user out, run credential stuffing under the noise, attribute the eventual takeover to the volumetric event. Composite defense at edge plus application is the production target.

Sources

OWASP Authentication Cheat Sheet (current)
NIST SP 800-63B-4 — Throttling guidance (Section 5.2.2)
Cloudflare DDoS Threat Report (quarterly)
US Patent — DDoS protection on authentication endpoints (Gupta et al.)