Skip to content
OperationscaleLast updated 2026-06-09

Scaling the user directory itself.

Who feels it

engineering

What triggers the evaluation

a login outage during a peak event · a nightly sync that takes days · a rigid-schema workaround database

The pain starts when the user store stops being a table and becomes a system. A few million rows in Postgres works fine until you need sub-100ms authentication globally, and then everything breaks at once: profile lookups slow, admin search across users times out, bulk operations lock tables, and peak events (Black Friday, a product drop, a streaming TV moment) produce 50 to 100x baseline login traffic the auth path was never provisioned for. Login is uncacheable and on the critical path of every session, so unlike content you cannot CDN your way out.

The evaluation question enterprises land on is whether the vendor can prove peak-load behavior, not average. Teams that have been burned ask for rate-limit specifics (per-endpoint, per-tenant), token issuance throughput, and what degrades first under load. Rate limits are a notorious hidden constraint: a platform that caps management-API calls at a few hundred per second turns a nightly CRM sync or a mass credential reset after a breach into a days-long job.

Schema flexibility is the quieter scaling problem. Consumer profiles are not static; loyalty tier, marketing preferences, device history, KYC status, and per-brand consents all accrete. Rigid schemas force data into a parallel database, which immediately recreates the fragmentation CIAM was supposed to solve. So buyers now ask: custom attributes, searchable or not, indexed how, queryable at what latency, and who can read and write them at the API level. See CIAM at high scale and multi-region CIAM.

How teams recognize it

  • Profile lookups and admin search slow down as the store grows
  • Bulk operations (export, backfill, forced reset) lock tables
  • Peak events produce 50-100x baseline login traffic
  • A management-API rate limit turns a nightly sync or mass reset into days

How to evaluate vendors for this

The exact questions to put to vendors. Match each answer against the capabilities in the comparison below.

  1. 01Can you prove peak-load behavior, not average, and what degrades first under load?
  2. 02What are the per-endpoint and per-tenant rate limits, and token issuance throughput?
  3. 03Can custom attributes be indexed and queried, and at what latency?
  4. 04Who can read and write custom attributes at the API level?

Capabilities that solve this

The vendors that cover the capabilities this pain maps to, scored on just those axes. See the full matrix on each vendor profile.

CapabilityAkamai Identity Cloud100% coveredAmazon Cognito100% coveredAuth0100% coveredCyberArk Identity100% coveredFirebase Authentication100% coveredForgeRock100% coveredIBM Verify100% coveredMicrosoft Entra External ID100% covered
Proven at high scale (1M+ MAU)✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes
Multi-region deployment✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes
Documented rate limits✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes
Custom user metadata✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes✓ Yes

See every vendor ranked for this pain

Related pain points

Keep going