Deepak Gupta

Scaling to a Billion Users

Crossing the first million in ARR felt like reaching the summit. Then we looked up and realized the mountain was much taller. Scaling LoginRadius from a promising CIAM platform to a system handling over one billion identities across 180+ countries introduced an entirely new set of challenges - architectural, organizational, and operational.

This chapter covers the decisions, trade-offs, and hard lessons of scaling a security platform to massive scale.

Architecture Decisions That Defined Scale

When you are managing identities for over a billion users, architectural decisions you made at 10,000 users can either enable or destroy you. Several early architectural choices proved to be the difference between scaling gracefully and hitting walls.

Multi-Tenant Architecture

The first critical decision was multi-tenancy. We chose a shared infrastructure with isolated data stores per customer - not a separate deployment per customer.

LoginRadius Multi-Tenant Architecture
========================================

  Customer A    Customer B    Customer C
     |              |              |
     v              v              v
  +--------------------------------------+
  |        Shared Application Layer       |
  |   (Authentication, Authorization,     |
  |    Session Management, MFA Engine)    |
  +--------------------------------------+
     |              |              |
     v              v              v
  +----------+ +----------+ +----------+
  | Data     | | Data     | | Data     |
  | Store A  | | Store B  | | Store C  |
  | (Region) | | (Region) | | (Region) |
  +----------+ +----------+ +----------+

Why multi-tenant: Single-tenant deployments are simpler to reason about but impossible to scale economically to hundreds of customers. Multi-tenancy let us amortize infrastructure costs, deploy updates once instead of per-customer, and maintain a single codebase.

The trade-off: Multi-tenancy in security is harder than in other SaaS categories. A vulnerability in the shared layer affects every customer. Data isolation must be absolute - a bug that leaks one customer's user data to another customer is a career-ending event. We invested heavily in tenant isolation testing, including automated tests that attempted cross-tenant data access on every deployment.

Global Data Residency

As we expanded internationally, data residency became non-negotiable. GDPR required European data to stay in Europe. Customer contracts specified data processing locations. Some countries had outright data localization laws.

Region	Data Residency Requirement	Our Solution
European Union	GDPR - data processing in EU	Frankfurt and Dublin data centers
United States	Various state laws, customer preference	US East and US West data centers
India	Data localization for certain data types	Mumbai data center
Canada	PIPEDA, provincial requirements	Toronto data center
Australia	Privacy Act, data localization	Sydney data center

We deployed in multiple regions early - this was one of the best early decisions we made. Companies that wait until a customer demands data residency find themselves in a months-long infrastructure project when they should be closing a deal.

Tip

Build for data residency before your customers demand it. The cost of deploying in additional regions proactively is a fraction of the cost of emergency deployment under customer pressure. In security, the question is not whether you will need regional data residency - it is when.

Performance at Scale

At one billion identities, every millisecond matters. Our authentication system processes up to 150,000 login requests per second at peak. The performance architecture that enabled this:

Performance Architecture
=========================

Request Flow:
  User login request
       |
       v
  Global CDN / Edge Network
  (TLS termination, DDoS mitigation)
       |
       v
  Load Balancer (regional)
       |
       v
  Authentication Service Cluster
  (horizontally scaled, stateless)
       |
       v
  Cache Layer (distributed)
  (session data, user profiles,
   configuration, rate limits)
       |
       v
  Database Cluster (regional)
  (only on cache miss or write)

Key Metrics:
  - P50 latency: <50ms
  - P99 latency: <200ms
  - Peak throughput: 150,000 req/sec
  - Uptime SLA: 99.99%

Stateless services. Every authentication service instance is stateless. User sessions and state live in a distributed cache layer. This lets us scale horizontally by adding instances without coordination.

Aggressive caching. User profile data, configuration, and session data are cached with carefully tuned TTLs. At our scale, the difference between a cache hit and a database read is the difference between 5ms and 50ms - and that compounds across millions of requests.

Read-heavy optimization. Authentication is overwhelmingly read-heavy (logins) versus write-heavy (registrations, profile updates). We optimized the read path aggressively with read replicas, caching, and denormalized data models.

Building the Team

Scaling from a small founding team to an organization capable of managing a billion-user platform required deliberate team building. The skills needed at 100 customers are different from the skills needed at 10,000.

The Hiring Evolution

Stage	Team Size	Key Hires	Why
0-100 customers	5-10	Generalist engineers	Need people who can build anything
100-500 customers	15-30	SRE, Security engineer, first AE	Reliability and sales become critical
500-2000 customers	30-60	VP Engineering, CS team, DevOps	Need leadership and customer operations
2000+ customers	60-100+	CISO, compliance team, regional leads	Governance, global operations

The Hardest Hires in Security

Security engineers who can also build product. The intersection of security expertise and product engineering skills is tiny. Most security engineers want to break things, not build products. Finding people who can do both is one of the hardest hiring challenges in the industry.

Enterprise account executives who understand security. Enterprise sales in security requires AEs who can hold technical conversations with CISOs, navigate security evaluations, and speak the language of risk. These people are rare and expensive.

Customer success managers with security domain knowledge. A CSM who does not understand authentication, identity federation, or compliance requirements cannot effectively support security customers. We had to train CSMs extensively on our domain.

Note

The biggest hiring mistake we made was hiring for generic SaaS experience instead of security domain expertise. A VP of Sales who crushed it at a marketing SaaS company struggled in security because the buying process, the stakeholder map, and the value proposition are fundamentally different. Domain expertise trumps general experience in security.

Operations at 1B+ Identities

Running a platform that manages over a billion identities is an operational discipline as much as a technical one. Several operational capabilities became critical as we scaled.

Incident Response

At billion-user scale, every incident is amplified. A 5-minute outage affects millions of login attempts. A security vulnerability could expose billions of records. We built incident response as a core competency:

Incident Response Framework
==============================

Severity Levels:
  SEV 1: Platform-wide outage or security breach
         Response: Immediate, all hands
         Comms: Customer notification within 1 hour
         Postmortem: Mandatory within 24 hours

  SEV 2: Degraded performance or partial outage
         Response: On-call team, escalation to leads
         Comms: Status page update within 30 minutes
         Postmortem: Mandatory within 48 hours

  SEV 3: Minor issue, no customer impact
         Response: On-call team, fix during business hours
         Comms: Internal only
         Postmortem: Optional

Key Metrics:
  - MTTD (Mean Time to Detect): <5 minutes
  - MTTR (Mean Time to Respond): <15 minutes
  - MTTR (Mean Time to Resolve): <2 hours (SEV 1)

Compliance at Scale

As our customer base grew, so did our compliance requirements. Different customers required different certifications and audit evidence:

Certification	What It Covers	Effort to Maintain
SOC 2 Type II	Security, availability, processing integrity	Annual audit, continuous controls
ISO 27001	Information security management	Annual audit, management review
GDPR compliance	EU data protection	DPO, data processing agreements, privacy impact assessments
HIPAA	Healthcare data protection	Annual risk assessment, BAA with customers
PCI DSS	Payment card data security	Quarterly scans, annual assessment

The compliance burden grows non-linearly. Each new certification adds ongoing maintenance, audit preparation, and documentation requirements. At some point, you need a dedicated compliance team - not security engineers doubling as compliance managers.

The CTO-to-CISO Dual Role

Technical founders of security companies often find themselves wearing two hats: CTO (building the product) and de facto CISO (ensuring the security of their own infrastructure). These roles have fundamentally different objectives:

Dimension	CTO Perspective	CISO Perspective
Speed	Ship fast, iterate quickly	Move carefully, evaluate risks
Features	Add capabilities	Minimize attack surface
Architecture	Optimize for performance	Optimize for security
Access	Enable developer productivity	Restrict access to need-only
Dependencies	Use best tools available	Minimize third-party risk

These perspectives often conflict. The CTO wants to adopt the latest database technology. The CISO wants to use battle-tested, audited solutions. The CTO wants developers to have broad access for debugging. The CISO wants least-privilege access with full audit trails.

Warning

If you are a technical founder running a security company, eventually you must split the CTO and CISO roles. One person cannot effectively optimize for both speed and security. The conflicts between these roles require separate decision-makers with equal organizational authority. The longer this split is delayed, the more likely it is to create tension across the engineering organization.

International Expansion Challenges

Expanding LoginRadius internationally introduced challenges beyond data residency:

Regulatory fragmentation. Every country has different privacy laws, data protection requirements, and authentication standards. What is compliant in the US may be illegal in Germany. What is acceptable in India may not satisfy Canadian requirements.

Localization of security features. Authentication experiences need localization beyond language translation. Phone number formats for SMS OTP, national ID verification requirements, regional social login preferences (WeChat in China, Line in Japan) all require country-specific implementation.

Support across time zones. Enterprise security customers expect responsive support. A customer experiencing an authentication outage at 3 AM their time cannot wait until your engineering team wakes up. We built follow-the-sun support before we could comfortably afford it.

Currency and pricing. Enterprise pricing in different markets requires understanding local purchasing power, competitive dynamics, and procurement norms. A pricing structure that works in the US may be non-competitive in India or over-discounted in Northern Europe.

Expansion Challenge	Our Approach
Regulatory compliance	Hired regional compliance advisors in EU, India, and Australia
Data residency	Pre-deployed infrastructure in 5 regions
Localization	Built extensible authentication UX supporting 30+ languages
Time zone support	Follow-the-sun support team across US, India, and Australia
Pricing	Regional pricing tiers based on market analysis

The Lessons of Scale

Scaling to a billion users taught us lessons that could not be learned at smaller scale:

Reliability is a feature. At small scale, an outage is an inconvenience. At billion-user scale, an outage is a crisis that makes headlines. We invested more in reliability engineering than in new features for extended periods. That felt wrong at the time but was exactly right.

Compliance is a competitive advantage. Companies often view compliance as a cost center. At scale, our comprehensive compliance portfolio became one of our strongest sales differentiators. Enterprise customers chose us specifically because we had the certifications their compliance teams required.

Simplicity scales, complexity breaks. The architectural decisions that scaled best were the simplest ones. Stateless services, clean data isolation, and aggressive caching are not clever - they are boring and reliable. The clever architectural decisions were the ones that caused the most incidents.

Culture determines operational quality. We could not hire enough engineers to personally monitor every system. Culture - specifically a culture where every engineer felt personally responsible for the reliability and security of the platform - was what kept the system running. Blameless postmortems, shared on-call responsibilities, and celebrating reliability metrics built that culture.

For a deep dive into authentication architecture at scale, see The Complete Guide to Authentication Implementation for Modern Applications.

The next chapter covers the unique dynamics of selling security to enterprises - where your product IS the thing they are evaluating for security.