Deepak Gupta

Building Identity Infrastructure for Autonomous Enterprises

Why Existing Identity Infrastructure Can't Scale

Every enterprise has identity infrastructure. Active Directory, Okta, Azure Entra ID, AWS IAM - these systems manage who can access what. They were designed for a world where "who" meant a human employee and "what" meant an application or a file share.

That world is gone.

In the autonomous enterprise, "who" includes AI agents, robotic process automation bots, CI/CD pipelines, IoT devices, and multi-agent orchestration systems. "What" includes APIs, MCP tool servers, agent-to-agent delegations, and dynamically provisioned resources that didn't exist five minutes ago.

Bolting AI agent identity onto traditional IAM is like bolting a jet engine onto a bicycle. The bicycle wasn't designed for that kind of power, and the results are predictable.

This chapter lays out the architecture for identity infrastructure that can handle both human and autonomous actors at enterprise scale.

The Agent Identity Architecture

Here's the target state - an identity architecture that treats agents as first-class identity principals alongside humans and traditional services.

+------------------------------------------------------------------+
|                    IDENTITY CONTROL PLANE                          |
|                                                                    |
|  +-------------------+  +------------------+  +----------------+  |
|  | Human Identity    |  | Agent Identity   |  | Service        |  |
|  | Provider          |  | Registry         |  | Identity       |  |
|  | (Okta, Entra ID)  |  | (NEW)            |  | Manager        |  |
|  +-------------------+  +------------------+  +----------------+  |
|         |                       |                     |            |
|         +-----------+-----------+---------------------+            |
|                     |                                              |
|              +------v-------+                                      |
|              | Unified      |                                      |
|              | Identity     |  <-- Single source of truth          |
|              | Store        |      for all identity types           |
|              +--------------+                                      |
|                     |                                              |
|         +-----------+-----------+---------------------+            |
|         |                       |                     |            |
|  +------v--------+  +----------v------+  +-----------v---------+  |
|  | Token Service  |  | Policy Engine  |  | Audit & Monitoring  |  |
|  | (JIT tokens)   |  | (OPA/Cedar)    |  | (SIEM integration)  |  |
|  +----------------+  +----------------+  +---------------------+  |
|                                                                    |
+------------------------------------------------------------------+
         |                       |                     |
         v                       v                     v
  +-----------+          +-----------+          +-----------+
  | Human     |          | AI Agent  |          | Service   |
  | Users     |          | Fleet     |          | Mesh      |
  +-----------+          +-----------+          +-----------+

Let's break down each component.

Component 1: The Agent Identity Registry

The Agent Identity Registry is the new component most organizations don't have. It's the authoritative source for all AI agent identities in the enterprise.

What the Registry Stores

For each agent, the registry maintains:

Agent Identity Record
=====================
Agent ID:           agent-planning-prod-001
Agent Type:         planning-orchestrator
Agent Version:      v2.4.1
Model:              claude-opus-4-20250514
Deployment:         kubernetes/prod/ns-agents
Owner (Human):      alice@company.com
Team:               platform-engineering
Created:            2026-01-15T10:00:00Z
Last Active:        2026-03-27T14:30:00Z
Status:             active

Authorized Capabilities:
  - cloud-inventory:read
  - reports:write
  - delegate-to: [cloud-agent, cost-agent, report-agent]

Delegation Constraints:
  - max-chain-depth: 3
  - max-scope-per-delegation: read-only
  - requires-human-approval-for: [write, delete, admin]

Credential Info:
  - type: SPIFFE SVID
  - rotation: hourly
  - last-rotated: 2026-03-27T14:00:00Z

Registry Operations

The registry supports four core operations:

Register: When a new agent is deployed, it must be registered before it can receive credentials. Registration requires a human sponsor (the owner), a defined capability set, and delegation constraints.

Attest: When an agent starts, it proves its identity to the registry through workload attestation - verifying that the agent process is running on the expected infrastructure, from the expected container image, with the expected configuration.

Query: Systems that receive requests from agents can query the registry to verify the agent's identity, capabilities, and constraints in real time.

Decommission: When an agent is retired, its registry entry is deactivated, all active credentials are revoked, and a decommissioning audit trail is created.

Tip

Start your agent identity registry as a simple database with an API. You don't need a full commercial solution on day one. A PostgreSQL table with the fields above, an API for registration and querying, and a cron job for cleanup will handle most organizations' needs for the first year.

Component 2: Scoped Token Service

The token service issues short-lived, narrowly scoped credentials to agents. This replaces the current practice of long-lived API keys and over-privileged service accounts.

Just-in-Time Token Issuance

The token flow works like this:

1. Agent receives task from human or upstream agent
         |
         v
2. Agent determines required capabilities for next action
         |
         v
3. Agent requests token from Token Service
   Request includes:
     - Agent ID (from registry)
     - Required scope (minimal for this action)
     - Task context (what and why)
     - Delegation chain (if delegated)
         |
         v
4. Token Service validates:
     - Agent is registered and active
     - Requested scope is within agent's authorized capabilities
     - Delegation chain is valid (if delegated)
     - Policy engine approves the request
         |
         v
5. Token Service issues scoped, time-limited token
   Token contains:
     - Agent identity
     - Granted scope (may be narrower than requested)
     - Expiration (minutes, not days)
     - Task correlation ID
     - Delegation chain reference
         |
         v
6. Agent uses token for the specific action
         |
         v
7. Token expires automatically
   Agent must request a new token for next action

Token Design

A well-designed agent token contains everything a downstream system needs to make an authorization decision:

{
  "sub": "agent-planning-prod-001",
  "iss": "token-service.internal",
  "aud": "cloud-inventory-api",
  "iat": 1711547400,
  "exp": 1711548300,           // 15-minute lifetime
  "scope": ["ec2:Describe*", "s3:ListBuckets"],
  "task_id": "infra-review-q1-2026",
  "human_principal": "alice@company.com",
  "delegation_chain_hash": "sha256:abc123...",
  "agent_type": "planning-orchestrator",
  "agent_version": "v2.4.1"
}

The key properties:

Short lifetime (15 minutes): Forces re-authorization for long tasks
Narrow scope: Only the permissions needed for this specific action
Audience restriction: Token only works with the intended service
Full attribution: Human principal, agent identity, and task ID are all present
Delegation chain: Cryptographic hash of the delegation chain for verification

Component 3: The Policy Engine

The policy engine makes authorization decisions for every agent action. It evaluates requests against organizational policies in real time.

Policy Architecture

+------------------------------------------------------------------+
|                       POLICY ENGINE                                |
|                                                                    |
|  Inputs:                          Outputs:                         |
|  - Agent identity                 - ALLOW                          |
|  - Requested action               - DENY (with reason)             |
|  - Target resource                - REQUIRE_APPROVAL               |
|  - Delegation chain               - ALLOW_WITH_CONDITIONS           |
|  - Task context                                                    |
|  - Time of day                                                     |
|  - Risk score                                                      |
|                                                                    |
|  +-------------------+   +--------------------+                    |
|  | Static Policies   |   | Dynamic Policies   |                   |
|  | (Rego/Cedar)      |   | (ML-based risk     |                   |
|  |                   |   |  scoring)           |                   |
|  | - Role mappings   |   | - Behavioral        |                   |
|  | - Scope rules     |   |   baselines         |                   |
|  | - Time windows    |   | - Anomaly           |                   |
|  | - Delegation      |   |   detection         |                   |
|  |   constraints     |   | - Context-based     |                   |
|  +-------------------+   |   risk               |                   |
|                          +--------------------+                    |
+------------------------------------------------------------------+

Sample Policies

Here are practical policies organizations should implement:

Scope boundary policy: Agents cannot request permissions outside their registered capabilities.

Time-of-day policy: Agents performing destructive operations (delete, modify) are restricted to business hours unless explicitly approved.

Volume policy: Agents that make more than N API calls per minute to sensitive systems trigger additional verification.

Cross-domain policy: Agents cannot access resources in domains they're not registered for (e.g., a development agent cannot access production databases).

Delegation depth policy: Delegation chains deeper than N levels require human approval for each additional level.

Sensitive data policy: Agents accessing PII, financial data, or other sensitive classifications require elevated authorization and enhanced logging.

Component 4: The Audit Layer

Every action by every agent must be logged with enough context to reconstruct what happened, why it happened, and who authorized it.

What to Log

Agent Audit Log Entry
=====================
Timestamp:        2026-03-27T14:15:23.456Z
Event Type:       tool_invocation
Agent ID:         agent-planning-prod-001
Agent Type:       planning-orchestrator
Human Principal:  alice@company.com
Task ID:          infra-review-q1-2026
Delegation Chain: [alice -> planning-agent -> inventory-agent]

Action:
  Tool:           aws_ec2_describe_instances
  Parameters:     {region: "us-east-1", filters: [...]}
  Result:         success (247 instances returned)

Authorization:
  Token ID:       tok_abc123
  Granted Scope:  ec2:DescribeInstances
  Policy Decision: ALLOW
  Policy Rule:    agent-cloud-read-v2

Context:
  Step:           3 of 12 in infrastructure review
  Previous Step:  Listed S3 buckets (success)
  Next Step:      Query cost data (pending)

Integration with Enterprise SIEM

Agent audit logs should flow into your existing SIEM alongside human activity logs, network logs, and application logs. The correlation is critical - you need to see agent activity in the same view as everything else.

Key SIEM integration requirements:

Requirement	Implementation
Common format	Structure agent logs in your SIEM's expected format (CEF, LEEF, JSON)
Correlation IDs	Use task IDs and delegation chain IDs as correlation keys
Agent-specific alerts	Create detection rules for agent-specific risks (unusual tool usage, delegation chain violations, scope escalation attempts)
Dashboard visibility	Add agent activity panels to security operations dashboards
Incident playbooks	Create runbooks for agent-related security incidents

Putting It All Together: A Day in the Life

Let's walk through how this architecture works in a real scenario.

Scenario: Alice Asks for an Infrastructure Review

9:00 AM - Alice opens the internal agent portal and requests a quarterly infrastructure review. The portal authenticates Alice via SSO and creates a task record.

9:00:05 AM - The portal invokes the Planning Agent. The Planning Agent's identity is verified against the Agent Registry. A task-scoped delegation token is created, signed by Alice's session, granting the Planning Agent read access to cloud resources and write access to the reporting system.

9:00:10 AM - The Planning Agent decomposes the task into subtasks and identifies which specialized agents it needs. It requests delegation tokens for each subordinate agent from the Token Service. Each token is scoped narrower than the Planning Agent's own scope (permission diminishment). Each token expires in 1 hour.

9:00:15 AM - The Cloud Inventory Agent starts its work. It requests a just-in-time token for AWS API access. The Policy Engine verifies that the agent is registered, the delegation chain is valid, the requested scope is within bounds, and the current time is within allowed operation hours. Token granted for 15 minutes.

9:15 AM - The Cloud Inventory Agent's token expires. It requests a new one for the next batch of API calls. The Policy Engine re-evaluates - nothing has changed, token granted.

9:45 AM - The Cost Analysis Agent attempts to access the billing API with write permissions (to create a cost report tag). The Policy Engine denies this - the agent's delegation only allows read access to cost data. The agent logs the denial and uses read-only data instead.

10:30 AM - All subordinate agents have completed their tasks. The Report Generator Agent creates a report in Confluence. The Planning Agent aggregates results and marks the task complete. All tokens expire. All agent credentials from this task are invalidated.

10:31 AM - Alice receives a notification that the review is complete. She reviews the report. Every action taken by every agent is available in the audit log, traceable back to her original request.

Note

This architecture may seem heavyweight for simple agent tasks. That's intentional. The overhead is minimal for well-designed systems (token issuance takes milliseconds, policy evaluation takes microseconds), but the security guarantees are substantial. The alternative - agents running with permanent credentials and no audit trail - is a risk no enterprise should accept.

Implementation Roadmap

Building this infrastructure doesn't happen overnight. Here's a practical roadmap.

Phase 1: Foundation (Month 1-3)

Deploy the Agent Identity Registry (start with a database and API)
Implement basic agent registration for all existing agents
Assign human owners to every agent
Begin logging all agent actions with agent IDs and task IDs

Phase 2: Token Service (Month 3-6)

Deploy the Token Service
Migrate agents from static credentials to short-lived tokens
Implement basic scope restrictions (read vs. write, per-system)
Set maximum token lifetimes (start with 1 hour, reduce over time)

Phase 3: Policy Engine (Month 6-9)

Deploy OPA or Cedar as the policy engine
Implement core policies (scope boundaries, time-of-day, volume limits)
Enable policy-based delegation constraints
Set up policy violation alerting

Phase 4: Full Integration (Month 9-12)

Integrate agent audit logs with enterprise SIEM
Implement delegation chain verification
Deploy behavioral monitoring and anomaly detection
Create incident response playbooks for agent-related events

Phase 5: Continuous Improvement (Ongoing)

Tighten policies based on observed agent behavior
Reduce token lifetimes as infrastructure matures
Add dynamic, risk-based policy evaluation
Automate agent lifecycle management (registration, rotation, decommissioning)

Technology Choices

You don't need to build everything from scratch. Here's where existing technology fits.

Component	Build	Buy/Use
Agent Registry	Build (simple API + database)	Custom - no mature products exist yet
Token Service	Build on existing standards	Extend your OAuth/OIDC provider (Keycloak, Auth0)
Policy Engine	Configure	OPA (open source) or Cedar (AWS)
Workload Identity	Deploy	SPIFFE/SPIRE (open source)
Secrets Management	Deploy	HashiCorp Vault, cloud-native options
Audit Logging	Extend	Your existing SIEM + custom agent fields
Monitoring	Extend	Your existing observability stack + agent dashboards

The Agent Registry is the only component that requires genuinely new development. Everything else is an extension or configuration of existing infrastructure.

Tip

For a comprehensive guide to CIAM and identity infrastructure at scale, including lessons from building identity systems for over a billion users, see Deepak Gupta's article on customer identity and access management.

The Cost of Not Building This

Some organizations will look at this architecture and think it's overkill. "We only have a few agents. We'll manage them manually."

Consider the trajectory. The average enterprise deployed 3-5 AI agents in 2024. That number is projected to be 50-100 by the end of 2026. By 2028, organizations will have hundreds of agents, many of them creating and managing other agents.

Manual agent identity management at that scale is impossible. Without infrastructure, you'll have agents running on forgotten credentials, with unknown permissions, performing unaudited actions on production systems. The cost of building identity infrastructure now is a fraction of the cost of cleaning up the mess later.

The autonomous enterprise is coming. The question is whether your identity infrastructure is ready for it.