Skip to content
AI Tools · AI Gateway

Top 5 AI Gateways 2026: Kong vs Portkey vs LiteLLM vs Cloudflare vs Helicone

AI gateways compared: Kong AI Gateway, Portkey, LiteLLM, Cloudflare AI Gateway, and Helicone. Routing, caching, budgeting, observability, and the security controls every production LLM application needs.

By Deepak Gupta·May 15, 2026·13 min·5 tools compared
AI GatewayLLM InfrastructureKongPortkeyLiteLLMCloudflare AI GatewayHelicone

Quick Comparison

GatewayBest ForDeploymentProvidersPricing Model
Kong AI GatewayEnterprise API platform standardizationSelf-hosted + Kong Konnect30+Enterprise license + usage
PortkeyProduction LLM apps wanting routing + observabilityCloud + self-hosted200+Free tier + usage-based
LiteLLMOSS gateway with maximum provider compatibilitySelf-hosted (open source)100+Free; LiteLLM Enterprise tier
Cloudflare AI GatewayCloudflare-hosted apps with simple AI proxy needsCloudflare-only10+ majorGenerous free tier + Workers usage
HeliconeOpen-source observability with light gatewayCloud + self-hosted OSS100+ via proxyFree tier + per-request pricing
1

Kong AI Gateway

Best for Enterprise

Best for: Enterprise API platform standardization with AI traffic alongside traditional APIs

Kong AI Gateway extends Kong's established API gateway platform into AI traffic. For enterprises already running Kong for API management, the AI Gateway is the right path — same operational model, same observability pipeline, same security controls. For greenfield AI-only deployments without Kong commitment, Portkey or LiteLLM are usually better fits.

Pros

  • Built on the established Kong API gateway with enterprise-grade operational maturity
  • Native plugins for prompt injection defense, semantic caching, rate limiting, and cost budgets
  • Unified observability for AI and traditional API traffic in one operational pipeline
  • Strong fit for organizations standardizing API management at enterprise scale

Cons

  • Enterprise pricing model assumes organizational commitment to the Kong platform
  • More operational complexity than purpose-built AI-only gateways
  • Less LLM-provider feature parity than Portkey or LiteLLM in some edge cases
Honest Weakness: Kong AI Gateway makes sense when Kong is already the API gateway standard at the organization. For AI-first teams without existing Kong investment, the operational overhead and pricing model are mismatched to the actual need — purpose-built AI gateways (Portkey, LiteLLM, Helicone) produce better outcomes at lower cost. The Kong fit is enterprise-standardization-driven, not AI-capability-driven.

Enterprise API Platform Integration

The AI Gateway runs as a Kong plugin set on the same gateway infrastructure that handles traditional API traffic. For organizations with existing Kong deployments, the operational model — admin API, declarative configuration, observability via Kong's stack — extends to AI traffic without parallel infrastructure. The integration value is the operational consistency, not raw AI-specific features.

Security Plugins for LLM Traffic

Built-in plugins handle prompt injection scanning, semantic caching with vector embeddings, request validation against expected schemas, cost budgets per consumer, and rate limiting per token volume. The security plugin set is the most enterprise-mature in the AI gateway category, fitting compliance-conscious deployments.

Kong Konnect from approximately $500-2,500/month; Enterprise pricing on request

Visit Kong AI Gateway
2

Portkey

Best Overall

Best for: Production LLM applications wanting routing, observability, and cost control without operating infrastructure

Portkey is the most-feature-complete commercial AI gateway in 2026. The platform combines routing across 200+ LLM providers, prompt caching, fallback chains, observability, and cost budgets in a managed cloud offering. For production LLM apps that want to avoid operating gateway infrastructure, Portkey is the leading choice; for teams committed to self-hosting, LiteLLM is the open-source equivalent.

Pros

  • Broadest LLM provider coverage in the category — 200+ providers and model families
  • Production routing features including fallbacks, load balancing, and semantic caching
  • Strong observability with prompt-level traces, cost tracking, and evaluation tooling
  • Cloud-managed deployment with low operational overhead

Cons

  • Cloud-only deployment for the managed offering; self-hosted available but less seamless than LiteLLM
  • Pricing scales with request volume — meaningful line item for high-throughput applications
  • Some advanced features (guardrails, evaluations) gated to higher pricing tiers
Honest Weakness: Portkey is the right managed AI gateway for production teams that prefer not to operate infrastructure. The cloud-first deployment is the strength and the constraint — organizations with data residency requirements that need self-hosted gateways often prefer LiteLLM or Kong. Pricing for high-throughput applications can exceed Cloudflare AI Gateway by a meaningful multiple; cost modeling against expected traffic matters at scale.

Routing and Fallback Chains

Portkey's configuration model supports declarative routing rules — primary provider, fallback chain on error or latency threshold, semantic caching with embedding-based lookup, request transformation between provider schemas. The configuration is the gateway's strongest production feature; failover patterns that would be application-layer code in alternative deployments become gateway configuration in Portkey.

Observability and Evaluation

The observability layer captures every prompt, completion, latency, token count, and cost per request. Built-in evaluations (groundedness, faithfulness, custom evaluators) score completions against quality criteria; the evaluation results feed into the routing decisions. For production teams running A/B tests across model providers, the integrated evaluation-and-routing loop reduces application-layer experiment infrastructure.

Free tier with 10K requests/month; usage-based pricing from $49/month for production tier

Visit Portkey
3

LiteLLM

Best Open Source

Best for: Open-source AI gateway with maximum provider compatibility for self-hosted production

LiteLLM is the dominant open-source AI gateway in 2026. The project's provider compatibility is the broadest of any open-source option (100+ providers across OpenAI-compatible APIs and native integrations), and the self-hosted deployment is straightforward. For teams committed to self-hosting AI infrastructure, LiteLLM is the default choice.

Pros

  • Open source (MIT) with broad provider compatibility across 100+ LLM providers
  • Self-hosted deployment is operationally simple — Docker container plus configuration
  • OpenAI-compatible interface means existing OpenAI SDK code works against any provider
  • Active community and commercial backing from BerriAI with LiteLLM Enterprise tier

Cons

  • Observability and management UI less polished than Portkey or commercial alternatives
  • Enterprise features (SSO, audit logs, advanced budgets) gated to LiteLLM Enterprise tier
  • Configuration complexity grows with feature surface — production deployments often non-trivial
Honest Weakness: LiteLLM's value is provider compatibility and self-hosted deployment; teams that need polished UI, advanced observability, or commercial support beyond community-effort find the alternatives (Portkey, Helicone) more turnkey. For production teams that prefer self-hosting and have the operational capacity to manage gateway configuration, LiteLLM produces excellent outcomes; for teams that prefer managed infrastructure, the alternatives are usually better fits.

OpenAI-Compatible Universal Interface

LiteLLM exposes an OpenAI-compatible API surface for any backing provider — Anthropic Claude, Google Gemini, Cohere, Bedrock, Azure OpenAI, local models via Ollama or vLLM, dozens more. Existing application code using the OpenAI SDK works against LiteLLM as the backend without modification, which makes provider migration and multi-provider deployment materially easier than direct integration with each provider's SDK.

Self-Hosted Deployment Model

Production deployment is a Docker container plus a YAML configuration describing providers, routing rules, budgets, and rate limits. The deployment fits Kubernetes, ECS, plain Docker, or any container orchestration; the operational model is similar to any HTTP proxy. For organizations preferring self-hosted infrastructure over managed services, the deployment is materially simpler than running Kong or building equivalent capability.

Free (open source, MIT); LiteLLM Enterprise tier with commercial pricing

Visit LiteLLM
4

Cloudflare AI Gateway

Best Free Option

Best for: Cloudflare-hosted applications with simple AI proxy and caching needs

Cloudflare AI Gateway is the right choice when the application already runs on Cloudflare Workers, Pages, or other Cloudflare infrastructure. The gateway runs at the edge, integrates natively with Workers, and ships with a generous free tier. For applications outside the Cloudflare ecosystem, the value proposition is weaker — purpose-built AI gateways have richer features.

Pros

  • Generous free tier — 100K requests/day at no cost
  • Runs at Cloudflare's edge with low latency for globally-distributed clients
  • Native integration with Cloudflare Workers, Workers AI, R2, and the broader Cloudflare stack
  • Built-in caching, rate limiting, and basic observability

Cons

  • Provider coverage limited to major LLM providers (OpenAI, Anthropic, AWS Bedrock, Google, a handful more) versus 100+ on alternatives
  • Lock-in to Cloudflare infrastructure — applications outside Cloudflare derive less value
  • Advanced features (evaluations, prompt management, complex routing) are less mature than Portkey or LiteLLM
Honest Weakness: Cloudflare AI Gateway is excellent if you are already a Cloudflare customer building on Workers; the integration value is real. For applications outside the Cloudflare ecosystem, the gateway's value is just an AI proxy with caching, which the alternatives do better. The decision is almost entirely 'are you on Cloudflare or not?' — yes points to AI Gateway; no points to Portkey or LiteLLM.

Cloudflare Stack Integration

AI Gateway runs at Cloudflare's edge network alongside Workers, Pages, R2, D1, and the broader Cloudflare platform. For applications built on this stack, the integration is operationally tight — observability flows into Cloudflare's analytics, caching uses Cloudflare's KV or D1, and the gateway is configured through the same dashboard as the rest of the infrastructure. The value scales with Cloudflare commitment.

Edge Latency and Generous Free Tier

Running at the edge produces materially lower latency than going through a centralized gateway in one cloud region, particularly for globally-distributed user bases. The 100K-requests-per-day free tier is the most generous in the category; for moderate-volume applications, the gateway is functionally free.

Free tier with 100K requests/day; usage-based pricing for higher volume integrated with Workers pricing

Visit Cloudflare AI Gateway
5

Helicone

Runner Up

Best for: Open-source LLM observability with light AI gateway functionality

Helicone is primarily an LLM observability platform with AI gateway functionality added via its proxy mode. The observability is the strength — production-quality traces, cost tracking, prompt versioning. The gateway capabilities (routing, caching, rate limiting) are present but less mature than the dedicated gateway alternatives.

Pros

  • Best LLM observability of any AI gateway — production-quality traces, evaluations, cost tracking
  • Open source (Apache 2.0) with self-hosted deployment option
  • Proxy mode adds gateway functionality (caching, rate limiting, basic routing) on top of observability
  • Strong fit when observability is the primary need and gateway is secondary

Cons

  • Gateway capabilities (routing, fallbacks, complex caching) less mature than Portkey or LiteLLM
  • Proxy-based deployment adds latency that direct gateway integration avoids
  • Pricing model based on observability events rather than gateway requests — different cost profile
Honest Weakness: Helicone is excellent for the observability use case and adequate for the gateway use case. Teams that need both observability and a production-grade gateway often combine Helicone for observability with LiteLLM or Portkey for gateway capabilities; teams that need the gateway primarily find dedicated alternatives produce better outcomes. The strength of the platform is the observability depth; the gateway functionality is secondary.

LLM Observability Depth

Helicone's observability captures every LLM call with full prompt, completion, latency, token counts, and cost, indexed for searchable analysis. Production teams running thousands of LLM calls per minute use the dashboard to identify regression in prompt quality, latency spikes, cost outliers, and provider error patterns. The observability is the platform's primary value.

Proxy Mode Gateway Functionality

Routing requests through Helicone's proxy enables basic gateway functionality — caching with semantic similarity, rate limiting per user or key, basic provider fallback. For applications where observability is the primary need and gateway features are useful additions, the proxy mode delivers both in one deployment; for applications where the gateway is the primary need, dedicated alternatives are usually better fits.

Free tier with 10K requests/month; usage-based pricing from $20/month for production

Visit Helicone

Which One Should You Pick?

Use CaseOur Recommendation
Production LLM application wanting routing across multiple providers with managed observabilityPortkey for the managed-cloud option; LiteLLM for the self-hosted equivalent. Both provide broad provider coverage; the choice is managed-vs-self-hosted preference.
Application built on Cloudflare Workers with AI proxy and caching needsCloudflare AI Gateway — native integration plus generous free tier makes this the obvious choice when already on Cloudflare. Outside Cloudflare, the value proposition is weaker.
Enterprise with existing Kong API gateway standardizationKong AI Gateway to maintain operational consistency with traditional API traffic. For organizations without Kong commitment, the operational overhead is mismatched to the actual need.
Open-source self-hosted gateway with maximum provider compatibilityLiteLLM — the dominant OSS AI gateway with broad provider support and OpenAI-compatible interface. Helicone OSS as the secondary option when observability is the primary need.
LLM observability as the primary need with light gateway capabilityHelicone for the observability-first deployment. For dedicated LLM observability without gateway needs, see the Top 5 LLM Observability Platforms 2026 comparison.
High-throughput application optimizing for gateway costCloudflare AI Gateway for the generous free tier and edge-latency benefits; LiteLLM self-hosted for the no-incremental-cost-per-request model; cost modeling against expected traffic essential before committing.
Multi-provider production with prompt injection defense requirementsKong AI Gateway for the most mature security plugin set; Portkey as the managed-cloud alternative with guardrails. LiteLLM with separate guardrail integration for self-hosted with security.

Frequently Asked Questions

What does an AI gateway actually do?
An AI gateway sits between application code and LLM providers, providing five primary functions: routing requests across providers (failover, load balancing, A/B testing), caching responses (often semantic — similar prompts return cached responses), rate limiting and cost budgets per user or API key, observability of every LLM call (prompts, completions, latency, cost), and security controls (prompt injection scanning, output filtering). For production LLM applications, the gateway centralizes capabilities that would otherwise be repeated application-layer code across every team using LLMs.
Do I need an AI gateway for a small LLM application?
Not strictly — small applications can call LLM providers directly. The value of a gateway scales with: number of teams using LLMs (centralized cost control), number of providers being used (managed multi-provider routing), traffic volume (caching benefit), and operational maturity needs (observability, security). For a single team with one provider and modest traffic, direct integration plus basic logging is usually sufficient; at organizational scale, a gateway becomes table-stakes.
Portkey vs LiteLLM — what's the real decision?
Managed cloud vs self-hosted. Portkey is the managed cloud offering with polished UI, broader provider coverage, and managed operations; LiteLLM is the open-source self-hosted equivalent with similar capabilities but more operational ownership. Teams that prefer managed services pick Portkey; teams that prefer self-hosting (data residency, cost predictability, infrastructure control) pick LiteLLM. The capability gap between them is meaningful but narrowing through 2025-2026.
Why is Cloudflare AI Gateway so cheap?
Cloudflare's business model is selling infrastructure usage (Workers, R2, KV) and the AI Gateway is a feature that pulls applications into the Cloudflare ecosystem. The gateway itself is essentially free at moderate volume because Cloudflare benefits from the surrounding Workers/Pages/R2 usage. For applications already on Cloudflare, this is excellent value; for applications outside Cloudflare, the lock-in cost is meaningful.
How does an AI gateway interact with LLM observability platforms like Langfuse or LangSmith?
Two patterns. Gateway-as-observability: Portkey, Helicone, and Cloudflare AI Gateway ship observability built in. Gateway-plus-separate-observability: LiteLLM or Kong AI Gateway proxy through to a dedicated observability platform (Langfuse, LangSmith, Arize) via OpenTelemetry or platform-specific integration. The first pattern is simpler operationally; the second pattern produces deeper observability when the dedicated platform has stronger capabilities than the gateway's built-in version.
What about prompt injection defense in AI gateways?
Kong AI Gateway has the most mature security plugin set including prompt injection scanning, output filtering, and PII detection. Portkey ships guardrails on higher pricing tiers. LiteLLM supports guardrail integration via external services (Lakera, Protect AI, custom code). Cloudflare AI Gateway and Helicone have basic capabilities. For applications requiring strict prompt injection defense, Kong is the safest default; for moderate defense needs, the alternatives plus an external guardrail service work well.
Can I run multiple gateways or combine approaches?
Yes, and many production deployments do. Common pattern: LiteLLM for self-hosted gateway capability + Helicone for observability + a dedicated guardrail service (Lakera, Protect AI) for prompt injection defense. The combination produces best-of-breed capabilities at higher operational complexity than a single managed solution like Portkey. The decision is operational complexity tolerance vs feature optimization.

Related Comparisons