API Gateway Rate Limiting: Why AWS and Kong Still Struggle in 2026

The API economy, despite its mature facade, remains a wild west when it comes to traffic management. Every developer worth their salt knows that a robust API gateway isn't just a reverse proxy; it's the bouncer, the accountant, and often, the first line of defense against both accidental abuse and malicious intent. At the heart of this defensive posture lies rate limiting – a seemingly simple concept that, in practice, quickly devolves into a complex interplay of algorithms, distributed state, and operational overhead.

As a seasoned observer of this arena, I’ve watched Kong vs. AWS API Gateway: The Truth About API Management in 2025 duke it out for mindshare, each peddling their flavor of control. Recently, both platforms have continued to evolve their rate-limiting capabilities. But let's be blunt: while the marketing slides might promise a panacea, the reality on the ground often involves wrestling with obscure configurations, chasing down inconsistent behavior, and accepting a healthy dose of "best-effort" guarantees. I've just emerged from the trenches, and here's my unvarnished assessment of where things stand.

AWS Traffic Management: Throttling Layers and WAF

AWS API Gateway has always embraced a multi-layered approach to throttling, which, depending on your perspective, is either a testament to its flexibility or an exercise in Byzantine complexity. At its core, API Gateway relies on a token bucket algorithm to manage request flow, where each request consumes a token, and tokens refill at a steady rate up to a defined burst limit. This fundamental mechanism underpins all its throttling layers.

Account-Level & Stage-Level: The Blunt Instruments

The most basic controls are the account-level and stage-level throttles. The account-level limit is a global safeguard, applied across all APIs in a given AWS region. It's a blunt instrument, a safety net to prevent your entire account from being overwhelmed. The default is typically 10,000 requests per second (RPS) with a burst of 5,000 requests, though this can be increased upon request to AWS Support, provided it doesn't exceed AWS's internal regional limits.

The problem? It's a shared resource. If a runaway process hits one API, every other API in your account in that region might start seeing 429s. This isn't ideal for multi-API environments or SaaS platforms where tenant isolation is paramount.

Sources

This article was published by the DataFormatHub Editorial Team, a group of developers and data enthusiasts dedicated to making data transformation accessible and private. Our goal is to provide high-quality technical insights alongside our suite of privacy-first developer tools.

🛠️ Related Tools

Explore these DataFormatHub tools related to this topic:

JSON to YAML - Convert OpenAPI specs
JWT Decoder - Debug auth tokens

AWS Traffic Management: Throttling Layers and WAF

Account-Level & Stage-Level: The Blunt Instruments

Sources

🛠️ Related Tools

📚 You Might Also Like