If you’re calling an IP geolocation API at scale — or building one — rate limiting matters. From the consumer side: you don’t want a bug or a runaway script to burn through your quota. From the provider side (which is also what you should think about if you’re building your own internal service): you want fairness, abuse prevention, and predictable load.
This post is a practical guide to rate limiting for IP lookup workloads specifically. We’ll cover the standard algorithms, the patterns that work in production, the bugs nobody mentions, and how to make sure your rate limits don’t break legitimate traffic.
Why Rate Limiting Matters for IP Lookups
A few specific scenarios:
As an API consumer (you’re calling the API):
- Runaway costs. A bug that loops over
convertIP()instead of batching can burn through a free tier in seconds. - Quota protection. You want your own internal limits below the API provider’s so you fail gracefully on your side rather than getting throttled by them.
- Multi-tenant fairness. If you’re a SaaS product, you don’t want one customer’s bug to exhaust the quota for everyone.
As an API provider:
- Abuse prevention. Some users will try to extract bulk data via a free tier.
- Cost control. Compute, bandwidth, and upstream data costs need to be bounded.
- Fairness. No single user should be able to monopolize capacity at the expense of others.
The patterns and the math are the same in both directions; the policy decisions differ.
The Standard Algorithms
Token bucket
The most common pattern. A bucket holds N tokens; each request consumes one. Tokens refill at a steady rate (say, 100 tokens/second). When the bucket is empty, requests fail or are queued.
Pros: Allows bursts up to the bucket capacity; smooth refill protects against sustained overload.
Cons: Implementing the refill correctly is fiddly.
When to use: Almost always. Token bucket is the right default.
Leaky bucket
Requests are added to a queue at any rate; the queue drains at a constant rate. Requests that arrive when the queue is full are dropped.
Pros: Smooths bursts into a constant output rate.
Cons: Less flexible than token bucket; requests can be delayed instead of immediately rejected.
When to use: Backend services where smooth downstream rate matters more than fast rejection.
Fixed window
Count requests in fixed time windows (per second, per minute, per hour). Reset the counter at each window boundary.
Pros: Simple to implement.
Cons: Boundary problem — a user can send N requests at the very end of one window and N more at the start of the next, doubling their effective rate. Generally a worse choice than the alternatives.
When to use: Quick prototypes; situations where the boundary problem doesn’t matter.
Sliding window
A more accurate version of fixed window. Instead of resetting at boundaries, the window slides continuously — you count requests in the last N seconds at any moment.
Pros: No boundary problem; more accurate.
Cons: Slightly more expensive to compute (need to track timestamps, not just counters).
When to use: When the boundary problem of fixed window matters and you want accurate rate enforcement.
For IP lookup workloads specifically, token bucket (with a Redis-backed counter) is the typical right answer.
Layered Limits
You probably need more than one limit. Common layering:
- Per-request. Single request rate (1 token per request, refill at X/second).
- Per-user. Each authenticated user has their own bucket.
- Per-API-key. Each API key has a budget.
- Per-IP. Source IP of the requester (catches anonymous abuse).
- Global. Aggregate cap for the whole service (protects against cascading failures).
Each layer triggers independently. A request might pass per-IP and per-user but fail global if the system as a whole is overloaded.
For most apps, per-user and per-API-key are the most important. Per-IP is useful for unauthenticated endpoints. Global is your last line of defense.
Implementing Token Bucket in Redis
The standard pattern uses Redis’s atomic Lua scripts or INCR/EXPIRE combinations. Pseudocode for token bucket:
-- KEYS[1] = bucket key
-- ARGV[1] = max tokens
-- ARGV[2] = refill rate (tokens per ms)
-- ARGV[3] = cost of this request
-- ARGV[4] = current time (ms)
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or tonumber(ARGV[1])
local last = tonumber(bucket[2]) or tonumber(ARGV[4])
-- Refill since last check
local elapsed = math.max(0, tonumber(ARGV[4]) - last)
tokens = math.min(tonumber(ARGV[1]), tokens + (elapsed * tonumber(ARGV[2])))
local allowed = tokens >= tonumber(ARGV[3])
if allowed then
tokens = tokens - tonumber(ARGV[3])
end
redis.call('HMSET', KEYS[1],
'tokens', tokens,
'last_refill', tonumber(ARGV[4]))
redis.call('EXPIRE', KEYS[1], 3600)
return { allowed and 1 or 0, math.floor(tokens) }
The Lua script runs atomically inside Redis, which prevents race conditions where two concurrent requests both think the bucket has enough tokens.
In Node.js with the ioredis client:
import Redis from 'ioredis'
const redis = new Redis()
const script = `<the Lua script above>`
async function checkRateLimit(userId: string): Promise<boolean> {
const key = `ratelimit:${userId}`
const maxTokens = 100
const refillPerMs = 100 / 1000 // 100 tokens per second
const cost = 1
const now = Date.now()
const result = await redis.eval(
script, 1, key,
maxTokens, refillPerMs, cost, now
) as [number, number]
return result[0] === 1
}
Battle-tested libraries (@upstash/ratelimit, rate-limit-redis, Python’s slowapi, Laravel’s RateLimiter) implement these patterns correctly. Use a library; don’t roll it yourself unless you have a specific reason.
Combining Rate Limits with Caching
If you’re caching IP lookups (and you should be — see caching strategies), your effective rate is much lower than your raw request rate. A 95% cache hit rate means only 5% of requests actually hit the API.
The implication: don’t rate-limit at the consumer level too aggressively if you have good caching. A 100-req/s limit at the API consumer might translate to <5 req/s at the actual API — well below any provider’s quota.
Conversely, if you’re rate-limiting the uncached path (calls to the upstream provider), you can be much stricter — the cache absorbs the bulk of the traffic, and the rate limit protects you from cache-miss storms.
What to Do When the Limit is Hit
Three options:
1. Reject with HTTP 429
Standard, clear. The client gets a clear signal that they should back off.
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1729000000
Set Retry-After so the client knows when to retry. Set the X-RateLimit-* headers so clients with proper SDKs back off automatically.
2. Queue the request
Hold the request until tokens are available. Adds latency but doesn’t fail.
Pro: Best UX for legitimate traffic that’s just slightly above the limit. Con: Can hide problems; queues can grow unbounded if you’re not careful.
3. Degrade gracefully
Skip the geo lookup for this request and return a response without geo data. The application functions, just with less context.
Pro: Application keeps working even under pressure. Con: Some features may behave differently for users hitting limits.
For consumer-side rate limits (protecting your own API spend), option 3 is often the right call — the geo data is an enrichment, not a hard requirement.
For provider-side rate limits (protecting your service from abuse), options 1 and 2 are more common.
Headers and Standards
The de facto standard for rate-limit headers is the IETF draft RateLimit Header Fields:
RateLimit-Limit: 100
RateLimit-Remaining: 42
RateLimit-Reset: 30
Or the older convention:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1729000000
Either works. SDKs and clients increasingly look for both. Set them in your responses; respect them in your code.
When Retry-After is present (typically on a 429 response), respect it:
async function callWithBackoff(url: string, opts: RequestInit) {
for (let attempt = 0; attempt < 3; attempt++) {
const resp = await fetch(url, opts)
if (resp.status !== 429) return resp
const retryAfter = resp.headers.get('retry-after')
const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : 1000 * 2 ** attempt
await new Promise(r => setTimeout(r, waitMs))
}
throw new Error('rate limited after retries')
}
Don’t just hammer the API after a 429. The provider gave you a signal; respect it.
Common Pitfalls
1. Per-IP only, no per-user
If you only rate-limit by source IP, you’ll incorrectly throttle large shared NAT exits (corporate offices, universities, mobile carriers — see NAT and CGNAT). Authentication-based limits are fairer.
2. Window boundary problem
Fixed-window limits let users burst at the boundary. If you absolutely need fixed windows, use a sliding window or a token bucket instead.
3. Counting before authentication
If you rate-limit before authenticating the request, an attacker can exhaust your bucket without ever needing a valid credential. Authenticate first, then count.
4. Race conditions in pure-PHP / no-Lua implementations
Naive INCR + check sequences in non-atomic environments can let two concurrent requests both pass when only one should. Use atomic operations (Lua scripts, Redis INCR followed by a single comparison, or database row locks).
5. Local-only counters in multi-instance services
A counter in process memory on each app server means each instance has its own limit. Use Redis (or another shared store) for cross-instance consistency.
6. Forgetting health checks
Internal health checks (“is the API up?”) shouldn’t count against rate limits. Either exempt them by path or by source.
7. Not communicating the limit
Without clear RateLimit-* headers and a meaningful 429 response, clients can’t back off correctly. They just keep hitting the wall.
Rate Limiting in Common Frameworks
Express (Node.js)
import rateLimit from 'express-rate-limit'
import RedisStore from 'rate-limit-redis'
import Redis from 'ioredis'
const redis = new Redis()
const limiter = rateLimit({
store: new RedisStore({
sendCommand: (...args: string[]) => redis.call(...args) as any,
}),
windowMs: 60 * 1000, // 1 minute window
max: 100, // 100 requests per window per IP
standardHeaders: true,
legacyHeaders: false,
})
app.use('/api', limiter)
FastAPI / Starlette (Python)
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.get('/api/lookup/{ip}')
@limiter.limit('100/minute')
async def lookup(request: Request, ip: str):
...
Laravel (PHP)
// In app/Providers/RouteServiceProvider.php
RateLimiter::for('api', function (Request $request) {
return Limit::perMinute(100)->by($request->user()?->id ?: $request->ip());
});
// In routes
Route::middleware('throttle:api')->group(function () {
Route::get('/lookup/{ip}', [LookupController::class, 'show']);
});
Combining Limits
For a real-world API, you often want multiple limits stacked:
- Per-API-key: 10,000 req/hour (subscription tier)
- Per-IP (unauthenticated): 60 req/minute
- Per-endpoint: 1,000 req/minute on /bulk
- Global: 100,000 req/second across all users
Each limit checks independently. A request must pass all of them to proceed. If any fails, you return 429 with the most relevant headers (typically the layer that triggered the limit).
Monitoring Rate Limits
Three metrics:
Rejection rate
% of requests rejected due to rate limits. Some non-zero rate is fine (abuse, misconfigured clients). A spike is a signal — either legitimate traffic surge, or someone’s loop is going wild.
Time-to-fill (queue depth)
If you’re queueing rather than rejecting, how long are requests waiting? If this grows unbounded, your incoming rate exceeds your processing rate.
Per-user / per-tenant rate limit hits
Who’s hitting their limits? If one customer is consistently rate-limited, they need a higher tier or you need to investigate what they’re doing.
TL;DR
- Use token bucket with Redis as your default. Use a library; don’t roll your own.
- Layer limits: per-user, per-API-key, per-IP, global. Each catches different abuse patterns.
- Authenticate before counting. Counting unauthenticated requests is exploitable.
- Cache + rate limit work together. A high cache hit rate means low actual API pressure.
- Set proper
Retry-AfterandRateLimit-*headers so well-behaved clients can back off. - Respect headers on the consumer side. Don’t hammer past 429 responses.
- Monitor rejection rates and per-tenant hits — they tell you what’s happening at the edges.
For deeper performance discussion, see caching strategies. For specific framework implementations, the Node.js, Python, and PHP guides include rate-limiting hooks alongside the geo enrichment code.
The right rate limit is the one your real users never notice while still protecting your service from runaway scripts and abuse. Tune from there.