Rate Limiting API: bảo vệ backend khỏi bị quá tải với Redis

API của bạn đang chạy ổn với 100 requests/giây. Rồi một ngày, một client gửi 10.000 requests trong 5 giây — có thể do bug, do crawler, hoặc đơn giản là ai đó chạy script quên đặt delay. Database connection pool cạn, response time tăng vọt, và toàn bộ user khác đều bị ảnh hưởng.

Đó là lúc bạn cần rate limiting.

Bài này mình chia sẻ cách mình triển khai rate limiting cho API trong production — không phải lý thuyết trừu tượng, mà là những pattern đã chạy thật và những sai lầm mình đã gặp.

Rate limiting là gì và tại sao cần?

Rate limiting đơn giản là giới hạn số request một client được gửi trong một khoảng thời gian. Ví dụ: tối đa 100 requests/phút cho mỗi user.

Mục đích không chỉ là chặn attacker. Nó còn giúp:

Bảo vệ resource: database, CPU, memory không bị một client ngốn hết
Đảm bảo công bằng: mọi user đều có bandwidth hợp lý
Giảm chi phí: ít request thừa = ít tiền cloud
Tăng stability: hệ thống chịu được spike tốt hơn

Nếu API của bạn public hoặc có nhiều client khác nhau, rate limiting không phải “nice-to-have” — nó là bắt buộc.

3 thuật toán phổ biến

1. Fixed Window Counter

Đơn giản nhất: chia thời gian thành các window cố định (ví dụ mỗi phút), đếm request trong mỗi window.

Window 1 (10:00-10:01): 87 requests ✅
Window 2 (10:01-10:02): 103 requests ❌ (limit 100)

Ưu điểm: dễ implement, ít memory.

Nhược điểm: có “burst” ở ranh giới window. Client có thể gửi 99 request cuối window 1 + 100 request đầu window 2 = 199 requests trong 2 giây.

2. Sliding Window Log

Ghi lại timestamp mỗi request, đếm số request trong N giây gần nhất.

Requests tại: [10:00:45, 10:00:47, 10:00:52, 10:01:03, ...]
Cửa sổ trượt 60s tính từ hiện tại: đếm tất cả timestamp trong range

Ưu điểm: chính xác, không bị burst ở ranh giới.

Nhược điểm: tốn memory vì phải lưu mọi timestamp. Với 10K users × 100 requests = 1M timestamps.

3. Sliding Window Counter (khuyên dùng)

Kết hợp ưu điểm của cả hai: dùng counter của window hiện tại + tỷ lệ của window trước.

Window trước (10:00-10:01): 80 requests
Window hiện tại (10:01-10:02): 30 requests
Đang ở giây 15 của window hiện tại → tỷ lệ window trước = 75%

Estimated count = 30 + (80 × 0.75) = 90 → cho qua ✅

Ưu điểm: gần như chính xác như sliding log nhưng chỉ cần 2 counter — tiết kiệm memory rất nhiều.

Đây là thuật toán mình dùng nhiều nhất trong production.

Triển khai với Redis + FastAPI

Redis là lựa chọn tự nhiên cho rate limiting vì: nhanh (in-memory), có atomic operations, và hỗ trợ TTL tự động xoá key hết hạn.

Sliding Window Counter với Redis

import time
import redis.asyncio as redis

class RateLimiter:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client

    async def is_allowed(
        self, key: str, limit: int, window_seconds: int
    ) -> tuple[bool, dict]:
        now = time.time()
        current_window = int(now // window_seconds)
        previous_window = current_window - 1
        elapsed = now % window_seconds
        weight_previous = 1 - (elapsed / window_seconds)

        current_key = f"rl:{key}:{current_window}"
        previous_key = f"rl:{key}:{previous_window}"

        pipe = self.redis.pipeline()
        pipe.get(current_key)
        pipe.get(previous_key)
        results = await pipe.execute()

        current_count = int(results[0] or 0)
        previous_count = int(results[1] or 0)

        estimated = current_count + (previous_count * weight_previous)

        if estimated >= limit:
            return False, {
                "limit": limit,
                "remaining": 0,
                "retry_after": window_seconds - int(elapsed),
            }

        # Increment counter và set TTL
        pipe = self.redis.pipeline()
        pipe.incr(current_key)
        pipe.expire(current_key, window_seconds * 2)
        await pipe.execute()

        return True, {
            "limit": limit,
            "remaining": max(0, int(limit - estimated - 1)),
            "retry_after": 0,
        }

Code trên dùng 2 key Redis cho mỗi client: window hiện tại và window trước. TTL tự xoá key cũ — không cần cleanup thủ công.

FastAPI Middleware

from fastapi import Request, Response
from starlette.middleware.base import BaseHTTPMiddleware

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, redis_client: redis.Redis):
        super().__init__(app)
        self.limiter = RateLimiter(redis_client)

    async def dispatch(self, request: Request, call_next):
        # Identify client — ưu tiên user ID, fallback IP
        client_id = getattr(request.state, "user_id", None)
        if not client_id:
            client_id = request.client.host

        allowed, info = await self.limiter.is_allowed(
            key=client_id,
            limit=100,       # 100 requests
            window_seconds=60  # per minute
        )

        if not allowed:
            return Response(
                content='{"detail": "Too many requests"}',
                status_code=429,
                headers={
                    "Retry-After": str(info["retry_after"]),
                    "X-RateLimit-Limit": str(info["limit"]),
                    "X-RateLimit-Remaining": "0",
                },
                media_type="application/json",
            )

        response = await call_next(request)
        response.headers["X-RateLimit-Limit"] = str(info["limit"])
        response.headers["X-RateLimit-Remaining"] = str(info["remaining"])
        return response

Vài điểm quan trọng:

Trả header Retry-After khi reject — client biết bao lâu nữa được gửi lại
Dùng X-RateLimit-* headers — client biết còn bao nhiêu quota
Status 429 Too Many Requests — đúng HTTP spec, client library nào cũng hiểu

Gắn vào FastAPI

from contextlib import asynccontextmanager
import redis.asyncio as redis

@asynccontextmanager
async def lifespan(app):
    app.state.redis = redis.from_url("redis://localhost:6379/0")
    yield
    await app.state.redis.close()

app = FastAPI(lifespan=lifespan)
app.add_middleware(RateLimitMiddleware, redis_client=app.state.redis)

Rate limit theo tầng

Trong production, mình thường không dùng một limit duy nhất. Thay vào đó, tách thành nhiều tầng:

Tầng	Limit	Mục đích
Global per IP	1000 req/phút	Chặn DDoS, bot
Per user	200 req/phút	Công bằng giữa users
Per endpoint	20 req/phút	Bảo vệ endpoint nặng
Per action	5 req/phút	Login, OTP, payment

Endpoint nặng (report generation, export CSV, AI inference) cần limit riêng vì một request có thể ngốn nhiều resource hơn 100 request thường.

# Decorator cho endpoint-level limiting
from functools import wraps

def rate_limit(limit: int, window: int = 60):
    def decorator(func):
        @wraps(func)
        async def wrapper(request: Request, *args, **kwargs):
            key = f"{request.state.user_id}:{request.url.path}"
            limiter = RateLimiter(request.app.state.redis)
            allowed, info = await limiter.is_allowed(key, limit, window)
            if not allowed:
                raise HTTPException(
                    status_code=429,
                    detail="Too many requests",
                    headers={"Retry-After": str(info["retry_after"])},
                )
            return await func(request, *args, **kwargs)
        return wrapper
    return decorator

# Sử dụng
@app.post("/api/reports/export")
@rate_limit(limit=5, window=300)  # 5 lần / 5 phút
async def export_report(request: Request):
    ...

5 bài học từ production

1. Đừng rate limit health check

Load balancer gọi /health liên tục. Nếu bị rate limit, LB nghĩ service down → restart container → downtime vô lý.

# Skip rate limit cho internal paths
if request.url.path in ["/health", "/ready", "/metrics"]:
    return await call_next(request)

2. Cẩn thận với IP-based limiting sau reverse proxy

Nếu app đứng sau Nginx hoặc AWS ALB, request.client.host sẽ là IP của proxy, không phải client thật. Mọi user share cùng một limit.

# Lấy IP thật từ header
client_ip = request.headers.get("X-Forwarded-For", "").split(",")[0].strip()
if not client_ip:
    client_ip = request.client.host

Nhưng cũng cẩn thận: X-Forwarded-For có thể bị spoof nếu proxy không strip header cũ. Đảm bảo reverse proxy của bạn override chứ không append.

3. Xử lý khi Redis down

Redis down thì rate limiter cũng down. Hai chiến lược:

Fail open: cho request qua nếu Redis không khả dụng — ưu tiên availability
Fail closed: reject tất cả — ưu tiên protection

Mình chọn fail open cho hầu hết API, fail closed cho endpoint nhạy cảm (login, payment):

try:
    allowed, info = await self.limiter.is_allowed(key, limit, window)
except redis.ConnectionError:
    if is_sensitive_endpoint(request.url.path):
        return error_response(503, "Service temporarily unavailable")
    return await call_next(request)  # fail open

4. Log khi reject — đừng reject im lặng

Mỗi lần reject, log lại client ID, endpoint, và current count. Giúp phân biệt: đây là user bình thường hit limit hay là bot đang tấn công?

if not allowed:
    logger.warning(
        "Rate limit exceeded",
        extra={"client": client_id, "path": request.url.path, "count": estimated}
    )

5. Cho phép burst hợp lý

Limit 100 req/phút không có nghĩa user chỉ được gửi ~1.67 req/giây. Có những lúc user load một trang và trình duyệt gửi 15 requests đồng thời — đó là hành vi bình thường.

Sliding window counter tự nhiên cho phép burst ở mức hợp lý. Nếu cần kiểm soát burst chặt hơn, xem xét token bucket — nhưng với hầu hết API, sliding window là đủ.

Tổng kết

Rate limiting nghe đơn giản nhưng triển khai đúng cần suy nghĩ kỹ:

Sliding window counter là lựa chọn cân bằng nhất giữa chính xác và hiệu quả
Redis là backend tự nhiên — nhanh, atomic, có TTL
Tách limit theo tầng: global, per-user, per-endpoint, per-action
Trả đúng headers: Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining
Xử lý edge cases: health check, proxy IP, Redis down

Đừng đợi đến khi API bị quá tải mới nghĩ đến rate limiting. Implement sớm, bắt đầu với config rộng rãi, rồi siết dần dựa trên monitoring thực tế. Phòng bệnh hơn chữa bệnh — nhất là khi “bệnh” ở đây có thể khiến toàn bộ hệ thống sập lúc 2 giờ sáng.

Bạn đang triển khai rate limiting cho project nào? Dùng thuật toán gì? Chia sẻ với mình qua email nhé!