Nginx
Overview
Nginx (OpenResty) acts as the reverse proxy in front of vLLM. It handles Bearer token authentication, per-client quota, endpoint allowlisting, rate limiting, IP whitelisting, and async request tracing to Langfuse.
Client → nginx (port 8100) → vLLM (internal, port 8000)
↓ (async log phase)
Langfuse (port 3010)
All access control logic is in nginx/lua/access.lua, loaded via access_by_lua_file.
Allowlisted Endpoints
Only the following paths are reachable. Everything else returns 404.
| Method | Path | Auth required | Rate limit burst |
|---|---|---|---|
| GET | / | No | — |
| GET | /health | No | — |
| GET | /v1/models | No | 50 |
| POST | /v1/chat/completions | Yes | 200 |
| POST | /v1/completions | Yes | 200 |
| GET | /vllm/* | No | — |
| GET | /api/documentation | No | — |
| GET | /api/langfuse-tracing | No | — |
The /vllm/* prefix is a full pass-through to vLLM (all routes, including /vllm/docs and /vllm/openapi.json). /vllm and /vllm/ redirect to /vllm/docs.
Authentication
Bearer token auth is enforced on POST/PUT/PATCH/DELETE methods only. Controlled via .env:
NGINX_AUTH_ENABLED=true # enforces token on write methods (default)
NGINX_AUTH_ENABLED=false # disables client auth
Clients must send:
Authorization: Bearer <VLLM_API_KEY>
See docs/security.md for full details.
Per-client Quota
Quota is enforced in Lua via lua-resty-limit-traffic, keyed on X-Client-Name header (falls back to IP). Applied to POST inference endpoints only.
NGINX_QUOTA_ENABLED=true # enables quota (default: true)
NGINX_QUOTA_RATE=10 # requests per second per client
NGINX_QUOTA_BURST=50 # burst size before 429
IP Whitelisting
Optional. Controlled via .env:
NGINX_IP_WHITELIST_ENABLED=true
NGINX_ALLOWED_IPS=192.168.14.4,10.0.0.0/24,127.0.0.1
Supports individual IPs and CIDR ranges. /health always bypasses this check. Returns 403 for unlisted IPs.
Request/Response Capture
nginx captures request and response bodies (capped at 64 KB) for all endpoints via body_filter_by_lua_block. These are passed to the Langfuse tracing script in the log phase. See docs/tracing.md for details.
Proxy Timeouts
| Endpoint | Read timeout | Send timeout |
|---|---|---|
/v1/chat/completions | 600s | 600s |
/v1/completions | 600s | 600s |
/vllm/* | 600s | 600s |
Streaming responses use proxy_buffering off and chunked_transfer_encoding on.
Status Page
GET / serves a status page (nginx/html/index.html) listing all endpoints and auth requirements. The API documentation section is conditionally injected based on NGINX_API_DOCS_ENABLED.
Build
The nginx image is built from nginx/Dockerfile using OpenResty Alpine. It includes:
lua-cjson— JSON encoding/decodinglua-resty-http— HTTP client for Lualua-resty-redis— Redis client for Lualua-resty-limit-traffic— per-client quota (bundled with OpenResty)
Lua scripts are copied from nginx/lua/ into /etc/nginx/lua/. Static HTML is copied from nginx/html/ into /etc/nginx/html/.