Nginx

Overview

Nginx (OpenResty) acts as the reverse proxy in front of vLLM. It handles Bearer token authentication, per-client quota, endpoint allowlisting, rate limiting, IP whitelisting, and async request tracing to Langfuse.


Client → nginx (port 8100) → vLLM (internal, port 8000)
                ↓ (async log phase)
           Langfuse (port 3010)

All access control logic is in nginx/lua/access.lua, loaded via access_by_lua_file.


Allowlisted Endpoints

Only the following paths are reachable. Everything else returns 404.

MethodPathAuth requiredRate limit burst
GET/No
GET/healthNo
GET/v1/modelsNo50
POST/v1/chat/completionsYes200
POST/v1/completionsYes200
GET/vllm/*No
GET/api/documentationNo
GET/api/langfuse-tracingNo

The /vllm/* prefix is a full pass-through to vLLM (all routes, including /vllm/docs and /vllm/openapi.json). /vllm and /vllm/ redirect to /vllm/docs.


Authentication

Bearer token auth is enforced on POST/PUT/PATCH/DELETE methods only. Controlled via .env:


NGINX_AUTH_ENABLED=true    # enforces token on write methods (default)
NGINX_AUTH_ENABLED=false   # disables client auth

Clients must send:


Authorization: Bearer <VLLM_API_KEY>

See docs/security.md for full details.


Per-client Quota

Quota is enforced in Lua via lua-resty-limit-traffic, keyed on X-Client-Name header (falls back to IP). Applied to POST inference endpoints only.


NGINX_QUOTA_ENABLED=true   # enables quota (default: true)
NGINX_QUOTA_RATE=10        # requests per second per client
NGINX_QUOTA_BURST=50       # burst size before 429

IP Whitelisting

Optional. Controlled via .env:


NGINX_IP_WHITELIST_ENABLED=true
NGINX_ALLOWED_IPS=192.168.14.4,10.0.0.0/24,127.0.0.1

Supports individual IPs and CIDR ranges. /health always bypasses this check. Returns 403 for unlisted IPs.


Request/Response Capture

nginx captures request and response bodies (capped at 64 KB) for all endpoints via body_filter_by_lua_block. These are passed to the Langfuse tracing script in the log phase. See docs/tracing.md for details.


Proxy Timeouts

EndpointRead timeoutSend timeout
/v1/chat/completions600s600s
/v1/completions600s600s
/vllm/*600s600s

Streaming responses use proxy_buffering off and chunked_transfer_encoding on.


Status Page

GET / serves a status page (nginx/html/index.html) listing all endpoints and auth requirements. The API documentation section is conditionally injected based on NGINX_API_DOCS_ENABLED.


Build

The nginx image is built from nginx/Dockerfile using OpenResty Alpine. It includes:

Lua scripts are copied from nginx/lua/ into /etc/nginx/lua/. Static HTML is copied from nginx/html/ into /etc/nginx/html/.