vLLM API Usage

Base URL: https://llm.kaisens.fr

All POST endpoints require a Bearer token:


Authorization: Bearer <VLLM_API_KEY>
X-Client-Name: <your-app-name>   # recommended — used for per-client quota

Chat Completions

POST /v1/chat/completions


curl https://llm.kaisens.fr/v1/chat/completions \
  -H "Authorization: Bearer <VLLM_API_KEY>" \
  -H "Content-Type: application/json" \
  -H "X-Client-Name: my-app" \
  -d '{
    "model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Once upon a time" }
    ],
    "max_tokens": 20
  }'

Example response:


{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
  "choices": [
    {
      "message": { "role": "assistant", "content": "there was a brave knight who..." },
      "index": 0,
      "finish_reason": "length"
    }
  ]
}

Text Completions

POST /v1/completions


curl https://llm.kaisens.fr/v1/completions \
  -H "Authorization: Bearer <VLLM_API_KEY>" \
  -H "Content-Type: application/json" \
  -H "X-Client-Name: my-app" \
  -d '{
    "model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
    "prompt": "Once upon a time",
    "max_tokens": 20
  }'

Models

GET /v1/models — no auth required


curl https://llm.kaisens.fr/v1/models

Common Parameters

Parameter	Type	Description
`model`	string	Model name to use
`prompt`	string	Input text (completions only)
`messages`	array	Conversation history (chat only)
`max_tokens`	int	Maximum tokens to generate
`temperature`	float	Sampling temperature (0.0–2.0, default 1.0)
`top_p`	float	Nucleus sampling threshold (default 1.0)
`stream`	bool	Stream response chunks (default false)
`stop`	string/array	Stop sequence(s)

Streaming

Add "stream": true to get server-sent events:


curl https://llm.kaisens.fr/v1/chat/completions \
  -H "Authorization: Bearer <VLLM_API_KEY>" \
  -H "Content-Type: application/json" \
  -H "X-Client-Name: my-app" \
  -d '{
    "model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
    "messages": [{ "role": "user", "content": "Tell me a story" }],
    "max_tokens": 100,
    "stream": true
  }'

Error Responses

Status	Meaning
`401`	Missing or invalid Bearer token
`403`	IP not in whitelist
`404`	Endpoint not allowlisted
`429`	Per-client quota exceeded (`X-Client-Name` or IP rate limit)
`503`	vLLM backend is down