vLLM API Usage
Base URL: https://llm.kaisens.fr
All POST endpoints require a Bearer token:
Authorization: Bearer <VLLM_API_KEY>
X-Client-Name: <your-app-name> # recommended — used for per-client quota
Chat Completions
POST /v1/chat/completions
curl https://llm.kaisens.fr/v1/chat/completions \
-H "Authorization: Bearer <VLLM_API_KEY>" \
-H "Content-Type: application/json" \
-H "X-Client-Name: my-app" \
-d '{
"model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Once upon a time" }
],
"max_tokens": 20
}'
Example response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
"choices": [
{
"message": { "role": "assistant", "content": "there was a brave knight who..." },
"index": 0,
"finish_reason": "length"
}
]
}
Text Completions
POST /v1/completions
curl https://llm.kaisens.fr/v1/completions \
-H "Authorization: Bearer <VLLM_API_KEY>" \
-H "Content-Type: application/json" \
-H "X-Client-Name: my-app" \
-d '{
"model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
"prompt": "Once upon a time",
"max_tokens": 20
}'
Models
GET /v1/models — no auth required
curl https://llm.kaisens.fr/v1/models
Common Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Model name to use |
prompt | string | Input text (completions only) |
messages | array | Conversation history (chat only) |
max_tokens | int | Maximum tokens to generate |
temperature | float | Sampling temperature (0.0–2.0, default 1.0) |
top_p | float | Nucleus sampling threshold (default 1.0) |
stream | bool | Stream response chunks (default false) |
stop | string/array | Stop sequence(s) |
Streaming
Add "stream": true to get server-sent events:
curl https://llm.kaisens.fr/v1/chat/completions \
-H "Authorization: Bearer <VLLM_API_KEY>" \
-H "Content-Type: application/json" \
-H "X-Client-Name: my-app" \
-d '{
"model": "solidrust/Mistral-7B-Instruct-v0.3-AWQ",
"messages": [{ "role": "user", "content": "Tell me a story" }],
"max_tokens": 100,
"stream": true
}'
Error Responses
| Status | Meaning |
|---|---|
401 | Missing or invalid Bearer token |
403 | IP not in whitelist |
404 | Endpoint not allowlisted |
429 | Per-client quota exceeded (X-Client-Name or IP rate limit) |
503 | vLLM backend is down |