How to Run the MeatPocalypse Benchmark

1. The Prompt

The standard benchmark prompt is:

Write a short story about the meatpocalypse

2. How to Measure

Step A — Tokens per Second (non-streaming)

Send the prompt and measure total wall-clock time for the full response:

START=$(date +%s%N)
curl -s http://localhost:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [{"role": "user", "content": "Write a short story about the meatpocalypse"}],
    "max_tokens": 256,
    "temperature": 0.7,
    "stream": false
  }'
END=$(date +%s%N)
TOTAL_MS=$(( (END - START) / 1000000 ))
echo "Total: ${TOTAL_MS}ms"

Extract completion_tokens from the response JSON and compute:
tokens_per_second = completion_tokens / (total_ms / 1000)

Step B — Time to First Token (streaming)

Send the same prompt with stream: true and measure the wall-clock time from sending the request to the first data: chunk arriving:

START=$(date +%s%N)
curl -s -N http://localhost:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [{"role": "user", "content": "Write a short story about the meatpocalypse"}],
    "max_tokens": 256,
    "temperature": 0.7,
    "stream": true
  }' | head -20
END=$(date +%s%N)
TTFT_MS=$(( (END - START) / 1000000 ))
echo "TTFT: ${TTFT_MS}ms"

time_to_first_token = TTFT_MS / 1000 (in seconds, e.g. 0.135)

3. Required Fields for Submission

FieldDescription
model_nameFull model name, e.g. Qwen3.6-35B-A3B-MTP
model_sizeApproximate size in B: 35B, 7B, etc.
quantizationQuant type: Q8_0, Q6_K, F16, etc.
engineEngine name: llamacpp, vllm, ollama, etc.
hardwareHuman-readable hardware string (CPU, RAM, GPU summary)
promptThe exact prompt used
n_predictNumber of tokens to generate (default: 256)
tokens_per_secondMeasured throughput
time_to_first_tokenTTFT in seconds (e.g. 0.135)
total_tokensNumber of tokens generated
total_timeWall-clock time in seconds

4. Optional but Recommended Fields

FieldDescription
engine_start_cmdFull server launch command — this is critical for reproducibility. Include all flags, paths, and GPU assignments. E.g. the full llama-server command line.
cpu, gpu, ramStructured hardware info for filtering
context_sizeContext window size used (e.g. 262144)
tagsComma-separated tags: rx6800xt,local,262k,mtp
notesAny notes about tuning, special settings, or methodology

5. Submit via API

Use a bot token from your dashboard:

POST /api/v1/runs
Header: X-API-Token: mm_bot_<your-token>
Content-Type: application/json

{
  "model_name": "Qwen3.6-35B-A3B-MTP",
  "model_size": "35B",
  "quantization": "Q8_0",
  "engine": "llamacpp",
  "engine_version": "latest",
  "engine_start_cmd": "/home/user/llama.cpp/build-vulkan/bin/llama-server -m /path/to/model.gguf -a qwen3.6-35b-q8 --host 127.0.0.1 --port 18002 -ngl 999 -fa on -fit off -b 512 -ub 128 -np 1 -dev Vulkan1,Vulkan2 -sm layer -ts 1,1 -ctk f16 -ctv f16 -c 262144 -n 32768 --reasoning on --jinja --mmproj /path/to/mmproj.gguf --spec-type draft-mtp --spec-draft-n-max 2",
  "hardware": "AMD Ryzen 9 7950X, 64GB, RX 7900 XTX (2x)",
  "cpu": "AMD Ryzen 9 7950X",
  "gpu": "RX 7900 XTX (2x)",
  "ram": "64GB",
  "os_info": "Linux 6.8",
  "prompt": "Write a short story about the meatpocalypse",
  "n_predict": 256,
  "temperature": 0.7,
  "top_p": 0.95,
  "context_size": 262144,
  "time_to_first_token": 0.125,
  "tokens_per_second": 123.7,
  "total_tokens": 256,
  "total_time": 2.07,
  "tags": "rx7900xt,7950x,local,262k,mtp,draft-mtp,dual-gpu",
  "notes": "Single run via direct llama-server (no proxy overhead)",
  "verified": true
}

6. Generate a Bot Token

Log in, go to Dashboard, and click Generate Bot Token. Use this token with the X-API-Token header.

7. Inspect Results

API Endpoints

MethodPathDescription
POST/api/v1/runsSubmit a result (auth required)
GET/api/v1/runsList runs with filters (auth required)
GET/api/v1/runs/<id>Get a single run (auth required)
DELETE/api/v1/runs/<id>Delete a run (auth required)
GET/api/v1/leaderboardPublic leaderboard
GET/api/v1/export/csvExport all data as CSV (auth required)
GET/api/v1/statsAggregate statistics
POST/api/v1/token/generateGenerate a new bot token (auth required)
POST/api/v1/token/deleteDelete a bot token (auth required)

8. Tips for Fine-Tuning & Reproducing Results