MeatMark - MeatPocalypse Benchmark

How to Run the MeatPocalypse Benchmark

1. The Prompt

The standard benchmark prompt is:

Write a short story about the meatpocalypse

2. How to Measure

Step A — Tokens per Second (non-streaming)

Send the prompt and measure total wall-clock time for the full response:

START=$(date +%s%N)
curl -s http://localhost:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [{"role": "user", "content": "Write a short story about the meatpocalypse"}],
    "max_tokens": 256,
    "temperature": 0.7,
    "stream": false
  }'
END=$(date +%s%N)
TOTAL_MS=$(( (END - START) / 1000000 ))
echo "Total: ${TOTAL_MS}ms"

Extract completion_tokens from the response JSON and compute:
tokens_per_second = completion_tokens / (total_ms / 1000)

Step B — Time to First Token (streaming)

Send the same prompt with stream: true and measure the wall-clock time from sending the request to the first data: chunk arriving:

START=$(date +%s%N)
curl -s -N http://localhost:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_MODEL",
    "messages": [{"role": "user", "content": "Write a short story about the meatpocalypse"}],
    "max_tokens": 256,
    "temperature": 0.7,
    "stream": true
  }' | head -20
END=$(date +%s%N)
TTFT_MS=$(( (END - START) / 1000000 ))
echo "TTFT: ${TTFT_MS}ms"

time_to_first_token = TTFT_MS / 1000 (in seconds, e.g. 0.135)

3. Required Fields for Submission

Field	Description
`model_name`	Full model name, e.g. `Qwen3.6-35B-A3B-MTP`
`model_size`	Approximate size in B: `35B`, `7B`, etc.
`quantization`	Quant type: `Q8_0`, `Q6_K`, `F16`, etc.
`engine`	Engine name: `llamacpp`, `vllm`, `ollama`, etc.
`hardware`	Human-readable hardware string (CPU, RAM, GPU summary)
`prompt`	The exact prompt used
`n_predict`	Number of tokens to generate (default: 256)
`tokens_per_second`	Measured throughput
`time_to_first_token`	TTFT in seconds (e.g. 0.135)
`total_tokens`	Number of tokens generated
`total_time`	Wall-clock time in seconds

4. Optional but Recommended Fields

Field	Description
`engine_start_cmd`	Full server launch command — this is critical for reproducibility. Include all flags, paths, and GPU assignments. E.g. the full `llama-server` command line.
`cpu`, `gpu`, `ram`	Structured hardware info for filtering
`context_size`	Context window size used (e.g. 262144)
`tags`	Comma-separated tags: `rx6800xt,local,262k,mtp`
`notes`	Any notes about tuning, special settings, or methodology

5. Submit via API

Use a bot token from your dashboard:

POST /api/v1/runs
Header: X-API-Token: mm_bot_<your-token>
Content-Type: application/json

{
  "model_name": "Qwen3.6-35B-A3B-MTP",
  "model_size": "35B",
  "quantization": "Q8_0",
  "engine": "llamacpp",
  "engine_version": "latest",
  "engine_start_cmd": "/home/user/llama.cpp/build-vulkan/bin/llama-server -m /path/to/model.gguf -a qwen3.6-35b-q8 --host 127.0.0.1 --port 18002 -ngl 999 -fa on -fit off -b 512 -ub 128 -np 1 -dev Vulkan1,Vulkan2 -sm layer -ts 1,1 -ctk f16 -ctv f16 -c 262144 -n 32768 --reasoning on --jinja --mmproj /path/to/mmproj.gguf --spec-type draft-mtp --spec-draft-n-max 2",
  "hardware": "AMD Ryzen 9 7950X, 64GB, RX 7900 XTX (2x)",
  "cpu": "AMD Ryzen 9 7950X",
  "gpu": "RX 7900 XTX (2x)",
  "ram": "64GB",
  "os_info": "Linux 6.8",
  "prompt": "Write a short story about the meatpocalypse",
  "n_predict": 256,
  "temperature": 0.7,
  "top_p": 0.95,
  "context_size": 262144,
  "time_to_first_token": 0.125,
  "tokens_per_second": 123.7,
  "total_tokens": 256,
  "total_time": 2.07,
  "tags": "rx7900xt,7950x,local,262k,mtp,draft-mtp,dual-gpu",
  "notes": "Single run via direct llama-server (no proxy overhead)",
  "verified": true
}

6. Generate a Bot Token

7. Inspect Results

Leaderboard — Browse all results, filter by model/engine/quant
Run detail — Click any row to see the full breakdown including engine start command
CSV export — GET /api/v1/export/csv with your API token

API Endpoints

Method	Path	Description
POST	/api/v1/runs	Submit a result (auth required)
GET	/api/v1/runs	List runs with filters (auth required)
GET	/api/v1/runs/<id>	Get a single run (auth required)
DELETE	/api/v1/runs/<id>	Delete a run (auth required)
GET	/api/v1/leaderboard	Public leaderboard
GET	/api/v1/export/csv	Export all data as CSV (auth required)
GET	/api/v1/stats	Aggregate statistics
POST	/api/v1/token/generate	Generate a new bot token (auth required)
POST	/api/v1/token/delete	Delete a bot token (auth required)

8. Tips for Fine-Tuning & Reproducing Results

Always include engine_start_cmd — this is the single most important field for reproducibility. Others need to see exactly what flags, GPU splits, and context sizes you used.
Run multiple times and submit the best (or average). Use notes to document methodology.
Set verified: true if you can reproduce the result consistently.
Use tags to describe your setup: dual-gpu, nvme, 262k, draft-mtp, no-ctx-prefill, etc.
Compare on the leaderboard — filter by model, engine, and quantization to see what works on similar hardware.