PICOCLUSTER CLAW // desktop AI datacenter

An experimentation platform for small models. Yours, on your network.

PicoCluster Claw is a two-node appliance for running, testing, and benchmarking small AI models — a dedicated CUDA inference node paired with OpenClaw and ThreadWeaver. It's the reference platform for SMEEP — an open notebook of what small models can actually do in a real agent context.

14W idle. ~$2/month to keep running. Boots in seconds, stays up 24/7, and answers on your LAN — no API keys, no metered tokens, no traffic off the network.

SMEEP reference platform Dedicated CUDA node No API keys OpenClaw runtime Open source

THE PROBLEM //

A private AI stack is the right call. Building one is the hard part.

Off-cluster inference, fragile orchestration, and shared workstation silicon all push private AI into the "later" pile. Claw collapses that work into one boxed appliance.

$

Metered tokens add up fast

Frontier APIs run $15–60 per million tokens. A working agent burns millions a week, and every token you ship is data on someone else's storage policy.

Hand-rolled stacks drift

Ollama, a web UI, a router, a fistful of scripts — any one update can break the pipeline. You end up maintaining plumbing instead of shipping work.

🔒

The network boundary is the policy

Once a request crosses your LAN, your code, designs, and customer data are governed by someone else's terms. Workstation-class hardware can't rewrite that contract.

Shared silicon throttles

Your laptop GPU is already pushing your IDE and browser. Phone-class SoCs thermal-limit under sustained load. You need dedicated execution silicon, not a background process fighting for cycles.

HARDWARE //

Two nodes. One private execution surface.

ClusterClaw hosts the OpenClaw automation runtime. ClusterCrush handles inference on dedicated CUDA silicon. Together they form an always-on AI execution node that lives on your desk and answers on your LAN.

01 // CLUSTERCLAW

Automation runtime node

Raspberry Pi 5 (8 GB) hosting OpenClaw — the local automation runtime. Handles planning, memory, tool dispatch, and MCP server coordination. Ships with 28 built-in tools across five servers: LEDs, system, clustercrush bridge, time, and files.

SoCRPi 5 BCM2712
RAM8 GB LPDDR4X
RoleRuntime / MCP
02 // CLUSTERCRUSH

Dedicated inference node

Jetson Orin Nano Super (8 GB) with 1024 CUDA cores serving OpenAI-compatible inference over the LAN. Reserved silicon — nothing else competes for the GPU, and your workstation stays free for its own work.

SoCOrin Nano Super
RAM8 GB unified
RoleInference / CUDA
03 // LED FEEDBACK

Out-of-band status surface

A Pimoroni Blinkt! strip driven by its own MCP server — an MCP for physical feedback. The runtime drives the strip to surface inference activity, tool calls, and alerts without a screen. The cluster's state is visible at a glance.

TypeBlinkt! RGB
MCP8 tools
ServerLED MCP
04 // 28 MCP TOOLS

Tool-calling against the appliance itself

Five MCP servers expose the cluster as callable surface area: LED control, system metrics, the inference bridge to Crush, time and scheduling, and sandboxed file access. The runtime drives the box through its tools, not through config files.

Servers5
Tools28
ProtoMCP / stdio
CHAT INTERFACE //

ThreadWeaver — the operator console for the cluster.

ThreadWeaver is the chat front-end that ships with the appliance. It runs on the AgentStateGraph (ASG) substrate and can route any thread to on-cluster inference, an external provider, or both — selected per session, not system-wide.

5+ providers, one router

On-cluster inference (Ollama), OpenAI, Anthropic, Mistral, and Groq all sit behind the same interface. Credentials live in one config, routing happens per-thread, and the active provider is always visible.

🧠

ASG substrate

Threads are backed by the AgentStateGraph — not a flat JSON log. Conversations are stateful, resumable, and searchable without any extra setup.

🔧

Native Python hooks + filter

Drop a Python function into the hook directory and it runs on every message — pre-send, post-receive, or both. ThreadWeaver's MCP + ASG + filter pipeline means you control what the model sees.

🔒

Encryption at rest

Thread history is encrypted on disk. On-cluster inference never leaves the appliance, and any external provider call is opt-in per thread — not a system-wide toggle.

ThreadWeaver pipeline
MCP tools
ASG state
filter hooks
model
SPECS //

Numbers worth knowing.

Measured on the appliance, not marketing estimates.

14W
idle power draw
Both nodes, typical idle
20W
typical active
Agent running, model idle
35W
peak inference
Full CUDA load on Crush
$2/mo
electricity cost
At US average $0.13/kWh
18tok/s
inference throughput
gemma3:3b on Crush, CUDA — real-time chat
28
MCP tools
Across 5 servers, built-in
Full hardware spec
Component Node Detail
ClusterClaw Raspberry Pi 5 8 GB LPDDR4X, BCM2712 quad-core A76
ClusterCrush Jetson Orin Nano Super 8 GB unified memory, 1024-core Ampere GPU
Storage Both nodes NVMe SSD (size configurable)
Network Both nodes Gigabit Ethernet + Wi-Fi 5
OS ClusterClaw Raspberry Pi OS (64-bit)
OS ClusterCrush JetPack 6 (Ubuntu 22.04 + CUDA)
LED strip ClusterClaw Pimoroni Blinkt! (8× APA102)
PLATFORMS //

Boxed appliance or workstation install — same stack.

The Claw appliance is the flagship deployment: plug in, power on, runtime up in minutes. The same OpenClaw + ThreadWeaver stack also installs on Mac, Linux, and Windows when you'd rather run it on the workstation you already have.

🍎

macOS

Docker Compose overlay for Apple Silicon and Intel Macs. OpenClaw runtime, ThreadWeaver, and the Ollama bridge come up with one command — Metal GPU handles inference.

macOS install →
🐧

Linux

Native install or Docker Compose. Full MCP tool surface, Ollama bridge, and operator console — identical behavior to the appliance.

Linux install →
🪟

Windows

WSL2 + Docker Compose overlay. Bring the OpenClaw runtime and ThreadWeaver up from Windows Terminal — same stack, same tools, same behavior.

Windows install →
All three targets share a single Docker Compose base with platform-specific overlays — one codebase, three substrates. Upgrade with a single git pull.
SMEEP //

Small Model Experimentation and Evaluation Project

PicoCluster Claw is the reference platform for SMEEP — a living record of what small models can actually do in a real agent context. Every result is reproducible. Every script is open source.

10/10
granite4.1:8b
T1–T4 tool-call ladder. The only model in our dataset to pass every tier.
19×
Q4_K_M vs Q4_0
CPU inference speed advantage for K-quant format on x86 AVX-512. Same model, same hardware.
10/30
granite4.1:8b
Best cap+workflow score on Orin Nano. 33% task completion — measured with KEEP_ALIVE=0, so treat as a lower bound.

What we've published

  • T1–T4 tool-call ladder — 5 models, 10 tasks each. Granite 10/10. Nemotron 9/10. Where the 4B tier hits the wall.
  • 30-task cap+workflow suite — natural language commands a user would actually type. Exec, web, scheduling, LED, multi-tool chains.
  • Quantization explainer — the Q4_0 vs Q4_K_M performance cliff on CPU, why it's 19×, and which format to always pull.
  • H4 Ultra CPU results — why 48 GB of RAM without a GPU can't handle agent workloads above T1. The CPU inference wall, measured.
Read the benchmark results → Raw data & scripts on GitHub →
Run it yourself
# Clone and run on any Ollama host
git clone https://github.com/picocluster/PicoCluster-Claw
cd PicoCluster-Claw

# T1–T4 tool-call ladder
python3 scripts/bench-openclaw.py \
  --models granite4.1:8b,nemotron-3-nano:4b

# 30-task capability suite
python3 scripts/bench-claw.py \
  --models qwen2.5:7b,granite4.1:8b \
  --output results/my-bench.json
CONTRIBUTE // Run this on your hardware and share the results. Any Ollama host works. Minimum viable contribution: hardware spec + exact command + results table. How to contribute →

A desktop AI datacenter.
Boxed, boot-ready, on your LAN.

PicoCluster Claw ships pre-configured — OpenClaw automation runtime, dedicated inference node, ThreadWeaver console, 28 MCP tools. Plug in, power on, runtime up in minutes. No subscription, no external dependency, under $2/month to keep online.

Not ready for the appliance? Install the same stack on Mac, Linux, or Windows — your machine, identical behavior.