PICOCLUSTER CLAW // desktop AI datacenter

An experimentation platform for small models. Yours, on your network.

PicoCluster Claw is a two-node appliance for running, testing, and benchmarking small AI models — a dedicated CUDA inference node paired with OpenClaw and ThreadWeaver. It's the reference platform for SMEEP — an open notebook of what small models can actually do in a real agent context.

14W idle. ~$2/month to keep running. Boots in seconds, stays up 24/7, and answers on your LAN — no API keys, no metered tokens, no traffic off the network.

Order yours → Read the docs

SMEEP reference platform Dedicated CUDA node No API keys OpenClaw runtime Open source

THE PROBLEM //

A private AI stack is the right call. Building one is the hard part.

Off-cluster inference, fragile orchestration, and shared workstation silicon all push private AI into the "later" pile. Claw collapses that work into one boxed appliance.

Metered tokens add up fast

Frontier APIs run $15–60 per million tokens. A working agent burns millions a week, and every token you ship is data on someone else's storage policy.

⚠

Hand-rolled stacks drift

Ollama, a web UI, a router, a fistful of scripts — any one update can break the pipeline. You end up maintaining plumbing instead of shipping work.

🔒

The network boundary is the policy

Once a request crosses your LAN, your code, designs, and customer data are governed by someone else's terms. Workstation-class hardware can't rewrite that contract.

⚡

Shared silicon throttles

Your laptop GPU is already pushing your IDE and browser. Phone-class SoCs thermal-limit under sustained load. You need dedicated execution silicon, not a background process fighting for cycles.

HARDWARE //

Two nodes. One private execution surface.

ClusterClaw hosts the OpenClaw automation runtime. ClusterCrush handles inference on dedicated CUDA silicon. Together they form an always-on AI execution node that lives on your desk and answers on your LAN.

01 // CLUSTERCLAW

Automation runtime node

Raspberry Pi 5 (8 GB) hosting OpenClaw — the local automation runtime. Handles planning, memory, tool dispatch, and MCP server coordination. Ships with 28 built-in tools across five servers: LEDs, system, clustercrush bridge, time, and files.

SoCRPi 5 BCM2712

RAM8 GB LPDDR4X

RoleRuntime / MCP

02 // CLUSTERCRUSH

Dedicated inference node

Jetson Orin Nano Super (8 GB) with 1024 CUDA cores serving OpenAI-compatible inference over the LAN. Reserved silicon — nothing else competes for the GPU, and your workstation stays free for its own work.

SoCOrin Nano Super

RAM8 GB unified

RoleInference / CUDA

03 // LED FEEDBACK

Out-of-band status surface

A Pimoroni Blinkt! strip driven by its own MCP server — an MCP for physical feedback. The runtime drives the strip to surface inference activity, tool calls, and alerts without a screen. The cluster's state is visible at a glance.

TypeBlinkt! RGB

MCP8 tools

ServerLED MCP

04 // 28 MCP TOOLS

Tool-calling against the appliance itself

Five MCP servers expose the cluster as callable surface area: LED control, system metrics, the inference bridge to Crush, time and scheduling, and sandboxed file access. The runtime drives the box through its tools, not through config files.

Servers5

Tools28

ProtoMCP / stdio

CHAT INTERFACE //

ThreadWeaver — the operator console for the cluster.

ThreadWeaver is the chat front-end that ships with the appliance. It runs on the AgentStateGraph (ASG) substrate and can route any thread to on-cluster inference, an external provider, or both — selected per session, not system-wide.

⚡

5+ providers, one router

On-cluster inference (Ollama), OpenAI, Anthropic, Mistral, and Groq all sit behind the same interface. Credentials live in one config, routing happens per-thread, and the active provider is always visible.

🧠

ASG substrate

Threads are backed by the AgentStateGraph — not a flat JSON log. Conversations are stateful, resumable, and searchable without any extra setup.

🔧

Native Python hooks + filter

Drop a Python function into the hook directory and it runs on every message — pre-send, post-receive, or both. ThreadWeaver's MCP + ASG + filter pipeline means you control what the model sees.

🔒

Encryption at rest

Thread history is encrypted on disk. On-cluster inference never leaves the appliance, and any external provider call is opt-in per thread — not a system-wide toggle.

ThreadWeaver pipeline

MCP tools

→

ASG state

→

filter hooks

→

model

SPECS //

Numbers worth knowing.

Measured on the appliance, not marketing estimates.

14W

idle power draw

Both nodes, typical idle

20W

typical active

Agent running, model idle

35W

peak inference

Full CUDA load on Crush

$2/mo

electricity cost

At US average $0.13/kWh

18tok/s

inference throughput

gemma3:3b on Crush, CUDA — real-time chat

MCP tools

Across 5 servers, built-in

Full hardware spec

Component	Node	Detail
ClusterClaw	Raspberry Pi 5	8 GB LPDDR4X, BCM2712 quad-core A76
ClusterCrush	Jetson Orin Nano Super	8 GB unified memory, 1024-core Ampere GPU
Storage	Both nodes	NVMe SSD (size configurable)
Network	Both nodes	Gigabit Ethernet + Wi-Fi 5
OS	ClusterClaw	Raspberry Pi OS (64-bit)
OS	ClusterCrush	JetPack 6 (Ubuntu 22.04 + CUDA)
LED strip	ClusterClaw	Pimoroni Blinkt! (8× APA102)

PLATFORMS //

Boxed appliance or workstation install — same stack.

The Claw appliance is the flagship deployment: plug in, power on, runtime up in minutes. The same OpenClaw + ThreadWeaver stack also installs on Mac, Linux, and Windows when you'd rather run it on the workstation you already have.

FLAGSHIP

🖥

PicoCluster Claw

The two-node appliance. ClusterClaw hosts the OpenClaw automation runtime. ClusterCrush serves inference on dedicated CUDA silicon. Ships pre-built and pre-configured — no assembly, no provisioning.

OpenClaw runtime pre-installed
ThreadWeaver operator console
28 MCP tools built-in
LED status surface
Dedicated inference silicon
Always-on, 14W idle

Order now →

🍎

macOS

Docker Compose overlay for Apple Silicon and Intel Macs. OpenClaw runtime, ThreadWeaver, and the Ollama bridge come up with one command — Metal GPU handles inference.

macOS install →

🐧

Linux

Native install or Docker Compose. Full MCP tool surface, Ollama bridge, and operator console — identical behavior to the appliance.

Linux install →

🪟

Windows

WSL2 + Docker Compose overlay. Bring the OpenClaw runtime and ThreadWeaver up from Windows Terminal — same stack, same tools, same behavior.

Windows install →

All three targets share a single Docker Compose base with platform-specific overlays — one codebase, three substrates. Upgrade with a single git pull.

SMEEP //

Small Model Experimentation and Evaluation Project

PicoCluster Claw is the reference platform for SMEEP — a living record of what small models can actually do in a real agent context. Every result is reproducible. Every script is open source.

10/10

granite4.1:8b

T1–T4 tool-call ladder. The only model in our dataset to pass every tier.

19×

Q4_K_M vs Q4_0

CPU inference speed advantage for K-quant format on x86 AVX-512. Same model, same hardware.

10/30

granite4.1:8b

Best cap+workflow score on Orin Nano. 33% task completion — measured with KEEP_ALIVE=0, so treat as a lower bound.

What we've published

T1–T4 tool-call ladder — 5 models, 10 tasks each. Granite 10/10. Nemotron 9/10. Where the 4B tier hits the wall.
30-task cap+workflow suite — natural language commands a user would actually type. Exec, web, scheduling, LED, multi-tool chains.
Quantization explainer — the Q4_0 vs Q4_K_M performance cliff on CPU, why it's 19×, and which format to always pull.
H4 Ultra CPU results — why 48 GB of RAM without a GPU can't handle agent workloads above T1. The CPU inference wall, measured.

Read the benchmark results → Raw data & scripts on GitHub →

Run it yourself

# Clone and run on any Ollama host
git clone https://github.com/picocluster/PicoCluster-Claw
cd PicoCluster-Claw

# T1–T4 tool-call ladder
python3 scripts/bench-openclaw.py \
  --models granite4.1:8b,nemotron-3-nano:4b

# 30-task capability suite
python3 scripts/bench-claw.py \
  --models qwen2.5:7b,granite4.1:8b \
  --output results/my-bench.json

CONTRIBUTE // Run this on your hardware and share the results. Any Ollama host works. Minimum viable contribution: hardware spec + exact command + results table. How to contribute →

A desktop AI datacenter.
Boxed, boot-ready, on your LAN.

PicoCluster Claw ships pre-configured — OpenClaw automation runtime, dedicated inference node, ThreadWeaver console, 28 MCP tools. Plug in, power on, runtime up in minutes. No subscription, no external dependency, under $2/month to keep online.

Order from PicoCluster → Read the docs

Not ready for the appliance? Install the same stack on Mac, Linux, or Windows — your machine, identical behavior.

An experimentation platform for small models. Yours, on your network.

A private AI stack is the right call. Building one is the hard part.

Metered tokens add up fast

Hand-rolled stacks drift

The network boundary is the policy

Shared silicon throttles

Two nodes. One private execution surface.

Automation runtime node

Dedicated inference node

Out-of-band status surface

Tool-calling against the appliance itself

ThreadWeaver — the operator console for the cluster.

5+ providers, one router

ASG substrate

Native Python hooks + filter

Encryption at rest

Numbers worth knowing.

Boxed appliance or workstation install — same stack.

PicoCluster Claw

macOS

Linux

Windows

Small Model Experimentation and Evaluation Project

What we've published

A desktop AI datacenter.Boxed, boot-ready, on your LAN.

A desktop AI datacenter.
Boxed, boot-ready, on your LAN.