I've been in IT for thirty years. I've built data centers, managed enterprise infrastructure, led teams. But nothing I'd done professionally quite prepared me for the experience of building a distributed AI agent network out of secondhand Dell Optiplexes and a family media server at 1 AM while my wife is trying to sleep and I'm quietly celebrating because a language model just sent itself a Discord message unprompted.
This is that story.
It Started With One Machine
The Legion 5i was already in the house — my main development machine. i7-14700HX, 64GB RAM, RTX 5070. When I started running local AI models through Ollama and playing with agent frameworks, it was the obvious starting point. I installed OpenClaw (the agent orchestration framework we'd been building) and stood up the first agent: Calder.
Calder was just supposed to be a coding assistant with some extra skills bolted on. A smarter terminal, basically. But once I had a gateway running and the skill system working, I started thinking about what else could be automated. What if I had a dedicated machine for media management? What if one of the nodes watched weather markets? What if there was a security scanner?
That's how you end up with seven machines.
The Fleet Takes Shape
The second node was the T3600 — an old Xeon workstation that had been sitting in the corner running Plex. We call that one Hollywood. E5-1620, 24GB RAM, GT 710 (the video card is literally just for a display output — the GT 710 does no real work), and a Storage Spaces array that eventually grew to 36 terabytes across spinning drives and SSDs. Hollywood runs Sonarr, Prowlarr, and Plex, and the agent monitors all of them and pings me on Discord when something breaks or a new season drops.
Then I found a deal on a pair of Optiplex 7080s and a pair of 7070s. Four machines for cheap, all NVMe-capable. Each one became a node:
- Johnny-5 (Optiplex 7080) — outreach and business intelligence. Runs the SiteLens email pipeline, the contact database, HubSpot sync, all of it.
- Njord (Optiplex 7070) — weather signals and prediction markets. More on this one in a separate post.
- Tiki (Optiplex 7070 #2) — patio guardian. Monitors the outdoor setup, handles whatever ambient home automation I point at it.
- Astrid (Optiplex 7080 #2) — bedroom TV node, still being stood up at time of writing.
My wife's gaming PC (a Ryzen 7 2700x with an RTX 4070, running inside a Thermaltake Armory full tower) became node seven. We call that one Fenrir. It runs in WSL2 alongside her gaming sessions and handles GPU inference when the big models need real compute. That machine has to coexist with someone's actual gaming — which means resource scheduling matters and I can't just let the agent eat all 12GB of VRAM whenever it feels like it.
The Architecture: OpenClaw and the Gateway Mesh
Every node runs an OpenClaw gateway — a WebSocket server that handles agent communication, skill execution, and proactive delivery (the ability for an agent to reach out to you without being asked). Calder on the Legion is the primary orchestrator. The other nodes have their own gateways that Calder can talk to, but they also operate independently.
Skills are the key unit of capability. An agent without skills is just a chat interface. A skill is a bundled piece of logic — a Python script, a set of tools, a defined workflow — that the agent can invoke. Hollywood has a clawarr-suite skill that talks to the Arr stack APIs. Johnny-5 has the SiteLens outreach pipeline as a skill. Njord has weather signal scanning and prediction market analysis.
Each agent also has a personality file — what we call a SOUL.md. Hollywood talks like a grizzled film-set fixer. Njord uses Norse nautical metaphors. Johnny-5 quotes movies. This probably sounds frivolous, but it's actually useful: when an agent sends you a message at 6 AM to tell you there's a strong weather signal in Los Angeles, having a distinct voice makes it immediately clear which node sent it and what context you're in.
What Actually Broke
A lot. In roughly chronological order:
WSL2 is a different animal than bare metal. Hollywood and Fenrir both run in WSL2 on Windows hosts. Password auth over SSH doesn't work reliably in non-interactive contexts on WSL2 — you need Ed25519 keys and you need to convert CRLF line endings in your key file or it silently fails. That one cost me an evening.
Systemd user services and CLI daemons fight for the same port. On Johnny-5, I had both a systemd user service and the OpenClaw CLI daemon trying to bind port 18790 at startup. The result was 1,429 crash-loop restarts and 40-160% CPU usage that I didn't notice for two days. The fix was disabling the systemd service and letting the CLI daemon be the sole gateway.
Drives die. Njord's original 1TB Seagate had 1,192 bad sectors — genuine media failure. Replaced it with a WD Black 500GB that turned out to have 7+ years on it and ATA link drops. Ended up pulling both drives and running NVMe-only on that node. The SATA port tested clean; both drives were just dying. You never fully trust secondhand hardware until it's had a few weeks on the bench.
The T3600's SSD has 79,000 hours on it. It's at 100% wear but still reporting healthy. Storage Spaces on that box is configured as Simple — no redundancy. One drive failure loses data. I know this. It's on the list.
The thing nobody tells you about running a home fleet is that "it's working" and "it's stable" are two different things. Most of the time you're operating in the space between them.
The Naming Convention
This gets its own section because people always ask. Every agent gets a name before it gets a purpose. The name shapes how you think about it, and how you think about it shapes how you build it.
Calder is named for the sculptor Alexander Calder — mobile constructions, dynamic balance, things that move and respond to their environment. That felt right for an orchestrator. Hollywood is self-explanatory. Fenrir is the Norse wolf who swallows the sun — seemed appropriate for a GPU node running security scans. Njord is the Norse god of wind and sea, which maps to weather trading better than anything else I could think of. Johnny-5 is from Short Circuit — he just wanted to be alive and learn things, which is weirdly apt for a machine that's been running outreach campaigns for months.
What I'd Do Differently
Drive selection is the big one. Every time I've had a node go sideways, it's been a drive. Buy new NVMe storage for every node from the start. The compute hardware (especially Optiplexes) is fine secondhand. The storage is not worth gambling on.
I'd also standardize on a single OS from day one. The fleet is a mix of Windows 11 (Legion, Marrok), Windows Server 2025 (T3600), and Ubuntu 24.04 bare metal (the Optiplexes). Each OS has its own quirks for service management, SSH behavior, and Python environment setup. Pick one and stick with it for the fleet nodes.
And I'd give every node its personality file before it goes live, not after. The nodes that shipped with their identity defined from the beginning are noticeably more coherent in how they communicate. The ones where I added personality retrofitted feel a little stitched together.
Where It's Going
The current plan is a hardware swap — Fenrir (the RTX 4070 machine) moves to the media role and takes Hollywood's Plex/Arr workload, because NVENC hardware transcoding on a modern GPU is dramatically better than the GT 710. The T3600 retires to security scanning and gets a GTX 1070. Every node eventually runs a full skills suite and the fleet operates as a coherent whole rather than a collection of independent agents.
The longer-term goal is to package this as a service. Not the hardware — the architecture, the setup, the ongoing management. If you've ever wanted an AI agent running in your environment and haven't wanted to deal with the infrastructure side, that's the problem we're building toward solving.
I've spent thirty years building other people's infrastructure. This is the first time I've built something that actually feels like mine.