My OpenClaw (Mis)Adventure

If you follow Artificial Intelligence at all, chances are you have heard of something called Clawdbot, or Moltbot, or OpenClaw. They are all the exact same thing: a virtual assistant with infinite memory and the ability to give itself any skill you or it can think of. It has complete control over its environment for better or worse. It’s highly dangerous to run on any system with access to sensitive information but also incredibly powerful, when it works. And it only takes 1 terminal command to get started.

People across the internet has given this bot its own computer to live on, either virtually, or in many cases, on a Mac Mini. I happened to have a Mac Mini sitting around, so why not give this OpenClaw thing a try?!

The Ultimate Personal Assistant

A personal assistant more or less equivalent to a junior employee. It would have its own email address and its own github-account, which I ended up cleaning out, and I set it up to be able to communicate directly with me through discord at any time. That was definitely a favorite feature, being able to talk to my private AI through my most used communication tool was easy and useful. I also gave it access to read my calendar and my email, even drafting emails on my behalf (but not sending them).

I got it to build a cool dashboard with an embedded kanban board for planning and tracking its work, and various widgets for monitoring token use, chatting with the model, etc.

V1: The “Free Mode” Massacre

At first it was awesome…

New features were built faster than I could type the next prompt. In less than an hour, I had 50% of everything I wanted already set up and running. The next day, I was 75% through making all my wishes come true.

I did notice that letting an AI essentially build out its own capacities and tools burns a lot of tokens. I later found out that the way OpenClaw works adds significantly to that token count, by not being very efficient with what is included every time you send another message. In response to this token burn, I started experimenting with alternative solutions, like using the free models available on OpenRouter. I obtained an API key and asked the system to build out Free Mode.

The idea behind Free Mode was to watch for rate limits and then gently cycle to the next model on the list, and so on until you got to the first model again. If rate limiting was still on, the system would revert to paid models. That was the plan, but I never got it working.

Instead, I ended up in a circle where the bot would continuously corrupt its own config file and immediately restart, breaking itself completely. Over and over. I had it build validation scripts which helped, but not for long (it eventually ignored the validation). I was so frustrated from manually fixing the same json file again and again, it can’t have been good for my blood pressure.

After 4 days, I killed the bot off entirely.

A couple of days later, I tried again. No more fluff, just the original set of features and image generation support. That was the scope. No relying on free compute, just being smart about which model to use for what.

V2: The Dashboard of Death

At first it was awesome…

Everything came together in quick iterations! First a basic dashboard, then the kanban, then another tab with the other widgets, one at a time. I got the calendar working! Image generation, too. It even had a sweet cyberpunk theme!

Then, at one point, I asked it to change the font on a header and the model deleted every other feature on the dashboard. I wasn’t too worried, what with its awesome memory tools and having created a special plan document outlining work priorities and guardrails. But it turns out it had also overwritten all of those things and now had zero memory of what had come before.

I armed myself with patience (okay, there was a lot of swearing) and tried getting the dashboard back together, but at this point it was stuck in a loop, exactly like the first one had failed on the config iteration it was now failing on the dashboard. It could restore about half the features at random but trying to add any of the rest would overwrite what was there and set me back to square one.

At that point it would have been easier and more fun to code it myself. Regardless, I soldiered on. A cron job was set up to automatically back up the entire home directory every 3 hours. Perhaps that could prevent catastrophic loss like that in the future, I thought. Only there was no cron job. The AI just told me there was, and like an idiot, I believed it.

The dashboard failed completely several times. I tried editing the rules for development to test everything before deployment, but none of it took consistently. I built specialized agents specifically dedicated to failure points, but the benefit was miniscule. When my API credits ran out, and I decided not to buy more.

Two lobster claws. torn off and broken.
OpenClaw? More like BrokenClaw!

After 4 days, I killed the bot off entirely.

Less Is More

OpenClaw is an impressive piece of software on the surface, but it is extremely finicky and though it can add new skills and tools fast, it does not work well with iterating on existing code. I was using Claude Sonnet and Opus for all the coding on V2 and it burned up at least 100M tokens in those 4 days with nothing to show for it.

Regardless of provider, if you want to run this, you will want to use very strong models for OpenClaw’s heavy lifting. Long context, reasoning and strong coding is crucial, but you can save a lot of money by not relying on the most expensive models for everything. For trivial tasks, you can even use small, local models without issue. But beyond that, you will find that errors and failures increase very noticeably as you scale model capacity down.

I found that Claude Opus was basically required for building anything beyond a single, standalone skill, but it is too expensive to use for everything. Claude Sonnet is great for many things but can also be overconfident and make horrible mistakes. The worst mistakes were made by Sonnet. For lighter tasks, I actually preferred Gemini 3 Flash Preview to Claude Haiku, as the latter had more mistakes. Finally, I started using a Qwen 2.5 7B model for running the heartbeat every 30 minutes. I tested several other models as well, but none of them stood out in a positive way. For the record, the only model I tested from OpenAI was gpt-oss-20B for lighter tasks in V1, but I did not like its performance at all.

Your mileage may vary of course, but for me running and maintaining an OpenClaw assistant is not even close to worth the expense both in time and money. It’s too easy to break, and it does so frequently. With AI coding tools being so good in general now, if you know how to code, you can oversee the work and create a custom framework that does exactly what you need without the overhead.

Learning from what worked and didn’t work with OpenClaw was great, though, and I can’t wait to build my own system to replace it. Personally, I prefer building scalable solutions specifically addressing real needs, over taking a big, open framework and try to force it to work for me.

Leave a Reply

Your email address will not be published. Required fields are marked *