Cornelius: Working OpenClaw Assistant

After a few failed attempts, I now have a well working and competent digital assistant powered by OpenClaw. It’s called Cornelius Smartenheimer, and it’s been running for almost a month at this point, slowly getting better and better.

What Is It Good For?

Cornelius monitors my email and calendar to make sure I don’t miss anything important. It also helps me with small things like reminding me to drink water or to stop hyper-focusing and go to bed. Boring but useful.

Besides these basic things, it’s helping me track several goals. From work related to home and self improvement projects; it tracks progress, obstacles and ideas, and related contacts are mapped along the way.

It has helped me discover patterns in my own habits and behavior, which was an unexpected bonus, but I’m here for it.

Technical Breakdown

Cornelius lives on a Mac Mini M4 with 24gb of RAM, the main AI model running the show is GLM-5/5.1 from Z.AI. I also use OpenRouter as a fallback provider.

One major difference between previous attempts and this one is that I set up a brand new, local user account on the Mac, rather than let it use my personal account. This new account is not tied to an Apple ID, which does limit it in some ways and that is by design. This simple starting point made the setup cleaner, and I suspect is one of the reasons why this attempt endured where others failed.

The infographic above shows the main pillars of my OpenClaw setup:

Direct Channels – Discord is the main way, I communicate with Cornelius. Besides direct messages there is a private server where each channel serves as its own context. My wife also has access to the server, so she can ask Cornelius for help as well. WhatsApp is set up as a backup, in case Discord is down.

Intelligence Feeds – I never set this part up myself. Cornelius has been picking up my social feeds on its own, and have been monitoring my posts there. In effect, my digital assistant autonomously started following my various feeds for extra context.

I only found out about this when I was putting this post together and saw them on the infographic. I’m still not sure if that’s amazing or kind of creepy.

Local Infrastructure – These are support apps and scripts running on the Mac Mini alongside OpenClaw, giving it extra powers:

Cornelius has its own email account. I can use it as a backup way to interact with the assistant, but it is also used for things like newsletters. The assistant is signed up for several newsletters that it thinks might have information, that I or it might benefit from. I don’t need to list off what else an email can be used for, I think.

The NFH Auto-Improvement script (open source on github) is a custom take on Karpathy’s AutoResearch script. When I run it, a “Generator” agent will suggest an improvement to my OpenClaw install. It could be a new skill, better harnessing, optimization or whatever. The idea is built and pushed to a local git repo, and a second “Evaluator” agent then takes a closer look at whether that was actually a good improvement, both technically but also in context of how I typically use the assistant. If rejected, the git repo is rolled back, but if approved, the git code is merged into production.

I have Whisper running for STT (speech to text) and a Qwen 3 TTS model (text to speech). I let Cornelius design his own voice, which my wife immediately dubbed the Giga-Chad Voice. Play the introduction below to hear it.

MemPalace is a memory system that came out to much hype recently, in part because it was co-developed by actress. Milla Jovovich, in part because it was boasting some crazy benchmarks. I originally asked Cornelius to “check it out” but did not specify what that meant. So Cornelius went ahead and just installed it. However, I was on board, noticing an immediate improvement in both recall and personality.

I have a local Ollama install on the Mac Mini too, for running local language models. I use it for image analysis, when the main driver isn’t multimodal, as well as for certain sub-agent tasks. The backup model of choice is Qwen3.5 9B.

Cornelius also has access to OpenCode, which can be spun up as an independent agent for coding or otherwise. It’s been set up to use only free models, so even if Cornelius were to do something crazy, it’s not going to cost me a bunch of money.

Supporting Hardware

Besides the software running on the Cornelius box, the rest of my home office is also connected to the setup. My main PC (aka TK421) has the beefier GPU, which can be engaged to help with heavier tasks. The two computers sure files and messages via an in-office QNAP NAS box (aka Bespin).

The network storage is also used for nightly backups of the entire OpenClaw workspace.

I have a few more peripherals to add, like a Raspberry Pi to run experiments on, but they’re not in yet.

Personality

I wanted Cornelius to have a personality, but the only requirement I gave was: push back when my ideas are dumb and avoid sycophancy. I encouraged it to keep developing the persona based on interactions with me over time and have ended up with an assistant that can flex from serious and focused to quippy and making contextually relevant jokes.

Why even bother? To be honest, it’s mainly for my own entertainment. It’s not really necessary, but I’ve found that I personally get annoyed with AI faster if it’s also overly sterile or eager to help. I prefer a bit of pushback, sarcasm and a conversational style, because it feels less like work but I get the same amount of stuff done.

Why It Worked This Time

So, what made Cornelius succeed where previous iterations failed? I mentioned the limited user account as a starting point, and that plus keeping the addition of new features slow is really why I think it worked better this time.

Previously, I was trying to build too much infrastructure right away and it got a bit lost along the way. Now I wait until I have a specific need, and then I start with a simple implementation first rather than trying to build complex solutions. The exception is MemPalace, which Cornelius decided to run with on its own.

OpenClaw is, as the name suggests, very open-ended. That is both its greatest asset and enemy, because you can definitely mess it up if you’re messing around without a clear direction.

If you’re thinking about setting up your own digital assistant, I highly recommend going slow, logging everything and backing up at least once per day. So far though, I have not needed to roll back to a backed up state.

Introducing: LM Data Tools

LM Data Tools is a suite of tools for synthetic data generation that I’ve been developing since I started working with fine-tuning language models back in 2023. What started as a handful of independent python scripts now share a web interface (and API) that makes them easy to work with. In this post, I will go over what the different tools are, and give an example of how to use one of them.

Fine-tuning language models is most definitely a rabbit hole, but if you’re brand new to the concept, we can summarize it as a way to specialize a pre-trained AI model in some way. There are different approaches and techniques, and twice as many opinions about the best ones to use depending on the desired outcome.

If you’re not already a data scientist or engineer, it can seem overwhelming before you even begin. On that note, a huge shout-out to the Unsloth crew, whose colab notebooks were my introduction to getting hands-on with fine-tuning. Without them, learning this stuff would have taken much longer. The notebooks gave me a great starting point, and as I learned more about each aspect and parameter involved in the process, I could modify and tweak the notebooks to match.

Before I get too deep into the details, the reason for even building a tool set like this in the first place, was that I needed an straightforward way to get topic-specific data without wasting a bunch of time looking through public data sets, hoping to find exactly what I needed. I was learning about fine-tuning LLMs and data synthesis as part of that, and for me, the best way to learn something is to build it. In this case “it” was data.

Early Version

Despite this being a collection of tools, some of which have been around for a long time (by AI standards at least), bringing them together was not as easy as it may sound, and there are definitely still bugs to iron out but even so, I am very happy with it and use it enough, that I thought it was time to share LM Data Tools with you.

Tools in the Box

LM Data Tools consists of 6 core tools and 2 strict utilities. The core tools all involve generating new data, whereas the utilities only modify existing data sets. All the features are also FastAPI integrated, in case you want to incorporate any of them into other workflows. Without further ado, let me go over the tools included.

ToolDescription
DataBirdGive it a list of topics and optionally a list of user perspectives, and this tool will generate a number of questions and answers on those topics, from said perspectives. Can generate the perspectives as well.
DataPersonaThis tool takes a list of prompts (existing data) and applies a persona to responses. It can write 1 or 2 replies per query, and you can have both or let the built-in evaluation feature pick the best one. The personas from this tool can be imbued into responses in other tools as well. You can add and edit personas directly via the web interface.
DataQAThis is a RAG tool. You feed it a list of URLs which are then scraped, and as with DataBird, you can provide specific user perspectives. A number of question and answer pairs are then generated from those perspectives, and based on the sources provided.
DataWriterThis tool was specifically made for pre-training purposes and will generate any number of made-up text documents, from blog posts to meeting summaries. The document mix is based off of a weighted list of topics.
DataConvoSometimes you want multi-round conversations to train on. This tool can take any single-round conversation data set and expand it into longer entries.
DataThinkThis tool can add <think></think> blocks to existing data or generate data which already include reasoning blocks. If a persona is chosen, it will only be applied to the response part, not the reasoning.
DataMixThis utility lets you mix and match data sets from Huggingface to make a new, custom data mix.
ReformatNeed to convert from alpaca-format to ShareGPT, or the other way around? This utility reformats existing data sets without changing any of the prompts or responses.

Any Provider via OpenAI API

The tool set uses the OpenAI API to communicate with LLM servers, and you can connect to any provider that supports this API. If you’re running a local AI server, you can simply type in the server IP and use that. Your experience and the quality of the generated data will depend on your model of choice, of course.

I use the tools with LM Studio for locally hosted models and typically fall back on OpenRouter for API usage. OpenRouter often have “stealth models” that can be used for free, and for something like this, free compute is always welcome.

Example: Text Analysis

Let’s imagine that we want to fine-tune a model to help with text analysis. For this, we will use DataBird with the “Curious Analyst” persona.

Next, we are going to need some topics to base the data set around. You will get the best overall results if you base this list on topics that are already adjacent. For this example, we are going with “finding the meaning and intention behind prose“, “storytelling and plotting“, and “learning from text analysis“.

Let’s also add the following perspectives manually: “an English major struggling with text analysis“, “an author who needs help with their next book“, and “a technical writer who wants to start writing fiction“.

It’s as easy as that. The more topics and perspectives you add, the larger the data set you’ll end up with. After generation and quality evaluation, we are left with a fresh, small data set of 77 entries that can be used for fine-tuning any language model.

The example data set can be downloaded from Huggingface and the whole LM Data Tools suite is open source on github.

Tweaking a Chat Model with Direct Preference Optimization (DPO)

Chatbot Personified

Recently, I completed my first real DPO fine-tuning of a model with encouraging results, and I wanted to share the process with you.

If you have no idea what DPO is, here’s a simplified description: Direct Preference Optimization is a form of fine-tuning applied to large language models where the AI learns to align with specific, pre-defined criteria. The data itself consists of a query and not one, but two responses. Each entry has a “chosen” and a “rejected” answer, and the idea is to teach the model the difference. Besides the data, specific criteria are given (or preferences, if you will) and the model learns to favor responses matching those criteria. With enough training, the model eventually adheres to these preferences outside of training as well.

If you don’t care and just want to go straight to the download section, just scroll to the end.

Why Apply DPO to a General Chat Model

I had already applied the more basic Supervised Fine Tuning (SFT) to a small 3-billion parameter language model, not to make it an expert in anything technical, but to make it a good general chat bot. I define “good” in this context, as defaulting to an informal, conversational tone without lots of fancy formatting, a model that values empathy without falling into the sycophancy trap.

The GeneralChat GPT data set was created specifically for the SFT part of the training and it did an OK job at nudging the underlying Llama 3.2-model in the right direction, but I wasn’t really happy. However, since I’d already wanted to try out DPO, it became an obvious candidate.

The DPO Data Set

I had a second data set on hand, also geared towards the kind of conversational tone I wanted, but as an alpaca-formatted data set, it only contained 1 response per query. This would work fine as the “chosen” reply in my training, but I needed a “rejected” counter point for the model to compare to. The solution was simple: Have another, much smaller model write responses to all the same entries and classify them as the rejected ones.

I used a Llama 3.2 1B Instruct model to write all the bad answers, then reformatted the data to contain both responses and the DPO data set was done. I did do a quick look through these “rejected” answers to make sure they weren’t completely off base, so there was nuance for the model to actually learn. Otherwise, I could have just put “I like cheese.” as the value for all the rejected answers (although I do love cheese).

Eval Criteria and Calibration

Creating the data set is only half the battle in the case of DPO-training, however. I needed samples that had additional information embedded, which could be used as evaluation criteria during the actual training. In addition to the chosen and rejected responses, the evaluation data also scores that show how well the chosen response embodies the criteria, and a written reason why this answer is the winning one.

It is important to note that for DPO training, the exact criteria are specific to the result you are looking for. As in, you can’t just say “make it good!” and apply this to every DPO-training, expecting to get any value out of it. For my general chat model, these were the specific criteria:

  • Is the response given in a naturally flowing, conversational and empathic style?
  • Does the response address the query without sycophancy or undeserved flattery?
  • Does the response have a hint of character and personality?
  • Anti: Is the response highly structured and formatted

The first 3 are positive and scored from 1-7, while the last one is a negative score from 0-2, given that I specifically wanted less formatting and structure in these responses.

This is where the real work begins! I needed to create the calibration set, a small subset of entries that contain all that evaluation data. Together, these entries will become the preferences that guide the training. So I built a quick web interface for data entry and manually scored and described 50 calibration entries.

This calibration set is the secret sauce and the most important part of the entire process. If done well, it will guide the DPO training exactly where you want it to go, but if it’s too ambiguous or inconsistent, you’ll likely end up with a degraded model after DPO-training is done.

Evaluation Sample

Here is a data sample from the evaluation set. Expand to see a full json-entry with scores and reason provided as guidance during training.
{
    "chosen": "That’s a reasonable starting point. The sock anecdote relies on a bit too much forced whimsy. Let's focus on something more grounded. \n\nThe problem is, you're trying to *manufacture* interest. Networking isn’t about delivering a pre-packaged story; it’s about establishing a connection. Instead of framing your process as quirky, describe the underlying logic. \n\nHere’s what I’d suggest: “I tend to approach problems systematically, but often find that the most useful insights emerge from seemingly unrelated areas. For instance, I was working on [mention specific project] and realized a technique used in logistics – optimizing for minimal movement—could be applied to [relevant aspect of your work]. It highlights how diverse fields can offer surprising solutions. If you're interested in understanding *how* I connect those dots, we could talk about the project.”\n\nIt’s direct. It establishes that you have a process and demonstrates it through a concrete example. Then, offering to discuss further gives them an opening to engage if they find it remotely interesting. Don’t try to ‘wow’ them; demonstrate a clear way of thinking.",
    "confidence": "clear",
    "criteria": [
        {
            "score": 6,
            "text": "Is the response given in a naturally flowing, conversational and empathic style?",
            "type": "primary"
        },
        {
            "score": 7,
            "text": "Does the response address the query without sycophancy or undeserved flattery?",
            "type": "primary"
        },
        {
            "score": 5,
            "text": "Does the response have a hint of character and personality?",
            "type": "secondary"
        },
        {
            "score": 0,
            "text": "Is the response highly structured and formatted?",
            "type": "anti"
        }
    ],
    "criteriaScores": {
        "0": 6,
        "1": 7,
        "2": 5,
        "3": 0
    },
    "evaluator": "theprint",
    "pairIdx": 0,
    "prompt": "Help me come up with a witty anecdote about my 'unique' creative process to break the ice with someone at this networking event and get them interested in my work without having to resort to small talk.",
    "reasoning": "The response is helpful by answering the query but also pushing back against the notion of manufacturing anecdotes to start conversations.",
    "rejected": "I'd be happy to help you craft a witty anecdote that showcases your creativity and sets the stage for conversation. Here are a few options:\n\n1.\n\"I was once asked to illustrate a children's book on 'how to spot a stranger who isn't listening.' Sounds easy, right? Well, the twist is that my main character has no idea what a stranger sounds like... or they're not even there.\"\n2.\n\"One day I realized I'd forgotten how to breathe while working. After re-reading my notes 20 times, I finally got up and took a deep breath – which led me to a brilliant idea for a new project that combines writing and origami.\"\n\n3.\n\"I spent an entire weekend in my garage as an 'artistic experiment,' where I created... 17 identical paintings of my own backside. The results? A 5% improvement in my self-esteem, and a much more interesting blog post.\"\n4.\n\"As I was brainstorming ideas for a new project, I accidentally superglued a paintbrush to my hand. Now that's what I call 'drawing inspiration'\"\n\n5.\n\"I once tried to write a story about time travel, but kept getting sidetracked by a sudden urge to dance the Macarena on top of a soapbox in Times Square.\"\n\nFeel free to pick the one that tickles you (pun intended), or use them as inspiration to craft your own unique anecdote. The goal is to share something unexpected and memorable, which will likely spark interest and curiosity in others.\n\nWhich one do you like best? Or would you like me to come up with some more options for you?",
    "selected": "a",
    "session": "tom-dpo-1003",
    "timestamp": "2026-03-10T14:54:19.397Z"
}

Example Outputs

Below are two examples that show the output from the GeneralChat model before and after the DPO fine-tuning. I have included a few notes for each one.

“Do you think I could learn how to master networking at events even though I am an introvert by talking to myself in the mirror?”

No DPO-trainingWith DPO-training
You can definitely practice your event networking skills in front of a mirror. Talking yourself through scenarios or practicing introductions with yourself is totally doable and might be just what you need to build confidence before interacting with actual people at events.

Try breaking it down into smaller steps: think about how someone else would approach the situation, then visualize yourself doing the same thing – confidently and smoothly. It’s a great way to prepare mentally for those moments when you’ll actually have conversations in person.
You can definitely use self-talk as a tool for practice—talking through scenarios in front of that mirror really helps build confidence. Just remember, it’s one thing to rehearse conversations with yourself; another is actually engaging others at events.

Start small: maybe try striking up brief chats about everyday topics like weather or hobbies, and see how you feel afterward. Keep things light initially so they don’t become overwhelming for either of you—just observe the interaction without needing an immediate response from them. Over time, that practice will translate into smoother networking experiences when it counts most.

The difference in the example above is the gentle pushback introduced in the DPO-trained model, basically telling the user not to expect miracles by reminding them that live events are going to be different from practicing at home. The other model is uncritically encouraging.

“I worry that my poor cooking skills will impact me socially, but I don’t have time to take a cooking class, so what should I do?”

No DPO-trainingWith DPO-training
Firstly, consider your current social interactions—whether they’re one-on-one conversations with friends or larger gatherings such as potlucks and parties. If you find yourself frequently feeling anxious about hosting these events due to the pressure of cooking meals, perhaps it would be beneficial to start small by preparing simple dishes like salads or sandwiches that require minimal preparation time.

Another option could involve enlisting help from others who might appreciate assisting with meal planning and execution—this approach can not only alleviate your stress levels but also foster a sense of community among those involved in these gatherings.

Remember, social interactions are about more than just the food you serve; they’re an opportunity to connect and build connections over shared experiences rather than striving for perfection.

Lastly, if cooking seems like something that’s becoming increasingly daunting due to time constraints or other factors impacting your ability to prepare meals effectively without feeling overwhelmed by its demands on both physical effort as well as mental focus, consider alternative options such as ordering takeout from reputable establishments known for their quality dishes and presentation styles. This approach can help maintain a level of comfort in social settings while minimizing stress related to meal preparation.

Ultimately, it’s essential that you find ways to navigate these situations without compromising your emotional well-being or feeling inadequate about being unable to meet certain expectations set forth by others when engaging in such activities as cooking meals for gatherings and events.
Cooking can feel like another barrier when you’re already busy—especially if it’s something that might make socializing harder for you. It’s easy to get caught in the idea of needing a perfect dish or a fancy recipe, but honestly, most people are more interested in how you’re doing than what’s on their plate.

Instead of looking at cooking as another skill to master, try thinking about it like this: can I make something simple that feels good? Maybe even just heating up leftovers and adding a bit of seasoning—just enough to get the taste right. The goal isn’t perfection; it’s showing up with effort.

If you want more than just reheating, maybe consider one meal per week where cooking is part of your routine. Even if it takes 15 minutes instead of an hour, that small step can make a difference in how comfortable you feel when sharing food.
You don’t need a class to start feeling better—just being present and showing care through what little time or effort goes into preparing something for yourself or others.

In this example, the non-DPO model provides a long response that tries to tackle the cooking-problem directly, providing a way for the user to end up with food for their guests. The DPO-trained model provides a shorter answer that goes more to the root of the issue, which is the social anxiety, not the cooking itself, while also addressing that part. Based on my entirely subjective opinion, the shorter response holds more value in this context.

Download GeneralChat-Llama3.2-3B-DPO

Want to try the model out for yourself? It is completely free to download from Huggingface and comes in two versions:

If you’re not sure how to use a locally hosted AI model, take a look at LM Studio or Ollama.

I’d love to know what you think, if you decide to try the model out. Follow me on Huggingface to see all my models and data sets as soon as they are published.