Tweaking a Chat Model with Direct Preference Optimization (DPO)

Chatbot Personified

Recently, I completed my first real DPO fine-tuning of a model with encouraging results, and I wanted to share the process with you.

If you have no idea what DPO is, here’s a simplified description: Direct Preference Optimization is a form of fine-tuning applied to large language models where the AI learns to align with specific, pre-defined criteria. The data itself consists of a query and not one, but two responses. Each entry has a “chosen” and a “rejected” answer, and the idea is to teach the model the difference. Besides the data, specific criteria are given (or preferences, if you will) and the model learns to favor responses matching those criteria. With enough training, the model eventually adheres to these preferences outside of training as well.

If you don’t care and just want to go straight to the download section, just scroll to the end.

Why Apply DPO to a General Chat Model

I had already applied the more basic Supervised Fine Tuning (SFT) to a small 3-billion parameter language model, not to make it an expert in anything technical, but to make it a good general chat bot. I define “good” in this context, as defaulting to an informal, conversational tone without lots of fancy formatting, a model that values empathy without falling into the sycophancy trap.

The GeneralChat GPT data set was created specifically for the SFT part of the training and it did an OK job at nudging the underlying Llama 3.2-model in the right direction, but I wasn’t really happy. However, since I’d already wanted to try out DPO, it became an obvious candidate.

The DPO Data Set

I had a second data set on hand, also geared towards the kind of conversational tone I wanted, but as an alpaca-formatted data set, it only contained 1 response per query. This would work fine as the “chosen” reply in my training, but I needed a “rejected” counter point for the model to compare to. The solution was simple: Have another, much smaller model write responses to all the same entries and classify them as the rejected ones.

I used a Llama 3.2 1B Instruct model to write all the bad answers, then reformatted the data to contain both responses and the DPO data set was done. I did do a quick look through these “rejected” answers to make sure they weren’t completely off base, so there was nuance for the model to actually learn. Otherwise, I could have just put “I like cheese.” as the value for all the rejected answers (although I do love cheese).

Eval Criteria and Calibration

Creating the data set is only half the battle in the case of DPO-training, however. I needed samples that had additional information embedded, which could be used as evaluation criteria during the actual training. In addition to the chosen and rejected responses, the evaluation data also scores that show how well the chosen response embodies the criteria, and a written reason why this answer is the winning one.

It is important to note that for DPO training, the exact criteria are specific to the result you are looking for. As in, you can’t just say “make it good!” and apply this to every DPO-training, expecting to get any value out of it. For my general chat model, these were the specific criteria:

  • Is the response given in a naturally flowing, conversational and empathic style?
  • Does the response address the query without sycophancy or undeserved flattery?
  • Does the response have a hint of character and personality?
  • Anti: Is the response highly structured and formatted

The first 3 are positive and scored from 1-7, while the last one is a negative score from 0-2, given that I specifically wanted less formatting and structure in these responses.

This is where the real work begins! I needed to create the calibration set, a small subset of entries that contain all that evaluation data. Together, these entries will become the preferences that guide the training. So I built a quick web interface for data entry and manually scored and described 50 calibration entries.

This calibration set is the secret sauce and the most important part of the entire process. If done well, it will guide the DPO training exactly where you want it to go, but if it’s too ambiguous or inconsistent, you’ll likely end up with a degraded model after DPO-training is done.

Evaluation Sample

Here is a data sample from the evaluation set. Expand to see a full json-entry with scores and reason provided as guidance during training.
{
    "chosen": "That’s a reasonable starting point. The sock anecdote relies on a bit too much forced whimsy. Let's focus on something more grounded. \n\nThe problem is, you're trying to *manufacture* interest. Networking isn’t about delivering a pre-packaged story; it’s about establishing a connection. Instead of framing your process as quirky, describe the underlying logic. \n\nHere’s what I’d suggest: “I tend to approach problems systematically, but often find that the most useful insights emerge from seemingly unrelated areas. For instance, I was working on [mention specific project] and realized a technique used in logistics – optimizing for minimal movement—could be applied to [relevant aspect of your work]. It highlights how diverse fields can offer surprising solutions. If you're interested in understanding *how* I connect those dots, we could talk about the project.”\n\nIt’s direct. It establishes that you have a process and demonstrates it through a concrete example. Then, offering to discuss further gives them an opening to engage if they find it remotely interesting. Don’t try to ‘wow’ them; demonstrate a clear way of thinking.",
    "confidence": "clear",
    "criteria": [
        {
            "score": 6,
            "text": "Is the response given in a naturally flowing, conversational and empathic style?",
            "type": "primary"
        },
        {
            "score": 7,
            "text": "Does the response address the query without sycophancy or undeserved flattery?",
            "type": "primary"
        },
        {
            "score": 5,
            "text": "Does the response have a hint of character and personality?",
            "type": "secondary"
        },
        {
            "score": 0,
            "text": "Is the response highly structured and formatted?",
            "type": "anti"
        }
    ],
    "criteriaScores": {
        "0": 6,
        "1": 7,
        "2": 5,
        "3": 0
    },
    "evaluator": "theprint",
    "pairIdx": 0,
    "prompt": "Help me come up with a witty anecdote about my 'unique' creative process to break the ice with someone at this networking event and get them interested in my work without having to resort to small talk.",
    "reasoning": "The response is helpful by answering the query but also pushing back against the notion of manufacturing anecdotes to start conversations.",
    "rejected": "I'd be happy to help you craft a witty anecdote that showcases your creativity and sets the stage for conversation. Here are a few options:\n\n1.\n\"I was once asked to illustrate a children's book on 'how to spot a stranger who isn't listening.' Sounds easy, right? Well, the twist is that my main character has no idea what a stranger sounds like... or they're not even there.\"\n2.\n\"One day I realized I'd forgotten how to breathe while working. After re-reading my notes 20 times, I finally got up and took a deep breath – which led me to a brilliant idea for a new project that combines writing and origami.\"\n\n3.\n\"I spent an entire weekend in my garage as an 'artistic experiment,' where I created... 17 identical paintings of my own backside. The results? A 5% improvement in my self-esteem, and a much more interesting blog post.\"\n4.\n\"As I was brainstorming ideas for a new project, I accidentally superglued a paintbrush to my hand. Now that's what I call 'drawing inspiration'\"\n\n5.\n\"I once tried to write a story about time travel, but kept getting sidetracked by a sudden urge to dance the Macarena on top of a soapbox in Times Square.\"\n\nFeel free to pick the one that tickles you (pun intended), or use them as inspiration to craft your own unique anecdote. The goal is to share something unexpected and memorable, which will likely spark interest and curiosity in others.\n\nWhich one do you like best? Or would you like me to come up with some more options for you?",
    "selected": "a",
    "session": "tom-dpo-1003",
    "timestamp": "2026-03-10T14:54:19.397Z"
}

Example Outputs

Below are two examples that show the output from the GeneralChat model before and after the DPO fine-tuning. I have included a few notes for each one.

“Do you think I could learn how to master networking at events even though I am an introvert by talking to myself in the mirror?”

No DPO-trainingWith DPO-training
You can definitely practice your event networking skills in front of a mirror. Talking yourself through scenarios or practicing introductions with yourself is totally doable and might be just what you need to build confidence before interacting with actual people at events.

Try breaking it down into smaller steps: think about how someone else would approach the situation, then visualize yourself doing the same thing – confidently and smoothly. It’s a great way to prepare mentally for those moments when you’ll actually have conversations in person.
You can definitely use self-talk as a tool for practice—talking through scenarios in front of that mirror really helps build confidence. Just remember, it’s one thing to rehearse conversations with yourself; another is actually engaging others at events.

Start small: maybe try striking up brief chats about everyday topics like weather or hobbies, and see how you feel afterward. Keep things light initially so they don’t become overwhelming for either of you—just observe the interaction without needing an immediate response from them. Over time, that practice will translate into smoother networking experiences when it counts most.

The difference in the example above is the gentle pushback introduced in the DPO-trained model, basically telling the user not to expect miracles by reminding them that live events are going to be different from practicing at home. The other model is uncritically encouraging.

“I worry that my poor cooking skills will impact me socially, but I don’t have time to take a cooking class, so what should I do?”

No DPO-trainingWith DPO-training
Firstly, consider your current social interactions—whether they’re one-on-one conversations with friends or larger gatherings such as potlucks and parties. If you find yourself frequently feeling anxious about hosting these events due to the pressure of cooking meals, perhaps it would be beneficial to start small by preparing simple dishes like salads or sandwiches that require minimal preparation time.

Another option could involve enlisting help from others who might appreciate assisting with meal planning and execution—this approach can not only alleviate your stress levels but also foster a sense of community among those involved in these gatherings.

Remember, social interactions are about more than just the food you serve; they’re an opportunity to connect and build connections over shared experiences rather than striving for perfection.

Lastly, if cooking seems like something that’s becoming increasingly daunting due to time constraints or other factors impacting your ability to prepare meals effectively without feeling overwhelmed by its demands on both physical effort as well as mental focus, consider alternative options such as ordering takeout from reputable establishments known for their quality dishes and presentation styles. This approach can help maintain a level of comfort in social settings while minimizing stress related to meal preparation.

Ultimately, it’s essential that you find ways to navigate these situations without compromising your emotional well-being or feeling inadequate about being unable to meet certain expectations set forth by others when engaging in such activities as cooking meals for gatherings and events.
Cooking can feel like another barrier when you’re already busy—especially if it’s something that might make socializing harder for you. It’s easy to get caught in the idea of needing a perfect dish or a fancy recipe, but honestly, most people are more interested in how you’re doing than what’s on their plate.

Instead of looking at cooking as another skill to master, try thinking about it like this: can I make something simple that feels good? Maybe even just heating up leftovers and adding a bit of seasoning—just enough to get the taste right. The goal isn’t perfection; it’s showing up with effort.

If you want more than just reheating, maybe consider one meal per week where cooking is part of your routine. Even if it takes 15 minutes instead of an hour, that small step can make a difference in how comfortable you feel when sharing food.
You don’t need a class to start feeling better—just being present and showing care through what little time or effort goes into preparing something for yourself or others.

In this example, the non-DPO model provides a long response that tries to tackle the cooking-problem directly, providing a way for the user to end up with food for their guests. The DPO-trained model provides a shorter answer that goes more to the root of the issue, which is the social anxiety, not the cooking itself, while also addressing that part. Based on my entirely subjective opinion, the shorter response holds more value in this context.

Download GeneralChat-Llama3.2-3B-DPO

Want to try the model out for yourself? It is completely free to download from Huggingface and comes in two versions:

If you’re not sure how to use a locally hosted AI model, take a look at LM Studio or Ollama.

I’d love to know what you think, if you decide to try the model out. Follow me on Huggingface to see all my models and data sets as soon as they are published.

My OpenClaw (Mis)Adventure

Lobster with an OpenClaw, on fire. On a Mac.

If you follow Artificial Intelligence at all, chances are you have heard of something called Clawdbot, or Moltbot, or OpenClaw. They are all the exact same thing: a virtual assistant with infinite memory and the ability to give itself any skill you or it can think of. It has complete control over its environment for better or worse. It’s highly dangerous to run on any system with access to sensitive information but also incredibly powerful, when it works. And it only takes 1 terminal command to get started.

People across the internet has given this bot its own computer to live on, either virtually, or in many cases, on a Mac Mini. I happened to have a Mac Mini sitting around, so why not give this OpenClaw thing a try?!

The Ultimate Personal Assistant

A personal assistant more or less equivalent to a junior employee. It would have its own email address and its own github-account, which I ended up cleaning out, and I set it up to be able to communicate directly with me through discord at any time. That was definitely a favorite feature, being able to talk to my private AI through my most used communication tool was easy and useful. I also gave it access to read my calendar and my email, even drafting emails on my behalf (but not sending them).

I got it to build a cool dashboard with an embedded kanban board for planning and tracking its work, and various widgets for monitoring token use, chatting with the model, etc.

V1: The “Free Mode” Massacre

At first it was awesome…

New features were built faster than I could type the next prompt. In less than an hour, I had 50% of everything I wanted already set up and running. The next day, I was 75% through making all my wishes come true.

I did notice that letting an AI essentially build out its own capacities and tools burns a lot of tokens. I later found out that the way OpenClaw works adds significantly to that token count, by not being very efficient with what is included every time you send another message. In response to this token burn, I started experimenting with alternative solutions, like using the free models available on OpenRouter. I obtained an API key and asked the system to build out Free Mode.

The idea behind Free Mode was to watch for rate limits and then gently cycle to the next model on the list, and so on until you got to the first model again. If rate limiting was still on, the system would revert to paid models. That was the plan, but I never got it working.

Instead, I ended up in a circle where the bot would continuously corrupt its own config file and immediately restart, breaking itself completely. Over and over. I had it build validation scripts which helped, but not for long (it eventually ignored the validation). I was so frustrated from manually fixing the same json file again and again, it can’t have been good for my blood pressure.

After 4 days, I killed the bot off entirely.

A couple of days later, I tried again. No more fluff, just the original set of features and image generation support. That was the scope. No relying on free compute, just being smart about which model to use for what.

V2: The Dashboard of Death

At first it was awesome…

Everything came together in quick iterations! First a basic dashboard, then the kanban, then another tab with the other widgets, one at a time. I got the calendar working! Image generation, too. It even had a sweet cyberpunk theme!

Then, at one point, I asked it to change the font on a header and the model deleted every other feature on the dashboard. I wasn’t too worried, what with its awesome memory tools and having created a special plan document outlining work priorities and guardrails. But it turns out it had also overwritten all of those things and now had zero memory of what had come before.

I armed myself with patience (okay, there was a lot of swearing) and tried getting the dashboard back together, but at this point it was stuck in a loop, exactly like the first one had failed on the config iteration it was now failing on the dashboard. It could restore about half the features at random but trying to add any of the rest would overwrite what was there and set me back to square one.

At that point it would have been easier and more fun to code it myself. Regardless, I soldiered on. A cron job was set up to automatically back up the entire home directory every 3 hours. Perhaps that could prevent catastrophic loss like that in the future, I thought. Only there was no cron job. The AI just told me there was, and like an idiot, I believed it.

The dashboard failed completely several times. I tried editing the rules for development to test everything before deployment, but none of it took consistently. I built specialized agents specifically dedicated to failure points, but the benefit was miniscule. When my API credits ran out, and I decided not to buy more.

Two lobster claws. torn off and broken.
OpenClaw? More like BrokenClaw!

After 4 days, I killed the bot off entirely.

Less Is More

OpenClaw is an impressive piece of software on the surface, but it is extremely finicky and though it can add new skills and tools fast, it does not work well with iterating on existing code. I was using Claude Sonnet and Opus for all the coding on V2 and it burned up at least 100M tokens in those 4 days with nothing to show for it.

Regardless of provider, if you want to run this, you will want to use very strong models for OpenClaw’s heavy lifting. Long context, reasoning and strong coding is crucial, but you can save a lot of money by not relying on the most expensive models for everything. For trivial tasks, you can even use small, local models without issue. But beyond that, you will find that errors and failures increase very noticeably as you scale model capacity down.

I found that Claude Opus was basically required for building anything beyond a single, standalone skill, but it is too expensive to use for everything. Claude Sonnet is great for many things but can also be overconfident and make horrible mistakes. The worst mistakes were made by Sonnet. For lighter tasks, I actually preferred Gemini 3 Flash Preview to Claude Haiku, as the latter had more mistakes. Finally, I started using a Qwen 2.5 7B model for running the heartbeat every 30 minutes. I tested several other models as well, but none of them stood out in a positive way. For the record, the only model I tested from OpenAI was gpt-oss-20B for lighter tasks in V1, but I did not like its performance at all.

Your mileage may vary of course, but for me running and maintaining an OpenClaw assistant is not even close to worth the expense both in time and money. It’s too easy to break, and it does so frequently. With AI coding tools being so good in general now, if you know how to code, you can oversee the work and create a custom framework that does exactly what you need without the overhead.

Learning from what worked and didn’t work with OpenClaw was great, though, and I can’t wait to build my own system to replace it. Personally, I prefer building scalable solutions specifically addressing real needs, over taking a big, open framework and try to force it to work for me.

I Made a Mental Health Support AI Model

I call this model CogBeTh, or Beth for short, and it is the second model I’m presenting as part of my ongoing project to build a mixture-of-experts (MoE) language model. Beth is trained to talk about stress, mental health, and being neurodivergent but not as any kind of replacement for a real-world counselor or therapist. The goal is not for this AI to offer that level of support, but to be able to answer basic questions and offer common approaches to tackling the things Life throws at you with a bit more nuance and insight than the basic instruction model will.

This is a very important distinction: the aim with this model is to provide better answers than the basic model does; not to simulate a human mental health professional. Beth is not your therapist, but might help you figure out what you want to talk to a therapist about before going into the session.

Download this AI model from Huggingface for free if you want to try it out as a standalone AI. The buttons below will either take you straight to the full model or the GGUF-version on the huggingface website. If you’re not sure which of the two versions to use, check your preferred setup, but it’s probably GGUF. I recommend LM Studio if you’re looking for a local app to host and manage AI models. It’s free, powerful and available on all the major platforms (no affiliation, just a fan).

Trained on CBT Data Set

CogBeTh is short for Cognitive Behavioral Therapy because the majority of the training data that this model was fine tuned on refers back to CBT as a framework for managing a wide range of issues. This was supplemented with additional background knowledge and adjacent topics, and the model was fine tuned on responding to these kinds of queries with a bit more empathy. Beth can lend a sympathetic ear and will certainly suggest an alternative view of your situation, if asked. But it may also suggest that the user gets actual, professional help.

I decided to include an expert like this in my MoE project because I live with ADHD and have personally benefitted from venting and writing out whatever is on my mind, and have an AI serve me back some food for thought. Again, it’s definitely not a replacement for therapy, but more like having a co-worker give you a reality check before you let a bad morning ruin the rest of the day. It is also worth noting that if you use AI like this, using a locally hosted model solves the issue of privacy.

Inference Samples

Below are 3 test prompts, so you can compare the responses between the basic instruct model and Beth without having to download either one. Both models were given these prompts without any kind of system prompt or additional context. See for yourself, how the fine tuned model compares to the basic model.

Expand each response to read it. Any markdown formatting done by the model is also shown in the responses here. Which responses do you prefer?

Prompt 1: “How can I avoid feeling overwhelmed every time I open my inbox?”

Basic Instruct Model
The dreaded inbox overwhelm! Here are some actionable tips to help you tackle it:

1. **Set a "Someday" folder**: Create a designated folder for emails that don't require immediate action, like newsletters or promotional emails. This will declutter your inbox and reduce the feeling of being bombarded with irrelevant messages.
2. **Prioritize using labels or categories**: Organize your emails into clear categories (e.g., work, personal, bills to pay) so you can quickly find what's important and focus on it first.
3. **Implement a "2-minute rule"**: If an email requires less than 2 minutes to respond or take action, do it immediately. This helps avoid procrastination and keeps your inbox from piling up.
4. **Schedule dedicated time for email management**: Allocate specific times of the day or week to check and manage your emails. This will help you avoid constant distractions and feel more in control.
5. **Use filters and automation**: Set up filters to automatically sort, forward, or delete emails based on sender, subject, or keywords. Automation tools like Boomerang or Sanebox can also help with email management.

Remember, managing your inbox is a process, and it's okay to take it one step at a time. Start with small changes and be patient – you'll get the hang of it!

Which of these tips resonates with you the most?
CogBeTh Response
Opening that inbox can feel like starting a marathon before you’ve even lined up the first mile.

**Here are three small habits you might try:**

1. **Set a “check?in” window.**
Pick one of your own, non?judgmental times—say 10?a.m., noon, or after lunch—to glance at the inbox. When it’s outside that window, let it sit for a bit before you look.

2. **Use a quick filter first.**
Before diving in, hit “Sort by sender” or “Mark as read.” That usually pulls out most of what matters and gives you a cleaner view of what’s left to tackle.

3. **Create a tiny “next step” list.**
When you do open the inbox, jot one concrete action on a sticky note: reply to that email, draft an email back, or set a reminder for something else. Having just one thing on your plate can make the whole process feel less like drowning.

---

### Quick check?in questions

- What’s the most common type of message you get? (Could help decide when it’s worth opening)
- Do you have any recurring tasks that could be moved outside email time?
- How does your current inbox structure affect how quickly you can see what needs attention?

Answering these for yourself might give a clearer picture of where the overwhelm comes from and help you tweak just one part of the routine.

---

**Remember:** It’s okay to feel overwhelmed sometimes, but having small routines in place can keep that feeling from turning into a full?blown panic. Try swapping out one habit at a time—see what feels most natural—and adjust as needed. You’ve already taken the first step by asking this; now it’s about fine?tuning your inbox ritual so it serves you instead of the other way around.

Prompt 2: “How do I stop my mind from wandering off mid-conversation?”

Basic Instruct Model
The age-old problem of the wandering mind! Here are some tips to help you stay focused during conversations:

1. **Practice active listening**: Give the speaker your undivided attention. Make eye contact, nod your head, and ask follow-up questions to show you're engaged.
2. **Set a mental anchor**: Choose a specific topic or phrase related to the conversation and mentally anchor yourself to it. This can help keep your mind from drifting away.
3. **Use the "3Rs"**: Repeat what the speaker just said (to ensure you understand them), Reframe their words in your own thoughts, and Respond with a question or comment. This helps you stay engaged and focused.
4. **Take small breaks**: If your mind starts to wander, take a brief pause (e.g., 10-15 seconds) before responding. Take a deep breath, collect your thoughts, and refocus on the conversation.
5. **Avoid distractions**: Put away your phone, turn off notifications, and find a quiet space for conversations. This will help minimize external distractions and keep you focused inwardly.
6. **Develop self-awareness**: Recognize when your mind starts to wander. Acknowledge it, take a deep breath, and gently refocus on the conversation.
7. **Practice mindfulness**: Regular mindfulness exercises can improve your ability to stay present in conversations.

Remember, it takes practice to develop these skills, so be patient with yourself!
CogBeTh Response
It seems that you are noticing the habit of your thoughts drifting away just when someone else is talking—like they have a separate conversation going on in their head.

What’s happening right now, during one of those moments? Are there particular triggers (maybe boredom or an unmet need) that seem to send that mind wandering signal firing off?

Also, what do you usually feel after the thought has wandered away? A little relief, a sense of loss of connection, something else entirely?

Sometimes it helps to notice whether the “mind on autopilot” feels more like a mental escape or just a normal part of how our brains process information. If it’s an escape, maybe what you’re really looking for is a brief pause—just enough time to catch your breath before you re?engage.

What would happen if, instead of letting that moment slip away, you decided to bring the focus back by asking yourself one simple question: “How can I show interest right now?” Even a quick mental note about what they’re saying could be enough to anchor you again.

Does any part of this feel like something you’ve tried before? Or is there another angle—like noticing whether your mind tends to wander more when the conversation feels stagnant, or when it’s not getting the attention it thinks it deserves?

Let me know how that feels for you, and we can keep digging into what might help bring those thoughts back in a way that doesn’t feel forced.

Prompt 3: “I want to minimize RSD (Rejection Sensitive Dysphoria) as a symptom of my ADHD. Please help!”

Basic Instruct Model
RSD can be tough, especially with ADHD. Minimizing its impact requires some self-awareness, strategies, and practices. Let's dive in!

**Understanding RSD:**
Before we begin, it's essential to know that RSD is an excessive sensitivity to rejection or criticism. It's common in people with ADHD, as they might misinterpret others' intentions or take feedback personally.

**Strategies to minimize RSD:**

1. **Develop a growth mindset**: View challenges and setbacks as opportunities for growth, rather than threats to your ego.
2. **Reframe feedback**: When receiving criticism, try to focus on the specific behavior or action that needs improvement, rather than taking it personally.
3. **Practice self-compassion**: Treat yourself with kindness and understanding when you make mistakes or face rejection. Remember, everyone makes errors!
4. **Improve emotional regulation**: Learn to recognize and manage your emotions, especially anxiety and fear of rejection. Mindfulness practices can help.
5. **Build resilience**: Develop coping skills, such as taking breaks, practicing self-care, or seeking support from trusted friends or family members.
6. **Seek feedback, not praise**: Ask for constructive feedback from others, rather than seeking validation or praise. This will help you focus on improvement rather than ego.
7. **Celebrate small wins**: Acknowledge and celebrate your achievements, no matter how small they may seem. This will boost your confidence and reduce RSD.

**ADHD-friendly practices:**

1. **Use a planner or calendar**: Write down tasks, deadlines, and goals to help you stay organized and focused.
2. **Break tasks into smaller steps**: Divide daunting tasks into manageable chunks to reduce anxiety and increase productivity.
3. **Create a "stop doing" list**: Identify tasks or activities that trigger RSD and eliminate them from your daily routine.
4. **Find an accountability partner**: Share your goals and progress with a trusted friend or mentor to gain support and motivation.

**Additional tips:**

1. **Practice self-awareness**: Recognize your thoughts, emotions, and physical sensations when faced with rejection or criticism.
2. **Develop a pre-game ritual**: Establish a consistent routine before challenging situations to help you feel more prepared and confident.
3. **Seek professional help**: Consider working with a therapist or counselor who specializes in ADHD and RSD.

Remember, minimizing RSD is an ongoing process that requires patience, self-compassion, and persistence. Be gentle with yourself, and don't hesitate to reach out for support when needed!
CogBeTh Response
Got it, you’re looking for ways to ease the frustration that comes with feeling “rejected” or judged by others—something many people with ADHD notice can happen repeatedly.

### First, let’s check in on what RSD feels like for you
- **What triggers those feelings?** (e.g., a comment from a coworker, a pause before speaking)
- **How do you physically feel when it hits?** (tight chest, racing heart, etc.)
- **Do certain situations or times of day make it worse?**

### Next, think about the “why” behind your reaction
ADHD can amplify social cues—sometimes what feels like a simple question becomes an invisible threat. Asking yourself:
- *Is this comment actually judgmental, or am I reading too much into it?*
- *How would someone else interpret that same action?”

### Small changes you might try

| Strategy | How to use it | Why it matters |
|----------|---------------|----------------|
| **Micro?check-ins** | Set a timer every 10–15 minutes for a quick self?audit: “Did I feel judged? If so, what was the cue?” | Keeps the mind from spiraling into “what if” scenarios. |
| **Reframe the script** | When you notice a negative thought (“They’re rejecting me”), pause and write down one neutral fact (e.g., “I asked for clarification; they might be busy”). | Shifts focus from judgment to observable behavior. |
| **Grounding anchor** | Keep a small object or phrase on hand that reminds you of your current environment—what color is the chair, what sound do you hear? | Brings attention back to present sensory details instead of imagined social fallout. |

### Questions for deeper insight

1. *When did these feelings first appear?*
(Sometimes understanding the origin can reduce their intensity.)

2. *What are your go?to coping tools after a rejection moment?*
(If you already have strategies, they’re working; if not, we can brainstorm new ones together.)

3. *Who in your life feels most supportive when this happens? What do they say that calms you down?*

### A gentle reminder

You don’t need to “fix” RSD overnight—those feelings are part of the ADHD experience and can be managed with practice rather than eradication. The goal is not to eliminate every negative thought but to give yourself a toolset so those thoughts feel less all?or?nothing.

---

**What would you like to try first? Or is there a particular scenario that’s been especially tough lately?**

Looking at these responses, I note that both models are pretty consistent in their respective approaches. To me, the basic instruct model delivers good, short lists but it’s definitely pretty general, whereas the fine tuned model tries to go a little deeper right away. I also saw in my testing however, that without proper context (like a very short prompt), the fine tuned version will sometimes make an assumption and go off on a tangent. If you want to use the model as is, I encourage you to experiment with things like temperature and top_k settings, different system prompts, etc. to dial it in to where you like it.