Jarvis and me: 2 months in
Dealing with my OpenClaw agent feels somewhat like owning a combination of racehorse đ and classic sports car đď¸. Both are magnificent and extraordinary, and a delight when they are doing what theyâre supposed to, but itâs sometimes tricky to cajole and wrestle them into that place where theyâre just right.
The snapshot
After several weeks of running this assistant most days, Iâve learned what works, what doesnât, and where the agent still stumbles. This isnât a final evaluation - OpenClaw is still evolving, and so is my workflow - but itâs a clear midâjourney picture of the tradeâoffs, costs, and capabilities Iâm living with. đ¸
What Iâve learned
1. Model selection matters
- GLMâ4.7âFlash is fine for quick, lowâstakes replies but falls apart on complex, multiâstep tasks. It sometimes ignores critical instructions, breaking entire interactions.
- DeepSeekâV3.2 is more expensive but reliably follows instructions, handles complex reasoning, and chains tools correctly. Iâve switched to using it for almost everything serious.
- GLM-5 for when you need the big guns. This is like working with a lawyer - donât waste their time on everyday tasks and you wonât waste your dollars. âď¸
2. Cost realities
Right now Iâm spending about $100 a month, and most of that goes into configuration and troubleshooting - figuring out why something failed, tuning a skill, or debugging an opaque error. Once a workflow is set up with the correct tools and skills, the running cost is relatively low. For the moment, since Iâm building, building, building - my costs are high.
If I were using DeepSeek for deep, parallel work every day, the cost could easily approach $200/month. In practice, my usage is concentrated in evenings and weekends, so the actual bill is lower - but the potential is there.
Thereâs also a model costâefficiency tradeâoff: using a more expensive model (like GLMâ5) more often might reduce backâandâforth, be quicker, and potentially not cost much more than the current DeepSeekâcentric approach with its reâprompts and debugging cycles. đ¸
3. Workflow adaptations
- Telegram topics as parallel channels - because AI responses can be slow, I run multiple conversations in separate topics. This turns latency into parallel throughput, butâŚ
- Multitasking overhead - keeping track of 3-4 topic threads is cognitively demanding. I sometimes lose the thread of what Iâm doing, and the constant contextâswitching reduces focus. Iâm getting better, but I have to say it hasnât come naturally. No doubt the fact I have grey hairs has something to do with that.
- Control vs. autonomy - I maintain close oversight of configuration changes to retain understanding and control. That slows progress, but may prevent future mess. Iâm still unsure whether this cautious, monitored approach is better than granting full autonomy and unleashing the beast.
4. Pattern: text works, visuals struggle
The assistant is most successful with purely textârelated tasks - summarising, drafting, coding, researching, replicating my voice. Those are its sweet spot.
Tasks that involve layout, graphics, or visual UI interaction are less reliable:
- Diagram generation - the draw.io skill produces a useful first draft, but still requires manual editing for layout, arrows, and fineâtuning. The hope of fully automated diagrams hasnât been met.
- Browser automation - direct use of the browser tool is slow and tokenâheavy. For one task we ended up writing a Playwright script instead, which works well but reinforces that visual tasks arenât the assistantâs forte. đď¸
Capability snapshots
â What works really well
- Deepâresearch mode - spawning multiple subâagents to research a topic produces highâquality, structured outputs with sources and insights.
- Multiâmessage mode - the protocol (activate â receive messages â process on âcompleteâ) is reliable and super useful for complex instructions.
- Blogâarticle workflow - from idea to published post, the process is smooth, efficient and fast - without compromising quality.
- Authorâvoice replication - the biggest success; the assistant captures my writing style, tone, and phrasing so well that generated content feels authentic and needs little editing.
- Workârelated automation - generating case studies and first drafts of technical documentation from specifications and ETL pipeline descriptions has been a great time-saver.
- Obsidian task integration - I add tasks verbally with priorities and due dates, Jarvis stashes them in the second brain, I track them in my Obsidian dashboard, and Jarvis tells me off when we hear the noise of yet another deadline whooshing by. (All hail Douglas Adams.) đ
â ď¸ Partial successes
- Diagram generation - as above; a timeâsaver for placing major elements, but not handsâoff.
- Coding collaboration - the Pythonâproject skill works well (UVâbased, organised), but I havenât done enough projects to draw firm conclusions. Working with this assistant is better than ChatGPT because conversations are persistent - no need to start fresh each time.
- Second brain - functionality works, and all looks promising, but we need scale to know whether it really works. Which means either I have to fully commit to stashing interesting stuff in there, or I have to automate and let Jarvis pick content for me. đ§
đ ď¸ Challenges that took iteration
- Daily briefing - combines multiple sources (calendar, weather, emails, mentions) and required several versions to get the formatting right. Still not 100% reliable.
- Weather report - free weather APIs return data in unwanted formats; custom parsing was needed to produce clean, readable forecasts. đ¤ď¸
Platform maturity: not quite productionâready
OpenClaw is still a relatively new tool. We encounter timeouts, toolâlimit failures, and occasional unclear errors. The agent is generally good at selfâdiagnosing, but sometimes the root cause is opaque, leading to lost time debugging. đ ď¸
The takeaway
This isnât a review - itâs a snapshot. The racehorse/sportsâcar metaphor holds: when everything clicks, the experience is extraordinary. When it doesnât, itâs fiddly, expensive, and demands patience. â˛ď¸
Iâm still figuring out the right balance of control, model selection, and which tasks to automate. But after a few weeks, I have a much clearer map of the terrain - and a suite of skills that already save me time, even if they donât yet deliver the fully handsâoff future I sometimes imagine. I thought the future had arrived. Not quite today, but tomorrow looks hopeful⌠đŽ
