AI agentic programming -- where to begin?

Recently I watched following System Crafters streams about agentic, or, pair programming capabilities:

(oldest to newest)

With these livestreams, I could see some evolution of pair programming, or agentic coding workflows. Daviwil starts with the emacs client, and makes his way to terminal and tmux.

While these are interesting to see, I still sense that I have lots of missing pieces in my head to start employing this workflow where I dictate what I want in my own hobby project to AI, let it generate code, and then let me review the code it generated.

The scene is full of its own, alien, jargon and terms! Here’s a list of terms which I have no practical grasp of:

  • MCP (model context protocol?)
  • ACP (agent-client protocol?)
  • “tool use”
  • SOUL.md, AGENT.md, SKILLS.md, etc.
  • openclaw, nemoclaw
  • hermes, harness, etc.

To make things worse, these terms change, get discarded, left behind, newly introduced, etc., on a near-weekly basis, making me get the sense that whatever topic I might choose to read up on might be already discarded and “got moved-on” already.

I don’t even know which term accurately represents what I am talking about: is it “agentic coding”, or, is it “pair programming”?


My experience with LLMs so far has been chatgpt and gemini AI via their WebUI (which sucks immensely, hog my CPU endlessly and drive me insane), and gptelwith API keys bought on nano-gpt via crypto (monero, specifically). Gpteloffers a much more pleasant experience for Question & Answer workflows, but, I feel that the programming industry, and the hobbyists are already moving on from Q&A -based utilization to agentic utilization of LLMs, and I want in on this train!

Anybody share my frustrations? Where did you guys start? I guess I should begin by taking a look at the agent-shell.el, which I think is doing what gptel did to Q&A-based workflows, to agentic workflows, however, there seems to be alternatives like claude-code.el, aidermacs, etc. – I don’t even know if I need to “write my own MCP servers” thingy that Daviwil seems to be doing with his own Sigil project.


This is the general view. I guess tips and tricks on these topics would be much appreciated. Feel free to share down below.

I started with gptel as a chat via Copilot. Pure chat, no code generation, no agentic workflow. That allowed me to create tools within gptel and Emacs Lisp and enhance the model. In my case, it was a way to get Jira tickets into the context without teaching the model how to use a custom script or another shell command.

Then, we changed vendors, and I had to reimplement the same in Claude Code. While claude-code-ide.el would allow me to use the existing Elisp functions, I’ve opted to rewrite everything as a MCP to enable coworkers to do the same. This is still only text work and reviews, no code generation. I’ve tried generation, and it kinda works, but for me, personally, it’s unmaintainable. If I get a call about questions in the codebase, about the part I have changed, I want to answer it without hesitation, not with a slurred “the AI did that”. If you don’t care about that, Claude Code could be nice for you.

A MCP is great if you don’t want your LLM/agent to re-read and guess all the time. That’s why Claude Code for example asks you to add rust-analyzer-lsp if you use it with a Rust project; it doesn’t need to grep or ls or find, it can use the results of the semantic analysis. No guesswork.

But that’s only required if your codebase is already large and you’re using too many tokens.


Personally, I will not buy into the LLM/agent driven coding hype. We’re seeing large price increases for GitHub Copilot just now. Annual subscriptions get replaced by monthly subscriptions. Monthly subscriptions get billed differently.

Using an LLM/AI to get better at coding is great. But using an AI/LLM to replace and diminish your coding skill will be… well… :poop:.


Depends. Do you switch with your AI partner? As in: does it implement some parts, you review them, and then you implement some parts, and it reviews? That’s agentic pair programming, and you can learn while doing that; even if the AI hallucinates, because you will learn from the bugs.

If only the AI/LLM implements features or fixes bugs, that’s agentic coding, if you review the changes. If you don’t review the actual changes but only the visible results from a common user’s (not an expert’s) view? That’s vibe coding.

2 Likes

Hey, I can absolutely share the confusion, in particular the jargon, the fact that every week there is a new cool tool/thing/revolution, etc. But when you look beyond that, it’s actually not so complex and there are only a few relevant evolutions (IMHO). I tried to write that up in a simple and comprehensive way, so you might find this useful:

AI: What You Need to Know — A Map for the Overwhelmed

Feedback is greatly appreciated.

I think it explains most of the terms/concepts you were asking about. I tried however to avoid delving into too much ephemeral things. MCPs, ACPs are tools. While what they do is needed and may stay, the actual ‘protocol’ is more a trend thing. INitially MCPs were hyped a lot, then people realized they make things slow and inert and are trying to move on. And more importantly, with a good harness (eg descriptions and systems like claude code, the CLAUDE.md files and the like, Skills) you don’t actually need them. That’s of course just my opinion. Others may say you cannot live without MCPs.

Cheers, Marc

[… 5 lines elided]

I started with gptel as a chat via Copilot. Pure chat, no code
generation, no agentic workflow. That allowed me to create tools
within gptel and Emacs Lisp and enhance the model.

This is pretty much the road I’ve travelled, as well.

[… 4 lines elided]

Then, we changed vendors, and I had to reimplement the same in Claude
Code. While claude-code-ide.el would allow me to use the existing
Elisp functions, I’ve opted to rewrite everything as a MCP to enable
coworkers to do the same.

Are claude-code.el and claude-code-ide.el different? What is an
“MCP”, in this context?

[… 13 lines elided]

A MCP is great if you don’t want your LLM/agent to re-read and guess
all the time. That’s why Claude Code for example asks you to add
rust-analyzer-lsp if you use it with a Rust project; it doesn’t need
to grep or ls or find, it can use the results of the semantic
analysis. No guesswork.

Hmm… I gotta chew on this. I still don’t understand how Claude Code
(CC) can ask me to add rust-analyzer-lsp, or what it does with it. I
guess as you say, it uses that to find function definitions in the code,
and read their documentations?

But that’s only required if your codebase is already large and you’re
using too many tokens.

Alright.

Personally, I will not buy into the LLM/agent driven coding
hype. We’re seeing large price increases for GitHub
Copilot

just now. Annual subscriptions get replaced by monthly
subscriptions. Monthly subscriptions get billed differently.

You are right on this. However, I am hopeful and expecting for open
weights models to become on-par with the current capabilities of the
frontier coding models, like CC. So, the price increases of the CC,
Copilot, et. al., should be painful in the short term. But the trend of
AI assisted programming, where, as in Daviwil’s showcase of his
workflow, the AI agent writes the libraries, and you review the code
diffs, is expected to be here to stay. Imo.

Using an LLM/AI to get better at coding is great. But using an
AI/LLM to replace and diminish your coding skill will
be… well… :poop:.

I agree here. In my opinion, you should know basics of the coding
language of your project. For example, you should know basics of Emacs
Lisp. And /then/, try working with an AI agent on your project where
you write an emacs package in emacs lisp.

Without knowing the coding basics, you will not be able to judge the
diffs and code-architectural choices the AI Agent makes. At that point,
you are fully on about auto-producing SLOP, instead of using the tool in
smart ways as an extension of your intellect.

I don’t even know which term accurately represents what I am talking
about: is it “agentic coding”, or, is it “pair programming”?

Depends. Do you switch with your AI partner? As in: does it implement
some parts, you review them, and then you implement some parts, and it
reviews? That’s agentic pair programming, and you can learn while
doing that;

Yes, that’s exactly what I had in mind here. That’s exactly my goal
with all of this.

If only the AI/LLM implements features or fixes bugs, that’s agentic
coding, if you review the changes. If you don’t review the actual
changes but only the visible results from a common user’s (not an
expert’s) view? That’s vibe coding.

Yep, thanks for these families of definitions.

[… 4 lines elided]

I tried to write that up in a simple and comprehensive way, so you
might find this useful:

AI: What You Need to Know — A Map for the Overwhelmed

AI: What You Need to Know — A Map for the Overwhelmed | MindLab by Baaden Scientific

Thanks for this write up. I will skim through it soon.

I think it explains most of the terms/concepts you were asking
about. I tried however to avoid delving into too much ephemeral
things. MCPs, ACPs are tools. While what they do is needed and may
stay, the actual ‘protocol’ is more a trend thing.

Interesting view.

INitially MCPs were hyped a lot,

Exactly. That’s my recollection as well.

then people realized they make things slow and inert and are trying to
move on.

Interesting x2. I didn’t notice that sentiment. Why did people regard
MCPs as “slwo and inert”?

And more importantly, with a good harness (eg descriptions and systems
like claude code, the CLAUDE.md files and the like, Skills) you don’t
actually need them.

Interesting x3. Now, the question, then, is what actually are MCPs and
the collections of markdown files such as CLAUDE.md and SKILLS.md. I
guess that requires some personal experience in messing around with the
tools in my part.

Hi, let me try to answer.

My understanding is that they are injected in the inference loop, so if you wanted to have a couple of tools that can be used through MCP it makes the inference much heavier, slower and token-hungry. I think the LLM first reads the full ‘offer’ an MCP server has in stock and then finds there the tools it may actually need (maybe 1 in 10 or worse 1 in 300).

MCPs is really a protocol for the LLM to talk to tools that are all formalized through (coded) definitions, so it’s more deterministic. You code a tool to do task A, then the MCP server gets a description that this tool is available and how it works. The LLM talks to the MCP server and can then access the tool code through this interface.

CLAUDE.md, SKILLS.md give a textual description of how to solve tasks, so basically they describe a tool to variying degrees. You could say there: 'if you need to access the web, get the URL from the user, then do curl URL ’ and that’s typically enough for an LLM to get it right 90% of the time (for such a simple task). Once the tasks/conditions get more complex, the description in the CLAUDE.md etc. files gets longer and more complex. That’s why people layer. I use a lot the WAT framework (Workflows, Automation, Tools). It’s basically a CLAUDE.md file telling the LLM to put any tools (typically python and bash scripts) into a tools/ folder and describe the workflow to use them in a workflows/ folder in markdown format. Then to automate user’s requests using these instruments.

With such an approach you can get very close to how MCP servers work, but it is not deterministic, because every time the LLM re-interprets the markdown files and instructions based on the user query. With the MCP server, as long as the LLM “finds” the right tool call, from that moment on you will always get exactly the same result.

Unfortunately, they are.

MCP stands for Model Context Protocol:

MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems.

Using MCP, AI applications like Claude or ChatGPT can connect to data sources (e.g. local files, databases), tools (e.g. search engines, calculators) and workflows (e.g. specialized prompts)—enabling them to access key information and perform tasks.

Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect electronic devices, MCP provides a standardized way to connect AI applications to external systems.

If you know LSP, then it’s similar: just like a language server protocol allows you to use the same LSP server from multiple editors, the model context protocol allows you to use certain agent-related tools from multiple LLM tools (Claude Code, GitHub Copilot in VSCode, Cursor, JetBrains).

If Claude Code “recognizes” that you’re handling Rust Code, it will prompt you to add rust-analyzer-lsp to your Claud Code configuration.

Yes. Which is a lot less error prone than to use grep "fn function_name".

I’m going to be a pedant here: CC is not a model. Haiku, Sonnet and Opus are models. CC is a CLI application, that allows access to those models via subscription or API key.

And yes, I also hope that open weight models become better, but given the current RAM prices I don’t see that happening on my own hardware.

That’s fine and dandy. Assistance is something I can tolerate. But I have seen management forbidding manually writing code. That would be the time where change my profession. Maybe become a barkeeper. Or baker.

1 Like

Sure, you could say: “do a cURL request with the following API token”. And it will leak your API token to the LLM. If you accidentally created a token with too many permissions, it might delete your database.

If you created or configured an MCP with read-only functions, then there is no way to accidentally delete your database. Likewise, the API token cannot be abused for unrelated tasks.

Determinism of results aside, that’s the biggest win of MCPs compared to SKILLS.md: determinism of possible actions.

I found that you get better results with… no CLAUDE.md, or a minimal one. For example, we had a topic.instructions.md for Copilot, handcrafted by a software architect. Results were OK, but… not great. Lots of tokens spent on unrelated tasks, too much code generated.

As a test, I removed that file, and instead just do my “usual” prompting:

Implement feature X as described in ticket ABC-1523.
Use @related_file.go as a template

Less tokens, better results. For the ticket, I obviously had a $ticketsystem MCP in this case, but it’s also possible to do the same without an MCP:

Implement feature X as described in ticket ABC-1523.
Use @related_file.go as a template

Ticket ABC-1523:
<ticket contents>

That’s enough, most of the time. Together with a minimal CLAUDE.md [1], this can be a nice workflow (see also Make your AGENTS.md a table of contents and https://arxiv.org/pdf/2602.11988).


  1. My handcrafted CLAUDE.md contain only information on how to format, lint, compile and test the project. The rest is already available in other files, ready for human consumption, too, because at some point a human developer may need to debug it. ↩︎

1 Like

MCPs are essential to the whole process of avoiding Ctrl-C, Ctrl-V

(As I’ve come to understand them)

Honestly the whole AI thing is dumb IMO

1 Like

[… 32 lines elided]

With such an approach you can get very close to how MCP servers work,
but it is not deterministic, because every time the LLM re-interprets
the markdown files and instructions based on the user query. With the
MCP server, as long as the LLM “finds” the right tool call, from that
moment on you will always get exactly the same result.

[… 11 lines elided]

Interesting. Thanks for highlighting this distinction.

[… 3 lines elided]

MCPs are essential to the whole process of avoiding Ctrl-C, Ctrl-V

[… 14 lines elided]

As in, not repeating the same instructions to the LLM over and over?
For example, “use the X binary with this Y URL and use ffmpeg to convert
the media file into av1 format[…]” going on and on, is an instruction
about an automated workflow. And, defining this thing in an MCP frees
you from copy-pasting this instruction again and again?

it just seems like MCP is a means for inserting values into templates for prompts, which is unfortunately necessary.

for the ffmpeg thing, i’d rather look at the man pages. idk. this all feels like a giant slide backwards.

There’s a bit of irony hear, because your other paragraph

indicates that you didn’t look into the MCP’s documentation and its server concepts:

Servers provide functionality through three building blocks:

Feature Explanation Examples Who controls it
Tools Functions that your LLM can actively call, and decides when to use them based on user requests. Tools can write to databases, call external APIs, modify files, or trigger other logic. Search flights
Send messages
Create calendar events
Model
Resources Passive data sources that provide read-only access to information for context, such as file contents, database schemas, or API documentation. Retrieve documents
Access knowledge bases
Read calendars
Application
Prompts Pre-built instruction templates that tell the model to work with specific tools and resources. Plan a vacation; Summarize my meetings
Draft an email
User

So while MCP servers can only be a collection of prompts, they hopefully also contain deterministic tools. And if they do, that allows agentic usage, where the LLM agent will automatically call the tool without any user interaction as required.[1]

A prompt-only serving MCP server would indeed be silly. But for some people, even ^C ^V is indistinguishable from magic.[2]

100% agree. The field is now getting filled with code that no one can explain, written by tools that are outdated faster than frameworks on npm. Best practices are replaced with a rat race who can provide the next feature the fastest (but please don’t open the lid and look at the innards, there be goblins!)

All of this will put a big maintenance burden on developers (see James Shore: You Need AI That Reduces Maintenance Costs). And as soon as the LLM/agent/AI prices stop getting subsidized and thus skyrocket, we will have to deal with the fallout.

I personally therefore keep the use of AI/LLMs to a minimum. Enough usage to know how I would tackle issues[3] and still learn about the current SOTA[4], but not too much to start losing my actual programming or writing skills.


  1. This is why I’m a proponent of read-only tools and resources, with the sole exception of source code; I don’t want my data deleted, and I certainly don’t want all data accessed automatically ↩︎

  2. ↩︎

  3. because people apparently think I know them well enough to help them with their assistant ↩︎

  4. state of the art ↩︎

1 Like

I see. You’re correct. I didn’t read the MCP docs. I still need to look into this.

These just seem like really bad programming patterns, especially the string manipulation for prompts.

That was a quick response, I’ve edited my post to add a quip about C-c C-v and magic :grin:.

String manipulation for prompts? You mean the structured input?

1 Like

I set up the systemcrafters site as a PWA, so I get notifications lol, but I should probably get up to the computer to reply with more detail.

I see this LLM & agentic programming going in a direction that increasingly

  • separates the programmers & individual product owners from their actual product
  • and replaces programmatic interfaces with these loosely defined interfaces for LLM & agentic interaction

I kinda understand. Deterministic functions called with non-deterministic input are ultimately non-deterministic (not trying to be overly pedantic, as i appreciate the feedback).

Can the agents introspect about the MCP gateways they’re given access to? I’m assuming so. That would definitely help

Yeh. Most templated input will contain repeated subsequences of tokens that surround the template variables. The free-form input just isn’t great to program with. I’m not a fan of Jinja.

This is why I write every line of code manually. I also stopped using LLMs entirely because I felt that LLM hallucinations were slowing me down. I went back to using search engines that show me a list of search results.

Hallucinations cannot be fixed with the current LLM architecture.

For me to use AI, I need a fundamentally different architecture that actually understands concepts instead of predicting the next tokens.

Not every new technology is necessarily useful.

Yes, have you ever studied laplace transforms? Do you know about control theory?

It doesn’t matter that there are hallucinations as long as they don’t destabilize the control loop

I have tried LLMs for months. LLMs weren’t useful enough for me.

I don’t need to understand it theoretically after I conclude that it’s not useful for me.

I can’t imagine a good use case for LLM in my life.

1 Like