LLM packages which one to choose?

Hello,

In the coming month the customer I work for will allow to use LLM (aka AI) for software development. It is currently reviewing some of them.

So, I am looking for a package to dialog with those beasts.

Currently, I have found:

  • copilot, which is specific to one LLM, copilot from github
  • gpt.el
  • ellama

For what I have seen, copilot provides auto-completion/auto-suggestions, while gpt.el and ellama, seems to be more for “chatting” with LLMs.

  1. Does any of gpt.el or ellama can be used for auto-complete and/or auto-suggestions?
  2. Do you know any other packages allowing interactions with LLMs?
  3. What are the differences between gpt.el and Ellama and any similar package you may know of?
1 Like

gptel for general LLM interactions including doc generation, summarization, and a handful of other tasks.

aider.el for spec-based coding.

claude-code.el for architect mode + small code gen. It’s damn expensive if you need to generate a lot of code. IndyDevDan on YT has a good video on how to use Claude for architect mode but shunt code gen over to aider via MCP where you can then configure whatever LLM you want for code. In his example he shunts over to a gemini pre-release because at the time of recording it was free to use other than Google’s usual surveillance.

You didn’t ask but I find Openrouter to be pretty darn useful for multi-plexing access to models regardless of provider as well.

As the author of an LLM Chat AI Assistant for Emacs I think I should mention:

I’m currently in the process of looking at some form of autocomplete but of course for local LLMs you are a little limited on speed and accuracy.

Although ollama is half of the name it can also link up to ChatGPT, Claude, Grok and Gemini.

So, if I understand correctly, your package is for local, like on the PC itself, LLMs?

As far as I know, in my situation, it will be one of the numerous LLMs in the cloud.

Sorry for the late reply,

I forgot to mention that I am completely illiterate with regards to LLMs, my first interaction with ChatGPT was last friday, I guess. So I haven’t understand a lot from your explanation :worried:.

Searching quickly for architect mode I find reference for it, only, with aidermacs, not on the github page of claude-code.el.

No worries.

As to architect versus other modes:

  1. Pretend you are the Lead Product Engineer. And you are talking to your team of Staff/Senior Engineers about the product roadmap goals and capture that info as a specifications document. You are talking to one or more LLMs in “architect mode”.

  2. Once the spec has been fully captured, the Staff + Sr Engineers break out with less senior engineers and start coding and testing. That’s the code-gen/coder mode. Often a different set of LLMs.

The various model providers don’t document that flow very well at all. Even that the products have special flags and configuration to enable that sort of a flow.

The mentioned IndyDevDan channel on YT goes into that and is a good starting reference.

They are not really just for chatting. Ellama has a lot of nifty
functions for many use cases, along with a nice transient-menu
interface.

The author also wants to add auto-completion functionality.

So many possibilities and so little time to give each of them a test,

Thank you for all those explanations. I will have a talk with some of my colleague to try to have a better view on what could be expected from me and those beast. At least I have tracks to follow.

@montaropdf , what did you end up using? And what have been your impressions so far? As soon as Guix System finishes installing I’m going to start playing around with gptel and claude-code.el.

Not OP here, but I ended up with both copilot (at work) and gptel (at work and at home):

  • copilot has copilot-mode, which allows Copilot-driven autocompletion. The experience is better than on VSCode, since you can set copilot-idle-delay to a value that does not interrupt your train of thought.
  • gptel works with local models, i.e. ollama, and can be used for rewriting as-well as chatting.

Copilot (in VSCode[1]) is a bit of a hit-and-miss. Sometimes, the network requests are fast and you get a suggestion in the middle of a sentence or of a function call. The suggestion seems almost right, but in the sense that an occasional turd almost seems like some molten chocolate. For me, personally, that’s distracting. But if you use it in Emacs with a copilot-idle-delay to 2, then it acts only when your train of thought stopped:

(use-package copilot
  :custom (copilot-idle-delay 2)       ; <--- less distractions
  :bind (:map copilot-completion-map
              ("C-<tab>" . #'copilot-accept-completion)
              ("C-n"     . #'copilot-next-completion)
              ("C-p"     . #'copilot-previous-completion)))

But Copilot as a service also provides a chat. And while I could use copilot-chat for that, I dislike that it has lots of external dependencies. Oof[2].

gptel on the other hand only uses curl and can also chat with Copilot and local LLMs, which allows me to run some chats offline with small models. Given that I use chats mostly as addition for web searches or for quick reviews, it works fine for me.


  1. our toolchain is really messy, thanks to MSVC, and I didn’t adapt it to allow Emacs, yet ↩︎

  2. By the way, copilot needs a nodejs environment to https://www.npmjs.com/package/@github/copilot-language-server, which is only somewhat open source. Yay. ↩︎

2 Likes

@ashraz , thanks for mentioning copilot-mode, I’ll have to give that a try.

gptel works with local models, i.e. ollama, and can be used for rewriting as-well as chatting.

Just to clarify, were you suggesting that gptel also works with local models? I’m currently using it with ChatGPT, Claude, and Gemini through their respective APIs. I’m still working out the kinks but am optimistic about incorporating gptel into my workflows. If you haven’t tried this I would recommend. Here’s a demo Karthink put together a little more than a year ago to showcase some of the functionality.

I’m currently using Guix System in a UTM VM so I won’t try running any local models until I can set it up on bare metal. Looking forward to trying that, though. As an aside, kinda wish I would’ve picked up a Framework instead of a MacBook last year… :man_shrugging:

That bullet point might be a bit ambiguous, but you apparently missed my last paragraph :wink::

(And yeah, that might also be misunderstood as it only mentions Copilot and local LLMs, but I meant “also”)

But karthink’s demo is great for the uninitiated :slight_smile:

1 Like

Apologies – I completely missed that :sweat_smile:

I have another question for you – have you had any issues with gptel completing requests? Or do you have any monitoring set up to measure success?

If you mean Copilot requests: well, I only had some chats/usages of Copilot via gptel so far[1], but I have noticed that local models will sometimes not respond at all. There is also no error message by gptel tho, so it might be ollama-specific.

I don’t have any monitoring/statistics set up, sorry.


  1. I don’t use Copilot privately, and I only got access ~3 weeks ago at work. ↩︎

1 Like

No worries at all. I think I’ll try to set up something because I’ve been getting failures (without any resultant error messages) that I’d like to better troubleshoot.

Hello,

So far, I didn’t gives any a try. I am in a state of mind where I have issues concentrating on things, some insomnia, a general lack of interests for most things, have just difficulties to define how to approach those beasts (some times I just need time to let data percolate through the filters).

However, your notification and the replies that follows gives me some momentum and I am trying to get back on track.

So, I am currently trying LSP mode, in parallel of trying to understand LLMs and their possibilities.

I am investigating LSP, because:

  • I have news on what the customer is planning. They have explicitly required the LLM to be integrated into VSCode, because it is the most widely used IDE in the company. The service will not be available through OpenRouter. So, their is a possibility, I could be left stranded, unable to access the new system or I will have to switch to VSCode.
  • Beside chatting, I am not convinced an LLM could be as good as LSP, the more I read and watch videos of lsp-mode.
  • I also want to have a replacement tool if the LLM service is not accessible due to network issues, for example. I want to be able to continue my work even in “degraded“ mode.

Regarding LLMs, I have questions regarding some information in the table displayed here:

What does means “run locally“ or “local lamafile server“? Have I to understand that a database of a potentially enormous size will be deployed on my PC?

The documentation of gptel assumes the user is already knowledgeable of the ecosystem (no blaming, here).

I have the same kind of questions regarding Ellama, when the author said to pull an Ollama model, what does a model contain and why do I need to download one?

And finally, when the table, from the gptel link, states “API key“ or token, it basically means I need an account on that service? (I know basic question).

Three things:

  1. And finally, when the table, from the gptel link, states “API key“ or token, it basically means I need an account on that service? (I know basic question).

Yes. I have a Google Workspace account and the process was fairly trivial. I think I just asked ChatGPT or Claude for instructions on how to get Gemini API access and it worked flawlessly.

Getting ChatGPT API access was a little bit more involved because I had to create an organization (separate from my user account), and I believe I had to do an identity verification with my phone’s camera (it was either OpenAI or Anthropic which required that, can’t remember right now.)

Claude API access was also very straight forward.

One thing to keep in mind is that (for now) I think it’s free to access Gemini via the API, but you have to buy credits with OpenAI and Anthorpic.

  1. I have the same kind of questions regarding Ellama, when the author said to pull an Ollama model, what does a model contain and why do I need to download one?

I don’t understand what you’re asking. Could you clarify?

  1. What does means “run locally“ or “local lamafile server“? Have I to understand that a database of a potentially enormous size will be deployed on my PC?

I’m not certain about this, so I just asked ChatGPT. Here’s my prompt and the answer it generated. Hope this helps.

Prompt:

Referencing the “Setup” table for this repository ( GitHub - karthink/gptel: A simple LLM client for Emacs ), what does it mean to “run locally“ or what is a “Local Llamafile server“? What are the relevant system requirements or any other important considerations I should make about this?

Answer: (Note: keep in mind some of this information is specific to my environment because of ChatGPT’s memory.)

Here’s what that “Setup” table is saying in practical terms:

What “run locally” means in gptel

In gptel’s context, “run locally” means you’re pointing Emacs at an LLM server process that’s running on your own machine (e.g., Ollama, GPT4All, Llama.cpp/llamafile, Open WebUI). gptel then talks to that local HTTP endpoint (usually http://localhost:…) instead of a cloud API. The manual’s setup matrix lists these options explicitly—e.g., “Ollama running locally,” “Llama.cpp running locally,” “Local Llamafile server,” etc.

What a “Local Llamafile server” is

llamafile is Mozilla’s single-file distribution of an LLM (weights + runner) built on llama.cpp and Cosmopolitan Libc. When you start a llamafile with --server, it spins up:

  • a local web UI on http://127.0.0.1:8080/, and

  • an OpenAI-compatible API (/v1/chat/completions, etc.), entirely on your machine.

Typical bring-up looks like:

chmod +x Llama-3.2-3B-Instruct.Q6_K.llamafile
./Llama-3.2-3B-Instruct.Q6_K.llamafile --server   # add --v2 for the newer server
# browse http://localhost:8080 or hit the API at http://localhost:8080/v1/...

The repo documents the OpenAI-compatible endpoint and shows curl examples targeting http://localhost:8080/v1/chat/completions.

Wiring gptel to a local llamafile

Point gptel at the local OpenAI-compatible endpoint:

(setq gptel-backend
      (gptel-make-openai "Llamafile"
        :protocol "http"
        :host "localhost:8080"
        :endpoint "/v1/chat/completions"
        :models '(LLaMA_CPP)       ;; model name as exposed by the server
        :stream t
        :key (lambda () "no-key"))) ;; llamafile doesn’t require a real key

This matches the manual’s “Llamafile — Local Llamafile server” backend entry.

System requirements & important considerations

  • OS/CPU/GPU support. llamafile bundles llama.cpp with Cosmopolitan so a single executable runs on macOS, Linux, BSD, and Windows (with caveats). It includes both AMD64 and ARM64 builds internally.

  • GPU acceleration on Apple Silicon. Since Cosmopolitan uses static linking, GPU support is compiled at runtime. On macOS you need Xcode Command Line Tools installed so it can build the Metal kernel (ggml-metal.m) the first time.

  • Model sizes & memory. You choose a specific llamafile (e.g., 1B/3B/8B/12B) and quantization level (Q4, Q6, etc.). Download size and runtime RAM/VRAM needs scale with model/quant. The project lists example files (e.g., Llama-3.1-8B Q4 ≈ 5.23 GB on disk). Plan disk + RAM accordingly.

  • Ports & binding. Default server listens on localhost:8080. Keep it on 127.0.0.1 unless you intentionally expose it. The server provides an OpenAI-compatible surface (chat completions; embeddings on the v2 server).

  • Sandboxing behavior. llamafile adds pledge()/SECCOMP sandboxing (on some platforms). It limits what the embedded HTTP server can access after start; details are documented in the repo.

  • Windows file-size limit. Windows has a 4 GB executable limit; larger models run by using external GGUF weights instead of fully self-contained executables. (Not an issue on your macOS host, but relevant cross-platform.)

  • “No cloud key” workflow. Because the API is OpenAI-compatible, most OpenAI-client code (including gptel’s OpenAI backend) works by changing base URL and using a dummy key.

TL;DR

  • “Run locally” = use a self-hosted backend on localhost (no cloud).

  • “Local Llamafile server” = start a single binary that exposes a web UI and an OpenAI-compatible API at http://127.0.0.1:8080, then point gptel at it.

Yes and no. Running locally means that you run a model, well, locally. This obviously means that you have to have all of the model on your local machine. How “enormous” that is depends completely on the model. What model you choose completely depends on your requirements.

If all you need is some chat bot, a small model like gemma3:1b might already be good enough for you and clocks in at only ~1GB. Or maybe tinyllm. That model needs to be interfaced by something, and that’s where applications like Ollama or LLamafile come in. Both provide a HTTP-based API on some port.

That being said:

What do you even mean by that? Are you setting up the LLM for your customer? And they want to use it in VSCode? If they have the financial means, just have them use Copilot Professional via GitHub enterprise and you’re done. If they’re telling you that you have to use LLM via VSCode, well, then it’s probably Copilot, you will get a GitHub Enterprise account, and you can use Emacs copilot-mode (third-party package) for it.

I am not the one that will set up the LLM, but, it as been made clean that one of the requirements is integration with VSCode.

I just fear not to be able to access it with emacs, when the system will be available. And consequently, have a loss of productivity compare to the VSCode user. And be pushed to leave emacs for VSCode.

As for what I want to do with LLMs, honestly, I don’t know. I feel kind of forced to use them, because everybody around me is using or at least has some knowledge of it and there is this project to offer an officially sanctioned LLM to the developers (at least). So I want to see how I can and if I should integrate that tool into emacs in my workflow. Given the fact I am an “emacs years old” IT worker, learning new “tricks” become more and more time consuming, so I want to be sure I am not wasting my time and have enough information to understand what I am “playing“ with. And frankly, I have broader questionings on my way working and which tool I use and how I use them, but let’s stay focused on the original topic, which broad enough already.

I can’t help you with the integration question but I suspect, you just need access to the REST API and you’re set.

From my experience, if it’s ChatGPT quality, you can use it for boilerplate, unit tests, docstings and other apprentice quality code generation. Be weary as every line of code is a liability…

I also use it as a dialogue partner to help me think by me having to formulate the question and sometimes getting useful answers. Setting the personality is important as I need a critic and not a enthusiastic flatter for this :wink:

Also for searches where I’m lacking the domain specific lingo it’s helpful. Don’t forget to set the personality to quote sources.

Reformulating text for smoother reading and a different audience. You can also ask critical questions about your text or try to let it generate text for an audience and usage based on a couple of catchphrases.