I defined this service in my guix home (basically a wrapper for llama-server --fim-qwen-3b-default
):
(simple-service 'llama-server
home-shepherd-service-type
(list (shepherd-service
(provision '(llama-server))
(start #~(make-forkexec-constructor
(list "llama-server"
"--fim-qwen-3b-default" "-v")
#:log-file "/tmp/llama-server.log"))
(stop #~(make-kill-destructor)))))
The service works, but I get 1.5 t/s (tokens per second), where as directly calling the command line gives ~30t/s.
Is my service definition wrong? Do I need to allocate more ressources to shepherd?
1 Like
dgr
July 16, 2025, 4:25pm
2
Maybe the stdout, stderr is slowing it down.
Can you remove -v
if it stands for verbose and the #:log-file
for testing?
Thanks for the reply. Still as slow without the -v and #:log-file
.
Adding environment variables helps a lot, but I can’t get access to the GPU from shepherd (whereas from the CLI I can):
(simple-service 'llama-server
home-shepherd-service-type
(list (shepherd-service
(provision '(llama-server))
(start #~(make-forkexec-constructor
(list #$(file-append llama-cpp "/bin/llama-server")
"--fim-qwen-3b-default"
"--host" "127.0.0.1"
"--port" "8012"
"--verbose")
#:environment-variables
(list "HOME=/home/juanpablo"
"XDG_CACHE_HOME=/home/juanpablo/.cache"
"XDG_CONFIG_HOME=/home/juanpablo/.config"
"XDG_DATA_HOME=/home/juanpablo/.local/share"
"XDG_RUNTIME_DIR=/run/user/1000")
#:log-file "/home/juanpablo/.local/state/log/llama-server.log"))
(stop #~(make-kill-destructor)))))
I’m now at aprox 10 tokens/second, but calling with GPU gives 50 tokens/second.
dgr
August 6, 2025, 7:08am
5
The classic: not the same env.
Maybe ask in Guix IRC?
Since it has nvidia gpu component for acceleration can I ask it n the guix IRC, or should it be the non guix IRC?
dgr
August 8, 2025, 7:23am
7
If you keep it general about how to figure out the difference in the environment, then the Guix IRC could work but if you keep a focus on the specificity to the nvidia GPU it would be the non Guix channel.
1 Like