Convenient LLMs for Home
For a while, I wanted to experiment with LLMs on my homelab but didn’t like the overhead of a GPU machine or the slowness that CPU processing brings. I also wanted to make everything convenient in the long run: updates had to be automated and if my OS died, rebuilding it would have to be quick and simple.
Running NixOS on my gaming computer with WSL seemed like the ideal solution. However, I ran into several challenges.
- Concerns about my vram being locked to LLMs.
- WSL shutting down automatically. Microsoft does not support WSL running when you aren’t actively utilizing it.
- NixOS with WSL did not support Nvidia right out of the box.
- It was not worth it to me to manage a separate Ubuntu machine that would require reconfiguring everything from scratch.
I spent a few weeks hacking at it and have now solved the blocks.
- Ollama unloads models by default if they’ve not been used in the last 5 minutes.
- WSL starts automatically, and remains active.
- Configure Nvidia Container Toolkit on WSL.
- Ollama Container configured for NixOS.
- NixOS manages the configuration of the entire system, so rebuilding is easy.
- The NixOS flake that I have is already configured to automatically update, which my WSL system will inherit.
Although there are some general information, this is heavily NixOS-focused. I heavily rely on Tailscale for my own networking convenience, so there are also some optional Tailscale steps.
Live Configuration that I actively use at Home:
Just for reference
- Whole Nixos Flake (19659018) Nvidia Module (19659019) Ollama Container
- Open-webi
- Tailscale sidecar Container
- Tailscale ACL
Force WSL not to stop
I found a fix for the biggest problem with an OS-independent fix. This github postis a good place to start. If you are using Ubuntu on WSL, you can run
.
1 |
wsl --exec dbus-launch true |
will launch wsl, and keep it running. You can set up a basic task on Windows Task Scheduler that will automatically run this command when at startup
is pressed. Set it to run while the user is logged out.
I found that this didn’t work for NixOS with WSL, as the --exec
options seemed to have issues. So I set it like this:
1 |
wsl.exe dbus-launch true |
For NixOS, this means that the shell will run in the background. This is less ideal than --exec
on Ubuntu but I’ll take what I can.
Installation of NixOS on WSL
NixOS meets most of my long-term convenience needs. NixOS allows you to configure the entire system, including nvidia, networking and containers. This makes it easy to re-deploy everything. Also, my NixOS Flake (19459177) is already configured for automatic weekly updates via a github command, and all of my NixOS hosts have been configured to automatically pull these updates and rebuild them. My NixOS will be able inherit these benefits.
There are alternative ways to achieve this. If you’d like, there are other ways to automate updates on a single NixOS machine.
Follow the steps in the article to get started. Nixos WSL github :
- Enable WSL, if you haven???t already done so:
Download
nixos.wsl
1
wsl --install --no-distribution
-
The latest releaseDouble-click on the file you downloaded (requires a WSL>=2.4.4).
- Now you can run NixOS.
Set it as default.
Basic NixOS Configuration
Enter WSL and navigate /etc/nixos/
to configure NixOS. You will find a configuration.nix which contains the configuration for the entire system. It is very basic, but we will add some basics to make it easier. You’ll have to use Nano until the first rebuild has been completed. Tailscale is a networking tool I use. It’s not required.
123456789 |
environment.systemPackages=[ pkgs.vim pkgs.git pkgs.tailscale pkgs.docker];services.tailscale.enable=true;wsl.useWindowsDriver=true;nixpkgs.config.allowUnfree=true; |
Now Run
1 |
sudo nix-channel --update |
et
1234567891011121314151617181920212223242526272829303132333435 |
[196590[xserver]VideoDrivers = ["nvidia"];HardwareNvidiaopen = true;environment.sessionVariables = Cuda_path = "${pkgs.cudatoolkit}"; EXTRA_LDFLAGS = "-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"; EXTRA_CCFLAGS = "-I/usr/include"; LD_LIBRARY_PATH = [ "/usr/lib/wsl/lib" "${pkgs.linuxPackages.nvidia_x11}/lib" "${pkgs.ncurses5}/lib"]; MESA_D3D12_DEFAULT_ADAPTER_NAME = "Nvidia";;hardware.nvidia-container-toolkit = enable = true; mount-nvidia-executables = false;;systemd.services 19659108]= nvidia-cdi generator = "Generate nvidia cdi"; description 19659114]= "Generate nvidia cdi"; wantBy = ; type 19659121= ; ExecStart {docker = Docker is anddaemon.SettingsFeaturescdi = true; daemon. Settingscdi-spec-dirs = ["/etc/cdi"];; |
Do anther nixos-rebuild switch
and restart WSL.|Cdi-specdir = ["/etc/cdi"];;
Do another nixos-rebuild switch
and restart WSL.}
You should now be able run nvidia-smi
to see your GPU. You’ll have to run your docker containers using --device=nvidia.com/gpu=all
in order to connect to the GPUs.
I did not discover these fixes on my own, I pieced this information together from these two github issues:
- https://github.com/nix-community/NixOS-WSL/issues/454
- https://github.com/nix-community/NixOS-WSL/issues/578
Configure Ollama Container:
To make networking easier, I’ve set up an example ollama containers and an optional Tailscale Docker container to pair with. Uncomment the code and add your Tailscale domain, then comment out port
and networking.firewall
for the ollama containers.
Tailscale Serve is already configured to provide the Ollama HTTP API at https://ollama.${YOUR_TAILSCALE_DOMAIN}.ts.net
for those who use the Tailscale Container.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475 |
virtualisation.oci-containers. {backend = "docker";virtualisation = { docker = backend = "docker";Virtualisation = Docker = enabled = true; Autopruneenable = true; ;};systemd.|enable 19659184= false; Systemd;}tmpfiles.rules= ( "d /var/lib/ollama 0755 root root" #"d /var/lib/tailscale-container 0755 root root");network; FirewallallowedTCPPorts= ; Virtualisation=oci-containers.containers = "ollama" = image = "docker.io/ollama/ollama:latest"; autostart = true; environment = "OLLAMA_NUM_PARALLEL" = "1"; ; ports = [ 11434 ]; volumes = [ "/var/lib/ollama:/root/.ollama" ]; extraOptions = [ "--pull=always" "--device=nvidia.com/gpu=all" "--network=container:ollama-tailscale" ]; ; #"ollama-tailscale"= # image="ghcr.io/tailscale/tailscale:latest"; # autoStart=true; # environment= # "TS_HOSTNAME"="ollama"; # "TS_STATE_DIR"="/var/lib/tailscale"; # "TS_SERVE_CONFIG"="config/tailscaleCfg.json"; # ; # volumes=[ # "/var/lib/tailscale-container:/var/lib" # "/dev/net/tun:/dev/net/tun" # "$ # (pkgs.writeTextFile # name="ollamaTScfg"; # text='' # # "TCP": # "443": # "HTTPS": true # # , # "Web": # #replace this with YOUR tailscale domain # "ollama.${YOUR_TAILSCALE_DOMAIN}.ts.net:443": # "Handlers": # "/": # "Proxy": "http://127.0.0.1:11434" # # # # # # ''; # ) # :/config/tailscaleCfg.json" # ]; # extraOptions=[ # "--pull=always" # "--cap-add=net_admin" # "--cap-add=sys_module" # "--device=/dev/net/tun:/dev/net/tun" # ]; #;; |
One morenixos-rebuild switch
and your ollama container should be started.
Testing and Networking
If you are using Tailscale,
- the Tailscale container must be setup before both containers can work.
- Exec in the Tailscale container.
sudo docker exec -it ollama-tailscale sh
tailscale up
- Use this link to add it your Tailnet.
- Exec in the ollama to pull a model.
sudo docker exec -it ollama ollama run gemma3
- Run the test prompt and verify with
nvidia-smi
that the GPU is being used. - Test api on another Tailscale-connected device:
12345
curl https://ollama.$YOUR_TAILSCALE_DOMAIN.ts.net/api/generate -d ' "model": "gemma3", "prompt": "test", "stream": false '
If NOT using Tailscale:
- Exec into the ollama container to pull a model
sudo docker exec -it ollama ollama run gemma3
- Run a test prompt, verify with
nvidia-smi
on wsl to see that the gpu is in use. - Ollama on port 11434 is WSL. Follow this guide to verify that the gpu is in use. Add
to tldr
- to expose it on your network
12
"OLLAMA_HOST" =19459086;"OLLAMA_ORIGINS"="*";
is the ollama configuration of nixos.
- to expose it on your network
- Use the
ifconfig
command to find your WSL IP address, which is usually found under eth0. - Create firewall rules on Windows using Powershell and Admin rights.
123
New-NetFireWallRule -DisplayName 'WSL firewall unlock' -Direction Outbound -LocalPort 11434 -Action Allow -Protocol TCPNew-NetFireWallRule -DisplayName 'WSL firewall unlock' -Direction Inbound -LocalPort 11434 -Action Allow -Protocol TCP
1 |
netsh interface portproxy add v4tov4 listenport=11434 listenaddress=0.0.0.0 connectport=11434 connectaddress=$WSL-IP-ADDRESS |
Replace $WSL-IPADDRESS.
http://192.168.1.123:11434
12345 |
curl http://WINDOWS-LAN-IP:11434api/generate -d ' "model": "gemma3", "prompt": "test", "stream": false ' |
Done!
You can now connect your Ollama API anywhere you like, for example Open-WebUI.