If you’ve been following along, in a previous post I covered how to run Ollama locally with the Qwen2.5:1.5b model. This time we’re taking it a step further by using Ollama to run Opencode, an open-source AI coding assistant that lives entirely in your terminal.
Opencode gives you an AI pair-programming experience similar to tools like GitHub Copilot or Cursor, but running locally on your own machine with no API costs and full privacy.
Now the big question: Was it worth it? Can you do any AI-code assisted by Quen3:8b model? Well, I’ll leave that for you to try and figure out on your own. :)
What is Opencode?
Opencode is an open-source AI coding assistant that runs in your terminal. It can help with code generation, refactoring, debugging, and general programming questions, all powered by a local LLM through Ollama. Think of it as having an AI coding buddy right in your terminal session.
One important note: Opencode recommends using a model with at least a 64k token context window, which is why Qwen3:8b is a solid choice here.
The Easiest Way: ollama launch
Ollama recently introduced the launch command which makes running Opencode incredibly simple. You don’t need to install Opencode separately or configure anything manually. Just run:
ollama launch opencode
That’s it. Ollama will prompt you to choose a model. Select Qwen3:8b and you’re up and running.
If you haven’t pulled the model yet, Ollama will download it for you. The Qwen3:8b model is roughly 5GB, so the first launch will take a few minutes depending on your connection.
If you want to go straight to configuration without launching:
ollama launch opencode --config
This opens the configuration file at ~/.config/opencode/opencode.json where you can tweak settings if needed.
Manual Setup (Alternative)
If you prefer setting things up manually, you can install Opencode directly:
curl -fsSL https://opencode.ai/install | bash
Then configure it to point to your local Ollama instance by editing ~/.config/opencode/opencode.json:
{
"provider": {
"id": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
}
},
"model": "qwen3:8b"
}
Make sure Ollama is running (ollama serve) before starting Opencode.
Why Qwen3:8b?
In my previous post, I used the much smaller Qwen2.5:1.5b model which worked great for simple queries. However, for a coding assistant like Opencode that needs to understand larger chunks of code and maintain context across a conversation, the 8 billion parameter Qwen3 model is a better fit:
- Larger context window to handle real code files and project context
- Better reasoning for complex coding tasks like refactoring and debugging
- Thinking mode where the model shows its reasoning process before answering (you can see this in the verbose output from the previous post, where Qwen3:8b reasons through its answer)
The trade-off is speed and resource usage. On my old laptop with the NVIDIA GeForce GTX 960M, Qwen3:8b ran at about 3.17 tokens/s, which is quite slow for interactive use. If you have a more modern GPU or an Apple Silicon Mac, the experience will be significantly better.
Quick Tips
- Check your hardware: For a usable interactive experience with Qwen3:8b, you’ll want a GPU with at least 8GB VRAM or an Apple Silicon Mac with 16GB+ unified memory
- Model management: Use
ollama listto see your installed models andollama rm <model>to free up space - Try different models: If Qwen3:8b is too heavy for your hardware, you could try smaller models, though the coding experience may be less impressive
The ollama launch command is a great move by the Ollama team. It removes the friction of configuring tools manually and makes it simple to get started with AI-powered development tools running entirely on your own machine.
Cheers,
Rodolfo