Ollama Local LLM Guide: Installation, Model Setup, and Editor Integration
AI

Ollama Local LLM Guide: Installation, Model Setup, and Editor Integration


“Why did the OpenAI API fee come out so high this month?” “Is it really safe to copy and paste my company’s internal code into ChatGPT?”

In an era where AI assistants have become a development necessity, the twin issues of cost and security always trouble developers behind the convenience. If you were uneasy about sending core code to external networks while worrying about token fees ogni volta, it’s time to turn your eyes to local.

As long as you have the remaining VRAM of your MacBook or Windows PC, you can have a smart personal AI that runs offline. We provide a perfect guide on how to launch a local LLM environment as easily as Docker using the cheat key tool, Ollama, and how to integrate it in practice.


💻 0. Hardware Requirements: Will It Run on My Computer?

The key to running extreme local LLMs is RAM (VRAM). The required memory varies depending on the number of parameters of the model.

Model SizeMinimum RAMRecommended SpecsRelated Model Examples
3B or less4GB8GBPhi-3, Qwen 1.5B
7B ~ 8B8GB16GB+Llama 3.1, Mistral, Gemma 2
14B ~ 30B16GB32GB+Command R, Qwen 14B
70B or more64GB128GB+Llama 3.1 70B

[!TIP] Apple Silicon Macs (M1, M2, M3) use Unified Memory, making them very advantageous for running local LLMs! We strongly recommend 16GB or more of RAM.


📦 1. Installing Ollama and Running Basic Models

Ollama has packages well prepared for each operating system, so installation is as simple as clicking a button.

  1. Download and install the installer suitable for your environment from the official website (ollama.com). (In a macOS environment, brew install ollama is also possible.)
  2. Open the terminal and try running the model from Meta, which is famous for being the lightest and smartest.
# Meta's 8B parameter model (runs smoothly even on a PC with about 8GB of memory)
ollama run llama3.1

# If you need a model specialized for the Korean domain (e.g., Yanolja EEVE, etc.)
ollama run eeve-korean

When you strike the command for the first time, you will download the model weights file, which amounts to several gigabytes (GB), and as soon as the download is finished, a REPL environment where you can enter prompts appears.

⚙️ 2. Creating Your Own Custom Model (Modelfile)

The true power of Ollama lies in the Modelfile, which is very similar to a Dockerfile. You can produce an independent model with default values that suit your tendencies without having to inject system prompts every time.

Create a text file with the name Modelfile in the work folder and enter it as follows.

# Specify the base model to be used
FROM llama3.1

# Lowering randomness (Temperature) - Because coding questions need consistent answers!
PARAMETER temperature 0.3

# Increasing context window (number of tokens)
PARAMETER num_ctx 4096

# System Prompt (Gaslighting)
SYSTEM """
You are a senior backend developer with 15 years of experience and my pair programming partner.
In your answers, exclude unnecessary introductions/conclusions and write mainly core code and comments.
Answer in English.
"""

Build and run this file in the terminal.

# Build with the label 'my-senior-dev'
ollama create my-senior-dev -f ./Modelfile

# Run your own custom model!
ollama run my-senior-dev

🔌 3. Integrating into the Development Workflow

When Ollama is running in the background, it essentially opens a REST API server that is perfectly compatible with OpenAI at the address http://localhost:11434. By utilizing this, integration is possible in various places.

sequenceDiagram
    participant User as Developer (Client)
    participant IDE as VS Code / Cursor
    participant Ollama as Ollama API Server
    participant Model as LLM Model (Llama 3.1)

    User->>IDE: Write code and ask questions
    IDE->>Ollama: HTTP POST /api/generate
    Ollama->>Model: Inference Request
    Model-->>Ollama: Generated text result
    Ollama-->>IDE: Return JSON response
    IDE-->>User: Output result

A. Integration with Terminal CLI (Setting for CLI lovers)

It is good to use it with the terminal tools mentioned in previous posts. You can also try poking it directly with curl.

curl http://localhost:11434/api/generate -d '{
  "model": "my-senior-dev",
  "prompt": "Write a simple Python array sorting code",
  "stream": false
}'

B. VSCode / Cursor Editor Integration (Continue Extension)

If you install the Continue.dev extension, which is the flower of the free coding assistant ecosystem, you can perform code review, refactoring, and auto-completion for free by specifying the Ollama model in the right panel of the editor. You just need to set the model provider to ollama in Continue’s config.json. It boasts perfect security because it does not send company work code outside the server.

C. Launching Your Own Web GUI (Open WebUI)

If you want to use local models on a pretty web screen like ChatGPT instead of the terminal, we strongly recommend launching Open WebUI (formerly Ollama WebUI) with Docker.

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

If you connect to localhost:3000, you can taste the AI driven by only your PC’s hardware resources in a screen that looks exactly like ChatGPT.


💡 100% Tips for Utilizing Local LLM

  • Environment Variable Management: If you set OLLAMA_HOST=0.0.0.0, you can use the model of your computer from other devices in the same network.
  • Performance Monitoring: Try adjusting the model size while checking the memory occupancy rate of the GPU process in macOS’s Activity Monitor.
  • Combine with Agents: Refer to the AI Agent Guide post to attach a search function to a local model and create an active assistant!

📝 Closing

It is truly a huge blessing to be able to code with a reliable AI pair programmer on a plane or in a cafe in the middle of nowhere without internet, without spending a single cent on API fees. If you have an Apple Silicon MacBook (M1/M2/M3) with 16GB or more of RAM or a Windows environment with an external GPU, set up Ollama right now. A new world of development productivity opens up!


Start Here

Continue with the core guides that pull steady search traffic.