Build Your First Local LLM App in C# with Ollama

Most AI tutorials still begin in notebooks. I wanted to begin somewhere that feels more natural for a lot of .NET developers: a small console app, a local model, and one clean HTTP request you can read end to end.

That constraint is the point of the project. Before adding streaming, chat history, or SDK abstractions, it helps to see the exact contract between your C# code and a running model. This sample keeps that contract visible.

Introduction

The project described in README.md is intentionally minimal. You run Ollama locally, start a .NET console app, send a prompt to /api/generate, and print the response in the terminal.

If you are coming from regular backend or app development in C#, this is a good first checkpoint. You do not need a notebook. You do not need a large framework. You just need a running local model and a bit of HTTP.

By the end of this post, you should be able to run the sample and understand what each piece is doing.

Prerequisites

Based on README.md, make sure you have these in place before you run anything:

.NET 10 SDK
Ollama installed locally
A local model matching the model name configured in the app
Ollama running on your machine

Start the local Ollama runtime first:

ollama serve

If you want to confirm which models are available locally, check them with:

ollama list

If you do not already have a model installed, pull one first:

ollama pull gemma4:e2b-it-q4_K_M

At this point, you should have Ollama running and a model available on your machine.

What the app does

The application code lives in Program.cs. It keeps the whole flow in one file, which makes the first integration easy to follow.

In practical terms, the sample does four things:

Creates an HttpClient with a base address of http://localhost:11434.
Chooses a local model named gemma4:e2b-it-q4_K_M.
Sends a JSON request to /api/generate.
Reads the JSON response and writes the generated text to the console.

That is the entire goal here. You are proving that a C# app can talk to a local LLM and get a result back.

The complete program

Here is the current code from Program.cs:

using System.Net.Http.Json;

var client = new HttpClient(){BaseAddress = new Uri("http://localhost:11434")};
var model = "gemma4:e2b-it-q4_K_M";

var request = new OllamaRequest
{
    Model = model,
    Prompt = "What is the capital of France? Answer it like a neanderthal.",
    Think = false,
    Stream = false
};

var responseJSON = await client.PostAsJsonAsync("/api/generate", request);

var ollamaResponse = await responseJSON.Content.ReadFromJsonAsync<OllamaResponse>();

Console.WriteLine(ollamaResponse.Response);

class OllamaRequest
{
    public string Model { get; set; }
    public string Prompt { get; set; }
    public bool Think { get; set; }
    public bool Stream { get; set; }
}

class OllamaResponse
{
    public string Response { get; set; }
}

Because everything is in one place, you can read the file from top to bottom and see the whole request/response loop without jumping between folders or helper classes.

Implementation Walkthrough

Set the Ollama endpoint

The first important line is the HttpClient setup:

var client = new HttpClient(){BaseAddress = new Uri("http://localhost:11434")};

This tells the app to send requests to the local Ollama runtime. If Ollama is listening on its default local address, this is where your requests go.

You should now understand where the app is sending traffic.

Pick the model

The next visible choice is the configured model name:

var model = "gemma4:e2b-it-q4_K_M";

That value needs to match a model you already have locally. If the name does not exist on your machine, the call will fail even if Ollama itself is running.

I already had gemma4:e2b-it-q4_K_M installed locally, so I just used that same model name in Program.cs instead of pulling something else for this sample.

You can use whatever Ollama model you want here. Just make sure the model name in Program.cs matches the model you actually have installed locally.

You should now know which local model this sample expects.

Build the request payload

The request object mirrors the payload the program sends to Ollama:

var request = new OllamaRequest
{
    Model = model,
    Prompt = "What is the capital of France? Answer it like a neanderthal.",
    Think = false,
    Stream = false
};

For this first version, the settings stay simple:

Model tells Ollama which local model to use.
Prompt is the text sent to the model.
Think = false keeps the sample focused on the basic response.
Stream = false asks for one full result instead of streamed chunks.

That last point matters. Streaming is intentionally left out for now so the basic contract stays easy to see.

You should now be able to identify exactly what JSON the app is preparing to send.

Send the request

The core integration point is one line:

var responseJSON = await client.PostAsJsonAsync("/api/generate", request);

This is the handoff from your .NET app to Ollama. It posts the request payload to the generate endpoint and waits for a response.

This is also why I like starting with raw HTTP in the first sample. You can see the endpoint, the payload, and the response boundary directly.

You should now know where the actual model call happens.

Read the response

Once the response comes back, the program deserializes it into a small output type and prints the text:

var ollamaResponse = await responseJSON.Content.ReadFromJsonAsync<OllamaResponse>();

Console.WriteLine(ollamaResponse.Response);

The response type only contains one property:

class OllamaResponse
{
    public string Response { get; set; }
}

That is enough for this post because the goal is not to model every field Ollama can return. The goal is to show the minimum code required to get generated text back into your application.

At this point, you should understand the entire request/response path from prompt to console output.

Run the sample

After Ollama is running, go to the app folder shown in README.md and start the project:

cd src/FirstLocalLlmApp
dotnet run

If everything is configured correctly, the terminal should print one full response from the local model.

The wording of the answer can vary, but the important thing to verify is simple: your C# app reached the local Ollama endpoint, the model processed the prompt, and the response text came back into the console.

Common issues

If the sample does not work on the first try, the likely causes are usually straightforward:

Ollama is not running
The configured model name is not installed locally
The app cannot reach http://localhost:11434
The response does not deserialize into the current OllamaResponse shape

That is another benefit of keeping the sample so small. You are debugging one request, not a whole stack of abstractions.

Result

This gives you a minimal but real AI integration in C#. You start a local model runtime, send a prompt over HTTP, deserialize the response, and print the result.

For a first step, that is enough. It proves the core loop without hiding anything important. Once that loop is clear, later posts can build on it with more interactive behavior while keeping the fundamentals visible.

If your terminal prints a response from the local model, the sample has done its job.

Introduction#

Prerequisites#

What the app does#

The complete program#

Implementation Walkthrough#

Set the Ollama endpoint#

Pick the model#

Build the request payload#

Send the request#

Read the response#

Run the sample#

Common issues#

Result#