A Step-by-Step Guide to Hosting an LLM with Ollama and Civo

Want to run your own private Large Language Model (LLM), fully under your control and optimized for performance? In this tutorial, you'll learn how to deploy an LLM using Ollama on GPU-powered Civo infrastructure, giving you the freedom to run models locally with full data ownership.

This walkthrough is based on a live demo I presented during a recent webinar with Civo’s Chief Innovation Officer, Josh Mesout. If you'd prefer to watch the setup in action, here’s the demo video 👇

Alternatively, you can watch the full webinar here to learn more about GenAI and how it is the key to unlocking new opportunities and driving success in your organization.

Before we get started, it is important to understand more about the tools that we are going to be using, Ollama and Civo:

Ollama: Ollama is a lightweight, fast runtime for large language models that runs models locally. It allows you to run LLMs on your own infrastructure, giving you control over the model and data.
Civo: Civo is building the new era of cloud computing, where speed, simplicity, and predictable billing unite. It offers a range of features, including GPU-powered instances, Kubernetes support, and a user-friendly interface.

There are many tools that can be used to build a private LLM; however, for this tutorial, I have decided to focus on Ollama and Civo as they offer users a range of benefits, such as:

Feature	Description
Control over the model and data	By deploying a private LLM using Ollama on Civo, you have complete control over the model and data. This is particularly important for organizations that handle sensitive information.
Customization and flexibility	Ollama allows you to customize the LLM to suit your specific needs. You can choose from a range of models, fine-tune them, and deploy them on Civo.
Scalability and performance	Civo's GPU-powered instances provide the necessary compute resources to run LLMs efficiently. You can scale your infrastructure up or down as needed.
Security and compliance	By deploying a private LLM on Civo, you can ensure that your data is secure and compliant with relevant regulations.

Prerequisites

Before getting started with this tutorial, make sure you have the following prerequisites in place:

Sign up for a Civo account with API access
Have Terraform installed on your local machine

When following along with this tutorial, you will need to refer to the GitHub repository found here. It provides a complete Terraform setup to deploy and run private LLMs using Ollama on Civo’s GPU instances. The repo automates everything from infrastructure provisioning to model setup and API access, making it easy to get started without manual configuration.

Deployment steps

Step 1: Set up your infrastructure

Start by creating a file named terraform.tfvars in the root directory with your Civo API key:

civo_token = "YOUR_API_KEY"

Or you can set the API key in the environment using this command:

set CIVO_TOKEN= YOUR_API_KEY

For details on how to find your API key, click here.

Once this is complete, initialize and apply the Terraform configuration:

terraform init
terraform plan
terraform apply

You’ll need to wait for the initial setup to complete (which might take 15-30 minutes). This will include: instance provisioning, CUDA installation and configuration, Ollama setup, and model downloading.

Step 2: Connect to your Civo instance and install Ollama

At this stage, you’ll need to connect to your Civo instance using SSH - this is done by finding the IP address of your Civo instance in the Civo dashboard.

Open a terminal on your local machine and use the SSH command to connect to your Civo instance. For example:

ssh civo@<instance-ip-address>

Replace <instance-ip-address> with the actual IP address of your Civo instance. To find your <instance-ip-address>, run this command:

civo instance show <instance-name> --region <your-region>

Next, you’ll need to enter your password or use your SSH key to authenticate.

To authenticate with your SSH key first, you need to create the civo-key.pem file, which contains the OpenSSH PRIVATE KEY. To create this file, follow the steps below:

Step 1: Generate a new key manually

In your terminal (Git Bash or WSL):

ssh-keygen -t rsa -b 4096 -f civo-key.pem

This creates:

civo-key.pem (private key)
civo-key.pem.pub (public key)

Step 2: Add public key to Civo

Open civo-key.pem.pub in Notepad
Copy contents
Go to Civo Dashboard → SSH Keys
Click “Add SSH Key” → paste → save

Once this has been done, run the below command:

ssh -i<file-location>/civo-key.pem civo@<instance-public-ip>

Once connected, install Ollama by running the following command:

curl -fsSL https://ollama.ai/install.sh | sh

Follow the installation instructions to complete the setup.

Configuration options

The deployment can be customized by modifying the script.sh file (found here). Key configurable parameters include:

MODEL_NAME="llama2"     # The Ollama model to run
CUDA_GPU="0"            # GPU device to use (0, 1, etc.)
API_PORT="11434"        # Port for Ollama API (default is 11434)

You will be able to deploy any model supported by Ollama by changing the MODEL_NAME variable. Model download times will vary depending on the size.

Accessing your model

After the deployment has been completed, connection information is automatically generated and stored on the server at /etc/ollama/server_info.txt. This file contains information such as the public IP address, API port, default model name, and example API calls.

Accessing your model Ollama and Civo

Image by author

You’ll also be able to check the system log for this information after the instance boots.

Interacting with your LLM using the API

Use the Ollama API to interact with your LLM. You can send requests to the API endpoint to generate text, answer questions, or perform other tasks.

Use a tool like curl to test the API endpoint:

curl -X POST -H "Content-Type: application/json" -d '{“model": ”llama2”, “prompt": "Hello, world!"}' http://${CIVO_INSTANCE_PUBLIC_IP}:11434/api/generate

Example: Using Ollama with Python

import requests

# Set the API endpoint URL
url = "http://${CIVO_INSTANCE_PUBLIC_IP}:11434/api/generate"

# Set the prompt and other parameters
prompt = "Hello, world!"
params = {"prompt": prompt}

# Send a POST request to the API endpoint
response = requests.post(url, json=params)

# Print the response
print(response.json())

Removing the deployment

To completely remove the deployment:

terraform destroy

This will terminate the instance and clean up all associated resources.

Summary

In this tutorial, we have gone through how to deploy a private LLM using Ollama on Civo. By following these steps, you can gain control over your LLM and data, customize the model to suit your needs, and scale your infrastructure as required.

With Ollama and Civo, you can build a powerful and flexible LLM solution that meets your organization's needs. If you want to learn more about these topics, check out some of these resources:

Prerequisites

Deployment steps

Step 1: Set up your infrastructure

Step 2: Connect to your Civo instance and install Ollama

Step 1: Generate a new key manually

Step 2: Add public key to Civo

Configuration options

Accessing your model

Interacting with your LLM using the API

Example: Using Ollama with Python

Removing the deployment

Summary

Kunal Kushwaha

Further reading

These may also be of interest

Get up and running with Kubeflow on Kubernetes

How to set up GPU for TensorFlow on Civo

Building a front end using Django with KubeFlow