Want to run your own private Large Language Model (LLM), fully under your control and optimized for performance? In this tutorial, you'll learn how to deploy an LLM using Ollama on GPU-powered Civo infrastructure, giving you the freedom to run models locally with full data ownership.
This walkthrough is based on a live demo I presented during a recent webinar with Civo’s Chief Innovation Officer, Josh Mesout. If you'd prefer to watch the setup in action, here’s the demo video 👇
Alternatively, you can watch the full webinar here to learn more about GenAI and how it is the key to unlocking new opportunities and driving success in your organization.
Before we get started, it is important to understand more about the tools that we are going to be using, Ollama and Civo:
- Ollama: Ollama is a lightweight, fast runtime for large language models that runs models locally. It allows you to run LLMs on your own infrastructure, giving you control over the model and data.
- Civo: Civo is building the new era of cloud computing, where speed, simplicity, and predictable billing unite. It offers a range of features, including GPU-powered instances, Kubernetes support, and a user-friendly interface.
There are many tools that can be used to build a private LLM; however, for this tutorial, I have decided to focus on Ollama and Civo as they offer users a range of benefits, such as:
Feature | Description |
---|---|
Control over the model and data | By deploying a private LLM using Ollama on Civo, you have complete control over the model and data. This is particularly important for organizations that handle sensitive information. |
Customization and flexibility | Ollama allows you to customize the LLM to suit your specific needs. You can choose from a range of models, fine-tune them, and deploy them on Civo. |
Scalability and performance | Civo's GPU-powered instances provide the necessary compute resources to run LLMs efficiently. You can scale your infrastructure up or down as needed. |
Security and compliance | By deploying a private LLM on Civo, you can ensure that your data is secure and compliant with relevant regulations. |
Prerequisites
Before getting started with this tutorial, make sure you have the following prerequisites in place:
- Sign up for a Civo account with API access
- Have Terraform installed on your local machine
When following along with this tutorial, you will need to refer to the GitHub repository found here. It provides a complete Terraform setup to deploy and run private LLMs using Ollama on Civo’s GPU instances. The repo automates everything from infrastructure provisioning to model setup and API access, making it easy to get started without manual configuration.
Deployment steps
Step 1: Set up your infrastructure
Start by creating a file named terraform.tfvars
in the root directory with your Civo API key:
civo_token = "YOUR_API_KEY"
Or you can set the API key in the environment using this command:
set CIVO_TOKEN= YOUR_API_KEY
For details on how to find your API key, click here.
Once this is complete, initialize and apply the Terraform configuration:
terraform init
terraform plan
terraform apply
You’ll need to wait for the initial setup to complete (which might take 15-30 minutes). This will include: instance provisioning, CUDA installation and configuration, Ollama setup, and model downloading.
Step 2: Connect to your Civo instance and install Ollama
At this stage, you’ll need to connect to your Civo instance using SSH - this is done by finding the IP address of your Civo instance in the Civo dashboard.
Open a terminal on your local machine and use the SSH command to connect to your Civo instance. For example:
ssh civo@<instance-ip-address>
Replace <instance-ip-address>
with the actual IP address of your Civo instance. To find your <instance-ip-address>
, run this command:
civo instance show <instance-name> --region <your-region>
Next, you’ll need to enter your password or use your SSH key to authenticate.
To authenticate with your SSH key first, you need to create the civo-key.pem
file, which contains the OpenSSH PRIVATE KEY. To create this file, follow the steps below:
Step 1: Generate a new key manually
In your terminal (Git Bash or WSL):
ssh-keygen -t rsa -b 4096 -f civo-key.pem
This creates:
civo-key.pem
(private key)civo-key.pem.pub
(public key)
Step 2: Add public key to Civo
- Open
civo-key.pem.pub
in Notepad - Copy contents
- Go to Civo Dashboard → SSH Keys
- Click “Add SSH Key” → paste → save
Once this has been done, run the below command:
ssh -i<file-location>/civo-key.pem civo@<instance-public-ip>
Once connected, install Ollama by running the following command:
curl -fsSL https://ollama.ai/install.sh | sh
Follow the installation instructions to complete the setup.
Configuration options
The deployment can be customized by modifying the script.sh
file (found here). Key configurable parameters include:
MODEL_NAME="llama2" # The Ollama model to run
CUDA_GPU="0" # GPU device to use (0, 1, etc.)
API_PORT="11434" # Port for Ollama API (default is 11434)
You will be able to deploy any model supported by Ollama by changing the MODEL_NAME
variable. Model download times will vary depending on the size.
Accessing your model
After the deployment has been completed, connection information is automatically generated and stored on the server at /etc/ollama/server_info.txt
. This file contains information such as the public IP address, API port, default model name, and example API calls.
You’ll also be able to check the system log for this information after the instance boots.
Interacting with your LLM using the API
Use the Ollama API to interact with your LLM. You can send requests to the API endpoint to generate text, answer questions, or perform other tasks.
Use a tool like curl
to test the API endpoint:
curl -X POST -H "Content-Type: application/json" -d '{“model": ”llama2”, “prompt": "Hello, world!"}' http://${CIVO_INSTANCE_PUBLIC_IP}:11434/api/generate
Example: Using Ollama with Python
import requests
# Set the API endpoint URL
url = "http://${CIVO_INSTANCE_PUBLIC_IP}:11434/api/generate"
# Set the prompt and other parameters
prompt = "Hello, world!"
params = {"prompt": prompt}
# Send a POST request to the API endpoint
response = requests.post(url, json=params)
# Print the response
print(response.json())
Removing the deployment
To completely remove the deployment:
terraform destroy
This will terminate the instance and clean up all associated resources.
Summary
In this tutorial, we have gone through how to deploy a private LLM using Ollama on Civo. By following these steps, you can gain control over your LLM and data, customize the model to suit your needs, and scale your infrastructure as required.
With Ollama and Civo, you can build a powerful and flexible LLM solution that meets your organization's needs. If you want to learn more about these topics, check out some of these resources: