Imagine your sales team asking: "show me last quarter’s top customers, but only from regions where revenue grew year-over-year by at least 12%, and break out their average order value by product category” and getting instant results, without writing a single line of SQL…
Suddenly, this isn’t a simple SELECT anymore. It’s joins, filters, date ranges, grouping, and conditional logic. That’s where Text-to-SQL shines, and in this tutorial, you’ll build a production-ready assistant that can handle exactly these kinds of real-world queries.
We'll create a lightweight FastAPI application that translates plain English questions into safe SQL queries using relaxAI's language models, executing them against a Civo Managed PostgreSQL database, and serving everything from a Kubernetes cluster with GPU support.
But what makes this special? Everything runs within your Civo environment, keeping your data private, latency low, and infrastructure unified.
Prerequisites
Before diving in, make sure you have the following in place:
- Civo account with GPU and Managed Database access (sign up | database quick start)
- relaxAI API key from the relaxAI dashboard
- Python & FastAPI basics (FastAPI documentation)
- Docker installed locally
- kubectl (Civo Kubernetes docs)
Project overview
Architecture Overview
Here's what we're building:
- FastAPI backend that accepts natural language queries
- relaxAI integration to generate SQL SELECT statements
- Safety layer that sanitizes and enforces read-only policies
- PostgreSQL execution against your Civo Managed Database
- Minimal frontend served as static HTML from the same container
- Kubernetes deployment on a Civo GPU cluster with LoadBalancer
The beauty of this architecture? One container, one deployment, publicly accessible via the Kubernetes Load Balancer.
Project Structure
.
├── Dockerfile
├── requirements.txt
├── main.py
├── k8s-deployment.yaml
└── static
└── index.html
Clean, simple, and production-ready.
Step 1: Set up your Civo Kubernetes cluster
First, provision a GPU-enabled cluster:
- Log in to the Civo Dashboard
- Create a new Kubernetes cluster
- Add a GPU node pool (one node is sufficient for this demo)
- Download your kubeconfig
- Connect to the cluster by running
set KUBECONFIG=path-to-kubeconfig-filein your terminal (useexportinstead ofseton Linux or macOS).
Verify connectivity:
kubectl get nodes
kubectl get pods -A

Step 2: Prepare a Civo Managed PostgreSQL
- Navigate to Databases in the Civo Dashboard
- Launch a PostgreSQL instance
- Note your connection string (visible in connection details)
Now, load a demo schema to test with. Connect using psql, TablePlus, or your preferred client:
CREATE DATABASE app;
\c app
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT,
country TEXT
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
category TEXT,
price NUMERIC
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
created_at TIMESTAMP
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT
);
INSERT INTO customers (name, country) VALUES
('Acme Corp', 'US'),
('Beta LLC', 'UK'),
('Cobalt Inc', 'DE');
INSERT INTO products (name, category, price) VALUES
('Laptop Pro', 'Electronics', 2200),
('Desk Chair', 'Furniture', 180),
('Noise Headset', 'Electronics', 320);
INSERT INTO orders (customer_id, created_at) VALUES
(1, '2025-01-10'),
(1, '2025-03-05'),
(2, '2025-02-11');
INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 1),
(2, 3, 2),
(3, 2, 4);
This simple schema is perfect for testing natural language queries like "Show me each customer’s total spend broken down by product category," or "Find customers whose average order value exceeds $1,000."
Step 3: Build the FastAPI backend
Create main.py with the complete application logic.
Import necessary libraries:
import os
import re
import pathlib
import json
import asyncio
from typing import List, Any, Optional, Dict
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import FileResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
import httpx
import asyncpg
from pydantic import BaseModel
from typing import Tuple
Load .env if present:
load_dotenv()
Create the necessary environment variables:
RELAX_API_KEY = os.getenv("RELAX_API_KEY") or os.getenv("RELAXAI_API_KEY")
RELAX_MODEL = os.getenv("RELAX_MODEL", "Llama-4-Maverick-17B-128E")
DATABASE_URL = os.getenv("DATABASE_URL")
Build DATABASEURL from individual DB* variables (if missing):
if not DATABASE_URL:
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("DB_NAME")
db_user = os.getenv("DB_USER")
db_pass = os.getenv("DB_PASS")
if db_host and db_name and db_user and db_pass:
DATABASE_URL = f"postgresql://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}"
Allow the server to start even when the configuration is incomplete:
if not RELAX_API_KEY:
print("Warning: RELAX_API_KEY (or RELAXAI_API_KEY) not set. /query will fail until configured.")
if not DATABASE_URL:
print("Warning: DATABASE_URL (or DB_HOST/DB_NAME/DB_USER/DB_PASS) not set. /query will fail until configured.")
Provide environment compatibility for SDKs expecting RELAXAIAPIKEY:
if RELAX_API_KEY:
os.environ.setdefault("RELAXAI_API_KEY", RELAX_API_KEY)
Initialize the FastAPI application:
app = FastAPI(title="Text-to-SQL Assistant (FastAPI + relaxAI)")
Mount static files and define the request model:
if pathlib.Path("static").exists():
app.mount("/static", StaticFiles(directory="static"), name="static")
class NLQuery(BaseModel):
nl_query: str
Call relaxAI and normalize its response:
async def call_relax_ai(messages: List[dict]) -> Dict[str, Any]:
if not RELAX_API_KEY:
raise HTTPException(status_code=500, detail="RELAX_API_KEY is not configured on the server.")
url = "https://api.relax.ai/v1/chat/completions"
payload = {
"model": RELAX_MODEL,
"messages": messages,
"temperature": 0.0,
"max_tokens": 800,
}
headers = {"Authorization": f"Bearer {RELAX_API_KEY}", "Content-Type": "application/json"}
async with httpx.AsyncClient(timeout=60.0) as client:
resp = await client.post(url, json=payload, headers=headers)
try:
resp.raise_for_status()
except httpx.HTTPStatusError as e:
body_text = resp.text if resp is not None else "<no body>"
raise HTTPException(status_code=500, detail=f"RelaxAI API error: {e}. Body: {body_text}")
try:
return resp.json()
except Exception:
return {"raw_text": resp.text}
def resp_to_text(resp: Any) -> str:
if not resp:
return ""
if isinstance(resp, dict):
choices = resp.get("choices")
if isinstance(choices, list) and choices:
first = choices[0]
if isinstance(first, dict):
msg = first.get("message") or {}
if isinstance(msg, dict) and msg.get("content"):
return msg.get("content")
if first.get("text"):
return first.get("text")
for key in ("content", "text", "raw_text"):
if isinstance(resp.get(key), str):
return resp.get(key)
return json.dumps(resp)
return str(resp)
Extract a SELECT statement and enforce a safe read-only SQL envelope:
def extract_first_select(text: str) -> Optional[str]:
if not text:
return None
# strip common code fences
text = re.sub(r"```(?:sql)?\n?", "", text, flags=re.I)
text = text.replace("```", "").strip()
# find first SELECT ... up to semicolon or blank line or end
m = re.search(r"(?is)(select\b.*?)(?:;|\n\n|$)", text)
if m:
candidate = m.group(1).strip()
candidate = re.sub(r"^[^a-zA-Z(]*", "", candidate)
return candidate
if re.search(r"(?i)\bselect\b", text):
return text.strip()
return None
def sanitize_sql(sql: str) -> str:
s = sql.strip().strip("`")
if ";" in s:
s = s.split(";", 1)[0]
if not re.match(r"(?i)^\s*SELECT\b", s):
raise ValueError("Only SELECT queries are allowed.")
forbidden = [
"INSERT",
"UPDATE",
"DELETE",
"DROP",
"ALTER",
"CREATE",
"TRUNCATE",
"GRANT",
"REVOKE",
"COPY",
"EXECUTE",
"SET",
]
for w in forbidden:
if re.search(rf"(?i)\b{w}\b", s):
raise ValueError(f"Query contains forbidden keyword: {w}")
if not re.search(r"(?i)\bLIMIT\b", s):
s = s + " LIMIT 100"
return s
Introspect the database schema and detect referenced tables:
async def introspect_schema(conn: asyncpg.Connection) -> Tuple[str, List[str]]:
rows = await conn.fetch(
"""
SELECT table_name, column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name NOT IN ('pg_stat_monitor')
AND table_name NOT LIKE 'pg_%'
AND table_name NOT LIKE 'sql_%'
ORDER BY table_name, ordinal_position
"""
)
schema: Dict[str, List[str]] = {}
for r in rows:
schema.setdefault(r["table_name"], []).append(f"{r['column_name']} {r['data_type']}")
lines = [f"{t}: {', '.join(cols)}" for t, cols in schema.items()]
# return both a human-readable schema and the allowed table list
return "\n".join(lines), list(schema.keys())
def extract_table_names_from_sql(sql: str) -> List[str]:
"""Extract table names referenced after FROM/JOIN (basic heuristic)."""
if not sql:
return []
# capture tokens after FROM or JOIN (handles quoted and schema.table)
tokens = re.findall(r"(?i)\b(?:from|join)\s+([`\"']?[A-Za-z0-9_.]+[`\"']?)", sql)
names = []
for t in tokens:
t = t.strip("`\"'")
if "." in t:
t = t.split(".")[-1]
names.append(t.lower())
return list(dict.fromkeys(names))
Main /query endpoint: generate, validate, and execute SQL:
@app.post("/query")
async def query(nl: NLQuery):
# Runtime config checks
if not RELAX_API_KEY:
raise HTTPException(status_code=500, detail="RELAX_API_KEY (or RELAXAI_API_KEY) not configured on the server.")
if not DATABASE_URL:
raise HTTPException(
status_code=500,
detail="DATABASE_URL (or DB_HOST/DB_NAME/DB_USER/DB_PASS) not configured on the server.",
)
# 1) get schema and allowed tables
conn = await asyncpg.connect(DATABASE_URL)
try:
schema_desc, allowed_tables = await introspect_schema(conn)
finally:
await conn.close()
# debug: log schema and allowed table list
print("--- DB schema sent to LLM (may be empty) ---")
print(schema_desc or "<empty schema - check DATABASE_URL/permissions>")
print("--- Allowed tables ---")
print(allowed_tables)
schema_description = """
Tables:
customers(id, name, country)
products(id, name, category, price)
orders(id, customer_id, created_at)
order_items(id, order_id, product_id, quantity)
"""
# 2) prepare messages (explicitly list allowed tables and forbid others)
system = {
"role": "system",
"content": f"You are a PostgreSQL SQL generator. Output only one valid SELECT query using the provided schema {schema_description}. Do not output explanations."
}
user_msg = {
"role": "user",
"content": (
f"Available tables and columns:\n{schema_desc}\n\n"
f"Question: {nl.nl_query}\n\n"
"Rules:\n"
"- Use only the tables and columns listed above. Do NOT reference any other tables (including system/monitoring tables like pg_stat_monitor).\n"
"- Output only the SELECT statement, no explanation.\n"
"- Add LIMIT 100 if missing."
)
}
# 3) call relax.ai
resp = await call_relax_ai([system, user_msg])
# debug: raw model response
try:
print("--- RelaxAI raw response ---")
print(resp)
except Exception:
pass
raw_text = resp_to_text(resp)
sql_generated = extract_first_select(raw_text) or (raw_text.splitlines()[-1].strip() if raw_text else "")
print("\n--- RelaxAI Generated SQL ---")
print(sql_generated)
if not sql_generated:
raise HTTPException(status_code=500, detail="Failed to extract SQL from RelaxAI response.")
# 4) basic validation: ensure SQL only references allowed tables
referenced = extract_table_names_from_sql(sql_generated)
forbidden_refs = [t for t in referenced if t not in [at.lower() for at in allowed_tables]]
if forbidden_refs:
raise HTTPException(
status_code=400,
detail=f"Generated SQL references disallowed tables: {forbidden_refs}. The model must use only: {allowed_tables}"
)
# 5) sanitize SQL
try:
safe_sql = sanitize_sql(sql_generated)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
# 6) execute SQL
conn2 = await asyncpg.connect(DATABASE_URL)
try:
rows = await conn2.fetch(safe_sql)
results = [dict(r) for r in rows]
finally:
await conn2.close()
return {"sql": safe_sql, "rows": results}
Key design decisions
- relaxAI integration: Uses the
/v1/chat/completionsendpoint with zero temperature for deterministic SQL generation - Safety first: The
sanitize_sqlfunction blocks all DDL/DML operations and enforces a defaultLIMIT 100 - Schema awareness: Dynamically introspects your database so the LLM knows exactly what tables and columns exist
- Async throughout: Uses
asyncpgandhttpxfor non-blocking I/O
Step 4: Create the frontend
Create index.html for a minimal but functional UI:
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>Text-to-SQL Assistant</title>
<style>
body { font-family: system-ui, sans-serif; max-width: 900px; margin: 32px auto; }
textarea { width: 100%; height: 80px; }
pre { background: #f6f8fa; padding: 12px; overflow:auto; }
table { width:100%; border-collapse: collapse; margin-top:12px;}
th,td { border:1px solid #ddd; padding:6px; text-align:left; }
</style>
</head>
<body>
<h1>Text-to-SQL Assistant</h1>
<p>Type a question about the DB and press Generate.</p>
<textarea id="nl" placeholder="e.g. Top 5 customers by revenue"></textarea><br/>
<button id="go">Generate & Run</button>
<h3>Generated SQL</h3>
<pre id="sql">—</pre>
<h3>Results</h3>
<div id="results">—</div>
<script>
document.getElementById('go').onclick = async () => {
const q = document.getElementById('nl').value;
const res = await fetch('/query', {
method:'POST',
headers:{'Content-Type':'application/json'},
body: JSON.stringify({ nl_query: q })
});
const data = await res.json();
if (!res.ok) {
alert(JSON.stringify(data));
return;
}
document.getElementById('sql').innerText = data.sql;
const rows = data.rows || [];
if (rows.length === 0) {
document.getElementById('results').innerText = 'No rows';
return;
}
const cols = Object.keys(rows[0]);
let html = '<table><thead><tr>' +
cols.map(c=>`<th>${c}</th>`).join('') +
'</tr></thead><tbody>';
for (const r of rows) {
html += '<tr>' +
cols.map(c=>`<td>${(r[c]===null)?'NULL':String(r[c])}</td>`).join('') +
'</tr>';
}
html += '</tbody></table>';
document.getElementById('results').innerHTML = html;
};
</script>
</body>
</html>
This interface provides a clean, responsive experience with loading states and error handling. Here’s what it should look like:

Step 5: Containerize with Docker
Create requirements.txt:
fastapi
uvicorn[standard]
httpx
asyncpg
pydantic
Create Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN apt-get update && \
apt-get install -y gcc libpq-dev curl && \
pip install --no-cache-dir -r requirements.txt && \
apt-get remove -y gcc && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
# Copy application code
COPY . .
EXPOSE 80
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "1"]
Build and push your image:
# Build
docker build -t <your-dockerhub-username>/text-to-sql:latest .
# Push
docker push <your-dockerhub-username>/text-to-sql:latest
Step 6: Deploy to Kubernetes
First, create a Kubernetes secret with your credentials:
kubectl create secret generic texttosql-secrets \
--from-literal=RELAX_API_KEY='your_relaxai_api_key_here' \
--from-literal=DATABASE_URL='postgres://user:pass@your-civo-db-host:5432/dbname'
Create k8s-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: texttosql
labels:
app: texttosql
spec:
replicas: 1
selector:
matchLabels:
app: texttosql
template:
metadata:
labels:
app: texttosql
spec:
containers:
- name: texttosql
image: <your-dockerhub-username>/text-to-sql:latest
ports:
- containerPort: 80
envFrom:
- secretRef:
name: texttosql-secrets
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: texttosql-lb
spec:
type: LoadBalancer
selector:
app: texttosql
ports:
- port: 80
targetPort: 80
Deploy to your cluster:
kubectl apply -f k8s-deployment.yaml
# Wait for LoadBalancer IP
kubectl get svc texttosql-lb -w
Once the EXTERNAL-IP appears, open http://<EXTERNAL-IP>/static/index.html in your browser!
Step 7: Test your assistant
Navigate to your LoadBalancer's external IP and try these queries:
- "Show me each customer’s total spend broken down by product category."
- "Find customers whose average order value exceeds $1,000."
- "Show me the total revenue generated by each customer."
- "Show me each customer’s total order amount."
Watch as your natural language questions transform into SQL and return real results. Here’s what it should look like:

Monitoring & optimization
Check your deployment:
# View logs
kubectl logs -f deploy/texttosql
# Check pod status
kubectl get pods -l app=texttosql
# Monitor GPU usage (if using on-host inference)
kubectl top nodes
Performance tips:
- For production, scale replicas:
kubectl scale deployment texttosql --replicas=3 - Monitor GPU utilization in the Civo Dashboard
- Consider connection pooling for high-traffic scenarios
- Add caching for frequently-asked questions
Security Best Practices
While this tutorial focuses on functionality, consider these production hardening steps:
- Database roles: Create a read-only PostgreSQL user for the application
- Rate limiting: Add request throttling to prevent abuse
- Input validation: Extend
sanitize_sqlwith more sophisticated parsing - Audit logging: Track all queries with user attribution
- HTTPS: Use Kubernetes Ingress with TLS certificates
What's next?
You've built a working Text-to-SQL assistant, but here's where it gets interesting:
Add query history
CREATE TABLE query_history (
id SERIAL PRIMARY KEY,
nl_query TEXT,
generated_sql TEXT,
timestamp TIMESTAMP DEFAULT NOW()
);
Store successful queries in a query_history table.
Implement RAG for better context
Enhance your prompts with:
- Table and column descriptions
- Common query patterns
- Business logic documentation
Check out relaxAI's RAG documentation for integration patterns.
Key takeaways
Text-to-SQL lets anyone query data using plain language, eliminating the need for SQL expertise. With relaxAI running inside your secure Civo environment, you maintain full control while benefiting from a lightweight, single-container deployment.
The system is production-ready with built-in safety, monitoring, and error handling, and it can be easily extended with caching, authentication, or custom models. This makes it a scalable, developer-friendly foundation for both prototypes and enterprise workloads.