Imagine your sales team asking: "show me last quarter’s top customers, but only from regions where revenue grew year-over-year by at least 12%, and break out their average order value by product category” and getting instant results, without writing a single line of SQL…

Suddenly, this isn’t a simple SELECT anymore. It’s joins, filters, date ranges, grouping, and conditional logic. That’s where Text-to-SQL shines, and in this tutorial, you’ll build a production-ready assistant that can handle exactly these kinds of real-world queries.

We'll create a lightweight FastAPI application that translates plain English questions into safe SQL queries using relaxAI's language models, executing them against a Civo Managed PostgreSQL database, and serving everything from a Kubernetes cluster with GPU support.

But what makes this special? Everything runs within your Civo environment, keeping your data private, latency low, and infrastructure unified.

Prerequisites

Before diving in, make sure you have the following in place:

Pro tip: Create your Civo database and generate your relaxAI API key before starting, as you'll need both for deployment.

Project overview

Architecture Overview

Here's what we're building:

  1. FastAPI backend that accepts natural language queries
  2. relaxAI integration to generate SQL SELECT statements
  3. Safety layer that sanitizes and enforces read-only policies
  4. PostgreSQL execution against your Civo Managed Database
  5. Minimal frontend served as static HTML from the same container
  6. Kubernetes deployment on a Civo GPU cluster with LoadBalancer

The beauty of this architecture? One container, one deployment, publicly accessible via the Kubernetes Load Balancer.

Project Structure

.
├── Dockerfile
├── requirements.txt
├── main.py
├── k8s-deployment.yaml
└── static
    └── index.html

Clean, simple, and production-ready.

Step 1: Set up your Civo Kubernetes cluster

First, provision a GPU-enabled cluster:

  1. Log in to the Civo Dashboard
  2. Create a new Kubernetes cluster
  3. Add a GPU node pool (one node is sufficient for this demo)
  4. Download your kubeconfig
  5. Connect to the cluster by running set KUBECONFIG=path-to-kubeconfig-file in your terminal (use export instead of set on Linux or macOS).

Verify connectivity:

kubectl get nodes
kubectl get pods -A

Set up your Civo Kubernetes cluster

Why GPU nodes? While this tutorial calls relaxAI’s cloud endpoint, deploying on GPU nodes ensures the infrastructure is ready for future on-host inference experiments.

Step 2: Prepare a Civo Managed PostgreSQL

  1. Navigate to Databases in the Civo Dashboard
  2. Launch a PostgreSQL instance
  3. Note your connection string (visible in connection details)

Now, load a demo schema to test with. Connect using psql, TablePlus, or your preferred client:

CREATE DATABASE app;

\c app

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name TEXT,
  country TEXT
);

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT,
  category TEXT,
  price NUMERIC
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT REFERENCES customers(id),
  created_at TIMESTAMP
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT
);

INSERT INTO customers (name, country) VALUES
  ('Acme Corp', 'US'),
  ('Beta LLC', 'UK'),
  ('Cobalt Inc', 'DE');

INSERT INTO products (name, category, price) VALUES
  ('Laptop Pro', 'Electronics', 2200),
  ('Desk Chair', 'Furniture', 180),
  ('Noise Headset', 'Electronics', 320);

INSERT INTO orders (customer_id, created_at) VALUES
  (1, '2025-01-10'),
  (1, '2025-03-05'),
  (2, '2025-02-11');

INSERT INTO order_items (order_id, product_id, quantity) VALUES
  (1, 1, 1),
  (2, 3, 2),
  (3, 2, 4);

This simple schema is perfect for testing natural language queries like "Show me each customer’s total spend broken down by product category," or "Find customers whose average order value exceeds $1,000."

Step 3: Build the FastAPI backend

Create main.py with the complete application logic.

Import necessary libraries:

import os
import re
import pathlib
import json
import asyncio
from typing import List, Any, Optional, Dict
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import FileResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
import httpx
import asyncpg
from pydantic import BaseModel
from typing import Tuple  

Load .env if present:

load_dotenv()

Create the necessary environment variables:

RELAX_API_KEY = os.getenv("RELAX_API_KEY") or os.getenv("RELAXAI_API_KEY")
RELAX_MODEL = os.getenv("RELAX_MODEL", "Llama-4-Maverick-17B-128E")
DATABASE_URL = os.getenv("DATABASE_URL")

Build DATABASEURL from individual DB* variables (if missing):

if not DATABASE_URL:
    db_host = os.getenv("DB_HOST")
    db_port = os.getenv("DB_PORT", "5432")
    db_name = os.getenv("DB_NAME")
    db_user = os.getenv("DB_USER")
    db_pass = os.getenv("DB_PASS")
    if db_host and db_name and db_user and db_pass:
        DATABASE_URL = f"postgresql://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}"

Allow the server to start even when the configuration is incomplete:

if not RELAX_API_KEY:
    print("Warning: RELAX_API_KEY (or RELAXAI_API_KEY) not set. /query will fail until configured.")
if not DATABASE_URL:
    print("Warning: DATABASE_URL (or DB_HOST/DB_NAME/DB_USER/DB_PASS) not set. /query will fail until configured.")

Provide environment compatibility for SDKs expecting RELAXAIAPIKEY:

if RELAX_API_KEY:
    os.environ.setdefault("RELAXAI_API_KEY", RELAX_API_KEY)

Initialize the FastAPI application:

app = FastAPI(title="Text-to-SQL Assistant (FastAPI + relaxAI)")

Mount static files and define the request model:

if pathlib.Path("static").exists():
    app.mount("/static", StaticFiles(directory="static"), name="static")


class NLQuery(BaseModel):
    nl_query: str

Call relaxAI and normalize its response:

async def call_relax_ai(messages: List[dict]) -> Dict[str, Any]:
    if not RELAX_API_KEY:
        raise HTTPException(status_code=500, detail="RELAX_API_KEY is not configured on the server.")
    url = "https://api.relax.ai/v1/chat/completions"
    payload = {
        "model": RELAX_MODEL,
        "messages": messages,
        "temperature": 0.0,
        "max_tokens": 800,
    }
    headers = {"Authorization": f"Bearer {RELAX_API_KEY}", "Content-Type": "application/json"}
    async with httpx.AsyncClient(timeout=60.0) as client:
        resp = await client.post(url, json=payload, headers=headers)
        try:
            resp.raise_for_status()
        except httpx.HTTPStatusError as e:
            body_text = resp.text if resp is not None else "<no body>"
            raise HTTPException(status_code=500, detail=f"RelaxAI API error: {e}. Body: {body_text}")
        try:
            return resp.json()
        except Exception:
            return {"raw_text": resp.text}


def resp_to_text(resp: Any) -> str:
    if not resp:
        return ""
    if isinstance(resp, dict):
        choices = resp.get("choices")
        if isinstance(choices, list) and choices:
            first = choices[0]
            if isinstance(first, dict):
                msg = first.get("message") or {}
                if isinstance(msg, dict) and msg.get("content"):
                    return msg.get("content")
                if first.get("text"):
                    return first.get("text")
        for key in ("content", "text", "raw_text"):
            if isinstance(resp.get(key), str):
                return resp.get(key)
        return json.dumps(resp)
    return str(resp)

Extract a SELECT statement and enforce a safe read-only SQL envelope:

def extract_first_select(text: str) -> Optional[str]:
    if not text:
        return None
    # strip common code fences
    text = re.sub(r"```(?:sql)?\n?", "", text, flags=re.I)
    text = text.replace("```", "").strip()
    # find first SELECT ... up to semicolon or blank line or end
    m = re.search(r"(?is)(select\b.*?)(?:;|\n\n|$)", text)
    if m:
        candidate = m.group(1).strip()
        candidate = re.sub(r"^[^a-zA-Z(]*", "", candidate)
        return candidate
    if re.search(r"(?i)\bselect\b", text):
        return text.strip()
    return None


def sanitize_sql(sql: str) -> str:
    s = sql.strip().strip("`")
    if ";" in s:
        s = s.split(";", 1)[0]
    if not re.match(r"(?i)^\s*SELECT\b", s):
        raise ValueError("Only SELECT queries are allowed.")
    forbidden = [
        "INSERT",
        "UPDATE",
        "DELETE",
        "DROP",
        "ALTER",
        "CREATE",
        "TRUNCATE",
        "GRANT",
        "REVOKE",
        "COPY",
        "EXECUTE",
        "SET",
    ]
    for w in forbidden:
        if re.search(rf"(?i)\b{w}\b", s):
            raise ValueError(f"Query contains forbidden keyword: {w}")
    if not re.search(r"(?i)\bLIMIT\b", s):
        s = s + " LIMIT 100"
    return s

Introspect the database schema and detect referenced tables:

async def introspect_schema(conn: asyncpg.Connection) -> Tuple[str, List[str]]:
    rows = await conn.fetch(
        """
        SELECT table_name, column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'public'
          AND table_name NOT IN ('pg_stat_monitor')
          AND table_name NOT LIKE 'pg_%'
          AND table_name NOT LIKE 'sql_%'
        ORDER BY table_name, ordinal_position
    """
    )
    schema: Dict[str, List[str]] = {}
    for r in rows:
        schema.setdefault(r["table_name"], []).append(f"{r['column_name']} {r['data_type']}")
    lines = [f"{t}: {', '.join(cols)}" for t, cols in schema.items()]
    # return both a human-readable schema and the allowed table list
    return "\n".join(lines), list(schema.keys())


def extract_table_names_from_sql(sql: str) -> List[str]:
    """Extract table names referenced after FROM/JOIN (basic heuristic)."""
    if not sql:
        return []
    # capture tokens after FROM or JOIN (handles quoted and schema.table)
    tokens = re.findall(r"(?i)\b(?:from|join)\s+([`\"']?[A-Za-z0-9_.]+[`\"']?)", sql)
    names = []
    for t in tokens:
        t = t.strip("`\"'")
        if "." in t:
            t = t.split(".")[-1]
        names.append(t.lower())
    return list(dict.fromkeys(names))

Main /query endpoint: generate, validate, and execute SQL:

@app.post("/query")
async def query(nl: NLQuery):
    # Runtime config checks
    if not RELAX_API_KEY:
        raise HTTPException(status_code=500, detail="RELAX_API_KEY (or RELAXAI_API_KEY) not configured on the server.")
    if not DATABASE_URL:
        raise HTTPException(
            status_code=500,
            detail="DATABASE_URL (or DB_HOST/DB_NAME/DB_USER/DB_PASS) not configured on the server.",
        )

    # 1) get schema and allowed tables
    conn = await asyncpg.connect(DATABASE_URL)
    try:
        schema_desc, allowed_tables = await introspect_schema(conn)
    finally:
        await conn.close()

    # debug: log schema and allowed table list
    print("--- DB schema sent to LLM (may be empty) ---")
    print(schema_desc or "<empty schema - check DATABASE_URL/permissions>")
    print("--- Allowed tables ---")
    print(allowed_tables)
    schema_description = """
Tables:
customers(id, name, country)
products(id, name, category, price)
orders(id, customer_id, created_at)
order_items(id, order_id, product_id, quantity)
"""
    # 2) prepare messages (explicitly list allowed tables and forbid others)
    system = {
        "role": "system",
        "content": f"You are a PostgreSQL SQL generator. Output only one valid SELECT query using the provided schema {schema_description}. Do not output explanations."
    }
    user_msg = {
        "role": "user",
        "content": (
            f"Available tables and columns:\n{schema_desc}\n\n"
            f"Question: {nl.nl_query}\n\n"
            "Rules:\n"
            "- Use only the tables and columns listed above. Do NOT reference any other tables (including system/monitoring tables like pg_stat_monitor).\n"
            "- Output only the SELECT statement, no explanation.\n"
            "- Add LIMIT 100 if missing."
        )
    }

    # 3) call relax.ai
    resp = await call_relax_ai([system, user_msg])

    # debug: raw model response
    try:
        print("--- RelaxAI raw response ---")
        print(resp)
    except Exception:
        pass

    raw_text = resp_to_text(resp)
    sql_generated = extract_first_select(raw_text) or (raw_text.splitlines()[-1].strip() if raw_text else "")

    print("\n--- RelaxAI Generated SQL ---")
    print(sql_generated)

    if not sql_generated:
        raise HTTPException(status_code=500, detail="Failed to extract SQL from RelaxAI response.")

    # 4) basic validation: ensure SQL only references allowed tables
    referenced = extract_table_names_from_sql(sql_generated)
    forbidden_refs = [t for t in referenced if t not in [at.lower() for at in allowed_tables]]
    if forbidden_refs:
        raise HTTPException(
            status_code=400,
            detail=f"Generated SQL references disallowed tables: {forbidden_refs}. The model must use only: {allowed_tables}"
        )

    # 5) sanitize SQL
    try:
        safe_sql = sanitize_sql(sql_generated)
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))

    # 6) execute SQL
    conn2 = await asyncpg.connect(DATABASE_URL)
    try:
        rows = await conn2.fetch(safe_sql)
        results = [dict(r) for r in rows]
    finally:
        await conn2.close()

    return {"sql": safe_sql, "rows": results}  

Key design decisions

  • relaxAI integration: Uses the /v1/chat/completions endpoint with zero temperature for deterministic SQL generation
  • Safety first: The sanitize_sql function blocks all DDL/DML operations and enforces a default LIMIT 100
  • Schema awareness: Dynamically introspects your database so the LLM knows exactly what tables and columns exist
  • Async throughout: Uses asyncpg and httpx for non-blocking I/O

Step 4: Create the frontend

Create index.html for a minimal but functional UI:

<!doctype html>
<html>
<head>
  <meta charset="utf-8" />
  <title>Text-to-SQL Assistant</title>
  <style>
    body { font-family: system-ui, sans-serif; max-width: 900px; margin: 32px auto; }
    textarea { width: 100%; height: 80px; }
    pre { background: #f6f8fa; padding: 12px; overflow:auto; }
    table { width:100%; border-collapse: collapse; margin-top:12px;}
    th,td { border:1px solid #ddd; padding:6px; text-align:left; }
  </style>
</head>
<body>
  <h1>Text-to-SQL Assistant</h1>
  <p>Type a question about the DB and press Generate.</p>
  <textarea id="nl" placeholder="e.g. Top 5 customers by revenue"></textarea><br/>
  <button id="go">Generate & Run</button>

  <h3>Generated SQL</h3>
  <pre id="sql">—</pre>

  <h3>Results</h3>
  <div id="results">—</div>

  <script>
    document.getElementById('go').onclick = async () => {
      const q = document.getElementById('nl').value;
      const res = await fetch('/query', {
        method:'POST',
        headers:{'Content-Type':'application/json'},
        body: JSON.stringify({ nl_query: q })
      });
      const data = await res.json();

      if (!res.ok) {
        alert(JSON.stringify(data));
        return;
      }

      document.getElementById('sql').innerText = data.sql;

      const rows = data.rows || [];
      if (rows.length === 0) {
        document.getElementById('results').innerText = 'No rows';
        return;
      }

      const cols = Object.keys(rows[0]);
      let html = '<table><thead><tr>' +
        cols.map(c=>`<th>${c}</th>`).join('') +
        '</tr></thead><tbody>';

      for (const r of rows) {
        html += '<tr>' +
          cols.map(c=>`<td>${(r[c]===null)?'NULL':String(r[c])}</td>`).join('') +
          '</tr>';
      }

      html += '</tbody></table>';
      document.getElementById('results').innerHTML = html;
    };
  </script>
</body>
</html>

This interface provides a clean, responsive experience with loading states and error handling. Here’s what it should look like:

Lightweight Text-to-SQL Assistant on Civo GPUs

Step 5: Containerize with Docker

Create requirements.txt:

fastapi
uvicorn[standard]
httpx
asyncpg
pydantic

Create Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN apt-get update && \
    apt-get install -y gcc libpq-dev curl && \
    pip install --no-cache-dir -r requirements.txt && \
    apt-get remove -y gcc && \
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

# Copy application code
COPY . .

EXPOSE 80

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "1"]

Build and push your image:

# Build
docker build -t <your-dockerhub-username>/text-to-sql:latest .

# Push
docker push <your-dockerhub-username>/text-to-sql:latest

Step 6: Deploy to Kubernetes

First, create a Kubernetes secret with your credentials:

kubectl create secret generic texttosql-secrets \
  --from-literal=RELAX_API_KEY='your_relaxai_api_key_here' \
  --from-literal=DATABASE_URL='postgres://user:pass@your-civo-db-host:5432/dbname'

Create k8s-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: texttosql
  labels:
    app: texttosql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: texttosql
  template:
    metadata:
      labels:
        app: texttosql
    spec:
      containers:
      - name: texttosql
        image: <your-dockerhub-username>/text-to-sql:latest
        ports:
        - containerPort: 80
        envFrom:
        - secretRef:
            name: texttosql-secrets
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: texttosql-lb
spec:
  type: LoadBalancer
  selector:
    app: texttosql
  ports:
  - port: 80
    targetPort: 80

Deploy to your cluster:

kubectl apply -f k8s-deployment.yaml

# Wait for LoadBalancer IP
kubectl get svc texttosql-lb -w

Once the EXTERNAL-IP appears, open http://<EXTERNAL-IP>/static/index.html in your browser!

Step 7: Test your assistant

Navigate to your LoadBalancer's external IP and try these queries:

  • "Show me each customer’s total spend broken down by product category."
  • "Find customers whose average order value exceeds $1,000."
  • "Show me the total revenue generated by each customer."
  • "Show me each customer’s total order amount."

Watch as your natural language questions transform into SQL and return real results. Here’s what it should look like:

Lightweight Text-to-SQL Assistant on Civo GPUs

Monitoring & optimization

Check your deployment:

# View logs
kubectl logs -f deploy/texttosql

# Check pod status
kubectl get pods -l app=texttosql

# Monitor GPU usage (if using on-host inference)
kubectl top nodes

Performance tips:

  • For production, scale replicas: kubectl scale deployment texttosql --replicas=3
  • Monitor GPU utilization in the Civo Dashboard
  • Consider connection pooling for high-traffic scenarios
  • Add caching for frequently-asked questions

Security Best Practices

While this tutorial focuses on functionality, consider these production hardening steps:

  1. Database roles: Create a read-only PostgreSQL user for the application
  2. Rate limiting: Add request throttling to prevent abuse
  3. Input validation: Extend sanitize_sql with more sophisticated parsing
  4. Audit logging: Track all queries with user attribution
  5. HTTPS: Use Kubernetes Ingress with TLS certificates

What's next?

You've built a working Text-to-SQL assistant, but here's where it gets interesting:

Add query history


CREATE TABLE query_history (
  id SERIAL PRIMARY KEY,
  nl_query TEXT,
  generated_sql TEXT,
  timestamp TIMESTAMP DEFAULT NOW()
);

Store successful queries in a query_history table.

Implement RAG for better context

Enhance your prompts with:

  • Table and column descriptions
  • Common query patterns
  • Business logic documentation

Check out relaxAI's RAG documentation for integration patterns.

Key takeaways

Text-to-SQL lets anyone query data using plain language, eliminating the need for SQL expertise. With relaxAI running inside your secure Civo environment, you maintain full control while benefiting from a lightweight, single-container deployment.

The system is production-ready with built-in safety, monitoring, and error handling, and it can be easily extended with caching, authentication, or custom models. This makes it a scalable, developer-friendly foundation for both prototypes and enterprise workloads.