Building a Data-Driven Recommendation System with relaxAI

Personalized recommendations have become the cornerstone of modern digital experiences. Whether you’re running an e-commerce store, a content platform, or conducting data science experiments, the ability to surface relevant products or articles at just the right moment can dramatically improve user engagement, satisfaction, and conversions.

According to McKinsey & Company, personalization in sales and marketing can reduce customer acquisition costs by up to 50%, lift revenues by 5% to 15%, and boost marketing ROI by 10% to 30%, making it one of the most financially impactful strategies businesses can implement.

In this tutorial, you’ll learn how to build a robust recommendation system by combining relaxAI, a lightweight Python model, and a clear, modular pipeline to handle data preparation, embedding, training, evaluation, and recommendations. By the end, you’ll have a fully working pipeline that can be adapted to your own datasets and use cases.

Prerequisites

Before diving in, ensure you have the following prerequisites in place:

Python 3.8+ installed on your machine
pip or conda for package management
relaxAI API key: Sign up at relaxAI to obtain your key
Familiarity with basic Python scripting and package installation
A code editor of your choice (VS Code, PyCharm, etc.)

You do not need any special hardware, as this tutorial runs comfortably on a laptop.

Pipeline overview

The recommendation pipeline consists of the following stages:

Dataset Preparation: Defining a small set of users, items (with descriptions), and interaction records (views/purchases).
Embedding Generation: Calling relaxAI to transform each item description into a high-dimensional embedding vector.
Model Training: Combining embeddings with user features to train a logistic regression model predicting interaction likelihoods.
Model Evaluation: Assessing accuracy and ranking metrics (e.g., Precision@K) using a held-out test set.
Recommendation Generation: Scoring all candidate items for a user and returning the top-K suggestions.
Extensions: Scaling to larger datasets, swapping in advanced ranking models, or integrating into a web service.

This modular approach keeps each component focused and easy to adapt. We will now explore the six stages in more detail.

Step 1: Dataset preparation

Start by creating a lightweight synthetic dataset in Python. Save the following as dataset.py:

1import pandas as pd
2from sklearn.model_selection import train_test_split
3
4
5# Sample items with textual descriptions
6items = pd.DataFrame({
7    'item_id': [101, 102, 103, 104, 105, 106],
8    'description': [
9        "Wireless noise-cancelling headphones with 30h battery life",
10        "Stainless steel travel mug, 16oz, spill-proof lid",
11        "Organic dark roast coffee beans, 1kg bag",
12        "Smartwatch with heart rate monitoring and GPS",
13        "Yoga mat with non-slip surface, eco-friendly material",
14        "Portable Bluetooth speaker, waterproof, 10W output"
15    ]
16})
17
18
19# Simulated user–item interactions
20interactions = pd.DataFrame([
21    {'user_id': 1, 'item_id': 101, 'interaction': 1},
22    {'user_id': 1, 'item_id': 103, 'interaction': 1},
23    {'user_id': 2, 'item_id': 102, 'interaction': 1},
24    {'user_id': 2, 'item_id': 105, 'interaction': 1},
25    {'user_id': 3, 'item_id': 104, 'interaction': 1},
26    {'user_id': 3, 'item_id': 106, 'interaction': 1},
27    # Negative samples
28    {'user_id': 1, 'item_id': 102, 'interaction': 0},
29    {'user_id': 1, 'item_id': 104, 'interaction': 0},
30    {'user_id': 2, 'item_id': 103, 'interaction': 0},
31    {'user_id': 2, 'item_id': 106, 'interaction': 0},
32    {'user_id': 3, 'item_id': 101, 'interaction': 0},
33    {'user_id': 3, 'item_id': 105, 'interaction': 0},
34])
35
36
37# Split for training and testing
38train_df, test_df = train_test_split(
39    interactions,
40    test_size=0.3,
41    random_state=42
42)
43
44
45if __name__ == "__main__":
46    print("Items:\n", items)
47    print("\nTrain interactions:\n", train_df)
48    print("\nTest interactions:\n", test_df)

The above input will create:

items: Six products with human-friendly descriptions
interactions: Simulated positive (1) and negative (0) interactions for three users
A 70/30 train-test split to evaluate our model fairly

Next, you will need to run the following to verify your data:

1python dataset.py

Step 2: Generating embeddings

To help our model understand item content, we’ll convert each product description into a numerical vector (embedding) that captures its meaning in high-dimensional space. These embeddings act as learned representations that the recommendation model can reason over.

relaxAI is fully compatible with OpenAI’s embedding API; it follows the same interface, authentication scheme, and response format. This means we don’t need a new SDK or custom integration. We can use LangChain’s OpenAIEmbeddings class directly by changing just two parameters:

Openai_api_key → set to your RELAXAI_API_KEY
Openai_api_base→ set to api.relax.ai/v1/

This compatibility simplifies the process significantly: anywhere you would use OpenAI’s embeddings, you can swap in relaxAI with minimal changes. Start by creating embed_items.py:

1import os
2import pandas as pd
3from langchain_openai import OpenAIEmbeddings
4# from dataset import items
5
6
7# Ensure your RELAXAI_API_KEY is set in your environment
8api_key = os.getenv("RELAXAI_API_KEY")
9
10
11# Use RelaxAI-compatible embeddings class
12embeddings_model = OpenAIEmbeddings(
13    openai_api_key=api_key,
14    openai_api_base="https://api.relax.ai/v1/",
15    model="DSE-QWen2-2b-MRL-V1"
16)
17
18
19if __name__ == "__main__":
20    embeddings = []
21    for desc in items["description"]:
22        print(f"Generating embedding for: '{desc[:30]}...'")
23        emb = embeddings_model.embed_query(desc)
24        embeddings.append(emb)
25
26
27    items["embedding"] = embeddings
28    items.to_pickle("items_embeddings.pkl")
29    print("Saved item embeddings to items_embeddings.pkl")

The above will do the following:

API call: Sends each description to the relaxAI embeddings endpoint
Error handling: resp.raise_for_status() stops execution on failures
Persistence: Saves the enriched DataFrame to items_embeddings.pkl

After running, you’ll have a pickled DataFrame with an embedding column containing high-dimensional vectors.

Step 3: Model training

Next, combine embeddings with user features to train a classifier. We’ll use logistic regression, a fast, interpretable choice. Create train_model.py:

1import pickle
2import numpy as np
3import pandas as pd
4from sklearn.linear_model import LogisticRegression
5from sklearn.preprocessing import OneHotEncoder
6from sklearn.metrics import accuracy_score
7# from dataset import train_df, test_df
8
9
10# Load item embeddings
11items_emb = pd.read_pickle("items_embeddings.pkl").set_index('item_id')
12
13
14# One-hot encode user IDs
15user_encoder = OneHotEncoder()
16user_encoder.fit(train_df[['user_id']])
17
18
19def build_features(df):
20    X_items = np.vstack([
21        items_emb.loc[iid, 'embedding'] for iid in df['item_id']
22    ])
23    X_users = user_encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense array
24    X = np.hstack([X_users, X_items])
25    y = df['interaction'].values
26    return X, y
27
28
29# Prepare data
30X_train, y_train = build_features(train_df)
31X_test, y_test = build_features(test_df)
32
33
34# Train
35model = LogisticRegression(max_iter=1000)
36model.fit(X_train, y_train)
37
38
39# Evaluate
40preds = model.predict(X_test)
41acc = accuracy_score(y_test, preds)
42print(f"Test Accuracy: {acc:.3f}")
43
44
45# Save model and encoder
46to_save = {'model': model, 'encoder': user_encoder}
47with open("recommender.pkl", "wb") as f:
48    pickle.dump(to_save, f)
49print("Model saved as recommender.pkl")

Key highlights from this will include:

One-hot vectors representing users are stacked with item embeddings to form the model’s input features
A logistic regression model learns to predict interaction probabilities
Accuracy is measured on the held-out test set to gauge performance
The trained model and encoder are persisted in recommender.pkl

Run the following and make a note of your test accuracy (e.g., 0.80); this will give you a baseline for further tuning:

1python train_model.py

Step 4: Model evaluation

Beyond plain accuracy, recommendation systems benefit from ranking metrics like Precision@K. Let’s compute Precision@3 to see how well the top 3 recommendations match true positives. Save as evaluate.py:

1import pickle
2import numpy as np
3import pandas as pd
4# from dataset import test_df
5# from embed_items import items
6
7
8# Load artifacts
9artifacts = pickle.load(open("recommender.pkl", "rb"))
10model = artifacts['model']
11encoder = artifacts['encoder']
12items = pd.read_pickle("items_embeddings.pkl").set_index('item_id')
13
14
15all_items = items.index.values
16
17
18# Compute Precision@K
19def precision_at_k(user_id, k=3):
20    # Build candidate set
21    df = pd.DataFrame({'user_id':[user_id]*len(all_items), 'item_id':all_items})
22    # Features
23    X_user = encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense array
24    X_item = np.vstack([items.loc[i,'embedding'] for i in df['item_id']])
25    X = np.hstack([X_user, X_item])
26    # Score
27    probs = model.predict_proba(X)[:,1]
28    df['score'] = probs
29    top_k = df.nlargest(k, 'score')['item_id'].values
30
31
32    # Ground truth for this user
33    true_pos = set(
34        test_df[(test_df['user_id']==user_id) &
35                (test_df['interaction']==1)]['item_id']
36    )
37    hits = sum(1 for i in top_k if i in true_pos)
38    return hits / k
39
40
41# Evaluate for each user in test set
42precisions = []
43for uid in test_df['user_id'].unique():
44    p = precision_at_k(uid)
45    print(f"User {uid} Precision@3: {p:.2f}")
46    precisions.append(p)
47
48
49print(f"Mean Precision@3: {np.mean(precisions):.2f}")

When you run the following, you’ll see per-user Precision@3 scores and the mean. This metric tells you how many of your top 3 suggestions were actually items the user engaged with in the test set:

1python evaluate.py

Step 5: Generating recommendations

Finally, let’s write a reusable function to output the top K recommendations for any user. Create recommend.py:

1import pickle
2import numpy as np
3import pandas as pd
4# from embed_items import items
5
6
7# Load model and encoder
8artifacts = pickle.load(open("recommender.pkl", "rb"))
9model = artifacts['model']
10encoder = artifacts['encoder']
11items = pd.read_pickle("items_embeddings.pkl").set_index('item_id')
12
13
14# Recommendation function
15def recommend(user_id, k=5):
16    df = pd.DataFrame({'user_id':[user_id]*len(items), 'item_id':items.index})
17    X_user = encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense array
18    X_item = np.vstack([items.loc[i,'embedding'] for i in df['item_id']])
19    X = np.hstack([X_user, X_item])
20
21
22    # Score and sort
23    probs = model.predict_proba(X)[:,1]
24    df['score'] = probs
25    top = df.nlargest(k, 'score')
26    top = top.merge(
27        items[['description']],
28        left_on='item_id',
29        right_index=True
30    )
31    return top[['item_id','description','score']]
32
33
34# Example usage
35if __name__ == "__main__":
36    user = 1
37    recs = recommend(user, k=5)
38    print(f"Top 5 recommendations for user {user}:\n", recs)

Next, run the following:

1python recommend.py

You’ll see a neat table of item IDs, descriptions, and predicted scores. These are your personalized suggestions!

Step 6: Real-world extensions

This foundational pipeline is highly adaptable. Some examples of how it can be used include:

Larger datasets: Swap synthetic data for a real interaction log (CSV, database, etc.). Use batch processing for embeddings and model training.
Advanced models: Replace logistic regression with LightFM, XGBoost, or shallow neural networks to capture complex patterns.
Online serving: Package the recommendation logic in a FastAPI or Flask endpoint, serving JSON recommendations on demand.
A/B testing: Compare different embedding models or ranking strategies to optimize for CTR or conversion.
Cold-start handling: Incorporate content-based filtering (item metadata) and user profiling to recommend to new users with little history.

Each extension requires minimal changes, well-structured code, and a clear modular design, making innovation straightforward.

Troubleshooting tips

If something doesn’t work as expected, here are some common issues and how to fix them:

Items or interaction data are missing: Ensure dataset.py has run successfully and that items and interactions are imported where needed.
API key issues: Confirm your RELAXAI_API_KEY is set in your environment. You can print os.getenv("RELAXAI_API_KEY") to double-check its loading properly.
Embedding errors: If relaxAI returns an error, verify that the endpoint and model name in OpenAIEmbeddings are correct.
Pickle file not found: Make sure embed_items.py ran and successfully created items_embeddings.pkl. Check the working directory if the file isn’t found.
Shape mismatch or key errors during training: This often happens if item_ids in interactions don’t align with those in the items dataframe. Use .set_index("item_id") carefully and verify alignment.
Low accuracy or precision scores: That’s expected in small synthetic datasets. For real use cases, try tuning the model or exploring alternatives like XGBoost or LightFM.

Print intermediate outputs or use a debugger if needed; each step is modular and can be tested in isolation.

Summary

Throughout this tutorial, you’ve learned how to build a data-driven recommendation system using relaxAI. You now have a template to deploy personalized suggestions that enhance user engagement, whether for products, articles, or any content type.

If you want to keep learning more about the capabilities of relaxAI, check out some of these additional resources:

Building a data-driven recommendation system with relaxAI

Prerequisites

Pipeline overview

Step 1: Dataset preparation

Step 2: Generating embeddings

Step 3: Model training

Step 4: Model evaluation

Step 5: Generating recommendations

Step 6: Real-world extensions

Troubleshooting tips

Summary

Further Reading

Deploy a lightweight AI utility on Civo using relaxAI and Civo Database

LLM-Powered architecture diagram generator

LLM-powered incident analysis dashboard on Civo

Deploy a lightweight AI utility on Civo using relaxAI and Civo Database

LLM-Powered architecture diagram generator

LLM-powered incident analysis dashboard on Civo

Company

Company

Public Cloud

Public Cloud

Private Cloud

Private Cloud

Civo AI

Civo AI

Solutions

Solutions

Resources

Resources

Contact

Contact

Legal

Social