Building a data-driven recommendation system with relaxAI
In this tutorial, you’ll learn how to build a robust recommendation system by combining relaxAI, a lightweight Python model, and a clear, modular pipeline.
Written by
Software Engineer @ GoCardless
Written by
Software Engineer @ GoCardless
Personalized recommendations have become the cornerstone of modern digital experiences. Whether you’re running an e-commerce store, a content platform, or conducting data science experiments, the ability to surface relevant products or articles at just the right moment can dramatically improve user engagement, satisfaction, and conversions.
According to McKinsey & Company, personalization in sales and marketing can reduce customer acquisition costs by up to 50%, lift revenues by 5% to 15%, and boost marketing ROI by 10% to 30%, making it one of the most financially impactful strategies businesses can implement.
In this tutorial, you’ll learn how to build a robust recommendation system by combining relaxAI, a lightweight Python model, and a clear, modular pipeline to handle data preparation, embedding, training, evaluation, and recommendations. By the end, you’ll have a fully working pipeline that can be adapted to your own datasets and use cases.
Prerequisites
Before diving in, ensure you have the following prerequisites in place:
- Python 3.8+ installed on your machine
- pip or conda for package management
- relaxAI API key: Sign up at relaxAI to obtain your key
- Familiarity with basic Python scripting and package installation
- A code editor of your choice (VS Code, PyCharm, etc.)
You do not need any special hardware, as this tutorial runs comfortably on a laptop.
Pipeline overview
The recommendation pipeline consists of the following stages:
- Dataset Preparation: Defining a small set of users, items (with descriptions), and interaction records (views/purchases).
- Embedding Generation: Calling relaxAI to transform each item description into a high-dimensional embedding vector.
- Model Training: Combining embeddings with user features to train a logistic regression model predicting interaction likelihoods.
- Model Evaluation: Assessing accuracy and ranking metrics (e.g., Precision@K) using a held-out test set.
- Recommendation Generation: Scoring all candidate items for a user and returning the top-K suggestions.
- Extensions: Scaling to larger datasets, swapping in advanced ranking models, or integrating into a web service.

Source: Image by author
This modular approach keeps each component focused and easy to adapt. We will now explore the six stages in more detail.
Step 1: Dataset preparation
Start by creating a lightweight synthetic dataset in Python. Save the following as dataset.py:
import pandas as pdfrom sklearn.model_selection import train_test_split# Sample items with textual descriptionsitems = pd.DataFrame({'item_id': [101, 102, 103, 104, 105, 106],'description': ["Wireless noise-cancelling headphones with 30h battery life","Stainless steel travel mug, 16oz, spill-proof lid","Organic dark roast coffee beans, 1kg bag","Smartwatch with heart rate monitoring and GPS","Yoga mat with non-slip surface, eco-friendly material","Portable Bluetooth speaker, waterproof, 10W output"]})# Simulated user–item interactionsinteractions = pd.DataFrame([{'user_id': 1, 'item_id': 101, 'interaction': 1},{'user_id': 1, 'item_id': 103, 'interaction': 1},{'user_id': 2, 'item_id': 102, 'interaction': 1},{'user_id': 2, 'item_id': 105, 'interaction': 1},{'user_id': 3, 'item_id': 104, 'interaction': 1},{'user_id': 3, 'item_id': 106, 'interaction': 1},# Negative samples{'user_id': 1, 'item_id': 102, 'interaction': 0},{'user_id': 1, 'item_id': 104, 'interaction': 0},{'user_id': 2, 'item_id': 103, 'interaction': 0},{'user_id': 2, 'item_id': 106, 'interaction': 0},{'user_id': 3, 'item_id': 101, 'interaction': 0},{'user_id': 3, 'item_id': 105, 'interaction': 0},])# Split for training and testingtrain_df, test_df = train_test_split(interactions,test_size=0.3,random_state=42)if __name__ == "__main__":print("Items:\n", items)print("\nTrain interactions:\n", train_df)print("\nTest interactions:\n", test_df)
The above input will create:
items: Six products with human-friendly descriptionsinteractions: Simulated positive (1) and negative (0) interactions for three users- A 70/30 train-test split to evaluate our model fairly
Next, you will need to run the following to verify your data:
python dataset.py
Step 2: Generating embeddings
To help our model understand item content, we’ll convert each product description into a numerical vector (embedding) that captures its meaning in high-dimensional space. These embeddings act as learned representations that the recommendation model can reason over.
relaxAI is fully compatible with OpenAI’s embedding API; it follows the same interface, authentication scheme, and response format. This means we don’t need a new SDK or custom integration. We can use LangChain’s OpenAIEmbeddings class directly by changing just two parameters:
Openai_api_key→ set to yourRELAXAI_API_KEYOpenai_api_base→ set tohttps://api.relax.ai/v1/
This compatibility simplifies the process significantly: anywhere you would use OpenAI’s embeddings, you can swap in relaxAI with minimal changes. Start by creating embed_items.py:
import osimport pandas as pdfrom langchain_openai import OpenAIEmbeddings# from dataset import items# Ensure your RELAXAI_API_KEY is set in your environmentapi_key = os.getenv("RELAXAI_API_KEY")# Use RelaxAI-compatible embeddings classembeddings_model = OpenAIEmbeddings(openai_api_key=api_key,openai_api_base="https://api.relax.ai/v1/",model="DSE-QWen2-2b-MRL-V1")if __name__ == "__main__":embeddings = []for desc in items["description"]:print(f"Generating embedding for: '{desc[:30]}...'")emb = embeddings_model.embed_query(desc)embeddings.append(emb)items["embedding"] = embeddingsitems.to_pickle("items_embeddings.pkl")print("Saved item embeddings to items_embeddings.pkl")
The above will do the following:
- API call: Sends each
descriptionto the relaxAI embeddings endpoint - Error handling:
resp.raise_for_status()stops execution on failures - Persistence: Saves the enriched DataFrame to
items_embeddings.pkl
After running, you’ll have a pickled DataFrame with an embedding column containing high-dimensional vectors.
Step 3: Model training
Next, combine embeddings with user features to train a classifier. We’ll use logistic regression, a fast, interpretable choice. Create train_model.py:
import pickleimport numpy as npimport pandas as pdfrom sklearn.linear_model import LogisticRegressionfrom sklearn.preprocessing import OneHotEncoderfrom sklearn.metrics import accuracy_score# from dataset import train_df, test_df# Load item embeddingsitems_emb = pd.read_pickle("items_embeddings.pkl").set_index('item_id')# One-hot encode user IDsuser_encoder = OneHotEncoder()user_encoder.fit(train_df[['user_id']])def build_features(df):X_items = np.vstack([items_emb.loc[iid, 'embedding'] for iid in df['item_id']])X_users = user_encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense arrayX = np.hstack([X_users, X_items])y = df['interaction'].valuesreturn X, y# Prepare dataX_train, y_train = build_features(train_df)X_test, y_test = build_features(test_df)# Trainmodel = LogisticRegression(max_iter=1000)model.fit(X_train, y_train)# Evaluatepreds = model.predict(X_test)acc = accuracy_score(y_test, preds)print(f"Test Accuracy: {acc:.3f}")# Save model and encoderto_save = {'model': model, 'encoder': user_encoder}with open("recommender.pkl", "wb") as f:pickle.dump(to_save, f)print("Model saved as recommender.pkl")
Key highlights from this will include:
- One-hot vectors representing users are stacked with item embeddings to form the model’s input features
- A logistic regression model learns to predict interaction probabilities
- Accuracy is measured on the held-out test set to gauge performance
- The trained model and encoder are persisted in
recommender.pkl
Run the following and make a note of your test accuracy (e.g., 0.80); this will give you a baseline for further tuning:
python train_model.py
Step 4: Model evaluation
Beyond plain accuracy, recommendation systems benefit from ranking metrics like Precision@K. Let’s compute Precision@3 to see how well the top 3 recommendations match true positives. Save as evaluate.py:
import pickleimport numpy as npimport pandas as pd# from dataset import test_df# from embed_items import items# Load artifactsartifacts = pickle.load(open("recommender.pkl", "rb"))model = artifacts['model']encoder = artifacts['encoder']items = pd.read_pickle("items_embeddings.pkl").set_index('item_id')all_items = items.index.values# Compute Precision@Kdef precision_at_k(user_id, k=3):# Build candidate setdf = pd.DataFrame({'user_id':[user_id]*len(all_items), 'item_id':all_items})# FeaturesX_user = encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense arrayX_item = np.vstack([items.loc[i,'embedding'] for i in df['item_id']])X = np.hstack([X_user, X_item])# Scoreprobs = model.predict_proba(X)[:,1]df['score'] = probstop_k = df.nlargest(k, 'score')['item_id'].values# Ground truth for this usertrue_pos = set(test_df[(test_df['user_id']==user_id) &(test_df['interaction']==1)]['item_id'])hits = sum(1 for i in top_k if i in true_pos)return hits / k# Evaluate for each user in test setprecisions = []for uid in test_df['user_id'].unique():p = precision_at_k(uid)print(f"User {uid} Precision@3: {p:.2f}")precisions.append(p)print(f"Mean Precision@3: {np.mean(precisions):.2f}")
When you run the following, you’ll see per-user Precision@3 scores and the mean. This metric tells you how many of your top 3 suggestions were actually items the user engaged with in the test set:
python evaluate.py
Step 5: Generating recommendations
Finally, let’s write a reusable function to output the top K recommendations for any user. Create recommend.py:
import pickleimport numpy as npimport pandas as pd# from embed_items import items# Load model and encoderartifacts = pickle.load(open("recommender.pkl", "rb"))model = artifacts['model']encoder = artifacts['encoder']items = pd.read_pickle("items_embeddings.pkl").set_index('item_id')# Recommendation functiondef recommend(user_id, k=5):df = pd.DataFrame({'user_id':[user_id]*len(items), 'item_id':items.index})X_user = encoder.transform(df[['user_id']]).toarray() # Convert sparse matrix to dense arrayX_item = np.vstack([items.loc[i,'embedding'] for i in df['item_id']])X = np.hstack([X_user, X_item])# Score and sortprobs = model.predict_proba(X)[:,1]df['score'] = probstop = df.nlargest(k, 'score')top = top.merge(items[['description']],left_on='item_id',right_index=True)return top[['item_id','description','score']]# Example usageif __name__ == "__main__":user = 1recs = recommend(user, k=5)print(f"Top 5 recommendations for user {user}:\n", recs)
Next, run the following:
python recommend.py
You’ll see a neat table of item IDs, descriptions, and predicted scores. These are your personalized suggestions!

Source: Image by author
Step 6: Real-world extensions
This foundational pipeline is highly adaptable. Some examples of how it can be used include:
- Larger datasets: Swap synthetic data for a real interaction log (CSV, database, etc.). Use batch processing for embeddings and model training.
- Advanced models: Replace logistic regression with LightFM, XGBoost, or shallow neural networks to capture complex patterns.
- Online serving: Package the recommendation logic in a FastAPI or Flask endpoint, serving JSON recommendations on demand.
- A/B testing: Compare different embedding models or ranking strategies to optimize for CTR or conversion.
- Cold-start handling: Incorporate content-based filtering (item metadata) and user profiling to recommend to new users with little history.
Each extension requires minimal changes, well-structured code, and a clear modular design, making innovation straightforward.
Troubleshooting tips
If something doesn’t work as expected, here are some common issues and how to fix them:
- Items or interaction data are missing: Ensure
dataset.pyhas run successfully and thatitemsandinteractionsare imported where needed. - API key issues: Confirm your
RELAXAI_API_KEYis set in your environment. You can printos.getenv("RELAXAI_API_KEY")to double-check its loading properly. - Embedding errors: If relaxAI returns an error, verify that the endpoint and model name in
OpenAIEmbeddingsare correct. - Pickle file not found: Make sure
embed_items.pyran and successfully createditems_embeddings.pkl. Check the working directory if the file isn’t found. - Shape mismatch or key errors during training: This often happens if
item_ids ininteractionsdon’t align with those in theitemsdataframe. Use.set_index("item_id")carefully and verify alignment. - Low accuracy or precision scores: That’s expected in small synthetic datasets. For real use cases, try tuning the model or exploring alternatives like XGBoost or LightFM.
Print intermediate outputs or use a debugger if needed; each step is modular and can be tested in isolation.
Summary
Throughout this tutorial, you’ve learned how to build a data-driven recommendation system using relaxAI. You now have a template to deploy personalized suggestions that enhance user engagement, whether for products, articles, or any content type.
If you want to keep learning more about the capabilities of relaxAI, check out some of these additional resources:

Software Engineer @ GoCardless
Mostafa Ibrahim is a software engineer and technical writer specializing in developer-focused content for SaaS and AI platforms. He currently works as a Software Engineer at GoCardless, contributing to production systems and scalable payment infrastructure.
Alongside his engineering work, Mostafa has written more than 200 technical articles reaching over 500,000 readers. His content covers topics including Kubernetes deployments, AI infrastructure, authentication systems, and retrieval-augmented generation (RAG) architectures.
Share this article
Further Reading
5 March 2026
LLM-Powered architecture diagram generator
27 November 2025