How to Develop a Recommendation Engine to Predict Future Customer Purchases


How often do you challenge yourself, push your boundaries, and leave your comfort zone? At Aarki, we believe striving for professional excellence is the key to success and growth, and we’re extremely excited by our Data Scientist Anton Protopopov’s win at Retail Hero competition. 

Retail Hero is a competition for Data Science and Machine Learning enthusiasts who compete for the champion title. Here’s the challenge and Anton’s winning solution:

The Challenge

The competition challenge was to build a complete recommendation system. The competitors needed to develop a service that could respond to requests with predictions of future customer purchases and at the same time maintain a high load. Based on information about the customer and their purchase history, they had to build a ranked list of products that the customer would likely purchase next.

The Solution

The common way to build a recommendation engine is to use 2-stage architecture:

  1. Candidate generation: light models, such as collaborative filtering or factorization machines,  able to quickly select the most relevant candidates
  2. Ranking: heavy models, like gradient boosting or neural networks, using available information about user, item, and their interactions 


The two-stage approach allowed us to make recommendations from a large corpus of items while still being certain that the small number of items appearing were personalized and engaging for the user. Further, this design enabled blending candidates generated by other sources.

In this particular competition, there were 43,000 items and 400,000 users. All test users did not intersect with the trained ones, so the prediction should have been calculated on the fly. On the first item2item, models based on cosine similarity worked the best. Also, the user's purchase history and popular items gave some more data coverage.

Due to performance constraints and GPU absence, gradient boosting was chosen for the second level.

In addition to the handcrafted features and item2item models from the first level, UMAP 
embeddings based on transaction activity similar to item2item and co-occurrence scores were used. Co-occurrence scores were calculated based on all users' transactions and then aggregated for each item like sum, mean, max.

Co-occurrence scores calculation:

$n_{i,j}$ - number of transactions with items $i$ и $j$ together  
$n_i$ - number of transaction with $i$ item  
$w_i$ - normalized weight $i$ from user transactions  

Transaction $t$ consists of items $p_1$, $p_2$, ... $p_n$  

For each item the following scores are calculated $n_{p,p_1}$, ... $n_{p,p_n}$  
As features normalized values are used $\frac{n_{p,p_n}}{n_{p_n}}$ and aggregations on them: min, max, sum
Furthermore, those scores were weighted on $w_i$:  $w_i \frac{n_{p,p_n}}{n_{p_n}}$

LightGBM was chosen due to better performance in the offline tests among Xgboost and CatBoost. The new XE_NDCG_MART didn't offer any boost compared to the LambdaMART so the latter was used in the final submission. 


Take a close look at the code in the GitHub repository.  

At Aarki, we are constantly doing recommendation engine research for lookalike modeling to discover new users who have similar traits and behaviors as the advertisers’ current high lifetime value (LTV) users. Find out more on the process and strategy in our article.

Challenge yourself, develop your skills, and take steps towards more relevant and non-disruptive advertising. Contact us now to learn more about how programmatic and AI can deliver results. 


Topics: Machine Learning