Predicting Ad Click-Through Rate: Strategies For Incorporating State Dependence and Random Effects


An ad can be deemed successful if it piques audience interest enough to incite them to interact with the call-to-action. To measure how well the ad does in capturing interest, ad click-through rate (CTR) is typically used. The higher the CTR, the more successful the ad is in generating interest amongst the target audience. In addition, predicting the CTR can be helpful in setting campaign goals. The more accurate the prediction is, the better it can help advertisers set realistic expectations. This prediction can also be used to make better media buying decisions. Thus, the ability to accurately predict ad CTR is essential in mobile app advertising.

Benefits of Accurate Click Prediction

This is especially true in a real-time bidding (RTB) situation, where the bid amount is derived from the predicted probability of a click, and it is not sufficient to simply rank ads ordinally. The predicted click probability is used to determine which ad impressions to bid on and the bid amount. The accuracy of prediction not only determines the placement of the ad but also the ad performance.

There are several methods that can be used to predict click probability.

Logistic Regression

A logistic regression model, and variants thereof, is commonly used to analyze the performance of ad campaigns. It is generally a common choice for predicting the probability of a binary (yes/no) outcome based on a set of independent variables \( X \).

This model assumes that for a set of coefficients \( \beta \), the dependent variable (\( y \)) takes on the value 1 with probability given by

\( p(y = 1 \mid \beta) = \sigma(\beta^T X) \)

Represented differently, \( y \) can be specified as a Bernoulli random variable with distribution

\( p(y \mid \beta) = \text{Bernoulli}(\sigma(\beta^T X)) \)

Naive Maximum Likelihood Estimation

In the simplest case, \( \beta \) is chosen to maximize \( p(y \mid \beta) \), the probability of observing the training data using maximum-likelihood estimate (MLE). This estimation approach assumes each observation to be an independent event and the model parameters to be constants.

Anything we know about the model parameters a priori is ignored. While it is tempting to assume that we know nothing and let the model do the work, feeding prior knowledge into the model helps to minimize the generalization error.

One consequence of naive MLE is a tendency to overfit, i.e., exaggerate relatively small fluctuations in observed data. This can be mitigated by techniques such as regularization, but this leads to miscalibrated probabilities, which require additional adjustment.

It is also important to remember that the probability of success is a random variable and is influenced by various exogenous factors not included in the model. For example, the basic model excludes randomness in usage patterns that may have occurred over time. MLE only gives an average point estimate of this random variable. While this estimate is a useful measure of the central tendency, we cannot be certain that it is representative of the entire distribution for prediction purposes.

Maximum A Posteriori Estimation

A somewhat more sophisticated model estimation approach is to choose the coefficient vector \( \beta \) that maximizes the posterior probability \( p(\beta \mid y) \) using maximum a posteriori (MAP) estimate. This posterior probability is proportional to \( p(y \mid \beta) p(\beta) \).

This approach does incorporate our prior knowledge about the model parameters \( p(\beta) \) and mitigates the overfitting issue. The result, however, is still a point estimate of the probability, albeit a more informed one that includes state dependence.

Bayesian Inversion Estimation

Ideally, we would like to get a picture of the full posterior distribution, one that is aware of both state dependence and random effects. This posterior distribution is given by Bayes’ theorem

\( p(\beta \mid y) = \displaystyle\frac {p(y \mid \beta) p(\beta)}{p(y)} \)

However, in all but the simplest of cases, this model needs to rely on numerical estimation such as the Markov Chain Monte Carlo (MCMC) method. This model estimation is able to capture both prior knowledge as well as random changes to the system in a robust manner. As a result, the team can make campaign decisions that are statistically sound and more trackable.

Experiment

To illustrate the differences between the three approaches outlined above, we trained and tested three model specifications on a set of 2.5 million historical impressions spanning 23 campaigns and over 10,000 publishers.

Each model was trained on a random 75% subsample of the dataset, and then tested over the remaining 25% using average element log-loss, and an R-squared statistic computed by averaging the true and predicted CTR for each campaign. The baseline case represents a “market share” model that predicts the average CTR from the training set for every impression in the test set.

Results

The results of the analysis are given below.

  Avg. Predicted CTR Avg. Error R-squared Log-loss
Baseline 0.099200 5.01 -0.02 0.312732
Naive MLE 0.098258 4.01 0.53 0.298476
MAP 0.095153 0.71 0.67 0.221826
Bayesian Inversion 0.094219 -0.26 0.81 0.211702
Actual 0.094467 0.00 1.00 0.000000

Summary

The Naive MLE model results in the least accurate predictions. Model fit metrics improve significantly when we include state dependence in model estimation. In particular, the MAP estimate, which incorporates a shrinkage prior on the regression coefficients as a form of regularization, results in a significantly better model, which generalizes better to the test set than MLE.

The most accurate estimate, however, is the Bayesian Inversion model. The high accuracy of this estimate can be credited to the fact that the model is aware of both state dependence and random effects.

This analysis indicates that choice of model specification and estimation methodology can have a significant impact on the accuracy and robustness of the prediction. This, in turn, impacts campaign performance and the velocity of achieving optimal results. At Aarki, our data scientists are constantly experimenting to develop sophisticated machine learning algorithms that deliver the greatest ROI to our clients. Contact us at contact@aarki.com to learn how machine learning can help you today!

Photo_Igor.jpg
Igor Raush
Software Engineer


The success of a mobile ad is determined by its ability to drive audience action -- typically measured by the click-through rate (CTR). Accurate prediction of CTR resulting from campaign actions is critical to the success of any mobile app advertising campaign.

 

Topics: Machine Learning