Audience preferences are constantly changing. The more users are swamped with digital content, the more their expectations increase when choosing brands to engage with. Mobile marketers are constantly challenged to fully understand, test, and optimize creative ads to ensure that targeted users see the best possible variant. Champion/challenger testing is a proven method that allows marketers to target users with ads that align with their interests.

A basic champion/challenger strategy for optimizing and iterating on creative variants is as follows:

- Allocate a small percentage of traffic (1-2%) to a new
*challenger*creative, leaving the rest to the*champion*creative. - Collect
*N*impressions for the challenger creative, where the budget*N*is determined ahead of time. - After the test budget is exhausted, observe the KPIs for the champion and challenger creatives. If the challenger out-performs the champion, the challenger replaces the champion.

This approach has a predictable impact on top-line performance, and delivers fairly interpretable results. However, it also suffers from several disadvantages:

- It may take a long time to spend the test budget, depending on overall campaign volume.
- The challenger may out-perform the champion during the test phase, but under-perform in the future. For example, this may be true for seasonal creatives. For this reason, we must maintain some degree of exploration for the lifetime of the campaign.

### Multi-armed bandits

It is useful to frame creative A/B testing as a **multi-armed bandit** problem, a classic reinforcement learning problem which exemplifies the explore/exploit tradeoff.

The problem formulation is as follows:

- We are given
*K***arms**(i.e., creative variants), each providing a random**reward**(i.e., some advertiser KPI) from some hidden probability distribution. For simplicity, we will consider the**Bernoulli bandit**, which issues a reward of 1 with probability*θ*and a reward of 0 with probability (1 -*θ*), where*θ*may be context- and time-dependent. Here, the binary reward could indicate a click or conversion. - We must choose a
**strategy**to learn the hidden reward distributions while maximizing total reward under a budget constraint. With unlimited budget, we could randomly choose a creative until we have a sufficiently representative sample of the reward distributions, and then just exploit these distributions.

### ε-greedy strategy

An easy and interpretable strategy is to explore creatives randomly during some burn-in period, and for some fraction ε of trials afterwards. This strategy is similar to the champion/challenger approach, but allocates some budget to exploration for the duration of the campaign.

Increasing ε makes the system respond more quickly to changes in the reward distributions; however, under constant distributional assumptions, a higher ε leads to a lower total reward, as the optimal arm is chosen less frequently.

### Bayesian strategies

We can impose some Bayesian formalism on the problem and strategy. We have assumed that rewards are drawn from a Bernoulli distribution with parameter *θ*.

$$

p(y) = \text{Bernoulli}(y \mid \theta)

$$

Assuming a constant *θ*, we can assume *θ* is beta-distributed. This is a convenient choice, since the beta distribution is a conjugate prior of the Bernoulli distribution.

$$

p(\theta) = \text{Beta}(\theta \mid \alpha, \beta)

$$

In particular, after observing a reward $y \in {0, 1}$, the posterior distribution on *θ* is also beta, and its parameters can be updated according to a simple rule.

$$

p(\theta) = \text{Beta}(\theta \mid \alpha + y, \beta + 1 - y)

$$

It’s also convenient that Beta(1, 1) is uniform over [0, 1], and so is a natural choice of a non-informative prior on *θ* before we see any data.

This formalism lends way to two strategies which, at each step, work with the most current belief around *θ*.

**Upper confidence bound (UCB) strategy.**At each step, we estimate*θ*as a high quantile of the prior distribution. This strategy encourages exploring distributions with high uncertainty (i.e., wide distributions), and has a mathematically provable upper bound on total regret.**Thompson sampling.**At each step, sample*θ*from the prior distribution. This strategy does not have the same theoretical guarantees as UCB, but it is very easy to implement, and it works well in practice.

### A note on priors

In the Bayesian formulation, in the absense of data, we have chosen a uniform Beta(1, 1) prior over the reward probability, indicating our belief that all reward probabilities are equally likely.

In reality, of course, this is not the case. A Beta(1, 1) prior leads to very aggressive exploration when a new creative variant is introduced into the system. After some exploration, the distribution converges to a representative posterior fairly quickly. Nevertheless, we can imbue the prior with the domain knowledge that creative click-through rates generally vary from 1% to 20%, depending on the ad format and the creative quality. For example, choosing a Beta(1, 19) prior results in much less erratic CTR predictions for new creative variants.

*If your ad creative is catchy, relevant, and based on your user’s experience, it will grab the **user's attention and drive installs. Make sure that your app marketing budget is spent on the **right creative. Keep testing!*