I recently went to a NY-style pizza shop near my house and saw that the owner has about 5 pizza pies on the showcase at the front of the shop. My first thought as soon as I saw the showcase was that this was only useful if the shop has a high turnaround time for pizzas, which didn’t look like the case for this shop. My hypotheses is that the pie showcase is not beneficial to the shop (total amount of pizza sold is the success metric), since most of the times I have been to the shop, there were stale pizza pies in the showcase. I wanted to devise a way to prove (or disprove) my hypotheses.
I had a conversation with the pizza shop owner, and he said that this is the only shop he owns and hence, we cannot use the traditional experiments due to violation of SUTVA. Hence, we will use a Switch-Back design. As mentioned above, I wanted to define a non-average success metric and hence, I chose total dollar amount of pizza sold as my success metric (we will call this a North- Star metric). We will also define a few auxiliary metrics to live track the progress of our experiments since north-star metrics can take a significant amount of time before we can infer anything statistically significant. In our case, we will choose percentage of total users who entered the shop and bought the pizza and average dollar amount per customer. We will also define guardrail metrics which will help us stop the experiments if it starts affecting the bottom line of the owner’s business. In our case, it will be the percentage of daily consumers who are returning users. If the percentage starts falling steeply, we will know it’s time to stop the experiment before it has any permanent damage to the business.
Now that we have our non-average north star metric in the form of total dollar amount of pizzas sold, we will define the experimentation process. We will conduct the experiment by randomly switching between showcase (control) and no showcase (treatment). Ideally, our time-period for either control or treatment should be less than a day so that we switch multiple times a day to reduce the variation in our estimators but due to practical constraints we will switch as the start of the day, randomly. This makes 1 day as our time-period. Although, this also will lead to higher number of days for which the experiments must go on to get a statistically significant result. We have to make sure that we let the experiment run its course so that we nullify the effect of any seasonal factors or any unique event that might have happened during the course of experimentation. Another benefit of switching back and forth on the same experimental unit, this should compensate for the effects of interference (spillovers).
I believe the above experimental design setup will help us remedy most of the issues that may have affected the validity of the result of A/B testing in this application and help me prove or disprove my hypotheses.
I believe the above experimental design setup will help us remedy most of the issues that may have affected the validity of the result of A/B testing in this application and help me prove or disprove my hypotheses.