The basic form of a model in an associational analysis will look
something like this.
You'll have an outcome, y and you'll have a key predictor, x.
And then you have maybe some potential confounder that will call z.
And then you have some independent random error that we call epsilon.
In addition to those factors,
we have a number of parameters that we wanna estimate.
So alpha here is the intercept.
It's the value of y when x and z are zero.
We have beta, which is the change in y associated with a one unit increase in x,
adjusted for z.
And then gamma is the change in y associated with a one-unit increase in z,
adjusting for x.
So, this is a linear model, and
the parameter that we're trying to estimate here is beta.
That's what tells us how our outcome changes along with our key predictor.
Now, there are other parameters, gamma and alpha, that are in the model and
we need to have them in the model for the model to work.
But we're actually not interested in those parameters.
So we sometimes call them nuisance parameters,
because we have to estimate them, but we don't actually care about their value.
So this is what I might consider to be a primary model.
It's very simple, there's a key predictor and there's only one confounder.
And so you may need to consider other things later on, but
this is sometimes good as a primary model.
On the other hand,
sometimes we'll use a primary model that doesn't have any confounders.
And then slowly add things to the mall to see how our results change.
So the example I'm going to use here is going to be an advertising campaign for
a new product.
Imagine you're selling something and you're thinking you're buying ads on
Facebook and you wanna know how effective those ads are gonna be.
So ultimately what you wanna do is sell more products and
make more money from this.
And so one thing you might do is try to pilot a one week experiment where you buy
Facebook ads for a week and see how it does.
So this is a very simple design, you might say look at the one week before
the ad campaign, the one week during the ad campaign.
And then the one week after the ad campaign just to see
how the sales numbers change while you're running the ads.
Then you could compare the total sales for the three weeks during,
the three weeks before and the three weeks after.
And see if there is any reasonable increase in the total sales.
So using this type of design and
this kind of experiment what would you expect to see?
So here's a data set that's not real, but it kind of represents the ideal scenario
for what you might see in an experiment like this.
So in the first seven days you have an average about 200 dollars per day.
Then the next 7 days,
this is during the campaign, you have an average of about $300.
And then after the campaign finishes you have an average of, again, about $200.
So it's possible to tell just from this graph, without doing anything fancy,
that the ad campaign seems to add about $100 per day to the total daily sales.
So, in this case, your primary model might be very simple,
it might look something like this.
Where you have Y, is the outcome, that would be the daily sales.
And then X is just an indicator of whether a given day fell during
the ad campaign or not.
And then still your primary interest is on the coefficient beta
which tells you how your total daily sales increases with the ad campaign in place.
So for example, the data for
this might look something like this in this table here where you have 21 days.
And you have seven days without the campaign, seven days with the campaign,
and seven days without.
I can see that the daily sales change as you go in and out of the campaign.
So it's a very basic setup, very simple, and
this is kinda what you would love to see in your data in an ideal world.
Of course, in reality, you will never see data like this.
There will always be something more complicated.
So here is a picture of what kinda more realistic
data might look like from an experiment like this.
Typically real world data are more noisy.
There are other trends in the background that are kind of messing up your
relationship.
So it makes it harder to analyze the data.
So you'll notice that in this picture there does appear
to be an increase in sales during the campaign period.
But the problem is that it seems like the increase started,
actually before the campaign even started, the sales were kind of going up.
And so you might wanna ask, well are there background trends that for
some reason increase sales over a three week period?
So it's possible that we would've seen higher sales in the product,
even without the ad campaign, just because of these background trends.
And so the question you really wanna know is,
did the ads cause an increase in sales.
Over and above whatever background trends that might of been going on that you
that you're not aware of.
So let's take our primary model,
which is just gonna be a simple model with the outcome and indicator of the campaign.
If we use that model, what you'll see is that we'll estimate beta,
the increase in the daily sales due to the ad campaign to be $44.75, okay?
Now, now let's suppose we add a background trend into our model.
So instead of the primary model where we just had the key predictor and
the outcome, we fit the following model.
Which has a quadratic trend for time, so this allows for kinda a little curvature,
and allows for kind of rising and falling of the daily total sales.
Okay, so if we use this model, and we still try to estimate beta,
what we get is that the, our estimate of beta is $39.86.
So that's somewhat less than the beta that we estimated for the primary model.