0:00

In this lesson,

Â we will illustrate why certain estimates minimize certain loss functions.

Â You work at a car dealership.

Â Your boss wants to know how many cars the dealership will sell per month.

Â An analyst who has worked with past data from your company provided you

Â a distribution that shows the probability of number of cars the dealership

Â will sell per month.

Â In Bayesian lingo, this is called the posterior distribution.

Â Here's a dot plot of that posterior.

Â Also marked on the plot are the mean, median and the mode of the distribution.

Â Your boss doesn't know any Bayesian statistics though, so he wants you to

Â report a single number for the number of cars the dealership will sell per month.

Â Suppose your single guess is 30.

Â We'll call this g in the following calculations.

Â If your loss function is l 0, that is a 01 loss, then you lose a point for

Â each value in your posterior that differs from your guess and

Â don't lose any points for values that exactly equal your guess.

Â Let's calculate what the total loss would be if your guess is 30.

Â Here, the values in the posterior distribution sorted in descending order.

Â The first value is four which is not equal to your guess or 30 so

Â the loss for that value is 1.

Â The second value is 19 also not equal to your guess of 30 and

Â the loss for that value is also 1.

Â The third value is 20 also not equal to your guess of 30 and

Â the loss for this value is also 1.

Â There's only 1, 30 in your posterior and the loss for

Â this value is 0 since it's equal to your guess.

Â The remaining values in the posterior are all different than 30 hence,

Â the loss for them are all ones as well.

Â To find the total loss, we simply sum over these individual losses in the posterior

Â distribution with 51 observations where only one of them equals our guess and

Â the remainder are different.

Â Hence, the total loss is 50.

Â Here's a visualization of the posterior distribution along with the 0-1 loss

Â calculated for

Â a series of possible guesses within the range of the posterior distribution.

Â To create this visualization of the loss function we went

Â through the process we described earlier for a guess of 30 for

Â all guesses considered, and we recorded the total loss.

Â We can see that the loss function has the lowest value when X, our guess,

Â is equal to the most frequent observation in the posterior.

Â Hence, L0 is minimize at the mode of the posterior which means that the best

Â point estimate if using the 0 win loss is the mode of the posterior.

Â 2:44

Let's consider another loss function.

Â If your loss function is L1, that is linear loss, then the total loss for

Â a guess is the sum of the absolute values of the difference between that guess and

Â each value in the posterior.

Â We can once again calculate the total loss under L1 if your guess is 30.

Â Here are the values in the posterior distribution again sorted in

Â ascending order.

Â The first value is 4, and

Â the absolute value of the difference between 4 and 30 is 26.

Â The second value is 19, and

Â the absolute value of the difference between 19 and 30 is 11.

Â The third value is 20 and the absolute value of the difference between 20 and

Â 30 is 10.

Â There's only one 30 in your posterior and the loss for

Â this value is 0 since it's equal to your guess.

Â The remaining value in the posterior are all different than 30 hence their

Â losses are different than 0.

Â To find the total loss we again simply sum over these individual losses,

Â and the total comes out to 346.

Â Here's again a visualization of the posterior distribution

Â along with a linear loss function calculated for

Â a series of possible guesses within the range of the posterior distribution.

Â To create this visualization of the loss function again we went

Â through the same process we described earlier for all of the guesses considered.

Â This time, the function has the lowest value when X is equal to

Â the median of the posterior.

Â Hence, L1 is minimized at the median of the posterior one other loss function.

Â If your loss function is L2, that is a squared loss, then the total loss for

Â a guess is the sum of the squared differences between that guess and

Â each value in the posterior.

Â We can once again calculate the total loss under L2 if your guess is 30.

Â We have the posterior distribution again, sorted in ascending order.

Â The first value is 4, and the squared difference between 4 and 30 is 676.

Â The second value is 19 the square of the difference between 19 and 30 is 121.

Â The third value is 20, and

Â the square difference between 20 and 30 is 100.

Â There's only 1 30 in your posterior, and the loss for

Â this value is 0 since it's equal to your guess.

Â The remaining values in the posterior are again all different than 30,

Â hence their losses are all different than 0.

Â To find the total loss, we simply sum over these individual losses again and

Â the total loss comes out to 3,732.

Â We have the visualization of the posterior distribution.

Â Again, this time along with the squared loss function calculated for a possible

Â serious of possible guesses within the range of the posterior distribution.

Â Creating this visualization had the same steps.

Â Go through the same process described earlier for a guess of 30,

Â for all guesses considered, and record the total loss.

Â This time,

Â the function has the lowest value when X is equal to the mean of the posterior.

Â Hence, L2 is minimized at the mean of the posterior distribution.

Â In summary, in this lesson we illustrated that the 0, 1 loss,

Â L0 is minimized at the mode of the posterior distribution.

Â The linear loss L1 is minimized at the median of the posterior distribution and

Â the squared loss L2 is minimized at the mean of the posterior distribution.

Â Going back to the original question.

Â The point estimate to report to your boss about the number of cars the dealership

Â will sell per month depends on your loss function.

Â In any case, you would choose to report the estimate that minimizes the loss.

Â