And an a times b times c of b.

Now let's deal with the first one.

A times c of a.

A is the number of clusters that we're going to select.

C of a is the cost per cluster.

That cost component for clusters primarily consist of travel.

If we do indeed have widespread units that we've gotta go to.

And preparation costs for the sample.

Now the travel, by the way, includes not only the cost of transportation,

but also staff time while they're traveling.

And it can involve multiple visits to a cluster.

So, if we were sampling school children and sampling them in schools,

we're going to have to figure in multiple visits to the school for

such things as contacting the principal.

To see whether or not they're willing to allow us to come into the school and

do our data collection.

Possibly having to talk to the superintendent for

the school district to make a decision about The process.

Going back to the principle in identifying a list of the classrooms that are there,

and making a selection of classrooms.

Going to each of the classrooms and visiting with the teachers,

which might require a couple of visits.

Also going back and collecting data from the children which might involve,

because of illness and absences, and other kinds of things,

multiple visits from the children.

All that's buried in there.

And those costs per cluster can be considerably larger than the cost

per observation within a cluster.

That's what we're worried about.

That's a big component that can easily inflate our costs substantially.

But we've got to keep it within the constraint.

To the extent that we do more clusters,

we have less money to do things within clusters.

The second component, b times c sub b.

c sub b is the cost per observation within a cluster.

It's dominated by interviewing cost,

whether that happens to be asking questions or

providing a self administered form, or some other way of collecting the data.

And c sub b is multiplied by the number of observations,

b, that we're going to use in our selection.

And then that's multiplied across the clusters that we have.

All right, that's our cost model.

It's not actually the way costs are recorded.

We don't actually record these.

But it's a style,

an approach to cost modeling that fits in nicely with the variance model.

Because the sampling variance that we've got involves a sampling variance that

has two components, a simple random sampling variance and a design effect.

Now you'll recognize in this red formula here, in square brackets on the right-hand

side, at the end of the right-hand side are design effect.

1 + (b-1)roh.

So there our variance depends on b.

But the first term, which has the (1- f)p(1-p)/n- 1,

is written with ab to represent n and

is the number of clusters times the number of elements per cluster.

And so here now, we see our variance

model includes not only the subsample size but also the number of clusters.

We know that as a goes up that variance is going to go down.

If a goes down, we do fewer clusters, then that sampling variance could go up.

But it's more complicated than that,

because we also have b in that denominator and we have b in the design effect.

So we need some way of combining considerations for

this variance with the cost model.

Now one way we could do this is very mechanically.

We could choose alternative values of a and b.

We could calculate the variance that goes with each combination.

And calculate the cost model and build a spreadsheet where we have a column of

costs and we have a column of variances that go with them with driving a and

b levels for each row, each alternative.

Simulate what's going to happen.

And then we could watch and see as cost and

variance are going in different directions, is there a balance point where

we get to a point where we've got the smallest variance among all alternatives?

That would be fine to do.

As a matter of fact, many people do that kind of thing today because spreadsheets

are so easy to work with.

But really there was an earlier day when spreadsheets weren't as available,

or weren't available at all and one just simply solved

what's the balance between cost and error in finding an optimum.

And there is an approach here that can take a fixed cost.

Let's start with a budget.

We have so much money.

And we're now going to take that fixed money and

see what's the small sampling variance we can get by varying a and b.