0:00

Okay, so want to take a lot at, at a structural model and fitting structural

models if, of network formation, I, and that combine aspects of both strategic

formation and chance meetings. And, the idea here is that, you know we

can build these models to explore the fact that in a lot of settings there's

going to be some choice involved but also some chance involved and we might want to

estimate some things like relative roles...

And you know, the the random models can be too extreme, the strategic models can

be too extreme. We seen the beginnings in terms of the

exponential random graph models of ways to combine some of these things but we

can also in particular instances fit models that are more precise to the

setting involved and more directed at asking a very specific question.

And so for instance, let's ask a question of, when we see homophily how much of

that was due to the choices of the individuals and how much of that was due

to the fact that you're more likely just to be meeting individuals of your own

type rather than choosing to interact with individuals of your own type...

So if we want to ask a question like that.

Can we build a simple model to address that?

And so here, what I want to emphasize is really the techniques for doing this

rather than a specific model. So this is going to be a very specific

and stylus model. But what the what I want to do is just

illustrate that you can use, you can do similar things to where you build what

you think is the right model for a particular application.

And then use that to generate networks. Look at the networks that come out.

Try and match them up with the data and that will allow you to fit parameters to

the model that best match the data and then do statistical tests to see whether,

you know, certain things are really going on.

how much choice is really going on. how much chance is really there.

How much noise is in the data and so on, so forth.

So that's the idea here. And so I want to emphasis basically an

approach rather then taking so seriously the specifics of this particular model.

It's more as an illustatration or an example then as as to be taken seriously

as the model. So in terms of application of homophily

Let's suppose that we've got two types of, two groups, Group A and Group B and

they form fewer cross say race relationships than would be expected

given their population mix. So if we go back to our add health data

and look at one of those high schools. And we see that, then we see a

segregation by race. We could ask is, is this due to

structure? So maybe they just don't meet each other

very often. They don't meet each other very often

because In the school, there's certain kinds of structural patterns in terms of

the way the courses are organized or the way that people will take extra circular

that don't allow for many meetings between different races, or is it due

maybe to the preferences of group A or the preferences of group B or both of

their preferences and so forth... So can we begin to sort these things out.

So, I'm going to just take a look at, at a couple of papers the techniques from a

couple of papers that are with Sergio Quarini and Paulo Pin, from nine 2009 and

10. And what we'll do is just, we'll specify

how much utility a given individual gets as a function of the friendships they

have. And then we'll allow a meeting process

that has randomness in terms of who you're going to meet.

And we'll allow this, both the utilities and the meeting process to depend on your

type, so in this case, say your race or your gender, or your age, or your

profession. Whatever, whatever it might be.

and then begin to see what comes out of that and, and try and match up the

parameters to the to the data. Okay, so let me say a little bit about

the idea here and, and, you know, when we're, when we're thinking about trying

to estimate strategic formation models Generally, what we end up seeing is, is

the result of some choices that were made.

And there's something that's known as revealed preference theory in economics.

Which refers to the fact that you know, we might see say a consumer buying

certain products. And then, based on the fact that they

bought one product at a given price and not another product at a given price.

We begin to try and infer what their preferences over different product

attributes are. So what do they really want if they ended

up buying something and not buying something else?

Okay, and so here what we'll do is we'll be basically inferring preferences by

saying, okay, this person formed these friendships and not the other, another

set of friendships. That gives us some insight into what

their preferences might be. Why did they form these friendships and

not those? Well, it tells us something about the,

what they preferred to form in terms of friendships now again that could be due

to what they have available. And just as in consumer theory you might

have a budget which says okay look these were the things I could've afforded, and

I bought this and not that. here what we're going to have to do is

just sort of infer what are the, what is the rate at which you had opportunities

to form different types of friendships? And so, the chance part is going to be

fitting what were the opportunities that were coming along and then what choices

were made as a function of those, and that'll give us information about what's

actually the, the preferences and, and what were, were the relative

opportunities that they had. So that's the idea.

One thing to emphasize here is this gives us, say, a different kind of look at

things than just direct surveys. So you might, for instance, ask people,

what's your attitude on race or would you like to form friendships across races and

so forth. And the difficulty with asking people

directly is that people often answer in ways that aren't necessarily congruent

with the choices that they make. So this takes seriously what did you

actually do, not what you would say on a survey.

And sometimes there can be differences about this and so this is a different way

of sort of measuring attitudes towards, you know things like race, or gender, or

age, or whatever it might be in that particular context.

Okay, so a simple model. what we're going to have is some set of

types 1 through k, so this might be ethnicity, it might be the, the age of

the individual, it might be a combination of their age, their religion, their

gender, etc. and what we'll have is a very simple

model in terms of the preferences that people have.

So, this is going to be a simple independent link formation model.

So, it's going to be simple in that dimension.

It's not going to be trying to recreate richer parts of the network but it's

going to allow to separate out some of the preference aspects from, from some

other aspects. And so what people value, is they care

about how many same-type friendships they have, and how many different-type

friendships they have. So, really simple model.

You just care about how many friendships do I have with t-, people that look like

me, how many friendships that I have of people that are of a different type, and

I get some benefit from just that. Okay, so very, the simplest possible

formulation you can imagine. And in particular, what you get in terms

of utility, is then some number of, of same and different type friendships

weighted by a parameter, gamma i, where gamma i is capturing how much do you

weight a different friendship compared to a same type friendship, okay?

So, it's a preference bias. If this was 1, then all I care about is

the total number of friendships. I don't care what their mix is.

If this is bigger than 1, then I actually care for diversity.

I care more to have friendships with other types than same types.

If it's less than 1, then I get a higher benefit from same-type friendships than

different-type friendships, right? So.

Gama i is going to be the critical perimeter.

In terms of representing preference bias. And then we also have this other

perimeter. Alpha.

And what is Alpha going to keep track of. Alpha, is going to be generally less than

one. Is going to be some diminishing returns

to friendships. So my first friendship might be very

valuable to me. My second one additional value and so

forth. By the time I get to my 10th 12th

etcetera these friendships are becoming less valuable and so the fact that alpha

might be less than one would give a concave function.

So as you look at at the utility as a function of total numbers the utility's

going to tend to be concave if alpha is less than 1.

So we've got a situation where, as alpha's less than 1, then we've got

curvature in that utility function. Okay.

so let's let t i be the total number of, of friendships that we're forming.

And, so basically, people are socializing, they have an opportunity to

form friendships. They meet people of different types, and,

in this model let's let qi be the fraction of own types that you're going

to meet and be able to form friends with. And then 1 minus qi is the relative

number of other types that you're going to form.

And so if you spend, if TI is the total number of friends that you form, then the

relative number, this is going to be your S size, is going to be, the fraction that

were of same type and, the DI is going to be the fraction that were different type,

times your total friendship. And so here in this model is a very

simple model I just have opportunities coming and the cost is just going to be

its going to be costly for me to form some number of friendships I'll cut off

that total number but then the mix I get is just going to depend on the relative

meeting rate. So I, I meet people at some rates, and I

take whatever friendships come, but it's expensive for me to form friendships, and

so after some time period I stop socializing or trying to find new

friends. Okay?

So ti is just going to maximize, it's going to be a maximizer of this overall

utility function. Where you've got, same type firend,

different type friend, and so forth. And the rate at which the come is Qi for

same type, 1 minus Qi for different types.

And that's going to be coming out of the random part of the process.

And right now, then what we could do, is say, we can figure out if we knew what

gamma was, and was alpha was, and what Q is, and C, and so forth We could solve

this function and say, how many total friendships would a given individual like

to have? And how would that depend on those

relative parameters, okay? Okay, so to maximizes this function.

If you solve that, you can get an expression for what ti is in terms of the

over the other parameters. So, maximizing that function.

take the derivative with respect to TI, set it equal to zero.

Alpha's less than one. This is necessary and sufficient for the

solution. and then we'll also add some noise to the

given decision. So it might be that a given individual

for whatever reason has more or fewer opportunities or more or less values.

So they're going to, we're just going to add noise in terms of the, the

friendships that have given individual forms so a person a of type i is going to

have an extra error term, epsilon a and so the total number of friendships of any

given individual forms is just going to be some noisy thing about this solution.

Okay. So very simple model in terms of the the

formation. But now we can see if we write down a

simple model of How much utility you get from some aspect of, of the network.

we maximize that. And we get a solution for what.

How many in this case. What degree.

So we can think of this really as the degree of aging eye.

This is the degree that they would like to have.

In terms of this model. and then what they end up with is some

noisy variation on what they would like to have.

given the parameters of the model. Okay so how do we actually identity the

parameters from the data. so what we can do is in the data we'll

actually observe the ti's the tai's so we'll see, how many we'll use the add

health data. So when we look at the actual networks of

friendships in these high schools we can see how many friendships did each

individual form. And so we observe this directly in the

data, and that's going to vary with the qi's, so as a function of qi, the

function of the alphas, the gamma i and so forth.

That gives us a tai. And so one thing to notice, is that when

we look at this expression for tai, This is increasing in q i if gamma is less

than one. Right?

So, if gamma is less than one, then you've got a plus one q i and then you've

got minus gamma. So, you've got q i Times 1 minus gamma i,

in here. And so if gamma i is less than 1, then

you've got a positive expression for t a, t is a function of q.

So, more of my if the fraction of people I'm meeting is more of my own type, I

should form more friendships And so that's what's going to allow us to begin

to fit what gamma i is, right? So the idea is t i should be a function

of q i, and how quickly it varies with q i is going to be dependent on what gamma

i is. Okay?

And, in particular, if you actually look, this is a picture of the add health data.

So these high schools here there are 84 schools.

And each dot here represents a certain race.

Group within a particular school. So for instance, this dot here is a group

of white students that formed. So it was a particular school, and in

that school the white students formed about a little over between 60 and 65% of

the population. this school right here, this is a, a

group of black students in a high school where their groups' size is a fraction of

the school was just below, between 20 and 30 percent, a little closer to 30 percent

and this then tells us on average how many friendships did they form?

They formed on average about eight friendships.

This group formed on average you know about three and a little bit of change,

and so forth. What we see here is that indeed if you do

just the slope between here, you see that the slope is 2.3.

So, there's an increase as a function of your group size.

So the more prevalent your group is in the population ,that's going to lead to

higher qis. And indeed, we see that there's a higher

group of friends as, higher friendship, as a function of the the size of the

group. So the easier it is to meet your own type

the more friends different groups are forming, and so we will be able to

actually identify that gamma perimeter from this data, and in particular when

you look at this thing, you know the slope here is 2.3, the T statistic on

that is 7.3, so... You're you're quite a number of standard

deviations away from from zero so so we're actually seeing a highly

significant slope here. So we will be able to identify the fact

that groups that have higher proportions are forming more friendships which would

indicate that they're getting higher utility under this model.

and we can then estimate what the gammas are based on that.

And in particular I'll just sort of you know, show you the best fit lines.

If you look at the best fit lines for different parameters, you'll for

different races you'll end up seeing different slopes, and that'll allow us to

back out gamma-ise the gammas for different Races, because each one of them

has, having a different relationship between how big their size is, their

group size, and then how many friendships they are forming.

Okay. So the last part of the puzzle in terms

of figuring out the randomness in this kind of model is where do the Qis come

from. So we've got the, how many friendships

each... Person would want to form as a function

of the parameters of the utility function and the rate at which they meet different

individuals, but now we want to ask, what's the rate at which they meet

different individuals? Okay, and the important thing here is

that the rate at which they're going to meet different individuals is going to

depend on the decisions of the other agents.

Agents, okay? So if everybody was trying to form the

same number of friendships, and we're just sort of mixing in the population,

then if my group formed 30% of the population and some other group formed

70% of the population, then I would meet my own group at, at a rate 30%.

And I would meet agents of different types at a rate 70%.

But if, the other types, imagine if the other types are actually trying to form,

they form many more friendships. They're spending more time circulating

and mixing, then they're going to be easier to meet and my type is going to be

relatively less likely to meet, and so it's not just...

a function of the relative sizes, it's also a function of how many friendships

different groups are trying to form. And so we need to solve this overall as

an equilibrium given that the, the, that the t's are going to be determined by

these relative rating, the q's, and the q's are going to be determined also by

the actual decisions of the agents. So in particular let's think of the

meeting process and we'll think of this as a giant party.

So we can think of this like a cocktail party.

So let's think of a different given individual say is a green agent.

And this green agent is bouncing around in a party where there are green agents

and red agents. And so what's going to happen.

imagine that the incoming proportion of reds is 80%.

And green's it is 20%. But if the red's spend more time, trying

to form friendships. And are generally forming more

friendships. It's going to be easier to form

friendships with red's then green's. And so even though it say let's say .8.

0.2 coming in. The mixture in here could be, say, 90%

10%, or, or even more skewed than that if the reds are spending, say, twice as much

time in the, in this party than, than the greens are.

So, the rate at which they come in and, and go out, is not necessarily going to

be the same as what the relative stock of people is, if the greens are exiting much

more rapidly than the reds are. So, as we go through this process then,

you know, this group, given green node bounces in to somebody, meets one

friendship, meets two. Three, four, so it's got three red

friends and one green friend, and it decides, okay, that's enough, I'm

satiated. You know, I formed four friendships, and

that's enough for me. And then it decides to exit.

a red might find this to be, if, if gamma i is less than 1, then reds are meeting

reds at a higher rate. They might want to stay longer.

And that's basically the idea of the model.

Ok? So we've got this q i is the rate at

which i meets i. one minus q i the rate at which you meet

the different types and the way in which this is going to be modeled is the q i

the rate at which you meet your own type in terms of this process Is going to be

dependent on the stock, how many of those individuals are actually in a room.

But will also allow this to be biased, so that even when I'm in the room, it might

be that that I'm biased in terms of meeting my own type.

So maybe I'm in this large room, but I actually look for greens and try and find

greens. In which case I'm going to meet greens at

a faster rate than actually they're, they're in the room.

And so, if beta i is exactly equal to 1, then the rate at which I meet people is

just, what's the stock of these people in this party.

20:18

If beta is greater than 1, then you're going to meet your own types, at a rate

faster than than you would just milling around.

You're actually going to meet your own types.

at a, at a faster rate so, so this particular formulation says the weight at

which you're going to meet people is dependant first of all on how many people

are in this party and then also can be skewed by this extra parameter which

represents some viscosity in this meeting process so own types are going to tend

to, to meet own types. So this is going to be the bias in

meetings, right? This is going to be the parameter beta i,

where beta i greater than one means that you're meeting your own type at a rate of

above what you should be meeting them relative to how they're mixing in this

population setting, okay? So we've got Qi equals to the, equal to

the, this stock thing, so if, if, if I was 50% of the population and beta was 1,

then I would meet my own type at a rate of 1 out of 2.

If we set beta to 2, then the, the chance I would meet my own type to be about 71%

And if beta was as high as seven then, you know, the chance that I would meet my

own type would be, would be about 91%. So, as you begin to, you know, this would

all be with, with a half, and, and sticking in whatever my relative size is,

what this does is sort of buy us this relative rate at which I'm going to make

own type friendships compared to other type friendships.

Relative to what the mixing, the total number of friendships are, where the

stock is going to be just you know, based on the sum of the ti's from my type,

compared to the sum of the tj's overall. The, the j's right?

So it's keeping track of sort of what's the relatives size of meeting the

population compared to the others, and then we raise that to some power.

Okay, so what does this all work out to be?

Then we've got the ti maximizing this function.

The stocks are going to be relative to the relative number of meetings that

different groups want, rated by their relative sizes.

And then the meetings are going to be determined by what the stock is raised to

these bias parameters. And so, if the b-, if the s- The, the,

the fact that the stocks have to add up to one.

tells us that we have a balance equation. In terms of what these qi's have to look

like. When you sum across the i's.

The qi race to the beta eyes, have to equal one.

Okay, so what we end up with in terms of having balance on the meetings that the,

you know, if one group's meeting the other groups at a certain rate, they have

to match up, that's going to give us an equation which will help us solve for

this beta i parameter. So we're going to be able to solve for

the beta i parameter from this. Okay, so a simple model where we maximize

utilities, we have a meeting process, we estimate the meeting process, we put all

these pieces together, and we'll be able to estimate both the beta i is from here,

and the gamma i s from here, and then see what it looks like in the data.

Okay, so we've got these two conditions. This, maximizing this, this will help us

identify the, these parameters. we've got this, which will help us

identify these parameters. The qi's we'll actually observe in the

data, so what's the relative proportion of own type friendships to other type

friendships for each group? So we've basically can identify these

perimeters by fitting this model to data. the only perimeter we got left .Is, we've

still got this cost of forming friendships.

That we don't know exactly what that is. and so when we look at the equations that

we have For the t's and the betas I'm putting in some errors.

Then what do we end up with? We end up with these two equations we

have to fit. We've still got this c out here.

and so, what we can do is when we look at you know, solving this out For two

different groups we can say that the, the relative weight at which is should be

forming friendships compared to js forming friendships including the errors

should be a ratio here, where now this ratio is going to divide the c out.

So the c we can factor out. By just looking at relative numbers of

friendships because the c scales everybodys friendships up or down.

And so if we look at relatives numbers of friendships formed by one group compared

to another then that factors out the c and then we don't have to estimate the c

directly, we can just estimate the alphas and gammas.

Right. So basically what that tells us is that

the we're going to end up with ti minus tj equaling some error, and now we, and

this is cross multiplying. so we end up with an expression which no

longer has a season because we're comparing relative ts to each other

rather than absolute ti. so that is one way of just factoring out

one of the parameters. Okay, so that's a technical detail in

terms of estimation, which will make our life a little easier.

Now we just have three perimeters to estimate.

We estimate Alpha, the Gamma Is, and the Beta Is.

Alright. So these are the parameters that are

left, and we factored out that, that C parameters.

Okay. Fitting technique, very simple...

What we'll do is, we'll just build a grid of Alphas, Beta i's, Gamma i's.

So, we've got a grid over all these things for each network and each school

and each specification of biases. We can see what's the actual number of

total friendships that would be predicted for each group What's the realized number

and so we can calculate an error in terms of how, how big the error is in, in total

friendships compared to what it was.What's the error in terms of actual

group relative group meeting rates, the qi's.

So we can, we, these predict ti's and qi's Right?

26:39

So for each one of these it predicts ti's and qis, and then we can look at what the

actual ones in the data are. And, sum the squared errors across all

the networks. So for each one of these we're going to

have say if we have four races, we'll have four sets of tis, four sets of qis.

And we can sum the squared errors, for each school we'll have a set of eight

errors, sum all those up, and then choose the biases, to minimize the weighted sum

of the squared errors, okay? So we, we just choose those things to

minimize these. Okay, so what do you get when you fit

this? So you can go through there's actually

five categories of students because there's the Asians, blacks, Hispanics,

whites, and there's also some that are miscoded or, or didn't indicate race.

Okay, so we have some others, and then we have the fit.

Alpha comes out to be about 0.55, so roughly like a square root in terms of

diminishing returns. When we look at gammas, what do we get?

We get Asian, they get different type friendships are worth about 0.9 of same

type friendship. Blacks 0.55, Hispanics 0.65, white's

0.75. So we get different fits of that

parameter, all of them are less than 1, but they're varying in terms of at what

rate they would like to form or they get a value of different type friendship

compared to same type friendship. And then the second thing that we have

are these beta parameters. And the beta parameters indicate that for

asians and blacks we're seeing a high rate of bias towards meeting owned types.

so a good portion of the bias that they actually observe is actually due to the

fact taht they're meeting themselves at a much higher rate.

whereas for Hispanics and whites, these parameters are much lower and in fact the

Whites see a mixing rate which is roughly 1, Hispanics about 2.5 and then Asians

and blacks a factor of 7 higher. Now, one thing we can do is then ask you

know, are these statistically significant numbers?

Do we have any idea whether these could be, you know maybe all these numbers are

just noisily different from one and in fact the model isn't, isn't all that

different. so are these you know, truly different in

terms of some statistical sense? And what you can do, is we can test a

hypothesis. So what we could do for instance is look

at the sum of squared errors. So this is the residual sum of squared

errors. So this is the sum of squared errors that

we get, by looking say just at the preference biases.

So look all the, all of the gammas and look at the ti's that are generated, see

what's the errors that you actually see in the data.

and then say, let's suppose that we restricted all of these to be equal to 1.

So we, we've forced all of the gamma parameters to be equal to 1.

Okay, so you force those to be equal to 1 and then you do the best fit of the

model. What you end up with, you, you'd end up

with, a-alpha would drop to .2. But the error would go up to 17000

compared to 4000 when you allow these parameters to vary.

Then you can do an F test. And what this says is that this is, the F

value here is 42. The F threshold for even a 99% confidence

level is 3.3. This thing is way, I mean, the,the size

of the square areas you're getting its so much larger, its a factor of four larger,

so you are getting basically, you know, a huge amount of the error is actually

being explained by allowing these, gammas to differ.

So, if you allow the gammas to differ across race you are actually explaining a

huge amount of the error. The error blows up by a factor of four

when you, you force all of these gamma parameters to be equal to 1.

So you know, you can, you can reject, so the, the ones red here indicate that you

rejecting these things this particular hypothesis.

So they're certainly not all equal to one statistically under this particular

model. Are they all equal well the error goes up

to 61.75, if you forced them all to be equal, the best guess would be that

they're all .8. Okay.

31:10

And then you can ask, okay, is it, is it true that Asians and Blacks have the same

preference parameter bias? If you fit a model where you force those

2 things to be the same, and re-estimate the model, you know, you'd end up with a,

an estimate of alpha to be 0.7. The gammas for those two races that are

forced to be the same, the Asians and blacks will be 0.8, and then so forth.

What would the error be there? Well it would go from 4700 to 5300.

It actually has an F value of 9.93 still highly significant.

So, it looks like Asians and blacks have different parameters.

The reduction in the error is not just due to randomness.

So, using these kinds of models, you can go through and do F tests and other kinds

of statistical tests, by looking at the errors you observe under the model and

the errors that you would observe if you forced.

or if you work with some null hypothesis or some alternative hypothesis, then,

then the one the one that allows all of the parameters to be fit, that gives you,

a new set of estimations. You can compare the errors that you get

under the two and then ask whether that reduction in error came up at random or

not. A standard statistical test.

In this case an F test tells you which ones.

So you can't reject the, the hypothesis that Asians and whites have the same

preference bias. You can reject the hypothesis that Asians

and blacks have the same, and so forth. So you can go through and do, a, so

blacks and Hispanics are not distinguishable here.

but blacks and whites are distinguishable, right?

So when you look at these F tests. which ones are statistically significant,

you get certain differences you can say are statistically significant, and other

ones are, are not. Okay.

You can do the same thing for the meeting bias, you can go through and, you know,

same kind of tests. And indeed, the meeting biases are also

highly significant, so it really appears that there's bias both in preferences and

In meetings. And what this again.

What I want to emphasize here. Is not this particular model, but this

approach of. If your careful about writing down a

structural model. And you can began then, to derive

implications of that model. That model then generates certain

observed patterns. Match those patterns up with the data, so

in this case what it was generating was total number...the degrees of all the

agents and the relative fractions of friends of the different types they

should have. And then we can look at the degrees and

fractions of different friends that they have in the data, try and best match

those parameters up That gives us estimates for preference parameters an so

forth, and then we can test whether they're significant and learn something

about the relative choices that were made.

Here it appears that both choice and chance were present, if you believe the

model then it looks like people have biased preferences towards own type and

that's accounting for the fact that you're forming more friendships when

you're put in a school that has more of your own type.

and, and so you know we, we end up with estimates, there.

And the you know, the kind of thing that that allows one to do is then do analysis

where you can go to look at say, counter factuals.

What would happen in a school if we change the way in which people meet.

And so we try and eliminate that beta parameter and move that towards one.

So we want to make sure that everybody meets each other.

How much of an impact is that going to have on friendship formation.

Using this model you could begining to estimate something like that.

So it allows you to, to look at different policies or, as opposed to a policy that

tries to influence a preference parameters that would have a different

impact on, on what would happen. And so using a model like this, you can

begin to sort those things out. And so this is just an idea of one

particular model that marries strategic formation with some randomness.

Very specific model, but it's a technique that can be used much more generally.