0:00

Let's answer the first question first, about Structural small worlds.

Â Let's use symbol L to denote the median of shortest path between node pairs.

Â So, if I got node pair i,j and the shortest path distance between them is dij

Â and then I look at all the node pairs and the corresponding shortest path distance.

Â Histogram it and then I get L as the median.

Â And I want L to be small. And of course, L will be small if there

Â are only a few nodes, So really, I want to say L is small

Â relative to the number of nodes, say n, And, more precisely, I want L to grow like

Â a slow function of a number of nodes. How slow?

Â For example, logarithmic function, Then even for exponential growth after

Â taking the log, it become just linear. So what we now want to do is to reverse

Â engineer and we'll comment on this, the utility and pitfalls of reverse

Â engineering, topology or functionality of a network towards the end of the lecture.

Â Want to reverse engineer and build an explanatory model to explain the

Â observation like what has do. To explain that observation this time

Â about the social network that L is often a slow function the number of nodes.

Â But whatever explanatory of model we have, it can be just a tree growing like this

Â twenty to the power of six, cuz we know that violates another observation

Â empirically made in social network, That is there's a lot of transitivity

Â quantified for example by a clustering coefficient of a graph quantifying the

Â degree of existence of triad closures. And we're going to use C to denote this

Â metric clustering coefficient. I just want to highlight C clustering

Â coefficient of a graph has nothing to do with the clustered density p that we

Â talked about in the last lecture, In contagion model.

Â Okay? Clustering coefficient of a graph is not

Â the cluster, the density of a subset of nodes.

Â So what is clustering coefficient? Remember, we're trying to quantify the

Â following observation that in many social networks, if a, b, c forms a connected

Â triple like this, then there's a good chance that it also closes the triangle.

Â Okay, that's why it's called triad closure,

Â So, we want to look at the number of triangles versus the number of connected

Â triples. And, we want this metric to be normalized,

Â so it always lies between zero and one for all graphs.

Â In particular, if it is a triangle, then we want the C to be one.

Â In order to accomplish this normalization we're going to divide the number of

Â connect triples by three. Why? Because of different ways we count,

Â connected triples versus the count of number of triangles.

Â If there is a, A link here, then the triangle exists, we

Â call that triangle a, B, c. We count it only once.

Â But when we count a number of connected triples, we actually have to count a, b

Â and b, c, two links coexist, then there's this connected triple whether there's a

Â link a, c or not. If there is one, we'll also have a

Â triangle. If not, we don't.

Â But, to connect a triple, this is one connected triple.

Â And then, we can also write a, c and c, b, two links that is another connector

Â triple. And b, a link and a, c link, that's

Â another connector triple. So, just because when we count connected

Â triples, we count basically three times, so we divide it by three to take into

Â account that over-counting. And therefore now, c, d will be guaranteed

Â to lie within zero and one. For just a triangle, it will be one.

Â For just a connected triple, which is a line here, it will be zero.

Â Now the question for a bigger general graph,

Â What will C look like? We want C to be big, and L to be small for

Â our explanatory model. So, here's another graph, slightly bigger,

Â four nodes and four links. Let's see what is the clustering

Â coefficient for tri closuring this graph. Well, for this graph, this number of c is

Â actually easy to compute. Alright, C equals the number of triangle

Â over the number of connected triples over three.

Â How many triangles are there? Whether there's only one triangle, okay

Â one, two, three nodes, triangle. What about connector triples here?.

Â Well, there are let's see, how many connect triples.

Â Two, one, three, that's a connected triple.

Â Okay? One, two, three, that's another connector

Â triple. Okay? one, three, two, that's another

Â connector triple. One, two, four, That's another connector

Â triple. And three, two, four, that's another

Â connector triple. So, there are actually five connectoe

Â triples. Of course, some of them we basically

Â counted more than once, So, we have the normalization divided by

Â three. And that means the clustering coefficient

Â of this graph is three-fifths, which is a pretty big number considering that C must

Â lie between zero and one. Okay?

Â 7:26

And let's look at, if you indeed construct such a graph what would be the l and c.

Â Now l is actually small as we can intuitively see okay, because between any

Â node pairs, there is a chance of p that they are directly connected, but the

Â clustering coefficient C is also small. Okay? When we look at c now we're talking

Â about expected c and expected l because this is a probabilistic object.

Â And without going to the details there, sufficed us to show the intuition that c

Â for our, plus our random graph is the following.

Â Small c over n minus one. N

Â Is the number of nodes. Small C is the expected degree of the

Â node. Why is that?

Â Why is this the expression for the expected clustering coefficient big C for

Â a random graph? Well, because the chance, the probability

Â that two nodes are connected is C over n minus one in a random graph.

Â Okay? This is how many connections you have and this is how many nodes are out

Â there other than yourself and, therefore, the ratio is the expected number of the

Â probability of the two nodes been connected.

Â Independent of whether these two nodes are already indirectly connected by another

Â node or not, doesn't matter. That's not how the graph was constructed. So the

Â chance that these two nodes are connected is exactly just c over n minus one.

Â So, suppose we look at a graph, let's say 100 million people, and each has 1,000

Â friends. That's a lot already, much bigger than

Â Dunbar's number. Still, the clustering coefficient is only

Â on the order of ten minus three, which is way too small.

Â We know C will be much bigger than that. Orders of magnitude bigger than that in

Â the real social network. So, random graph can explain small l, but

Â not big C. What about the extreme end on the other

Â hand? A regular graph, let's say, a regular ring

Â graph. What's a regular ring graph?

Â Let's draw a ring here. Okay?

Â It's just a line with two ends connected. And, let's say, there are nodes living

Â here. Okay? In this case, there are eight nodes,

Â so n equals eight. Just like random graph is parametrized by

Â number p, that's the probability of a link appearing or not.

Â A regular ring graph is parametrized by a number of c, a small c.

Â Now this small c is not exactly what we just used expected degree.

Â Here is the actual deterministic degree c, here.

Â Okay. That's the number of neighbors each node has,

Â It's the degree of each node. Given this is a regular graph, you can

Â rotate however you want and it still looks the same.

Â This is basically the degree for all the nodes.

Â 10:53

Now, in this particular ring graph c is two because for each node there are two

Â neighbors. The degree is two for everyone, I can add

Â a few more. Say, let's make a c and even number.

Â Okay? I can also connect to this node with two hop neighbors now.

Â So there's a direct link going out over there now.

Â Okay? Similarly,

Â I'll add all these. Now in this case c is no longer two, c is

Â four, because there's one more missing, because every node has actually four

Â neighbours. I pick this node, one neighbour, two

Â neighbour, three neighbour, four neighbour.

Â Okay, don't worry too much about exactly the way the lines are drawn.

Â You can easily count that, one, two, three, four.

Â This node has four neighbours and its symmetric is regular, so every node has

Â four neighbours, degree four and eight nodes.

Â 11:58

Of course, you can make c even bigger now. If we insist the c to be even then, you

Â can make c to be six and eventually six be eight and that would also be called full

Â mesh network and every node is connected to every other node in fact in this case

Â because of even number of nodes here, you can even do that.

Â The largest and even c can get is just the six.

Â Okay? Well, so in this case, what would be c and

Â what would be l? First of all, c is now much better.

Â Hm, okay? Let's compute the clustering coefficient

Â here, c. Number of triangles and number of

Â connected triples. Turns out number of connected triples is

Â easy to compute. If you just look at each node, because

Â it's symmetric, We can just count those centered around

Â each individual node. The number of connected triples is easy to

Â choose, it's count is just c choose two. You've got c neighbors and you choose two

Â of them then you form a connected triple. It might be a triangle, might not be a

Â triangle, But still, a number of connected triples

Â centered around a node is just c over c choose two.

Â Of course, don't forget you have to divide by three for normalization.

Â Now, what about the number of triangles centered around a given node?

Â Let's think about that. Well, first to form a triangle, you need

Â to go up, okay, one step, then need to go up another step, and then you have to come

Â back, Otherwise, you don't close it and you

Â don't have a triangle. So you have to two go steps out and then

Â be able to come back in one shot. So there, how many such choices there are?

Â Well, in order to be able to come back in one hop, you can only go out c over two,

Â that far. Farther than that you won't be able to

Â come back. And out of those, you can choose two

Â because you'll be making two steps. So the number of triangles is c over two

Â choose two and the number of connect triple is c over two.

Â What is this expression? It is one-half, c over two, times c over

Â two minus one, divided by one-half, c, c minus one, over three, Which simplifies to

Â three times c minus two over four times c minus one.

Â This is the clustering coefficient quantifying triad closure, and closure, on

Â a regular ring graph. You can see that a number of nodes doesn't

Â come into play here, because this is entirely symmetric.

Â So only c matters the deterministic number of the degree here for each node matters.

Â Well, for sanity check, let's see what happens if c is two.

Â Well, we know if c is two, we're just talking about a ring.

Â There is no triangle. There's only connected triple.

Â Indeed when c is two, this becomes zero as you can see.

Â 15:18

Okay? So triad closure, doesn't exist,

Â Clustering coefficient big c is zero. What if c is four now,

Â Like this graph? Actually, the way we draw this, it's hard

Â to see, intuitively, what should a clustering coefficient be.

Â There's another way to draw the same graph,

Â Which is, you can easily verify for yourself like this.

Â These are the nodes. .

Â And now, you can clearly see the clustering coefficient should be half,

Â Because had I also joined these dotted lines, which don't actually exist here, I

Â would have actually tri closure all the tri closures. But they don't exist,

Â Only these parts exist. So, intuitively this way I've joined this

Â same picture, shows that clustering coefficients should be half.

Â And indeed, if you'll plug in c equals four here, what you'll get is that

Â clustering coefficient is one over two. Okay.

Â Now, what if c becomes real big like this small c. Okay. When this little c is like

Â infinite, then what would happen? Well then, in that case this simplifies us to

Â three over four. In other words regular ring graph, the

Â clustering coefficient can only go up to three quarters, but that's already very,

Â very big. Compared to say ten to the minus three for

Â our Poisson random graph, this is way bigger.

Â And, indeed, most likely it would be, at least c, at least four and therefore,

Â clustering coefficient is between one-half and three-fourths.

Â This is a small dynamic range, but it is of the right magnitude.

Â Good. We got c checked off, okay?

Â We got large c on the regular ring graph, Which is not too surprising, cuz with

Â enough C, you quickly start closing the triangles.

Â Now the question is, what about these l's? That's where the problem is with a regular

Â ring graph to explain small row.. It doesn't have a very small l.

Â Indeed, if you want to go from this node to that node, you actually have to at

Â least hop through here and here. But remember, this is only enabled with

Â eight nodes. When you have many more nodes ends big on

Â the regular ring, then no matter how many c you add for quite a large range of

Â degrees, you still need to hop slowly, locally.

Â You do not have a direct long range link anywhere in this graph.

Â This graph only got short range links. Hm, okay?

Â So, with only short range links, you cannot possibly have a small l.

Â