0:00

[MUSIC]

Â Hi there.

Â In this lecture, we're going to see some very interesting and

Â cool properties of networks or graphs as you have seen them.

Â So before we get started, a few definitions again.

Â What is a network or a graph?

Â A network or a graph are synonyms by the way.

Â So a network or a graph has vertices or nodes.

Â For instance, in the Facebook graph, each user is a vertex.

Â In the Internet graph, each router or switch is a vertex.

Â There are edges that connect pairs of vertices.

Â In the Facebook graph, a friend relationship between you and

Â one of your friends is an edge.

Â Similarly, in the Internet graph, the link joining to

Â switches in the Internet constitute an edge in that graph.

Â 0:50

So there are networks of graphs all around us.

Â The Internet of course is one of the largest networks that we use.

Â The World Wide Web is a network where the vertices are webpages and

Â the edges joining them are links that point from one web page to another.

Â This is often called the directed graph as the edges are uni directional.

Â They're also social networks such as Facebook, Twitter and LinkedIn,

Â which are all networks where the users are vertices, and the friend relationships or

Â the connections are edges.

Â And of course a biological graph such as DNA interaction graphs,

Â ecosystem graphs, and of course all of these are networks, as well.

Â So there's a variety of networks and all these networks are fairly complex.

Â They have different kinds of complexities.

Â The first kind of complexity is structural complexity.

Â For instance, the human social network has about 7 billion.

Â That's a number of people alive in the world as of this year 2014.

Â There are millions of computers on the Internet.

Â The graphs, or the networks, as a result of

Â these large numbers of vertices are also structurally very, very complex.

Â 2:25

There's a lot of diversity even inside any one of these graphs.

Â Inside the human social network for instance, some folks are more popular than

Â others, so they might have more friends than others.

Â In the World Wide Web, some websites may have many more links than others,

Â they have many more links pointing to them than other websites do.

Â For instance, the BBC website has many more links pointing

Â to it than your or my personal homepage does, likely.

Â And of course, the vertices in these networks are fairly complex.

Â In the Internet graph, the endpoints, the routers and

Â switches have different kinds of CPUs, they're running different kinds of OSs.

Â And these OSs themselves have a lot of complexity.

Â And of course human beings in the human social network are very

Â complex individuals as well.

Â 3:14

And even though you might be able to represent each of these vertices by a few

Â simple rules, when you put a large number of these vertices together in a network,

Â you get some very complex, global behaviors.

Â These are often called as emergent phenomena.

Â So, even from simple end vertex behavior, and end edge behavior, which

Â are governed by simple rules, you might get very complex system-wide behavior.

Â One of the best examples of this is climate change.

Â So, you could represent climate change as a network,

Â where the weather is different points on the globe.

Â And the edges are causatical relationships between them.

Â And really we do understand all the basics of climate change, for

Â instance, wind flows from high pressure to low pressure.

Â The rainfall happens when there's lower pressure and so on and so forth.

Â And yet, weather, it's kind of hard to predict, and that's precisely because it's

Â an example of an emergent phenomenon where even though you know the simple rules,

Â you aren't necessarily able to predict the emergent global behaviors when

Â you put all of these tools together in a very large complex system.

Â 4:18

So, these networks have a lot of structure in the sense that many of these networks,

Â you are able to reach one node from another in just a few hops.

Â For instance, there is this well known myth, actually almost a fact,

Â known as six degrees of Kevin Bacon.

Â In this case, if you draw the actor graph, if you use the Internet Movie Database,

Â IMDB, and if you represent each actor or actress by a vertex in the graph, and

Â an edge joining two actors if those actors have acted together in at least one movie,

Â then it can be shown that any actor can be reached from Kevin Bacon,

Â the word extrapolated into Kevin Bacon within six hops.

Â You can take just six jumps, one over each edge,

Â and reach any actor from Kevin Bacon.

Â Milgram, a researcher assigned to research at Stanford,

Â did an experiment in the 1970s which showed that the human social

Â network separates any pair of human beings with just a few edges.

Â Typically, it's, again, only six degrees of separation,

Â where there are an expected number of six hops between any two

Â individuals in the world, no matter how far or how unrelated they happen to be.

Â And so the actor network, the human network, and

Â also other networks such as the World Wide Web, the human social networks that

Â are in real online social networks such as Facebook, Twitter, and LinkedIn,

Â peer-to-peer overlays, electric power grids, protein networks.

Â A lot of these shared this common characteristics,

Â where you can reach any node from any other node within just a few hops.

Â Just six hops or so.

Â So, why is this?

Â 5:53

We should be wondering why this is.

Â And one of the common characteristics among all these networks is that they

Â evolve naturally.

Â It's not been the case that someone has come along and said,

Â hey here's how the network is going to be, now and forever.

Â Instead these networks started small and then they evolved over time.

Â All of these, whether it's the World Wide Web,

Â whether it's a human social network or whether it's the electric power grid,

Â they evolved, whether it's evolution over a few years or over a few decades.

Â And in the case of Facebook or in the case of the electric power grid, or

Â as in the case of human social network, evolution over many, many centuries and

Â millennia.

Â A lot of these are small world networks.

Â Here I'm introducing a new term, a very interesting and

Â very popular term called small world networks.

Â We need to see exactly what they are.

Â 6:37

Before we see small world networks,

Â let's see two important matrix that we can use to characterize networks.

Â They are called clustering coefficient and path length.

Â Let's do the path length first.

Â The path length is something that you've already seen so far in our discussion.

Â Path length essentially is, if you take any pair of vertices in the graph,

Â you take the shortest path in between them,

Â the shortest number of vertices you need to jump from one to reach the other.

Â That's the path length of that vertex pair.

Â You calculate this path length for every pair of vertices and

Â you take the average among them, that's the path length of the entire graph, or

Â the average path length in that graph or network.

Â 7:11

So the longer the path length is, the more spread out the network is, in a sense.

Â And the shorter the path length is, the more clustered the network is.

Â But clustering is captured by a different metric, known as a clustering coefficient,

Â also abbreviated as CC.

Â This defines something like the following, given three word, this is A, B, and C.

Â If there's an edge between A and C, given an edge between A and C, and

Â given an edge between C and B.

Â That is A and B are common neighbors at C.

Â What is the probability that A and B are also connected to each other?

Â In other words, C has two friends, A and B.

Â What is the probability that C's two friends are also friends of each other?

Â That is a clustering coefficient.

Â This is a probability, so it has values between zero and one.

Â If the values is zero,

Â then it means essentially this is a very highly unclustered graph.

Â For instance, tree networks have a clustering coefficient of zero.

Â However, if it has a clustering coefficient of 1.0,

Â the highest value possible, then it means that this is a highly clustered network.

Â For instance, a complete graph where every vertex is joined every other

Â vertex is an example of a graph that has a 1.0 clustering coefficient.

Â But different graphs lie different points on the spectrum of clustering coefficient

Â and path length.

Â First, if you take the ring graph, the graph that you have seen when we

Â discussed a peer to peer systems and distributed hash tables before.

Â You did the ring graph and you add a few successors and predecessors to each

Â vortex, you got a graph that has a very high clustering coefficient.

Â Because if you have two neighbors, they're likely also to be neighbors of each other

Â because they're likely to be successor and predecessor as well, but

Â it has very long paths.

Â If you have n vertices in your network, then the paths are on average n in length.

Â On the other hand, if you draw a random graph where the edges are added

Â between random pairs of vertices, this can be shown to have a very low clustering

Â coefficient, but it has really short paths.

Â So the paths in between any pair of orders are very short in general.

Â So ideally, you know what we are used to in the human social network is high

Â clustering coefficient.

Â The fact that two of your friends are also likely to know each other, but also very

Â short paths as we have discussed from Milgram's experiments and the 60 degrees

Â of separation that you can reach any other human being within just a few hops.

Â That is what is characterized by the small world networks,

Â which have a high clustering coefficient and short paths.

Â 9:27

So, the reason I discuss the extended ring graph and

Â the random graph is that you can actually show how to read some types of small world

Â networks by using these two types of graphs.

Â Specifically, you take an extended ring graph which has very structured edges and

Â you take one edge at a time and you make their edge point to random notes.

Â Okay, so you don't change the number of edges, but you take one edge at a time and

Â make it point across random pairs of notes.

Â What happens as a result of doing that is that, eventually,

Â you have changed all the edges to being random, you get to a random graph, right?

Â So, you go from an extended ring graph to a random graph, and a result,

Â the clustering coefficient, which is very high for extended ring graphs,

Â falls to very low for random graphs, and the path length also reaches very high for

Â extended ring graphs also falls to be very low for random graphs.

Â But these two things, the clustering coefficient and path length,

Â don't fall at the same rate.

Â In fact, the clustering coefficient falls much slower.

Â It stays stable for much longer as you move from the ring graph to

Â random graph while the path length falls very quickly.

Â In other words,

Â if you change just a few edges in your extended ring graph to be random edges,

Â the path length falls very quickly because now a lot of the sharp paths will grow by

Â these random edges because they join very disparate portions of the graph.

Â 10:34

So you get these region of networks in between the ring graphs and the random

Â graphs which have a very high clustering coefficient as high as an extended ring

Â graphs but have a very short path length almost has low as random graphs.

Â And that is the region of small world networks.

Â Once again, this is not all the small world networks that exist.

Â This is one particular class of small world networks that exist.

Â But essentially, this is a way of deriving small world networks which have this

Â high clustering coefficient and short path length characteristic as we are used to

Â in the separation and in the human social network.

Â So small world networks are all around us.

Â The network of actors is a small world network.

Â The network of humans is a small world network.

Â All online social networks such as Facebook, Twitter,

Â LinkedIn are small world networks.

Â Many of them have been shown by measurements to be small world networks.

Â And the co-authorship network where vertices are researchers and

Â edges between researchers, joint researchers have co-authored at least one

Â paper together is an example of a small world network.

Â In fact, there is number where that between

Â a famous scientist and you in the correlation network.

Â The worldwide Internet, all of these are examples of small word

Â which have a very high coefficient, and yet have a very short.

Â And many of these networks have grown incrementally, and in fact,

Â they can be modeled by using a preferential model of growth where you

Â add vertices sequentially or.

Â When you add a vertex to a graph, you add some edges and you add from an existing

Â vertex v with a probability proportional to the number of neighbors of v.

Â So the number of neighbors of v already has is proportional to

Â the probability with which v will be added to the newly added vertex.

Â So if you use the you can actually show that you end up with small word networks.

Â 12:30

So we've discussed small world networks and we discussed it a little bit.

Â Let's change that a little bit and we'll come back to small world networks.

Â Let's change that a little bit and discuss degree.

Â The degree of a given vertex in a graph or

Â a network is essentially the number immediate one hop neighbors.

Â So if a vertex has three edges, then its degree is three.

Â So the degree distribution in a network or

Â a graph is the probability that a given node has k edges.

Â There are different kinds of degree distributions.

Â You can have a regular graph where all the nodes have exactly identical degrees or

Â Gaussian distribution.

Â Or you can have a random graph with an exponential

Â distribution where the probability that

Â the vertex have k edges is e power minus k x c where c is a constant, some constant.

Â And, of course, you have a power log graph which says that the probability that you

Â have a k edges incident on you as a vertex is proportional to k power

Â minus alpha where alpha is a constant.

Â This is a very interesting distribution, because when k is pretty small, like one,

Â two or three, this probability is very high.

Â But as k increases, this probability drops off very quickly.

Â So there's a very large number of vertices that have very low degrees, but

Â there are a few vertices that have very high degrees as well.

Â So how do you represent this?

Â Well, this can be represented by what is known as the log plot.

Â This is a well-known plot.

Â Here, you represent the no degree on the exact sys, instead,

Â what you plug is the logarithm of the node degree.

Â So you notice that the values of node degree go exponentially up, 1, 10,

Â 100, 1000, rather than 1, 10, 20, and 30.

Â And on the Y axis, you plug the number of nodes that actually have that node degree.

Â For instance, if the point is X axis equals 1 and Y axis equals 1M,

Â it means 1M vertices in this graph.

Â 14:32

Okay, so a very large numbers.

Â We have very small degrees in the top left portion of the curve but

Â some also have very high degree and

Â these are the high degree work presents in any power law graphs.

Â The exponential graphs is not a straight line.

Â It's curve at its head, the top left portion of the plot.

Â And it's also curved at its tail, the bottom right portion of the plot.

Â And also some distributions which are heavy tailed distributions

Â which are straight downward sloping at their tail, the bottom right portion, but

Â they are not necessarily heavy headed which is top left portion.

Â Okay, so they're almost like power law graphs in the sense that they have some

Â vertices that have a very high degree, but

Â they don't have that many vertices with a low enough degree as a power law graphs.

Â 15:19

So this log node plot is the best way for you to distinguish when a particular

Â graph, or when a particular network is either power law, or heavy tailed, or

Â exponential, or something else.

Â You draw the log node plot.

Â If we'd just trade down your sloping line, it's the power law graph.

Â If it has a heavy tail, if it has straight tail, that's heavy tailed.

Â And otherwise, it is some other distribution.

Â So a lot of small world networks are also power law graphs.

Â For instance, the internet backbone graph, the telephone call graph,

Â protein networks, the worldwide world graph is, in fact, a small-world graph.

Â And also, power law graph has a value of alpha equals somewhere between 2.1 and

Â 2.4, remember that.

Â You have in our power law graph a degree of

Â with probable deeper version of they key minus alpha.

Â The Gnutella p2p system has a heavy-tailed degree distribution.

Â It's not necessarily power law but it is heavy tail, so

Â it has a straight downward sloping tail in the log log point.

Â Power law networks are often called a scale-free networks, for instance,

Â in Gnutella, you have about 3.4 edges per vertex independent of

Â the number of vertices that are present in your network or in your graph.

Â But then, not all small world networks are power-law graphs and vice versa.

Â For instance, co-authorship networks are small world but

Â they're not necessarily power-law.

Â And also, not all power-law graphs are small-world.

Â You can generate power-law networks using the data distribution,

Â where you have disconnected components of the graph, where they're not necessarily

Â reachable from each other and so the graph is not a small-world network.

Â But many small world networks are in fact power-law graphs as far as you find them

Â in reality, and because a lot of these graphs,

Â a small world and power-law, their resilience comes into question.

Â Most nodes have a small degree in these graphs, but

Â a few nodes have a very high degree in these graphs.

Â So if you launch an attack on one of these graphs, if the attack is random, for

Â instance, you kill a very large number of randomly chosen nodes,

Â this will not disconnect the graph, okay?

Â This is why attacking a power-law small world graph randomly is not really

Â effective at all.

Â But if you target some of these very high degree nodes in the power-law

Â small world graph, this has a much higher chance of disconnecting the graph, okay?

Â And this is essentially why, in the human body for

Â instance, because the graph of proteins in the human body or chemicals in

Â the human body is a small world network and also a power-law network,

Â a few of the nutrients are very high degree nodes in the graph.

Â So, essentially edges in this protein graph,

Â edges are joined to chemicals that react with each other.

Â And so some nutrients happen to have a much higher degree than

Â other nutrients because they react with a lot of other chemicals in the body.

Â And these nutrients are essentially, the vitamins in the human body,

Â the minerals that the human body acquires like calcium and potassium.

Â So, if you have a shortage of any of these vitamins or nutrients in your body,

Â or minerals, your likely to be in trouble.

Â But if you have a shortage of some of these other random chemicals that

Â are there in your human body,

Â then it's not likely that you're going to be affected all that much.

Â Similarly, networks like the internet as well as the electric power grid,

Â if you target a few high degree vertices, that's much more likely to disconnect

Â the graph and cause a partition, and cause outages in the network.

Â But if you target randomly chosen vertices,

Â that's not likely to have any effect on the network itself.

Â 18:36

What about routing in these networks such as the Internet?

Â If you build shortest-path routes between every pair of vertices,

Â it turns out that in small-world power-law networks, most of these

Â shortest-path routes will pass through some high-degree vertices in the graphs.

Â So these high degree vertices get highly overloaded with these shortest paths and

Â they're likely to suffer congestions and perhaps, in fact, even crash.

Â As we have seen from the previous slide, if you disconnect some of these

Â high-degree vertices, it's also likely that the entire graph would be

Â disconnected and you might have an outage in the network.

Â So this should tell you why there are these frequent outages in

Â networks such as the internet, in networks such as the electric power grid.

Â It's because these high-degree vertices suffer congestion and

Â they might crash as a result of that, and because when they do,

Â they disconnect large portions of the graph and end up having an outage.

Â 19:52

To wrap up our discussion of structure of networks, the networks are all around us,

Â both man-made networks, such as the internet, the World Wide Web and peer2peer

Â networks, as well as natural networks that have evolved over centuries and

Â millennias such as protein networks and the human social network.

Â Yet a lot of these networks have common characteristics.

Â Many of them are small world networks that have a very high clustering coefficient

Â and very short path lengths.

Â And there's also power-law where you have a very large number of vertices that have

Â a small degree, but some vertices have very high degrees.

Â And it's useful to know these characteristics because when you design

Â distributed systems and

Â distributed protocols that run over networks like the internet, that run over,

Â that be with social networks such as online social networks.

Â It's useful to know these characteristics so

Â that you know exactly how your protocol is going to behave when it sends packets,

Â when it sends messages over these networks, or when you run a graph

Â processing algorithm that is dealing with such power-law graphs.

Â So you can better predict how your distributed system is going to behave

Â when it runs over small-world networks, such as the internet.

Â [MUSIC]

Â