0:00

Okay, so we're back and we're going to talk a little bit more about computing

Â the size of a giant component in a random graph and in particular,

Â we'll focus in on poisson or GMP random graph, [inaudible] random graphs.

Â And, so we're doing giant component calculation and we're going to do sort of

Â a heuristic calculation but one that turns out to be fairly accurate.

Â And the idea here is we'll just do the following kind of calculation.

Â Let's let q be the fraction of nodes in the largest component.

Â Now, let's look at a given node and if we just pick a node at random,

Â the chance it's in there is q.

Â But another way to figure out that chance that a given node

Â is in the giant component has to be that all of its neighbors are

Â outside of the giant component if it's outside the giant component or if it has

Â any neighbor inside the giant component it'll be inside the giant component.

Â And so that'll give us an ability to do a calculation.

Â So the probability that a given node is outside the giant component,

Â so it's not in the giant component,

Â is 1-q since q is the probability that it's in there.

Â But that's also equal to the probability that all of its neighbors are outside.

Â So if the node has degree d,

Â if it has d neighbors,

Â then the chance that each one of those is outside of

Â the giant component is (1-q) we raised that to

Â the dth power and we end up with

Â (1-q)^d is the chance that it's outside the giant component.

Â And so what we can do is just look at the expected degree,

Â take an expectation over this based on the probability of it having

Â different degrees and then find an equation based on that.

Â So we have this q,

Â we're going to try and solve for q.

Â So the probability that you're outside the giant component that's 1-q,

Â it's also the probability that all of your neighbors are outside the giant component.

Â You have d neighbors.

Â What's the probability that you have d neighbors?

Â Well, it's given by a degree distribution.

Â So p(d) here is our degree distribution,

Â so this is the chance that you have

Â d neighbors and once we've got a degree distribution then we can solve for q.

Â Now this is a calculation you could do.

Â It's a heuristic calculation because I'm not worrying

Â about correlations and some of these variables.

Â I'm treating this as if they're independent.

Â They're not quite independent.

Â You can do a more exact calculation using generating functions.

Â There's some of that in the book but this will give us

Â a fairly good approximation and work fairly well especially on large networks.

Â So what we want to do is solve for q.

Â Now, this is an equation so far that doesn't require

Â us to work with a specific degree distribution.

Â You could plug in any degree distribution you wanted into

Â this and then solve for what q is.

Â It's a non linear equation so it's going to be a

Â little difficult to solve for a closed form for q,

Â but you can plug in whatever degree distribution you want and then

Â solve that for q or get

Â an approximate solution for q as a function of your degree distribution.

Â So this is a nicely well-defined equation.

Â And basically, when we look at this

Â there's always going to be a solution of q equals zero.

Â So there is one solution to this equation which has q equals zero.

Â Right? So we're basically solving this equation.

Â One possibility is, well,

Â if we put in q equals zero here and q equals zero

Â here then we're just summing across all degrees for the probability of d,

Â we get one on both sides.

Â That's always going to be a solution.

Â The interesting solution to this is going to be the one where we have a non-zero q.

Â So we want to solve to a non-zero q.

Â So one possibility is we assume everybody is outside

Â the giant component and then everybody ends

Â up outside the giant component because all their neighbors are.

Â That's the nonsensical solution in a lot of situations so the one we want to solve

Â for is going to be the one where there's a positive solution to this equation.

Â So if we look at this and try and solve for

Â the positive solution it's going to be the point where 1-q is equal

Â to the âˆ‘(1-q)^d P(d).

Â And so we're trying to solve for what's,

Â what 1-q is solve this simultaneously,

Â that gives us a solution to q. That's what we need.

Â You could solve this for different degree distributions.

Â Let's do it for the case of GNP or approximately a poisson random graph.

Â So we'll plug in a poisson distribution for PFD.

Â So we have the poisson distribution here.

Â So this is the probability that a given node that has degree d is given by this formula.

Â And when we plug that in then we end up with this equation.

Â So now we can solve this for q and this looks like a daunting equation.

Â It's got sum over factorials of d. So again this is sum over different ds.

Â We've got factorials here.

Â It looks like a fairly nasty equation.

Â But luckily there are some nice parts to this equation.

Â So when you look at things that look like sums of

Â some expression raised to the d over d factorial,

Â that actually has a nice exponential form.

Â So that sum is a nicely behaved sum.

Â And one useful approximation,

Â just to have in the back of your mind,

Â is that when you look at writing down a Taylor series approximation for e^x,

Â that looks like âˆ‘e^x/d!.

Â So Taylor series approximation tells you that e raised to

Â some power of x is the same thing it's suming x to the d over d factorial.

Â So you've got a nice expression here and in particular when we go

Â back we look like we're summing something to the d over d factorials.

Â That's going to look like an exponential.

Â So when we plug in that expression now we say we can

Â rewrite this given our formula as exponents of what's inside here.

Â So we rewrite this in a nice form so that

Â now we've got 1-q is equal to this expression here.

Â Okay? And collecting terms,

Â sum of our P simplify here and we end up with e^(-q(n-1)p).

Â Remember that (n-1)p is the expected degree, right?

Â So P is the probability of any link,

Â multiply that (n-1) times,

Â that's the the expected degree.

Â So what we end up with is here 1-q is equal to e^(-qE(d)) and if we

Â take a log of both sides we get -log(1-q)/q= E(d).

Â So a very nice simple equation.

Â Now we've got a relationship between the q

Â on the one side and the expected degree on the other side.

Â So now we've got an expression that tells us how

Â large the giant component is as a function of the expected degree.

Â It's not quite a solution directly in q because it's

Â -log(1-q)/q but what we can do is at least plot this out and see what it looks like.

Â So if you plot that out as a function of the E(d)s and then see what the qs look like,

Â basically what happens is as q gets close to one there is no giant component, right,

Â so q is very small at one and then the giant component

Â grows and by the time you get up to three as an expected degree,

Â your q is actually very large.

Â It's already close to 95%.

Â So by a degree of three we've already hit close to 95% in

Â terms of the size of the giant component and at one we're close to zero.

Â So this is interesting and this is, you know,

Â somewhat characteristic of these rationally random graphs.

Â We get these Tyche phase transitions.

Â So if people have expected neighbors

Â less than one expected interactions that would transmit a diffusion of less than one,

Â we don't get diffusion.

Â Once you hit one we get a diffusion and then

Â it grows fairly rapidly and by the time you have

Â three interactions then we're already to the point

Â where you're very likely to hit most of the population.

Â So if you remember in our promo video I said tell at least two friends.

Â Well, if you tell just one friend about this then we don't get much diffusion.

Â Two friends is at least enough to start pushing us fairly far along this.

Â And you get to three friends and you're most of the way there.

Â So just in terms of giant component size here we are.

Â We see that there's a fairly tight transition from having

Â no giant component to having almost everybody being

Â connected and so most of the play is in this one region.

Â And indeed when people study epidemiology there's

Â this idea that having at least one contact that's going to transmit something

Â if people are doing more than one you're going to get diffusion or contagions and if

Â people are having less than one transmission then things are going to die out.

Â And that makes perfect sense. You need at least, you know,

Â more than one and it's growing,

Â fewer than one and it's shrinking.

Â That's the critical point.

Â And then here we find that it actually

Â transitions very rapidly so that by the time you've got

Â three interactions that are likely to lead to

Â contagion then our diffusion is quite extensive.

Â Okay, so given that,

Â we can also do calculations of what's the probability of being in a giant component.

Â Well, this was the probability that you weren't in the giant component.

Â Probability that you are is one minus that,

Â this is increasing in d,

Â it's an increasing function of d. M basically that just tells you more connected,

Â you're more likely to be infected.

Â And this is exponential M d. So

Â it tells you that increasing your interaction rate makes it much more likely,

Â exponentially more likely that you end up in the giant component.

Â So you're more likely to be infected at any point in time if we do a

Â dynamic and much more likely to end up infected.

Â And this gives us some insight behind that Coleman,

Â Katz, and Menzel finding,

Â that the people with more interactions we're adopting the antibiotic,

Â prescribing antibiotic at an earlier time.

Â So here we see this directly in terms of this model.

Â Okay, extensions.

Â Let's talk about some extensions quickly.

Â Immunity, well, that's easy.

Â If some nodes are immune then they can't catch something.

Â They can't transmit it so we can just drop those out of

Â our model to begin with and then study the giant component on the remaining nodes.

Â And what's that's going to do is actually decrease the degree, right?

Â So if we have some fraction of the nodes that disappear then that's going to decrease

Â the degree and that decreases the overall interaction.

Â In terms of probabilistic infections, you know,

Â we just have links fail and we can incorporate that directly into the link probability.

Â So we can think about, you know, some nodes initially exposed to infection.

Â We could have some fraction pi of the nodes that are

Â immune and we can think

Â also that maybe only some of the links are actually going to transmit.

Â Let's let a fraction f of the links transmit and then we ask what's the extent of

Â the infection and basically we can just start with our network on end nodes,

Â we delete some fraction of the nodes,

Â delete some fraction of the links and we look at what's left.

Â And if we get an infection in the giant component on what's left then we'll get

Â to reach whatever the giant component is

Â on what's left and otherwise we don't get an infection.

Â So it's a very simple extension to start to incorporate these things and, you know,

Â if we let q be the fraction of knowledge remaining on

Â the network and the remaining network in the giant component,

Â then q times 1-P is the probability of having a non-trivial contagion and that's also,

Â sort of, the extent of the infection.

Â So basically what we can do is just resolve

Â the equation where now what we've got is we're going to have

Â a lower expected degree for the individuals plus moving that down to lower given the fact

Â that they're not going to have as many neighbors because some of those neighbors are

Â immune and then also scaling down by the f. And

Â once we've done that we end up with basically the same exact equation

Â just taking these two modulations into effect.

Â And so what we end up with is a similar curve as we had before but now what

Â we have is on the x axis the expected degree.

Â The one that's, sort of, relevant here is now

Â modulo the fact that some

Â of the nodes are going to be missing and some fraction are going to transmit.

Â And so it has to still be at least one out of what's

Â remaining in this and then going up to three.So you can just do direct calculations,

Â this just scales things back.

Â So implications. The infection can fail now if pi is high enough or

Â if f or p are low enough so that basically puts you below the one.

Â So high pi, high immunization, low f,

Â low contagiousness of the process,

Â low p, low contact among the population.

Â So you can think of these different modulations

Â to what we had before and that's going to affect the eventual process.

Â Okay, so that takes us through

Â a quick look at component size and doing these kinds of calculations.

Â This has been for a process that just spreads through one network.

Â The next thing we're going to look at is an ongoing process where people

Â can change back and forth between state one and zero.

Â So maybe you support somebody, you don't,

Â you like a technology, you don't like it.

Â So you can change your mind over time or there are certain kinds

Â of diseases that people can catch and then re-catch.

Â And so, you know, these are ones that can last in the population.

Â So I can clean my computer of a virus,

Â get the virus back,

Â it can infect other people and so forth.

Â So I can go back and forth between the states zero

Â and one over time and we can look at that kind of model as well.

Â That will be the next thing we have a look at.

Â