0:00

[MUSIC]

Â We will conclude this module.

Â There is an overview of the things you will see next week because it's a lot

Â of stuff you will have to learn.

Â And just in order to be prepared a little bit,

Â I will introduce you to the different steps we are considering.

Â So the first step is to set up a theoretical framework.

Â This is not something we can really teach you here because,

Â as I already said in some other lessons,

Â it is the environment in and for which you want to construct the composite indicator.

Â That is, certainly, even if you're constructing the composite indicator

Â from scratch, you are not constructing something completely out of the nothing.

Â So then you are talking about certain environmental indicator you

Â have already in mind what kind of things should go in.

Â And there's some literature behind.

Â There's some theory behind.

Â There's a lot of knowledge behind.

Â And this we cannot teach to you, especially because we don't

Â where do you want to construct your composite indicator for.

Â And so, this is just to make clear that, certainly,

Â you should not construct your composite indicator in a vacuum.

Â You should be aware that there is something that many people who have

Â already been working on that.

Â And you should know the literature and you should have a concept.

Â This concept, this theoretical framework, will be the basis for

Â all the selections we have to do then in the next steps.

Â It's, for example the basis for selecting the variables you will use.

Â It's also the basis of the statistical method,

Â how to weight each variable, how to aggregate the variables.

Â Whether for example you want to sum them up or

Â multiply them or take some transformation out of them.

Â All this should be considered,

Â looking back to what was the theoretical framework I'm looking at?

Â And I'm trying to construct a composite indicator in order to measure

Â a certain phenomenon.

Â 2:20

It certainly also determines a certain kind of subgroup,

Â subgroups of variables, subgroups of indicators.

Â You could also think of a complete different dimension subgroup of countries.

Â But the second step then is something we have really to talk a little bit about,

Â maybe not in the next module.

Â But when we present all the different composite indicators you will come to

Â learned in our Mookh.

Â Then we certainly talk about data selection.

Â And when you select data then certainly the first most important

Â thing is to select data along the theoretical framework, the first step.

Â But then also, you have other things to address and

Â that is the analytical soundness of the data.

Â It's a measurability, it's also the coverage, whether you find this data for

Â all of the countries you have in mind.

Â So it doesn't really make sense to set up a composite indicator that finally you can

Â only calculate for two counties because for all the others you don't get the data.

Â And certainly but this is, I think obvious,

Â is the relevance of the data you collect for your composite indicator.

Â 3:38

If you don't really observe the things you're interested in,

Â then you always should think an alternative and

Â that's called the proxy variables.

Â Sometimes you don't need the exact information,

Â you just need a variable that could move.

Â The data proxy for the things you want to measure.

Â So in the end you need the quality for data.

Â You need the strengths and weaknesses of data to know.

Â That means well, maybe they are very well measured, maybe they are badly measured,

Â maybe they are easily available, maybe it's very hard to get them,

Â maybe you get them just now, but not in the future.

Â Then you have just a composite indicator for today,

Â but you will never be able again to use it.

Â And you might even want to make a table where you have all the pros and

Â cons of the different variables and then on the first or

Â second step select variables along that table.

Â 4:40

The next thing is then the importation of missing data.

Â The problem is, if you have decided for a certain set of variables of individual

Â indicators, then you will nonetheless obvious found that for some years some for

Â some regions, for some countries, for some aspects, you don't have data.

Â Certainly it should not be the common thing because then

Â you really should reconsider the set of variables you've selected.

Â But it definitely will happen.

Â And then you have to impute some missing data because it's not always

Â recommendable to throw away all the the variables where you have some missings.

Â Or to just exclude all the countries just because you have two or

Â three data in certain years, not available.

Â 5:38

statistical methods in order to impute missing data.

Â But these are typically made for different problems.

Â For example if you say, okay they are just missing at random.

Â Then you can say okay, I would guess

Â that the missing variables behave along the same distributions as all the others.

Â In those cases we have very nice methods that just are maximizing

Â the probability of the sample you're considering.

Â And plugging in where you have blanks data that fit exactly or

Â maximize exactly this probability.

Â This certainly only works if they are really missing at random.

Â Very often you will find and this is something I cannot generally teach you,

Â this depends very much on the context.

Â That data are exactly missing for

Â especially poor countries, or especially rich areas.

Â Or whatever but they're not missing at random and

Â if you have a systematic bias, then certainly you

Â cannot use those methods, you have to look for alternative methods.

Â But again this depends from case to case you just have to first be

Â aware of it and not just impute something and then go ahead.

Â We will come back to this point a little bit later.

Â The next step is multivariate analysis.

Â What does it mean?

Â It basically means that you have to think about

Â everything in this world as correlated, has some covariances and

Â this correlation can be strong or it can be weak.

Â Maybe you don't think that there's a correlation.

Â But then you look at, it turns out yes.

Â These variables always move in exactly the same direction.

Â This basically means if you include both

Â 7:43

And this you should take into account when you're thinking about rating.

Â So should I really include this information twice, or should I just

Â take one of these variables and then think about a reasonable rating for that?

Â Otherwise the rate should account for the covariance structure.

Â Certainly also the covariance structure should somehow be related

Â to the theoretical framework that we discussed in the first step.

Â So you might start maybe this cluster analysis of principal component analysis.

Â The reason is that sometimes you have included many different variables, but

Â it turns out that they all upon more or less have same information.

Â And you would like to know whether really the problem you want to,

Â or the phenomenon you want to measure is as multi-dimensional as you think.

Â I'll just give you a very simple example, it's not really a composite indicator but

Â when we have student evaluations then there's very often the discussion whether

Â we should put 3, 5, or 25 questions on the form.

Â And we ask about all the different aspects and dimensions of the quality of teaching.

Â However, very often it turns out finally you do a principal component analysis

Â that it's just one dimension.

Â Students like the class, or they don't like it.

Â And whether you ask one question, whether they like it or not, or

Â whether you ask them 20 questions about all the different aspects and

Â dimensions that teaching comprises or entails.

Â Then in the end it doesn't make a difference.

Â And this kind of PCA or cluster analysis help you to

Â 9:36

reduce a little bit dimensionally before you start to think about rating.

Â Another aspect is normalization.

Â If you do a PCA you might get in so

Â manuals if you read them before you perform a PCA.

Â Find a lot that you can not just compare all kinds of variables

Â no mater on what scales they are measured.

Â And this is maybe obvious but, very often, people forget about it.

Â That they certainly have to bring the variables on a comparable scale before

Â they compare them, or before they aggregate them.

Â The next and almost final step is the weighting and aggregation.

Â We already talked a little bit about it because normally addition results in kind

Â of weighting.

Â And certainly if you take into account the covariance structure between

Â the different variables, and say okay I have three variables included that more or

Â less tell me the same story even if they have completely different names.

Â Then this already gives me an idea that I have to think over, and over,

Â and over the different weightings I want to give to the variables when

Â I include them to the composite indicator.

Â It's certainly a collection of importance how, and it's a subjective decision.

Â How important is for me this or that variable in my composite indicator?

Â Even if people say I'm using a statistical method for

Â it, there're many statistical methods in order to calculate a weighting.

Â And all of it is a choice of statistical method is a subjective choice

Â in the end of the day.

Â 11:11

The aggregation is even a little bit more complicated, but

Â all these things you will learn in the next week.

Â The aggregation is a more complex thing, because here you have to think about, for

Â example, should I add all the numbers or should I multiply the numbers?

Â And what is the difference?

Â Well, the difference is quite essential, because if you just add the numbers,

Â that means that they are exchangeable.

Â So for example, if you think about the well being of a person you could say,

Â well, he doesn't have enough food, but maybe I just can give him more education.

Â And if you just sum up the variables, education and food, then certainly,

Â you could still have an indicator that goes up, up, up, up but he's starving.

Â And this certainly is an essential question of aggregating the different

Â variables.

Â 12:04

Then finally, once you have done the complicit indicator,

Â then you should make what you call a sensitivity analysis or robustness check.

Â And that means while on the way to construct a composite indicator, made

Â several subjective decisions even if you wanted just to impute some missing data.

Â 12:29

And now what you could do is while in each step,

Â when you make such a decision, you thought about different alternatives.

Â And just try different alternatives and

Â see whether you get completely different results.

Â Or whether, for example, a country with one composite indicator

Â using all the imputation methods, variable selections you have decided for.

Â Or an alternative one a colleague have voted for, whether they give more or

Â less the same ranking or completely different one.

Â This is a sensitivity analysis.

Â It just tells you a little bit whether the composite indicator moves

Â a lot with your decision, or not.

Â And even if it's not just about the individual decisions, also then just about

Â the by debating the alternate decision it depends on what message you have chosen.

Â But it could be that finally the composite indicator after all the effort you have

Â done it's just driven by one variable and this you would like to see.

Â So you have also to simulate some data some new distributions for

Â the data you included in the composite indicator and

Â see which are the drivers of my composite indicators and

Â other some variables that don't actually matter.

Â I included them but in the end they don't move the composite indicator.

Â And with all these things and tools at hand,

Â you should be able to construct at least to understand and

Â get an intuition for reasonable construction of composite indicators.

Â [MUSIC]

Â