0:29

The first is a very important dichotomy that we should be aware of which is

Â The Primary Vs Secondary Data Dichotomy.

Â There are two types of data, broadly speaking, that we would in some sense use

Â for research and analytics, primary data and secondary data.

Â What is primary data?

Â Primary data basically is data collected at source and hence primary in form.

Â Specifically, primary data is that data that would not exist but

Â for your research or analytics project.

Â If you didn't go out and collect it, it would not, in some sense, exist as data.

Â The source of the data, because it is primary in form, would be individuals,

Â groups of individuals, organizations, institutions, and so on.

Â Surveys, interviews, focus groups, all of this fall under the ambit of primary data.

Â On the other hand, secondary data are that data which are collected previously.

Â Whether or not you're doing research or you're doing analytics,

Â that data would have existed anyway.

Â A good example are sales records within a company.

Â I mean, they are anyway collected by the point of sale system.

Â ERP data within a firm is anyway going to be collected, accounting data.

Â So whether or not you're doing something with it is a secondary issue.

Â The data would already continue to exist.

Â This is an important dichotomy,

Â because we will see that the type of data you have will, in some sense,

Â influence the questions you can ask, and the answers you can hope to get.

Â All right, what will follow quickly is a multiple choice question on

Â this dichotomy based on what we've seen so far in the slide.

Â So now let's get to the four data types.

Â There are four types of data based on the four primary scales, right?

Â And these correspond to in some sense, or

Â data types correspond to the scales directly.

Â So these four are nominal, ordinal, interval, and ratio scale.

Â Nominal basically means off a name, right?

Â So it's just a name, it's just a label, and no further information can be gleaned.

Â And the example I put on the slide is not a good example because in some sense,

Â Coke and Pepsi are not uninformative.

Â In a blind test, people would normally prefer a Pepsi, but in a non-blind test,

Â where the names are visible, people tend to say they prefer Coke more.

Â I mean this has been, in some sense, documented repeatedly.

Â Ordinal data, ordinal means having an order, so there is ordered information.

Â So this conveys not just label information,

Â this also conveys preference information.

Â So when you say, I prefer A to B, you know that there are A and B, two entities, and

Â at the same time you know that you prefer A to B, so there is a direction implied.

Â The third data type is interval data.

Â Interval data is not just nominal and ordinal.

Â Sure, it has labels and it has direction.

Â And in addition to that, it has magnitude information.

Â I rate A a 7 and B a 4 on a scale of 10.

Â So it's telling me not just that I prefer A to B direction, and

Â that there are A and B nominal labels.

Â It is telling me how much I prefer A to B.

Â And finally, ratio conveys information on an absolute scale.

Â So I paid, say, $11 for A and $12 for B.

Â The reason ratio defers from, so

Â ratio basically has all the properties of all the other scales.

Â The reason it differs from interval is that this is an absolute scale.

Â It is understood independent of subject.

Â So $0 or 0 rupees in this case are understood the same by everybody, right?

Â So there is not a zero point.

Â And is considered fixed.

Â Here's another quick example from the world of sports.

Â Nominal would be the numbers assigned to runners in a race.

Â It doesn't matter what number a particular runner is wearing.

Â It doesn't affect the performance in any other way.

Â Ordinal would be the rank order of winners.

Â So I know that A came first and B came second, but

Â I don't know about the difference between them.

Â I don't know whether A won by an inch or by a mile, so to say.

Â Interval would be a performance rating on a 0 to 10 scale.

Â This is done in gymnastics, for instance, where judges give out ratings.

Â And ratio would be in a race, say, the time taken to complete it.

Â So basically, 15.2 seconds for A and 14.1 seconds for B basically telling me that,

Â well I know the exact difference compared to an absolute zero point.

Â What does it all mean?

Â Why should we care about the four primary scales?

Â Okay, look at the first column there.

Â If you have nominal data, only the most you can do in terms of analysis is mode,

Â frequencies and percentages.

Â That's about it.

Â Nothing else can really be done.

Â If you have ordinal data, however, because you now have ordered information, I can,

Â in addition to what you can do with nominal data, get medians to come in.

Â So half the ordering is above the median, the other half is below the median.

Â When we move from ordinal to interval, we're not just taking a step,

Â we're actually taking a leap.

Â 5:26

Because the moment you get to interval and

Â to ratio, some very interesting properties come into play.

Â Well, looking at the slide, can you guess what they are?

Â So write the mean and the variance.

Â The moment the mean and the variance become meaningful, statistical analysis,

Â statistical inference of a parametric variety is now in play, right.

Â A lot is now possible, which basically tells me that if you had a choice

Â in your data collection, in your research design for your analytics.

Â You should ideally want data in either in interval or

Â a ratio form, as far as possible.

Â So the first two scales, nominal and ordinal are called non-metric.

Â The last two are called metric because the mean and the variance.

Â The arithmetic mean is meaningful there.

Â Based on what we've seen,

Â the four data types corresponding to the four primary scales.

Â What will follow shortly are four multiple choice questions.

Â So just to sum up what we did, any psychometric scale that yields internal or

Â ratio data is a metric scale.

Â The other two would be non-metric.

Â