0:04

Another useful transformation is when you want to transform

one quantitative attribute into an ordinal one. How do you do that?

Through an operation that is called binning. What does it mean?

It means that you take the quantities,

and you bin them into a number of categories,

and then you sort them according to their values.

In this case, you are effectively

transforming a quantitative attribute into an ordinal one.

There are many situations where this can be very, very useful.

Let me give you an example.

Here, I have another data set that

contains information about sales of certain products coming from a company.

And say that I have an attribute that is called profit,

the amount of profit that comes from each sale.

So, you can imagine that for every single sale,

the profit amount changes,

so there are lots of different values.

But if I want to transform this attribute into a discrete one,

into an ordinal one,

what I can do is just to aggregate these values into a number of bins,

which is exactly what I did here.

Going from one range to another range,

and just create one single category for each one.

This is very, very useful.

Another transformation that is common is the idea of rescaling or

re-expressing a given quantitative attribute, typically through normalization.

What is normalization?

So if your attribute has a given minimum and maximum value,

you can represent the same range using a different scale.

For instance, say that you have an attribute

that goes from one given minimum to a maximum,

and you want to rescale it in a new scale between zero and one,

or minus one plus one.

This is also very common and very useful in certain cases.

Another one is transforming quantitative values into percentages.

There are many situations where calculating

percentages makes comparison between values easier.

Or sometimes, rather than using the values,

you want to use the distance of the values from a reference point.

A very common situation is you have all the quantities in an attribute,

you calculate the average value and you want to

re-express them in terms of distance from the average value.

This is also very useful.

In this specific chart,

what I'm showing is the percentage case.

The chart on the left is expressing information in terms of counts,

and the one on the right is expressing information in terms of percentages.

In this chart, what I'm showing is information coming from food inspections.

There is a food inspection data set in New York,

and what happens is that for every restaurants,

there is an inspection and

the restaurant receives a grade according to a number of parameters.

So, here what I'm showing,

what I'm trying to do in this analysis is to see how different cuisine types,

how the distribution of grades changes across cuisine types.

Now, since there are different number of restaurants across cuisine types,

if I use only raw numbers,

which is what I'm representing on the left chart,

I can't really compare across different cuisines or it's a little hard.

On the right chart,

I'm re-expressing the same information in terms of percentages,

so I no longer have the problem,

the different cuisine types have different frequencies.

And now, it's easier to compare them.

So this is an example of why it is useful sometimes to

re-express a quantitative attribute with a different quantity.

So, I think the most important message here,

we can't really cover all possible transformations because

every single project may require some very specific types of transformations.

Here, I just presented some that tend to be very common.

But you have to think about transformation as part of the design process.

That's very, very, very important.

Creating the right, effective visual representation for

a given problem is not only about finding the right graphical format,

but also finding the right information.

It's almost never the case that you can take the original data and represent it as it is.

You need some intermediary transformations.

And part of the problem for a designer is

to figure out what is the best transformation to achieve,

to create a visual representation that is effective.

So, visualizing data, keep in mind,

visualizing data is not only about how to visualize data,

but also what information to visualize.

That's a very, very important message to keep in mind.