Here's an example of a medical record.
So, this is again another form of data.
People are very interested in studying medical records,
and trying to understand how people can either
improve the way that we insure people or
improve the way that we give people medical care.
And so, these data again are text files that,
are not necessarily formatted in a very nice way.
So what you might want to be able to do is
Do things like extract the allergy name from this text file.
Or subtract or extract with different things were ordered and so
forth, and then use that data to maybe answer some questions.
Data might also be a video, so in
this particular case machine learning experts developed an
algorithm that could learn whether a video was a cat, or a video was something else.
It seems like kind of a trivial application.
It's actually quite a hard problem And they solved.
But in this particular case, the data were actually the videos themselves.
They're stills from the video themselves.
Data might also be an audio file.
So this is an example, DarwinTunes, where
people actually study the evolution of music over
time, where people decided whether, innovations introduced
into the audio file were interesting or not.
So, overtime they evolved music that was more
melodious and more interesting for people to listen to.
So, the data itself was actually the audio files in this study.
Data might also look like access to files, whether
through an API or through spreadsheets through open government websites.
So, for example, data.gov has a lot of data sets that might
be accessible, and those data sites might be in any number of
formats, from very simple to tab or comma separated files, to excel
files, to something that's much more complicated and messy like RAW text files.