By the way, this sentence is used a lot in AI for
testing because this is a short sentence that contains every alphabet from A to Z,
so you see this sentence a lot.
But, given that recording of "the quick brown fox jumps over the lazy dog," you
can then also get a recording of car noise like this.
So, that's what the inside of a car sounds like,
if you're driving in silence.
And if you take these two audio clips and add them together,
you can then synthesize what
saying "the quick brown fox jumps over the lazy dog" would sound like,
if you were saying that in a noisy car. So, it sounds like this.
So, this is a relatively simple audio synthesis example.
In practice, you might synthesize other audio effects like
reverberation which is the sound of
your voice bouncing off the walls of the car and so on.
But through artificial data synthesis,
you might be able to quickly create more data that sounds like it
was recorded inside the car without needing to go out there and collect tons of data,
maybe thousands or tens of thousands of hours of
data in a car that's actually driving along.
So, if your error analysis shows you that you should try to
make your data sound more like it was recorded inside the car,
then this could be a reasonable process for
synthesizing that type of data to give you a learning algorithm.
Now, there is one note of caution I
want to sound on artificial data synthesis which is that,
let's say, you have 10,000 hours of data that was recorded against a quiet background.
And, let's say, that you have just one hour of car noise.
So, one thing you could try is take this one hour
of car noise and repeat it 10,000 times in
order to add to this 10,000 hours of data recorded against a quiet background.
If you do that, the audio will sound perfectly fine to the human ear,
but there is a chance,
there is a risk that your learning algorithm will over fit to the one hour of car noise.
And, in particular, if this is the set of
all audio that you could record in the car or,
maybe the sets of all car noise backgrounds you can imagine,
if you have just one hour of car noise background,
you might be simulating just a very small subset of this space.
You might be just synthesizing from a very small subset of this space.
And to the human ear,
all these audio sounds just fine because one hour of car noise
sounds just like any other hour of car noise to the human ear.
But, it's possible that you're synthesizing data from a very small subset of this space,
and the neural network might be
overfitting to the one hour of car noise that you may have.
I don't know if it will be practically feasible to
inexpensively collect 10,000 hours of car noise so that
you don't need to repeat the same one hour of
car noise over and over but you have 10,000 unique hours
of car noise to add to 10,000 hours of unique audio recording against a clean background.
But it's possible, no guarantees.
But it is possible that using 10,000 hours of unique car noise rather than just one hour,
that could result in better performance through learning algorithm.
And the challenge with artificial data synthesis is to the human ear,
as far as your ears can tell,
these 10,000 hours all sound the same as this one hour,
so you might end up creating this
very impoverished synthesized data set from
a much smaller subset of the space without actually realizing it.
Here's another example of artificial data synthesis.
Let's say you're building a self driving car and so you want to really detect
vehicles like this and put a bounding box around it let's say.
So, one idea that a lot of people have discussed is, well,
why should you use computer graphics to simulate tons of images of cars?
And, in fact, here are a couple of pictures of
cars that were generated using computer graphics.
And I think these graphics effects are actually pretty good and I can
imagine that by synthesizing pictures like these,
you could train a pretty good computer vision system for detecting cars.
Unfortunately, the picture that I
drew on the previous slide again applies in this setting.
Maybe this is the set of all cars and,
if you synthesize just a very small subset of these cars,
then to the human eye,
maybe the synthesized images look fine.
But you might overfit to this small subset you're synthesizing.
In particular, one idea that a lot of people have independently raised is,
once you find a video game with good computer graphics of cars and just
grab images from them and get a huge data set of pictures of cars,
it turns out that if you look at a video game,
if the video game has just 20 unique cars in the video game,
then the video game looks fine
because you're driving around in the video game and you see
these 20 other cars and it looks like a pretty realistic simulation.
But the world has a lot more than 20 unique designs of cars,
and if your entire synthesized training set has only 20 distinct cars,
then your neural network will probably overfit to these 20 cars.
And it's difficult for a person to easily tell that,
even though these images look realistic,
you're really covering such a tiny subset of the sets of all possible cars.
So, to summarize, if you think you have a data mismatch problem,
I recommend you do error analysis,
or look at the training set,
or look at the dev set to try this figure out,
to try to gain insight into how these two distributions of data might differ.
And then see if you can find some ways to get
more training data that looks a bit more like your dev set.
One of the ways we talked about is artificial data synthesis.
And artificial data synthesis does work.
In speech recognition, I've seen artificial data synthesis significantly
boost the performance of what were already very good speech recognition system.
So, it can work very well.
But, if you're using artificial data synthesis,
just be cautious and bear in mind whether or not you might be accidentally
simulating data only from a tiny subset of the space of all possible examples.
So, that's it for how to deal with data mismatch.
Next, I like to share with you some thoughts
on how to learn from multiple types of data at the same time.