0:01

In order to build deep neural networks,

Â one modification to the basic convolutional operation

Â that you need to really use is padding.

Â Let's see how it works.

Â What we saw in earlier videos is that if you take

Â a six-by-six image and convolve it with a three-by-three filter,

Â you end up with a four-by-four output with a four-by-four matrix.

Â And that's because, the number of possible positions for your three-by-three filter,

Â there are only sort of four-by-four possible positions for

Â the three-by-three filter to fit in your six-by-six matrix.

Â The math for this turns out to be that if we have a n-by-n image,

Â and to involve that with an f-by-f filter,

Â then the dimension of the output will be n

Â minus f plus one by n minus f plus one.

Â In this example, six minus three plus one is equal to four,

Â which is why you wind up with four-by-four output.

Â The two downsides to this,

Â one is that every time you apply a convolutional operator,

Â your image shrinks so you come from six-by-six down to four-by-four,

Â then you can only do this a few times before your image starts getting really small.

Â Maybe it strings down to one-by-one or something,

Â so maybe you don't want your image to shrink

Â every time you touch the edges or install other features on it,

Â so that's one downside.

Â And the second downside is that,

Â if you look the pixel at the corner or the edge,

Â this little pixel is touched,

Â it has used only in one of the outputs,

Â because this touches that three-by-three region.

Â Whereas, if you take a pixel in the middle, say this pixel,

Â then there are a lot of the three-by-three regions that overlap that pixel,

Â and so it's as if pixels on the corners or on the edges are used much less in the output,

Â so you're throwing away a lot of the information near the edge of the image.

Â So to solve both of these problems,

Â both the shrinking output,

Â and when we build really deep neural networks,

Â you'll see why you don't want the image to shrink on every step

Â because if you have maybe 100 layer deep net,

Â then the strings that are on every layer then after 100 layers,

Â you end up with a very small image.

Â So that was one problem. The other is throwing away a lot

Â of the information from the edges of the image.

Â So in order to fix both of these problems,

Â what you can do is,

Â the full upon the convolutional operation,

Â you can pad the image.

Â So in this case,

Â let's say you pad the image with

Â an additional border of one pixel all around the edges.

Â So if you do that,

Â then the six-by-six image,

Â you've now padded this to eight-by-eight image.

Â And if you convolve an eight-by-eight image with a three-by-three image,

Â you now get that out,

Â now the four-by-four by the six-by-six images.

Â So you managed to preserve the original input size of six-by-six.

Â So by convention, when you padded with zeros,

Â and p is the padding amounts.

Â So in this case,

Â p is equal to one, because we're padding all around with an extra border of one pixels,

Â that the output becomes n plus two p minus f plus one,

Â by n plus two p minus f by one.

Â So this become six plus two times one,

Â minus three plus one,

Â by the same thing on that.

Â So six plus two minus three plus one,

Â that's equals to six.

Â So you end up with a six-by-six image.

Â That's the image appearance at that point.

Â So this green pixel actually influences all of these cells of the output,

Â and so this effective,

Â maybe not quite throwing away but counting less

Â the information from the edge of a corner or the edge of the image is reduced.

Â And I've shown here the effects of having the border with just one pixel.

Â If you want, you can also pad the border with two pixels.

Â In which case, I guess you add on another border here,

Â and take and pad it with even more pixels if you choose.

Â So I guess what I'm trying here,

Â this would be a pad of p equals two.

Â In terms of how much to pad,

Â this time there are two common choices.

Â They are called Valid convolutions and Same convolutions.

Â Not really great things,

Â but in a valid convolution,

Â this basically means no padding.

Â And so in this case,

Â you might have an n-by-n image, convolve with an f-by-f filter,

Â and this would give you an n minus f plus one by n minus f plus one dimensional output.

Â So this is like the example we had previously from the previous videos

Â where we had an n-by-n image convolve with a three-by-three filter,

Â and that gave you a four-by-four output.

Â The other most common choice or padding is called the Same Convolution.

Â And that means when you pad,

Â so the output size is the same as the input size.

Â So, if you actually look at this formula,

Â when you pad by p pixels,

Â then it's as if n goes to n plus two p,

Â and then you have from the rest of this,

Â minus f plus one.

Â So even n-by-n image,

Â and the padding of a border of p pixels all around,

Â then the output size is a big destination,

Â n plus two p minus f plus one.

Â And so if you want n plus two p minus f plus 1 to be equal to 1,

Â so that the output size is same as the input size,

Â if you take this and solve for,

Â I guess it cancels out on both sides,

Â and if you solve for p,

Â this implies that p is equal to f minus one over two.

Â So when f is odd,

Â by choosing the padding size to be as follows,

Â you should make sure that the output size is same as the input size.

Â And that's why, for example,

Â when the filter was three-by-three as the example on the previous slide,

Â the padding that would make the upper size the same as the input size was three minus

Â one over two, which is one.

Â And as another example,

Â if your filter was five-by-five,

Â so f is equal to five,

Â then if you plug it into that equation,

Â you find that the padding of two is required to keep the output size the same as

Â the input size when the filter is five-by-five.

Â By convention, in computer vision,

Â f is usually odd.

Â It's actually almost always odd.

Â And you rarely see an even-numbered filters,

Â filter words using computer vision.

Â And I think there are two reasons for that.

Â One is that if f was even,

Â then you need some asymmetric padding,

Â or it's only f is not that this type of same convolution gives a natural padding.

Â We can pad the same dimension all around them,

Â pad more on the left and pad less on the right or something asymmetric.

Â And then second, when you have an automation filter,

Â such as three-by-three or five-by-five,

Â then it has a central position and sometimes in computer vision,

Â it's nice to have a distinguisher.

Â It's nice to have a pixel you can call the central pixel,

Â so you can talk about the position of the filter.

Â Maybe none of this is a great reason for using f to be pretty much always odd,

Â but if you look at convolution literature,

Â you see three-by-three filters are very common,

Â you see some five-by-five, seven-by-seven.

Â And that is sometimes later,

Â we will also talk about one-by-one filters and one that makes sense.

Â But just by convention,

Â I recommend you just use odd-numbered filters as well.

Â I think that you can probably get

Â just fine performance even if you want to use an even number value for f,

Â but if you stick to the common computer vision convention,

Â I usually just use odd-numbered f. So you've now seen how to use padding convolutions.

Â To specify the padding for the convolution operation,

Â you can either specify the value for p,

Â or you can just say that this is a valid convolution which means p equals zero,

Â or you can say this is the same convolution which means pad

Â as much as you need to make sure the output has same dimension as the input.

Â So that's it for padding. In the next video,

Â let's talk about how you can implement Strided Convolution.

Â