0:01

In order to build deep neural networks one modification to

Â the basic convolutional operation that you need to really use is padding.

Â Let's see how it works.

Â What we saw in earlier videos is that if you take

Â a six by six image and convolve it with a three by three filter,

Â you end up with a four by four output with a four by four matrix,

Â and that's because the number of possible positions with the three by three filter,

Â there are only, sort of,

Â four by four possible positions,

Â for the three by three filter to fit in your six by six matrix.

Â And the math of this this turns out to be that if you have

Â a end by end image and to involved that with an f by f filter,

Â then the dimension of the output will be;

Â n minus f plus one by n minus f plus one.

Â And in this example,

Â six minus three plus one is equal to four,

Â which is why you wound up with a four by four output.

Â So the two downsides to this; one is that,

Â if every time you apply a convolutional operator, your image shrinks,

Â so you come from six by six down to four by four then,

Â you can only do this a few times before your image starts getting really small,

Â maybe it shrinks down to one by one or something,

Â so maybe, you don't want your image to shrink

Â every time you detect edges or to set other features on it,

Â so that's one downside,

Â and the second downside is that,

Â if you look the pixel at the corner or the edge,

Â this little pixel is touched as used only in one of the outputs,

Â because this touches that three by three region.

Â Whereas, if you take a pixel in the middle, say this pixel,

Â then there are a lot of three by three regions that overlap that pixel and so,

Â is as if pixels on the corners or on the edges are use much less in the output.

Â So you're throwing away a lot of the information near the edge of the image.

Â So, to solve both of these problems,

Â both the shrinking output,

Â and when you build really deep neural networks,

Â you see why you don't want the image to shrink on every step because if you have,

Â maybe a hundred layer of deep net,

Â then it'll shrinks a bit on every layer,

Â then after a hundred layers you end up with a very small image.

Â So that was one problem,

Â the other is throwing away a lot of the information from the edges of the image.

Â So in order to fix both of these problems,

Â what you can do is the full apply of convolutional operation.

Â You can pad the image.

Â So in this case, let's say you pad the image with an additional one border,

Â with the additional border of one pixel all around the edges.

Â So, if you do that,

Â then instead of a six by six image,

Â you've now padded this to eight by eight image and if you

Â convolve an eight by eight image with a three by three image you now get that out.

Â Now, the four by four by the six by six image,

Â so you managed to preserve the original input size of six by six.

Â So by convention when you pad,

Â you padded with zeros and if p is the padding amounts.

Â So in this case,

Â p is equal to one,

Â because we're padding all around with an extra boarder of one pixels,

Â then the output becomes

Â n plus 2p minus f plus one by n plus 2p minus f by one.

Â So, this becomes six plus two times one minus three plus one by the same thing on that.

Â So, six plus two minus three plus one that's equals to six.

Â So you end up with a six by six image that preserves the size of the original image.

Â So this being pixel actually influences all of

Â these cells of the output and so this effective,

Â maybe not by throwing away but counting less

Â the information from the edge of the corner or the edge of the image is reduced.

Â And I've shown here,

Â the effect of padding deep border with just one pixel.

Â If you want, you can also pad the border with two pixels, in which case I guess,

Â you do add on another border

Â here and they can pad it with even more pixels if you choose.

Â So, I guess what I'm drawing here,

Â this would be a padded equals to p plus two.

Â In terms of how much to pad,

Â it turns out there two common choices that are called,

Â Valid convolutions and Same convolutions.

Â Not really is a great names but in a valid convolution,

Â this basically means no padding.

Â And so in this case you might have n by n image convolve with an f by

Â f filter and this would give you an n minus

Â f plus one by n minus f plus one dimensional output.

Â So this is like the example we had previously on the previous videos where we had

Â an n by n image convolve with

Â the three by three filter and that gave you a four by four output.

Â The other most common choice of padding is called

Â the same convolution and that means when you pad,

Â so the output size is the same as the input size.

Â So if we actually look at this formula,

Â when you pad by p pixels then,

Â its as if n goes to n plus 2p and then you have from the rest of this, right?

Â Minus f plus one.

Â So we have an n by n image and the padding of a border of p pixels all around,

Â then the output sizes of this dimension is xn plus 2p minus f plus one.

Â And so, if you want n plus 2p minus f plus one to be equal to one,

Â so the output size is same as input size,

Â if you take this and solve for, I guess,

Â n cancels out on both sides and if you solve for p,

Â this implies that p is equal to f minus one over two.

Â So when f is odd,

Â by choosing the padding size to be as follows,

Â you can make sure that the output size is same as

Â the input size and that's why, for example,

Â when the filter was three by three as this had happened in the previous slide,

Â the padding that would make the output size the same as the input size was three minus

Â one over two, which is one.

Â And as another example,

Â if your filter was five by five,

Â so if f is equal to five, then,

Â if you pad it into that equation you find that the padding of two is required to keep

Â the output size the same as the input size when the filter is five by five.

Â And by convention in computer vision,

Â f is usually odd.

Â It's actually almost always odd and you rarely see even numbered filters,

Â filter works using computer vision.

Â And I think that two reasons for that;

Â one is that if f was even,

Â then you need some asymmetric padding.

Â So only if f is odd that this type of same convolution gives a natural padding region,

Â had the same dimension all around rather than

Â pad more on the left and pad less on the right,

Â or something that asymmetric.

Â And then second, when you have an odd dimension filter,

Â such as three by three or five by five,

Â then it has a central position and sometimes in

Â computer vision its nice to have a distinguisher,

Â it's nice to have a pixel,

Â you can call the central pixel so you can talk about the position of the filter.

Â Right, maybe none of this is a great reason for using f to be pretty much always

Â odd but if you look a convolutional literature you

Â see three by three filters are very common.

Â You see some five by five, seven by sevens.

Â And actually sometimes, later we'll also

Â talk about one by one filters and that why that makes sense.

Â But just by convention,

Â I recommend you just use odd number filters as well.

Â I think that you can probably get

Â just fine performance even if you want to use an even number value for f,

Â but if you stick to the common computer vision convention,

Â I usually just use odd number f. So you've now seen how to use padded convolutions.

Â To specify the padding for your convolution operation,

Â you can either specify the value for

Â p or you can just say that this is a valid convolution,

Â which means p equals zero or you can say this is a same convolution,

Â which means pad as much as you need to make sure

Â the output has same dimension as the input.

Â So that's it for padding.

Â In the next video, let's talk about how you can implement Strided convolutions.

Â