We have already hinted at the fact that a hidden layer, convolutional layer does not need to have only one feature map. So let us look at what this would look like to have multiple feature maps. It's quite trivial. In fact, each one of the feature maps, feature maps has its own kernel. And you make the computation separately for each kernel over of course, the same image as a separate cross-correlation convolution if you want operation for each one of the kernels and each one of the feature maps has its own its own set of hidden units. Are hidden units, therefore now have three dimensions which feature map while we're talking about the row and column. So if we continue from the example we had so far with an image of size seven times seven. And consider again that we've picked kernels, there are size three times three. Then if we have a number of feature maps, capital F out, then the number of hidden units is going to be capital F out times five times five, which is derived in the same way as we've been deriving the feature map sites so far. And the number of parameters will be the number of feature maps times the number of parameters uni for his future map, that would be three times three plus one for the bias and the number of connections again, the same way we seen so far will be five times five times three times three. But of course, once for each one of the feature points. Different situation arises when we have multiple input feature maps or input channels as we move into the first hidden layer. And a very common example of this is when we have the red, green, blue channels for an image that's basically three matrices of the same dimensionality representing our image. The example here on the slide, we have two channels. Each feature map. Each. Each channel, sorry, will have its own current associated with it. These are different kernels and each channel, of course, has its own image. If you want. It's a different channel, the same image. We want to see it that way. And the result of the computation is going to be one feature now. And it's not completely describe here you can imagine and quite typical that we sum the results of these two cross correlations together. That's how we get our feature map. If you now have f capital N number of input channels, then the number of input units will be the size of image, time to time. The number of channels. Again, the, since each individual kernels three times three or there are number of hidden units will be five times five. And the output and the kernel size, we can actually say that our kernel is for, go back for a second. We can say that our kernel is actually both of these kernels as shown here. So you can imagine adding an extra dimension to the crown. That is, the first dimension could be which channel is this part of the kernel applied to? So the one on the top, W 0 zeros erupted WTO 22. That one would be apply the first channel and then the one below. It could be that the SEC at the second value for the first dimension. And that slice that will be from w1 0 to W one to two. And that would be used for the cross-correlation operation over the second channel. So you could see those two together as a three-dimensional Colonel instead. But the operations are exactly as we described. Now, considering that aside, the number of parameters, we're going to use r for each one of these kernel matrices if you want. Times how many they are. And plus one for the bias. And the number of connections will again be the size of our output feature map times the dimensionality of each one of these kind of matrices, times the number of channels that are. So in our example before there will be two times five times five times three times three. We don't generally tie the weights across the feature maps, as we said, they are different for each one of the channels. And this means that our local receptive fields are actually across multiple input images. Seeing each cow if you want, as an image that is now you can of course imagine combining the two as will quite frequently the case. So we'll have multiple input channels and we will also be looking at multiple future maps at the hidden layers. So then our number of input units will be number of channels times the dimensionality of the input image, seven times seven. Here, the number of hidden units will be five times five times the number of feature maps, output feature maps, each one of our kernels. The way we saw it earlier, where the kernel would have 3-dimensions. Well, one of the dimensions is the number of channels and the rest of this, three times three as we've seen so far. So each one of our kernels will have, therefore size, number of input channels times three times three. Then number of parameters will be input channels times output feature maps times the size of the kernel, three times three, or each one of the current matrices if you want. Plus bias once for each one of the output feature maps. So that's plus F out the number of feature maps. And then the number of connections will be number of input channels times number of output feature maps, times five times five times three times three.