60 lines
2.6 KiB
Markdown
60 lines
2.6 KiB
Markdown
|
|
Convolutional Neural Network
|
||
|
|
============================
|
||
|
|
Hubel and Weisel(1962) experiment -> inspiration for CNN
|
||
|
|
single neuron detects edges oriented at 45degress
|
||
|
|
|
||
|
|
filter kernel -> (a patch in image - matrix) (typical to 3x3|5x5|7x7
|
||
|
|
smaller the better)
|
||
|
|
returns a feature map
|
||
|
|
CNN -> multiple layers of kernels(1st layer computes on the input image,
|
||
|
|
subsequent layers computes on the feature maps generated by the previous
|
||
|
|
layer)
|
||
|
|
strides -> amount of pixels to overlap between kernel computation on the same
|
||
|
|
layer
|
||
|
|
(max) pooling kernel -> looks at a patch of image and
|
||
|
|
returns the (maximum) value in that patch
|
||
|
|
(doesn't have any learnable parameters)
|
||
|
|
usually the number of feature maps is doubled after a
|
||
|
|
pooling layer is computed
|
||
|
|
maps (n,n)eg.[(28x28)x128] -> (mxm)eg.[(14,14)x128] -> (x256)
|
||
|
|
|
||
|
|
No of weight required per layer = (k1xk1)xc1xc2 (c1 is channels in input layer)
|
||
|
|
(k1,k1) is the dimension of filter kernel
|
||
|
|
(c2 is number of feature maps in first layer)
|
||
|
|
-> in 1st layer
|
||
|
|
(k2,k2)xc2xc3 (c3) number of feature maps
|
||
|
|
|
||
|
|
conv2d -> padding 'same' adds 0's at the borders to make the output
|
||
|
|
dimension same as image size
|
||
|
|
'valid' does the convolution one actual pixels alone -> will return
|
||
|
|
a smaller dimension relative to the image
|
||
|
|
|
||
|
|
|
||
|
|
technique: use a smaller train/test data and try to overfit the model
|
||
|
|
(100% on train to verify that the model is expressive enough
|
||
|
|
to learn the data)
|
||
|
|
|
||
|
|
Deconvolutional Layers(misnomer):
|
||
|
|
upsampling an image using this layer
|
||
|
|
(tf.layers.conv2d_transpose,tf.nn.conv2d_transpose)
|
||
|
|
|
||
|
|
|
||
|
|
Transfer Learning:
|
||
|
|
==================
|
||
|
|
using pretrained networks as starting point for a task (using a subset of layers)
|
||
|
|
eg. VGG(Visual Geometry Group) networks (224x224 -> 1000 classes)
|
||
|
|
-> classification(what) & localization(where)
|
||
|
|
CNN works great for classification(since it is invariant to location)
|
||
|
|
to predict the location (use the earlier layers(cotains locality info)
|
||
|
|
for final output)
|
||
|
|
using it to identify a class not in the 1000 pretrained classes
|
||
|
|
using it to identify a class with input size 64x64(depends on the first layer filter size)
|
||
|
|
|
||
|
|
|
||
|
|
|
||
|
|
Regularization:
|
||
|
|
===============
|
||
|
|
Dropout based regularization is great for image classification application.
|
||
|
|
(Warning: not to be used on data without redundancy(image data has lot of redundancy
|
||
|
|
eg. identifing a partial face is quite easy))
|