- AlexNet
- Max pooling, ReLU nonlinearity
- More data and bigger model (7 hidden layers, 650k units, 60M params)
- GPU implementation (50x speedup over CPU)
- Trained on two GPUs for a week
- Dropout regularization
- 61M parameters
- VGG Net
- Small filters, Deeper networks
- AlexNet(8 layer) VS VGG16(16~19 layers)
- Only 3x3 convolution stride 1, pad 1 and 2x2 maxpool stride 2
- Why 3x3 stacks?
- Stacked convolution layers have a large receptive field.
- Two 3x3 layers => 5x5 receptive field
- Three 3x3 layers => 7x7 receptive field
- More non-linearity
- Less parameters to learn (~140M per Network)
- ** stacked of smaller convolution layers have same effective receptive field as more larger convolution layers
- ** but deeper, more non-linearities and fewer parameters.
- GoogLeNet
- Decision the type of convolution you want to make at each layer은 다음 계층으로 이동하기 전에 각 컨볼루션과 그에 따른 기능 맵을 병렬로 연결하기 때문에 안됨
- ResNet
- Deeper layer makes the performance worse and degrades. (Quite counter-intuitive)
- Because deeper networks failed to the identity mapping
- So, use identity mapping!
- Rather than just expecting the deeper network learn the identity mapping as a new function, we can give it a hint.
- First trial
- Shortcut connection 1
- Residual block can be applied when input and output has same spatial & channel dimension.
- Shortcut connection 2 : Channel dimension increase
To be Continued . . .