Deep Learning

CNN Architectures

AlexNet
- Max pooling, ReLU nonlinearity
- More data and bigger model (7 hidden layers, 650k units, 60M params)
- GPU implementation (50x speedup over CPU)
  - Trained on two GPUs for a week
- Dropout regularization
- 61M parameters
VGG Net
- Small filters, Deeper networks
- AlexNet(8 layer) VS VGG16(16~19 layers)
- Only 3x3 convolution stride 1, pad 1 and 2x2 maxpool stride 2
- Why 3x3 stacks?
  - Stacked convolution layers have a large receptive field.
    - Two 3x3 layers => 5x5 receptive field
    - Three 3x3 layers => 7x7 receptive field
  - More non-linearity
  - Less parameters to learn (~140M per Network)
  - ** stacked of smaller convolution layers have same effective receptive field as more larger convolution layers
  - ** but deeper, more non-linearities and fewer parameters.
GoogLeNet
- Decision the type of convolution you want to make at each layer은 다음 계층으로 이동하기 전에 각 컨볼루션과 그에 따른 기능 맵을 병렬로 연결하기 때문에 안됨
ResNet
- Deeper layer makes the performance worse and degrades. (Quite counter-intuitive)
- Because deeper networks failed to the identity mapping
- So, use identity mapping!
  - Rather than just expecting the deeper network learn the identity mapping as a new function, we can give it a hint.
- First trial
  - Case 1. When convolution keep its input dimension (256 -> 256)
    - Merging
    - Assuming 3x3 convolution outputs the same spatial dimension size as input
      - If without non-linear functions like ReLU, this process goes meaningless.
      - Because of ReLU, the value range by the way the output from ReLU and X is different.
    - Solution : add one more 3x3 convolution layer
  - Case 2. When convolution changes spatial dimension.
    - 28x28 => 3x3 => 14x14
    - Identity mapping을 하려고 할 때, spatial dimension doesn’t match.
    - 따라서 layer size가 바뀌면 identity size도 변경 (1x1 layer 이용)
  - Case 3. What if we want channel dimension to be changed?
    - 256x28x28 => 3x3,32 => 32x28x28
    - Identity : 1x1 convolution filter, 32, /2
  - Case 4. However if we keep stacking residual layers very deeply, the computation cost will also increases.
  - “Reduce the computation”
    - ResNet 50 이상부터는 1x1 - 3x3 - 1x1 block
- Shortcut connection 1
  - Residual block can be applied when input and output has same spatial & channel dimension.
- Shortcut connection 2 : Channel dimension increase

To be Continued . . .

'Deep Learning' 카테고리의 다른 글

Machine Learning - Arcing (0)	2022.10.25
creating tensor (0)	2021.06.18
Aggregating Features (0)	2021.06.14
Up-sampling (0)	2021.06.14
CNN [1] : Pooling & Convolution (0)	2021.06.07

Contents

새소식

CNN Architectures

'Deep Learning' 카테고리의 다른 글

당신이 좋아할만한 콘텐츠

티스토리툴바