原文鏈接
https://blog.csdn.net/qq_33037903/article/details/88774615
背景簡介
2012年 AlexNet 在 ImageNet 上顯著的降低了分類錯誤率,深度神經(jīng)網(wǎng)絡(luò)進(jìn)入迅速發(fā)展階段。在2014年牛津大學(xué)機(jī)器人實(shí)驗(yàn)室嘗試構(gòu)建了更深的網(wǎng)絡(luò),文章中稱為"VERY DEEP CONVOLUTIONAL NETWORKS",如VGG16,有16層,雖然現(xiàn)在看起來稀疏平常,但與 AlexNet 相比,翻了幾倍。這個階段,主要是沒有解決網(wǎng)絡(luò)太深梯度反向傳播消失的問題,且受限于GPU等硬件設(shè)備的性能,所以深度網(wǎng)絡(luò)不易于訓(xùn)練。不過,VGG 顯然是當(dāng)時最好的圖像分類模型,斬獲 ILSVRC 比賽冠軍。順便說下,2012年之后,標(biāo)準(zhǔn)數(shù)據(jù)集主要是ImageNet,到后來又有微軟的COCO數(shù)據(jù)集。
原論文地址
https://arxiv.org/pdf/1409.1556.pdf
文章發(fā)在 ICLR( International Conference on Learning Representations) 2015會議上,截止目前引用次數(shù)為27081
個人github實(shí)現(xiàn)
https://github.com/uestcsongtaoli/vgg_net
模型介紹
這里重點(diǎn)介紹VGG16
上圖上半部分相對直觀,下半部分是一種常見的普通網(wǎng)絡(luò)結(jié)構(gòu)的表示方法,下半部分最后一層不應(yīng)該是4096,應(yīng)該是分類問題的類別數(shù)目 num_classes。
該模型可以簡單分為5個 stage,(我猜測這種思想來自于 AlexNet 的5層卷積,可以參考我寫的關(guān)于AlexNet的文章結(jié)構(gòu)介紹部分)每層兩卷積核池化組成,最后接3層全連接用于分類。
先定義 conv block 包括卷積、BatchNormalization 和 Activation:
def conv_block(layer, filters, kernel_size=(3, 3), strides=(1, 1), padding='same', name=None):
x = Conv2D(filters=filters,
kernel_size=kernel_size,
strides=strides,
padding=padding,
kernel_initializer="he_normal",
name=name)(layer)
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
Stage 1
- 卷積層Conv_1_1
- 卷積層Conv_1_2
- 池化層 max_pool_1
x = conv_block(input_layer, filters=64, kernel_size=(3, 3), name="conv1_1_64_3x3_1") x = conv_block(x, filters=64, kernel_size=(3, 3), name="conv1_2_64_3x3_1") x = MaxPool2D(pool_size=(2, 2), strides=(2, 2), name="max_pool_1_2x2_2")(x)
Stage 2
- 卷積層Conv_2_1
- 卷積層Conv_2_2
- 池化層 max_pool_2
x = conv_block(x, filters=128, kernel_size=(3, 3), name="conv2_1_128_3x3_1") x = conv_block(x, filters=128, kernel_size=(3, 3), name="conv2_2_128_3x3_1") x = MaxPool2D(pool_size=(2, 2), strides=(2, 2), name="max_pool_2_2x2_2")(x)
Stage 3
- 卷積層Conv_3_1
- 卷積層Conv_3_2
- 卷積層Conv_3_3
- 池化層 max_pool_3
x = conv_block(x, filters=256, kernel_size=(3, 3), name="conv3_1_256_3x3_1") x = conv_block(x, filters=256, kernel_size=(3, 3), name="conv3_2_256_3x3_1") x = conv_block(x, filters=256, kernel_size=(1, 1), name="conv3_3_256_3x3_1") x = MaxPool2D(pool_size=(2, 2), strides=(2, 2), name="max_pool_3_2x2_2")(x)
Stage 4
- 卷積層Conv_4_1
- 卷積層Conv_4_2
- 卷積層Conv_4_3
- 池化層 max_pool_4
x = conv_block(x, filters=512, kernel_size=(3, 3), name="conv4_1_512_3x3_1") x = conv_block(x, filters=512, kernel_size=(3, 3), name="conv4_2_512_3x3_1") x = conv_block(x, filters=512, kernel_size=(1, 1), name="conv4_3_512_3x3_1") x = MaxPool2D(pool_size=(2, 2), strides=(2, 2), name="max_pool_4_2x2_2")(x)
Stage 5
- 卷積層Conv_5_1
- 卷積層Conv_5_2
- 卷積層Conv_5_3
- 池化層 max_pool_5
x = conv_block(x, filters=512, kernel_size=(3, 3), name="conv5_1_512_3x3_1") x = conv_block(x, filters=512, kernel_size=(3, 3), name="conv5_2_512_3x3_1") x = conv_block(x, filters=512, kernel_size=(1, 1), name="conv5_3_512_3x3_1") x = MaxPool2D(pool_size=(2, 2), strides=(2, 2), name="max_pool_5_2x2_2")(x)
FC Layers
3 層全連接,最后接 softmax 分類
# FC layer 1
x = Flatten()(x)
x = Dense(2048)(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Activation("relu")(x)
# FC layer 2
x = Dense(1024)(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Activation("relu")(x)
# FC layer 3
x = Dense(num_classes)(x)
x = BatchNormalization()(x)
x = Activation("softmax")(x)
全連接層最后的 units 個數(shù)可以根據(jù)實(shí)際問題修改,我覺得4096對于10個類別的分類太多了,簡單的處理了下,減少了一般的參數(shù),畢竟實(shí)驗(yàn)室機(jī)器不太好呀!
這是原論文中不同深度的VGG網(wǎng)絡(luò)結(jié)構(gòu)
個人理解
- 看完 VGG 你會覺得就是在 AlexNet 網(wǎng)絡(luò)上沒一層進(jìn)行了改造,5個 stage 對應(yīng) AlexNet 中的5層卷積,3層全連接仍然不變。
- 圖片輸入的大小還是沿用了 224x224x3
- 網(wǎng)絡(luò)更深,訓(xùn)練出來的效果確實(shí)比 AlexNet 有所提升
- 常用的 trick 都加進(jìn)去了: max_pooling/batch_normalization/dropout(我沒加)
- 調(diào)參主要是,初始化用了 he_normal;使用了不同的優(yōu)化器 optimizer, 如 adamax 等
- 由于網(wǎng)絡(luò)在如今看來并不是太深,所以有些任務(wù)的基礎(chǔ)骨架仍然選擇VGG
更多參考資料
- Reading the VGG Network Paper and Implementing It From Scratch with Keras
https://hackernoon.com/learning-keras-by-implementing-vgg16-from-scratch-d036733f2d5 - VGG Convolutional Neural Networks Practical
原文作者的網(wǎng)絡(luò),講解詳細(xì),源碼是 MATLAB
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html - Convolutional Neural Network with VGG Architecture
清晰的結(jié)構(gòu)圖
https://betweenandbetwixt.com/2018/12/23/convolutional-neural-network-with-vgg-architecture/ - VGG in TensorFlow
https://www.cs.toronto.edu/~frossard/post/vgg16/
代碼
- VGG16 – Implementation Using Keras
https://engmrk.com/vgg16-implementation-using-keras/ - VGG16 model for Keras
https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3 - VGG16 model for TensorFlow
https://github.com/machrisaa/tensorflow-vgg/blob/master/vgg16.py