faster-rcnn：Fast Region-based Convolutional Neural Networks基于區域的卷積神經網絡

http://blog.csdn.net/column/details/ym-alanyannick.html

先感謝敖川學長給我提供練手的電腦！
前面都學習CNN在圖像分類上的巨大優勢和應用，但是要把CNN用作目標檢測改怎么實現，困擾了我很久。學了幾天先作個筆記。
在Faster R-CNN之前還有R-CNN和Fast R-CNN。既然Faster R-CNN是前面的改進，我就先學Faster R-CNN。

如有錯誤請指正！

理論部分

在學習目標檢測之前，我就想象CNN怎么用作目標檢測。第一想法是將圖像切割送入網絡中。RCNN就是類是滑動窗的東西進行操作：
1.提取建議區域
2.利用CNN對建議區域進行分類

提取建議區域方法的發展：1.滑動窗口 2.select search/edge box 3.rpn（Region Proposal Network）
其他深度學習檢測策略，利用CNN強大表述能力直接對目標位置進行回歸，例如YOLO

R-CNN、Fast R-CNN、Faster R-CNN三者關系

image.png

SPP Net

一般CNNs后解full-connect layer或者classifier，他們都需要固定的輸入尺寸。因此不得不對輸入數據進行crop（修剪）或warp（彎曲），這些預處理會造成數據丟失或幾何學上的失真。SPP Net的第一個貢獻是將空間金字塔的思想加入到CNNs中，實現了數據的多尺度輸入。

image.png

如圖，在卷積層和全連接層之間加入SPP layer。此時網絡的輸入可以是任意尺寸，在SPP layer中每一個pooling的filter會根據輸入調整大小，而SPP的輸出尺寸始終是固定的。
在R-CNN中，每個proposed region先rescale成統一大小，然后分別作為CNNs的輸入，這樣是很低效的。
在SPP Net中，只對原圖進行一次卷積得到整張圖的feature map，然后找到每個proposed region在feature map上的映射patch，將此patch作為每個proposed region的卷積特征輸入到SPP layer和之后的層。節省了大量的計算時間，比R-CNN有一百倍左右的加速。

Fast R-CNN整體結構

image.png

如圖，Fast R-CNN的網絡有兩個輸出層，一個softmax，一個bbox regressor（相對的R-CNN,SPP Net中分類和回歸是兩個部分，這里集成在了同一個網絡中）。而且加入了一個RoI pooling layer（類似于一個尺度的SPP layer）。注意：Fast R-CNN提取建議區域的方法依然是select search。

RoI pooling layer
這是SPP pooling的一個簡化版本，可以看做是只有一個尺度 filter的‘金字塔’。輸入是N個整幅圖的feature map和一組R個RoI（proposed region）。每個特征映射都是HWC，每個RoI是一個元組（n，r，c，h，w），n是特征映射的索引，r，c，h，w分別是RoI的左上角坐標和高與寬。輸出是max-pooling過得特征映射H’xW’xC，如上圖中紅色框線。

Faster-RCNN整體框架

image.png

Faster R-CNN的主要貢獻是設計了提取建議區域的網絡Region Proposal Network（RPN）。代替了費時的select search，使檢測速度大為提高。下圖為Faster R-CNN的結構圖，黃色部分為RPN，可以看出除了RPN，其它部分繼承了FR-CNN的結構

RPN整體結構

image.png

RPN的網絡結構類似于FR-CNN，連接與最后卷基層輸出的feature map，有一個RoI層，兩個輸出層，一個輸出滑窗為建議區域的概率，另一個輸出bbox回歸的offset。其訓練方式也類似于FR-CNN。注意：RPN與FR-CNN共用卷積層。

image.png

RPN通過一個滑動窗口（圖中紅色框）連接在最后一個卷積層輸出的feature map上，然后通過全連接層調整到256-d的向量，作為輸出層的輸入。同時每個滑動窗對應k個anchor boxes，在論文中使用3個尺寸和3個比例的3*3=9個anchor。每個anchor對應原圖上一個感受野，通過這種方法提高scale-invariant。

Multi-task loss

image.png

FR-CNN的有兩個網絡輸出層，將原來與網絡分開的bbox regression的操作整合在了網絡中。并設計了一個同時優化兩個輸出層的loss函數。

image.png

RoI-centric sampling與Image-centric sampling

RoI-centric sampling：從所有圖片的所有RoI中隨機均勻取樣，這樣每個SGD的mini-batch中包含了不同圖像中的樣本（SPP Net采用）。SPP Net的反向傳播沒有到SPP pooling之前的層，因為反向傳播需要計算每一個RoI感受野的卷基層，通常會覆蓋整幅圖像，又慢又耗內存。FR-CNN想要解決這個限制。
Image-centric sampling：mini-batch采用分層采樣，先對圖像采樣，再對RoI采樣。將采樣的RoI限定在個別圖像內，這樣同一圖像的RoI共享計算和內存。通過這種策略，實現了端到端的反向傳播，可以fine-tuning整個網絡。

為了使共用的卷積層在訓練RPN和FR-CNN時都會收斂，論文里設計了一個四步訓練的策略：

（1）：對RPN進行end-to-end的訓練，這里網絡使用ImageNet pre-trained model進行初始化。
（2）：使用第一步RPN生成的建議區域訓練FR-CNN，這里也使用ImageNet pre-trained model進行初始化。
（3）：使用上一步FR-CNN的參數初始化RPN，固定卷基層，只fine-tune RPN獨有的層。（在此步已共享卷積層）
（4）：固定卷基層，只fine-tune FR-CNN獨有的層。

訓練時采用的一些策略與參數設置

訓練樣本選擇方法與其參數設置

Fast-RCNN中參數的設置
    ims_per_batch 1或2
    batch_size 128
    每個batch中正樣本占得比率。  fg_fraction 0.25
    與GT的IOU大于閾值0.6的ROI作為正樣本。  fg_thresh=0.6
    與GT的IOU在閾值0.1到0.5之間的ROI作為負樣本。bg_thresh_hi=0.5、bg_thresh_lo=0.1

實現部分

參考：http://blog.csdn.net/u012177034/article/details/52288835

1.下載py-faster-RCNN源碼

git clone --recursive https://github.com/rbgirshick/py-faster-rcnn

2.編譯lib庫

cd $FRCN_ROOT/lib
make

3.編譯caffe

這部分巨惡心，由于py-faster-rcnn編寫時的caffe版本很老無法直接編譯,可以直接下載我提供的連接鏈接: https://pan.baidu.com/s/1pLkIFDx 密碼: sj9y。我的配置為：GTX1070，CUDA8.0，cuDNN6.5，i7

cd caffe-fast-rcnn  
git remote add caffe https://github.com/BVLC/caffe.git  
git fetch caffe  
git merge caffe/master

4.運行demo

cd $FRCN_ROOT
./tools/demo.py

4.修改為視頻流demo

faster-rcnn的確實不能滿足實時性要求，fps在這配置下為8，延遲為0.5s左右
由于原代碼使用了matplotlib繪圖模塊，每次顯示需要手動關閉。如果要處理視頻還是使用opencv，但是opencv的參數與matplotlib不同需注意。
demo_vedio.py
需要改的地方在vis_detections()這個函數里
我直接把整個代碼貼上來

#!/usr/bin/env python

# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------

"""
Demo script showing detections in sample images.

See README.md for installation instructions before running.
"""

import _init_paths
from fast_rcnn.config import cfg
from fast_rcnn.test import im_detect
from fast_rcnn.nms_wrapper import nms
from utils.timer import Timer
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as sio
import caffe, os, sys, cv2
import argparse

CLASSES = ('__background__',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'horse',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor')

NETS = {'vgg16': ('VGG16',
                  'VGG16_faster_rcnn_final.caffemodel'),
        'zf': ('ZF',
                  'ZF_faster_rcnn_final.caffemodel')}


def vis_detections(im, class_name, dets, thresh=0.5):
    """Draw detected bounding boxes."""
    inds = np.where(dets[:, -1] >= thresh)[0]
    if len(inds) == 0:
        return

    for i in inds:
        bbox = dets[i, :4]
        score = dets[i, -1]
  
    font=cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(im, '{}>= {:.1f}'.format(class_name,thresh), (int(bbox[0]), int(bbox[3])), font, 1, (0,255,0), 2)
    cv2.rectangle(im,(int(bbox[0]), int(bbox[3])),(int(bbox[2]), int(bbox[1])),(0,255,0),5)
    cv2.imshow("im",im)
    
def demo(net, im):
    """Detect object classes in an image using pre-computed object proposals."""

    # Load the demo image
    #im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
    #im = cv2.imread(im_file)
    
    # Detect all object classes and regress object bounds
    timer = Timer()
    timer.tic()
    scores, boxes = im_detect(net, im)
    timer.toc()
    print ('Detection took {:.3f}s for '
           '{:d} object proposals').format(timer.total_time, boxes.shape[0])

    # Visualize detections for each class
    CONF_THRESH = 0.8
    NMS_THRESH = 0.3
    for cls_ind, cls in enumerate(CLASSES[1:]):
        cls_ind += 1 # because we skipped background
        cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]
        cls_scores = scores[:, cls_ind]
        dets = np.hstack((cls_boxes,
                          cls_scores[:, np.newaxis])).astype(np.float32)
        keep = nms(dets, NMS_THRESH)
        dets = dets[keep, :]
        vis_detections(im, cls, dets, thresh=CONF_THRESH)

def parse_args():
    """Parse input arguments."""
    parser = argparse.ArgumentParser(description='Faster R-CNN demo')
    parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]',
                        default=0, type=int)
    parser.add_argument('--cpu', dest='cpu_mode',
                        help='Use CPU mode (overrides --gpu)',
                        action='store_true')
    parser.add_argument('--net', dest='demo_net', help='Network to use [vgg16]',
                        choices=NETS.keys(), default='vgg16')

    args = parser.parse_args()

    return args

if __name__ == '__main__':
    cfg.TEST.HAS_RPN = True  # Use RPN for proposals

    args = parse_args()

    prototxt = os.path.join(cfg.MODELS_DIR, NETS[args.demo_net][0],
                            'faster_rcnn_alt_opt', 'faster_rcnn_test.pt')
    caffemodel = os.path.join(cfg.DATA_DIR, 'faster_rcnn_models',
                              NETS[args.demo_net][1])

    if not os.path.isfile(caffemodel):
        raise IOError(('{:s} not found.\nDid you run ./data/script/'
                       'fetch_faster_rcnn_models.sh?').format(caffemodel))

    if args.cpu_mode:
        caffe.set_mode_cpu()
    else:
        caffe.set_mode_gpu()
        caffe.set_device(args.gpu_id)
        cfg.GPU_ID = args.gpu_id
    net = caffe.Net(prototxt, caffemodel, caffe.TEST)

    print '\n\nLoaded network {:s}'.format(caffemodel)

    # Warmup on a dummy image
    im = 128 * np.ones((300, 500, 3), dtype=np.uint8)
    for i in xrange(2):
        _, _= im_detect(net, im)

    videoCapture = cv2.VideoCapture('/home/noneland/PycharmProjects/Train0707/BR2.avi') 
    success, im = videoCapture.read()
    while success :
        demo(net, im)
        success, im = videoCapture.read() 
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    videoCapture.release()
    cv2.destroyAllWindows()

很好的一張原理圖：

image.png

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Caffe學習筆記8：Faster R-CNN運行及實時性DEMO測試

Caffe學習筆記8：Faster R-CNN運行及實時性DEMO測試

如有錯誤請指正！

理論部分

SPP Net

Fast R-CNN整體結構

Faster-RCNN整體框架

RPN整體結構

Multi-task loss

RoI-centric sampling與Image-centric sampling

訓練時采用的一些策略與參數設置

實現部分

1.下載py-faster-RCNN源碼

2.編譯lib庫

3.編譯caffe

4.運行demo

4.修改為視頻流demo

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Caffe學習筆記8：Faster R-CNN運行及實時性DEMO測試

如有錯誤請指正！

理論部分

SPP Net

Fast R-CNN整體結構

Faster-RCNN整體框架

RPN整體結構

Multi-task loss

RoI-centric sampling與Image-centric sampling

訓練時采用的一些策略與參數設置

實現部分

1.下載py-faster-RCNN源碼

2.編譯lib庫

3.編譯caffe

4.運行demo

4.修改為視頻流demo

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频