目标检测之RCNN套路
Selective Search
- 《Selective Search for Object Recognition》
- matlab code(论文作者提供) :
- http://disi.unitn.it/~uijlings/MyHomepage/index.php#page=projects1
- python code :
- https://github.com/AlpacaDB/selectivesearch
- https://github.com/belltailjp/selective_search_py
- 解决物体识别画框的问题:
- Instead of an exhaustive search we use segmentation as selective search yielding a small set of class independent object locations.
- In contrast to the segmentation, instead of focusing on the best segmentation algorithm, we use a variety of strategies to deal with as many image conditions as possible, thereby severely reduc-ing computational costs while potentially capturing more objects accurately.
- Instead of learning an objectness measure on randomly sampled boxes, we use a bottom-up grouping procedure to generate good object locations.
- 设计原则:
- 能捕捉任意scale的物体,bottom-up
- 区分方法多样性,物体在图片里区分可能需要有n种区分方法,都要能识别出来
- 快速,尽可能少的在这一步浪费算力
- 技术方案
- Hierarchical Grouping Algorithm
- 提出四种计算similarity的方案,进行加权融合
The Region-based CNN (R-CNN)
- 《Rich feature hierarchies for accurate object detection and semantic segmentation》
- https://github.com/rbgirshick/rcnn
- 贡献
- 把神经网络从图片分类取得的好效果迁移到目标检测上了
- Approach
- 对于输入图片用上面的Selective Search技术选差不多2000个bottom-up region
- 把图片resize到cnn接受的尺寸
- cnn抽取feature
- SVM判断分类
- bounding-box regressors改善定位
- 这篇文章作者非常良心,工程化细节介绍的很详细,值得大家反复摩擦,搞清楚每一个细节,而且也给出了code(matlab晕)
- selective search到cnn的resize方案调研
- fine-tuning阶段和svm训练阶段positive sample的选取原则不同
- pre-training和fine-tuning的learning rate的设置
- mini-batch的选择方案
- 为什么还要单独训练svm而不是fine-tuning一个softmax分类器就够了
- 增加bounding box regression可以提升效果
- 错误分析工具
- 《Diagnosing error in object detectors》
- 训练及验证集的使用
- 训练集未完整标注
- 验证集进行了完整标注
Fast-RCNN
- 《Fast R-CNN》
- https://github.com/rbgirshick/fast-rcnn
- RCNN的问题:
- multi-stage:卷积神经网络算feature + SVM 做分类 + bounding-box regressors 改善定位
- 速度慢:训练慢,推理慢
- Fast-RCNN贡献:
- single-stage + multi-task loss
- 更高的准确度
- 更快的速度
- Fast-RCNN网络架构(Figure 1):
- network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map.
- 用cnn算整个图片的feature vector,然后用ROI pooling layer去对应区域的feature map进行pooling操作
- for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map.
- Each feature vector is fed into a sequence of fully connected (fc) layers
- branch into two sibling output layers:
- one that produces softmax probability estimates over K object classes plus a catch-all “background” class
- another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes
- network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map.
- 小trick
- 保证缩放不变性,引入图像金字塔来辅助
- 用SVD压缩卷积参数W,从而加速计算
- Truncated SVD can reduce detection time by more than 30% with only a small (0.3 percent- age point) drop in mAP and without needing to perform additional fine-tuning after model compression
- 《Exploiting linear structure within convolutional networks for efficient evaluation》
- 经验
- multi-task training的效果要比multi-stage training的效果要好
- 图像金字塔引入对效果提升很小,但是计算量的消耗很大
- 更多的region候选集合虽然理论上提高召回率但是准确率会有下降
Faster-RCNN
- 《Faster R-CNN: Towards real-time object detection with region proposal networks》
- https://github.com/ShaoqingRen/faster_rcnn
- https://github.com/rbgirshick/py-faster-rcnn
- 计算时proposal是计算瓶颈,提出region proposal network(RPN)来提供proposal,而不再使用之前的selective search,且RPN和预测的fast cnn模型是share feature的,从而达到加速的目的
- VGG模型可以达到5fps
- RPN模型上是一种fully-convolutional network
- 整体模型架构可见Figure2,基本思路是:
- 对整图用卷积神经网络算全局feature
- 基于图片的全局feature计算proposal
- 基于计算的proposal和全局feature计算物体识别
- 在faster-RCNN之前如果要做尺度不变性要训练时进行1)图像缩放或2)多个卷积层的filter size来处理,对训练时长增加很明显,而faster-rcnn只需要对“anchor”进行处理
- anchor的计算从算法上保证了一定是平移不变的
- 训练过程:
- 1、用ImageNet pre-trained的模型初始化RPN,end-to-end对RPN进行fine-tuning
- 2、用ImageNet pre-trained的模型初始化Fast RCNN,基于第一步RPN算的proposal训练fast rcnn,注意截止到第二步,fast rcnn和rpn还没有share feature
- 3、用第二部训练的模型来初始化RPN,训练的时候shared feature那些层是固定权重的,不参与训练,只有专属RPN的部分参与模型训练
- 4、同第三步,share feature层不参与训练,fine tuning专属fast rcnn的那些层
Mask R-CNN
- https://github.com/facebookresearch/Detectron
- inference速度:5fps,基本等同于faster-rcnn
- 训练:单台机器配备8个GPU,训练1~2天
- 主要改进:
- multi-task loss增加一个branch,用于输出像素级的mask
- 增加RoIAlign层来辅助mask预测(对比faster RCNN的ROIPool层)
联系我: