3.1 视觉任务简介

计算机视觉的任务包括（如图1所示）：

图像分类：判断图像是什么类别。结果是class。
目标检测：判断目标是什么，以及框出目标。结果是bounding box和class。
语义分割：判断像素的类别。结果是pixel对应的class。
实例分割：判断像素的实体。结果是pixel对应的instance。

图1 计算机视觉的任务

语义分割和实例分割的区别，可以参考重叠的两只羊这个例子。

3.2 目标检测

图2 目标检测示例

图像目标检测的结果：

target class
Bounding Box(x, y, width, height)

3.2.1 R-CNN

原论文

R-CNN包括三个部分：

候选区域生成（Region Proposal）：生成物体类别无关的Region proposal的模块。这里没有任何神经网络，它使用图像处理的技术产生可能包含物体的候选区域。
特征提取：一个CNN来提取固定大小的特征。这个CNN只是用来提取特征。
识别器和Bounding Box回归：每个类别都有一个线性的SVM分类器来判断候选区域是否属于这个类别。

Region Proposal
R-CNN的Region Proposal算法是selective search算法，其算法复杂度依然很高。

图3 Region Progosal示例

图4 原图

图5 基于图的图像分割结果

由图4和图5对比，selective search算法复杂度依然很高。
由图5，经过聚类算法得到图6。

图6 过度分割的图片

然后由过度分割的图片根据以下算法得到Region Proposal。算法步骤如下：

所有细粒度的分隔都加到候选区域里(当然分割不是矩形区域我们需要把它变成矩形区域)；
在候选区域里根据相似度把最相似的区域合并，然后加到候选区域里；
回到1不断的重复这个过程。

文章到这里，居然给了个opencv实现selective search算法的源码，经过了3秒钟的思考，我决定将它们粘贴过来分析下。（注意：这个算法是在contrib包里，所有需要使用命令pip install opencv-contrib-python来安装。）

import sys
import cv2
print(cv2.__version__)
if __name__ == '__main__':
    # 使用多线程加速
    cv2.setUseOptimized(True);
    cv2.setNumThreads(4);
    
    # 读取图片
    im = cv2.imread(sys.argv[1])
    # resize图片
    newHeight = 200
    newWidth = int(im.shape[1] * 200 / im.shape[0])
    im = cv2.resize(im, (newWidth, newHeight))
    
    # 这行代码创建一个Selective Search Segmentation对象，使用默认的参数。
    ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
    
    # 设置应用算法的图片
    ss.setBaseImage(im)
    
    # fast模式，速度快，但是召回率低 
    if (sys.argv[2] == 'f'):
        ss.switchToSelectiveSearchFast()
    # 高召回率但是速度慢 
    elif (sys.argv[2] == 'q'):
        ss.switchToSelectiveSearchQuality()
    else:
        print(__doc__)
        sys.exit(1)
    
    # 实际运行算法 
    rects = ss.process()
    print('Total Number of Region Proposals: {}'.format(len(rects)))
    
    # 只显示100个区域
    numShowRects = 100
    # increment to increase/decrease total number
    # of reason proposals to be shown
    increment = 50
    
    while True:
        # create a copy of original image
        imOut = im.copy()
        
        # itereate over all the region proposals
        for i, rect in enumerate(rects):
            # draw rectangle for region proposal till numShowRects
            if (i < numShowRects):
                x, y, w, h = rect
                cv2.rectangle(imOut, (x, y), (x + w, y + h), (0, 255, 0), 
                        1, cv2.LINE_AA)
            else:
                break
        
        # show output
        cv2.imshow("Output", imOut)
        
    cv2.destroyAllWindows()

特征提取

特征提取采用的AlexNet，这个网络性能和速度不如SqueezeNet、MobileNet、ShuffleNet和MNasNet等等。但是那个年代已经是石破天惊了。
需要注意的问题是需要将不同大小的Region变为统一大小的输入图像，这也是R-CNN的缺点，这个缺点在Faster R-CNN中被改进了。