图像识别和卷积神经网络

作者: 又双叒叕苟了一天 | 来源:发表于2018-08-03 11:05 被阅读0次

TensorFlow 实战Google深度学习框架（第2版）第六
卷积神经网络学习（一）滤波器意义
深度学习入门--卷积神经网络-卷积层
深度学习算法通俗
卷积神经网络
Task03 字符识别模型
学习笔记TF027:卷积神经网络
CNN详解-基于python基础库实现的简单CNN
RNN理解笔记
卷积神经网络

应用：自然语言处理、医药发现、灾难气候发现、围棋人工智能等

和全连接的神经网络之间的差异：相邻层之间只有部分节点相连，为了展示每一层神经元的维度，一般会将每一层卷积层的节点组织成一个三维矩阵。

在图像识别中，全连接网络的缺点：参数增多导致计算速度减慢，容易导致过拟合问题。

卷积神经网络的组成

主要由以下5种结构组成：

输入层：用三维矩阵代表一张图片，矩阵的长宽代表图片的大小，矩阵的深度代表图像的色彩通道，如黑白为1，RGB为3。
卷积层：卷积层的输入是上一层神经网络的一小块，通常大小为 $3\times3$ 或 $5\times5$ 。它试图对神经网络的每一小块进行更加深入的分析得到抽象程度更高的特征。一般来说，通过卷积层处理后的结点矩阵深度会增加。
池化层：不会改变三维矩阵的深度，但是能够缩小矩阵的大小，达到减少参数的目的。可以看做将一个分辨率较高的图片转化为分辨率较低图片的过程。
全连接层：经过多轮卷积和池化后，经过1-2个全连接层来进行输出。可以将卷积和池化层看做特征提取，最后的全连接层来进行分类。
softmax层：转化为概率分布。

卷积层

过滤器（filter）是卷积层的中最重要的部分，也可以叫做内核（kernel）。

过滤器尺寸通常为 $3\times3$ 或 $5\times5$ ，它将要处理的节点矩阵上尺寸大小的一块处理成一个单位大小，深度为过滤器深度的一个三维矩阵。

一般卷积层是不改变节点矩阵大小的。通常采用补0或者改变步长来调整大小。

如果一个卷积层使用的尺寸是 $5\times5$ ，深度为16，则参数有 $5\times5\times3\times16+16=1216$ 个，当用全连接去处理 $32\times32\times3$ 的输入矩阵时，参数有1.5M个。可见，卷积神经网络大大减少了参数个数。

卷积层的前向推导代码：

import tensorflow as tf

filter_weight = tf.get_variable("weights", [5, 5, 3, 16], initializer=tf.truncated_normal_initializer(stddev=0.1))

biases = tf.get_variable("biases", [16], initializer=tf.constant_initializer(0.1))

#卷积神经网络的前向推导，添加偏置项，采用relu激活函数去线性化
#strides控制步长，padding控制填充方式
conv = tf.nn.conv2d(input, filter_weight, strides=[1, 1, 1, 1], padding="SAME")
bias = tf.nn.bias_add(conv, biases)
actived_conv = tf.nn.relu(bias)

池化层

池化层可以非常有效的缩小矩阵的尺寸，既可以加快计算速度也有防止过拟合问题的作用。

使用最大化操作的池化层被称之为最大池化层，使用平均操作的池化层被称之为平均池化层。最常用的是最大池化层，平均池化层并不怎么用。

池化层的前向推导代码：

pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding="SAME")

平均池化层函数为tf.nn.avg_pool()

LeNet-5模型用于MNIST图像识别的程序

参数可能被我改了几个，也可以自己再改改参数提高一下运行速度。
前向传播：mnist_inference.py

import tensorflow as tf

INPUT_NODE = 784
OUTPUT_NODE = 10

IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10

#第一层卷积层的尺寸和深度
CONV1_DEEP = 6
CONV1_SIZE = 5
#第二层卷积层的尺寸和深度
CONV2_DEEP = 16
CONV2_SIZE = 5
#全连接层的节点个数
FC_SIZE = 120

#训练时会用到dropout方法，但测试时不会用
def inference(input_tensor, train, regularizer):
    with tf.variable_scope("layer1-conv"):
        #卷积层过滤器参数为，长、宽、输入深度、输出深度
        conv1_weights = tf.get_variable("weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
                                        initializer=tf.truncated_normal_initializer(stddev=0.1))
        #卷积层偏置项为输出深度的个数
        conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
        #卷积层的前向推导，stride中间两个指定长宽的步长，两边的恒为1，padding采用全0填充
        conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding="SAME")
        #添加偏置项，并采用relu激活函数
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))

    with tf.name_scope("layer2-pool1"):
        #ksize指定池化层过滤器尺寸采用2*2，长宽步长均为2，采用全0填充
        pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

    #第二个卷积层
    with tf.variable_scope("layer3-conv2"):
        conv2_weights= tf.get_variable("weight", [CONV2_SIZE, CONV1_SIZE, CONV1_DEEP, CONV2_DEEP],
                                       initializer=tf.truncated_normal_initializer(stddev=0.1))
        conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
        conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding="SAME")
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))

    #第二个池化层
    with tf.name_scope("layer4-pool2"):
        pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

    #获取第二个池化层输出的维度，为4维：一个batch样本数、长、宽、深度
    pool_shape = pool2.get_shape().as_list()
    #一个样本的长度
    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
    #将其按每个样本拉成一个向量
    reshaped = tf.reshape(pool2, [pool_shape[0], nodes])

    #第一个全连接层
    with tf.variable_scope("layer5-fc1"):
        fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None:
            tf.add_to_collection("losses", regularizer(fc1_weights))
        fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))
        fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)

        #在训练时采用dropout
        if train: fc1 = tf.nn.dropout(fc1, 0.5)

    with tf.variable_scope("layer6-fc2"):
        fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
                                      initializer=tf.truncated_normal_initializer(stddev=0.1))
        if regularizer != None:
            tf.add_to_collection("losses", regularizer(fc2_weights))
        fc2_biases = tf.get_variable("bias", [NUM_LABELS],
                                     initializer=tf.constant_initializer(0.1))
        logit = tf.matmul(fc1, fc2_weights) + fc2_biases
    return logit

训练：mnist_train.py

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_inference
import numpy as np
from tqdm import tqdm

BATCH_SIZE = 100#每100条数据更新一次参数
LEARNING_RATE_BASE = 0.8#初始学习速率
LEARNING_RATE_DECAY = 0.99#学习速率衰减速度
REGULARAZTION_RATE = 0.0001#正则化系数
TRAINING_STEPS = 30000#训练的轮数
MOVING_AVERAGE_DECAY = 0.99#滑动平均模型的滑动速率
MODEL_SAVE_PATH = "./"
MODEL_NAME = "model.ckpt"

def train(mnist):
    #给输入、输出占位
    x = tf.placeholder(tf.float32, [
        BATCH_SIZE,
        mnist_inference.IMAGE_SIZE,
        mnist_inference.IMAGE_SIZE,
        mnist_inference.NUM_CHANNELS],
                       name="x-input")
    y_ = tf.placeholder(tf.float32, [None, mnist_inference.OUTPUT_NODE], name="y-input")

    #创建l2正则化，用该正则化创建前向的推导
    regularizer = tf.contrib.layers.l2_regularizer(REGULARAZTION_RATE)
    y = mnist_inference.inference(x, True, regularizer)

    #创建一个随参数更新次数自增的变量
    global_step = tf.Variable(0, trainable=False)

    #创建滑动平均模型，并将其应用于所有要训练的变量
    variable_average = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    variable_average_op = variable_average.apply(tf.trainable_variables())

    #计算带正则化的交叉熵损失
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
    loss = cross_entropy_mean + tf.add_n(tf.get_collection("losses"))

    #设置学习速率
    learning_rate = tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        mnist.train.num_examples / BATCH_SIZE,
        LEARNING_RATE_DECAY)

    #设置优化算法
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    #将滑动平均模型和训练一起绑定
    with tf.control_dependencies([train_step, variable_average_op]):
        train_op = tf.no_op(name="train")

    saver = tf.train.Saver()

    with tf.Session() as sess:
        #初始化变量
        tf.global_variables_initializer().run()

        for i in tqdm(range(TRAINING_STEPS)):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            reshaped_xs = np.reshape(xs, (BATCH_SIZE,
                                          mnist_inference.IMAGE_SIZE,
                                          mnist_inference.IMAGE_SIZE,
                                          mnist_inference.NUM_CHANNELS))
            _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys})
            if i % 1000 ==0:
                print("After %d training step(s), loss on training "
                      "batch is %g." % (step, loss_value))
                saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)

def main(argv=None):
    mnist = input_data.read_data_sets("./MNIST_data/", one_hot=True)
    train(mnist)

if __name__ == '__main__':
    tf.app.run()

验证集评估：mnist_eval.py

import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_inference
import mnist_train
import numpy as np

EVAL_INTERVAL_SECS = 10

def evaluate(mnist):
    with tf.Graph().as_default() as g:
        x = tf.placeholder(tf.float32, [
            mnist.validation.num_examples,
            mnist_inference.IMAGE_SIZE,
            mnist_inference.IMAGE_SIZE,
            mnist_inference.NUM_CHANNELS],
                           name="x-input")
        y_ = tf.placeholder(tf.float32, [None, mnist_inference.OUTPUT_NODE], name="y-input")

        reshaped_xs = np.reshape(mnist.validation.images, (mnist.validation.num_examples,
                                      mnist_inference.IMAGE_SIZE,
                                      mnist_inference.IMAGE_SIZE,
                                      mnist_inference.NUM_CHANNELS))
        #设置验证数据集
        validate_feed = {x: reshaped_xs, y_: mnist.validation.labels}

        #不带正则项的前向计算
        y = mnist_inference.inference(x, False, None)

        #准确率计算
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

        #读取训练时的滑动平均值
        variable_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DECAY)
        variable_to_restore = variable_averages.variables_to_restore()
        saver = tf.train.Saver(variable_to_restore)

        #每隔EVAL_INTERVAL_SECS时间就计算一次正确率
        while True:
            with tf.Session() as sess:
                #该函数会通过checkpoint文件自动找到这个目录下最新模型的文件名
                ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH)

                if ckpt and ckpt.model_checkpoint_path:
                    saver.restore(sess, ckpt.model_checkpoint_path)
                    global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                    accuracy_score = sess.run(accuracy, feed_dict=validate_feed)
                    print("After %s training step(s), validation"
                          "accuracy = %g" % (global_step, accuracy_score))
                else:
                    print("No checkpoint file found")
                    return
            time.sleep(EVAL_INTERVAL_SECS)

def main(argv=None):
    mnist = input_data.read_data_sets("./MNIST_data/", one_hot=True)
    evaluate(mnist)

if __name__ == '__main__':
    tf.app.run()

TensorFlow 实战Google深度学习框架（第2版）第六
第六章：图像识别与卷积神经网络 * 6.1图像识别问题简介&经典数据集* 6.2卷积神经网络简介* 6.3卷积神经...
卷积神经网络学习（一）滤波器意义
作者：荔枝boy 一. 卷积神经网络的发展二. 卷积神经网络的重要性三. 卷积神经网络与图像识别四. 滤波器...
深度学习入门--卷积神经网络-卷积层
卷积神经网络（CNN）被广泛应用于图像识别这个领域，几乎所有的基于深度学习的图像识别都是以卷积神经网络作为技术基础...
深度学习算法通俗
1.cnn 卷积神经网络(图像识别领域算法，避免前期复杂预处理，直接输入原始图像) 卷积神经网络是近年发展起来，并...
卷积神经网络
卷积神经网络（Convolutional Neural Network）是一个专门针对图像识别问题设计的神经网络。...
Task03 字符识别模型
一、CNN模型 CNN，又称卷积神经网络，它是一种前馈的神经网络，在图像识别领域有着巨大的应用。二、如何理解卷积...
学习笔记TF027:卷积神经网络
卷积神经网络(Convolutional Neural Network,CNN)，可以解决图像识别、时间序列信息问...
CNN详解-基于python基础库实现的简单CNN
CNN，即卷积神经网络，主要用于图像识别，分类。由输入层，卷积层，池化层，全连接层(Affline层），Softm...
RNN理解笔记
在深度学习方面，图像识别、语音识别主要运用卷积神经网络（CNN），而文字语言处理主要运用循环神经网络（RNN）循...
卷积神经网络
卷积神经网络是基于人工神经网络的深度机器学习方法，成功应用于图像识别领域。CNN采用了局部连接和权值共享，保持了网...