{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 神经网络学习"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 人工智能是什么呢?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"人工智能被分为强人工智能和弱人工智能:\n",
"+ 强人工智能:在各方面都能够和人类比肩的人工智能,“一种宽泛的心理能力,能够进行思考、计划、解决问题、抽象思理解复杂理念、快速学习和从经验中学习等”\n",
"+ 弱人工智能:擅长于单个方面的人工智能,比如AlphaGo。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"弱人工智能,本质上还是人类为计算机设计的算法,一系列定义良好的计算过程,能够将输入数据转化成输出数据。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"神经网络模拟人脑神经元的连接来达到学习功能,通过逐层抽象将输入数据逐层映射为概念等高等语义。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<div align=\"center\">\n",
"<img src=\"https://imgbed.momodel.cn/alg2.png\" width = 600>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1 人脑神经机制"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"首先,人脑神经元的活动和学习机制是怎样的呢?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"800\"\n",
" height=\"600\"\n",
" src=\"https://view.officeapps.live.com/op/view.aspx?src=https://files.momodel.cn/nn_brain_3.pptx\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7fb8fceea7d0>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import IFrame\n",
"src = 'https://files.momodel.cn/nn_brain_3.pptx'\n",
"IFrame('https://view.officeapps.live.com/op/view.aspx?src='+src, width=800, height=600)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"神经学假说认为:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<table>\n",
" <tr>\n",
" <td ><center><img src=\"http://imgbed.momodel.cn/微信图片_20200114133721.png\"/></center></td>\n",
" <td><center><img src=\"http://imgbed.momodel.cn/微信图片_20200114133731.png\"/></center></td>\n",
" <td><center><img src=\"http://imgbed.momodel.cn/微信图片_20200114133746.png\"/></center></td>\n",
" </tr>\n",
"</table>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"高层的特征是低层特征的组合,从低层到高层的特征表示越来越抽象,越来越能表现语义。\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"人工智能中神经网络正是体现“逐层抽象、渐进学习”机制的学习模型。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<img src=\"http://imgbed.momodel.cn/微信图片_20200114133755.png\"/>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"人眼在辨识图片时,会先提取边缘特征,再识别部件,最后再得到最高层的模式。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<img src=\"http://imgbed.momodel.cn//20200103102429.png\" width=500>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"神经元的功能:\n",
"+ 物理反应:将前序神经元所传递过来的信息按联结权重累加\n",
"\n",
"$ 神经元i获得信息 = 神经元k信息 \\times 联结权重k + 神经元l信息 \\times 联结权重l + 神经元m信息 \\times 联结权重m $\n",
"\n",
"+ 化学反应:对神经元i获得的信息施以一个非线性变换(通过激活函数),激活若干信息、而非“来者不拒”\n",
"+ 信息流通:将神经元非线性变换后所得的信息继续向后传递"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2 感知机模型"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 2.1 感知机模型:\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"<table>\n",
" <tr>\n",
" <td ><img src=\"http://imgbed.momodel.cn/感知器模型.png\" width=300/></td>\n",
" <td><img src=\"http://imgbed.momodel.cn//20200208141322.png\" width=400></td>\n",
" </tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**输入项**:3个,$x_1,x_2,x_3$ \n",
"**神经元**:1个,用圆圈表示 \n",
"**权重**:每个输入项均通过权重与神经元相连(比如 $w_i$ 是 $x_i$ 与神经元相连的权重) \n",
"**输出**:1个\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**工作方法**:\n",
"+ 计算输入项传递给神经元的信息加权总和,即:$y_{sum} = w_1x_1+w_2x_2+w_3x_3$\n",
"+ 如果 $y_{sum}$ 大于某个预定阀值(比如 0.5),则输出为 1,否则为 0 。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 激活函数"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"在输出的判断上,其实不仅可以简单的按照阈值来判断,可以通过一个函数来进行计算,这个函数称为**激活函数**。\n",
"\n",
"常见的激活函数有: sigmoid,tanh,relu 等。\n",
"\n",
"下面我们看看这些激活函数的曲线图。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"\n",
"def plot_activation_function(activation_function):\n",
" \"\"\"\n",
" 绘制激活函数\n",
" :param activation_function: 激活函数名\n",
" :return:\n",
" \"\"\"\n",
" x = np.arange(-10, 10, 0.1)\n",
" y_activation_function = activation_function(x)\n",
"\n",
" # 绘制坐标轴\n",
" ax = plt.gca()\n",
" ax.spines['right'].set_color('none')\n",
" ax.spines['top'].set_color('none')\n",
" ax.xaxis.set_ticks_position('bottom')\n",
" ax.yaxis.set_ticks_position('left')\n",
" ax.spines['bottom'].set_position(('data', 0))\n",
" ax.spines['left'].set_position(('data', 0))\n",
"\n",
" # 绘制曲线图\n",
" plt.plot(x, y_activation_function)\n",
" \n",
" # 展示函数图像\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.sigmoid函数\n",
"\n",
"Sigmoid函数是一个在生物学中常见的S型函数,也称为S型生长曲线。在信息科学中,由于其单增以及反函数单增等性质,Sigmoid函数常被用作神经网络的阈值函数,将变量映射到0,1之间。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$$ f(x) = \\frac{1}{1+e^{-x}}$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def sigmoid(x):\n",
" \"\"\"\n",
" sigmoid函数\n",
" :param x: np.array 格式数据\n",
" :return: sigmoid 函数\n",
" \"\"\"\n",
" y = \n",
" \n",
" return y\n",
"\n",
"# 绘制 sigmoid 函数图像\n",
"plot_activation_function(sigmoid)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def sigmoid(x):\n",
" \"\"\"\n",
" sigmoid函数\n",
" :param x: np.array 格式数据\n",
" :return: sigmoid 函数\n",
" \"\"\"\n",
" y = 1/(1+np.exp(-x))\n",
" \n",
" return y\n",
"\n",
"# 绘制 sigmoid 函数图像\n",
"plot_activation_function(sigmoid)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"优点:\n",
"+ Sigmoid 函数的输出映射在(0,1)之间,单调连续,输出范围有限,优化稳定,可以用作输出层。它在物理意义上最为接近生物神经元。\n",
"+ 求导容易。\n",
"\n",
"缺点:\n",
"+ 由于其软饱和性,容易产生梯度消失,导致训练出现问题。\n",
"+ 其输出并不是以0为中心的。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.tanh函数\n",
"tanh是双曲函数中的一个,$Tanh()$为双曲正切。在数学中,双曲正切“Tanh”是由基本双曲函数双曲正弦和双曲余弦推导而来。\n",
"\n",
"公式如下:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$$ f(x) = \\frac{1-e^{-2x}}{1+e^{-2x}}$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def tanh(x):\n",
" \"\"\"\n",
" tanh函数\n",
" :param x: np.array 格式数据\n",
" :return: tanh 函数\n",
" \"\"\"\n",
" y = \n",
" return y\n",
"\n",
"# 绘制 tanh 函数图像\n",
"plot_activation_function(tanh)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def tanh(x):\n",
" \"\"\"\n",
" tanh函数\n",
" :param x: np.array 格式数据\n",
" :return: tanh 函数\n",
" \"\"\"\n",
" y = (1-np.exp(-2*x))/(1+np.exp(-2*x))\n",
" return y\n",
"\n",
"# 绘制 tanh 函数图像\n",
"plot_activation_function(tanh)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"优点:\n",
"+ 比Sigmoid函数收敛速度更快。\n",
"+ 相比Sigmoid函数,其输出以0为中心。\n",
"\n",
"缺点:\n",
"+ 还是没有改变Sigmoid函数的最大问题——由于饱和性产生的梯度消失。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.ReLU函数\n",
"\n",
"Relu激活函数(The Rectified Linear Unit),用于隐层神经元输出。公式如下:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$$ f(x) = max(0, x)$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def relu(x):\n",
" \"\"\"\n",
" relu 函数\n",
" :param x: np.array 格式数据\n",
" :return: relu 函数\n",
" \"\"\"\n",
" \n",
" y = \n",
"\n",
" return y\n",
"\n",
"# 绘制 relu 函数\n",
"plot_activation_function(relu)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def relu(x):\n",
" \"\"\"\n",
" relu 函数\n",
" :param x: np.array 格式数据\n",
" :return: relu 函数\n",
" \"\"\"\n",
" \n",
" y = np.maximum(0,x)\n",
"\n",
" return y\n",
"\n",
"# 绘制 relu 函数\n",
"plot_activation_function(relu)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"优点:\n",
"+ 因为它线性、非饱和的形式,ReLU在SGD中能够快速收敛。\n",
"+ 有效缓解了梯度消失的问题。\n",
"+ 提供了神经网络的稀疏表达能力。\n",
"\n",
"缺点:\n",
"+ 随着训练的进行,可能会出现神经元死亡,权重无法更新的情况。如果发生这种情况,那么流经神经元的梯度从这一点开始将永远是0。也就是说,ReLU神经元在训练中不可逆地死亡了。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 单层感知机实例"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们根据上面的定义可以编写一个简单的感知机模型。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def perceptron(x, w, threshold):\n",
" \"\"\"\n",
" 感知机模型\n",
" :param x: 输入数据 np.array 格式\n",
" :param w: 权重 np.array 格式,需要与 x 一一对应\n",
" :param threshold: 阀值\n",
" :return: 0或者1\n",
" \"\"\"\n",
" x = np.array(x)\n",
" w = np.array(w)\n",
" \n",
" #计算信息加权总和\n",
" y_sum = \n",
" \n",
" # 计算输出大于阀值返回 1,否则返回 0\n",
" output = \n",
" return output\n",
"\n",
"\n",
"# 输入数据\n",
"x = np.array([1, 1, 4])\n",
"# 输入权重\n",
"w = np.array([0.5, 0.2, 0.3])\n",
"# 返回结果\n",
"perceptron(x, w, 0.8)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def perceptron(x, w, threshold):\n",
" \"\"\"\n",
" 感知机模型\n",
" :param x: 输入数据 np.array 格式\n",
" :param w: 权重 np.array 格式,需要与 x 一一对应\n",
" :param threshold: 阀值\n",
" :return: 0或者1\n",
" \"\"\"\n",
" x = np.array(x)\n",
" w = np.array(w)\n",
" \n",
" #计算信息加权总和\n",
" y_sum = np.sum(w * x)\n",
" \n",
" # 计算输出大于阀值返回 1,否则返回 0\n",
" output = 1 if y_sum > threshold else 0\n",
" return output\n",
"\n",
"\n",
"# 输入数据\n",
"x = np.array([1, 1, 4])\n",
"# 输入权重\n",
"w = np.array([0.5, 0.2, 0.3])\n",
"# 返回结果\n",
"perceptron(x, w, 0.8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3 神经网络"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 神经网络的结构"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"神经网络:相互连接形成一个无环图的神经元集合。\n",
"\n",
"神经网络通常由不同的神经元层组织而成,常见的层类型有:\n",
"+ 全连接层\n",
"+ 卷积层\n",
"+ 池化层\n",
"+ Dropout层\n",
"+ RNN层\n",
"+ ......"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"首先,我们来看一个完全由全连接层联结起来的神经网络,也被称为多层感知器(Multilayer Perceptrons, MLP)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_fullconnect.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"神经网络架构示意图如下:\n",
"\n",
"<img src=\"http://imgbed.momodel.cn//20200103111837.png\" width=400>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"与感知机的不同,神经网络:\n",
"+ 输入层和输出层之间存在若干隐藏层。\n",
"+ 每个隐藏层中包含若干神经元。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 补充材料:梯度下降与误差反向传播"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 正向传播 与 反向传播"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"\n",
"<img src=\"http://imgbed.momodel.cn//20200103111837.png\" width=400>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ 前馈神经网络接收一个输入 $x$ ,并产生一个输出 $\\hat{y}$ 信息正向流过网络,称为**正向传播(forward propagation)**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ 训练过程中:正向传播可以一直继续下去直到产生一个标量代价\n",
"$J(\\theta)$ ."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ 反向传播:允许信息从代价出发,反向流过网络"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 误差反向传播"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ **核心问题**:给定某个函数 ,计算f在点x处的梯度$\\epsilon{f(x)}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ 在神经网络中, 对应于**损失函数**,而输入x 则是由**训练数据**和神经网络**权重参数**构成"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"+ 反向传播:通过递归地应用链式法则来计算表达式梯度的一种方法"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"反向传播指的仅仅是计算梯度的方法,并不是多层神经网络的整个学习算法!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"简单表达式与梯度解释:\n",
"\n",
"梯度:是一个向量,表示函数在该点处沿着该方向变化最快,变化率最大\n",
"\n",
"而对于单变量的情况下,梯度也就等同于它的导数。\n",
"\n",
"<img src=\"https://imgbed.momodel.cn/20200327040148.png\" width=800>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"复合表达式和链式法则:\n",
"\n",
"<img src=\"https://imgbed.momodel.cn/20200327040301.png\" width=800>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"反向直播的直观理解:\n",
"$f(x,y,z) = (x+y)z, q=x+y, f=qz$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<img src=\"https://imgbed.momodel.cn/20200327040327.png\" width=600>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 神经网络实现MNIST手写数字分类"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"好,我们接下来,依旧使用MNIST手写数字数据集来实现一个神经网络的分类方法。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"采用 **keras** 框架搭建一个神经网络解决手写体数字识别问题。 \n",
"1. 导入相关包"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# 数据集 mnist\n",
"from tensorflow.keras.datasets import mnist\n",
"# 序列模型 Sequential\n",
"from tensorflow.keras.models import Sequential\n",
"# 神经网络层 Dense,Activation,Dropout\n",
"from tensorflow.keras.layers import Dense, Activation, Dropout\n",
"# 工具 np_utils\n",
"from tensorflow.python.keras.utils.np_utils import to_categorical\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"!mkdir -p ~/.keras/datasets\n",
"!cp ./mnist.npz ~/.keras/datasets/mnist.npz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. 下载 **MNIST** 数据集并将它们转换为模型所能使用的格式。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# 获取数据\n",
"(X_train, y_train),(X_test,y_test) = mnist.load_data()\n",
"\n",
"# 将训练集数据形状从(60000,28,28)修改为(60000,784)\n",
"X_train = X_train.reshape(len(X_train), -1)\n",
"X_test = X_test.reshape(len(X_test), -1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"通过plot查看数据集情况"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x360 with 9 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"# 查看一些图片\n",
"def plot_images(imgs):\n",
" \"\"\"绘制几个样本图片\n",
" :param show: 是否显示绘图\n",
" :return:\n",
" \"\"\"\n",
" sample_num = min(9, len(imgs))\n",
" img_figure = plt.figure(1)\n",
" img_figure.set_figwidth(5)\n",
" img_figure.set_figheight(5)\n",
" for index in range(0, sample_num):\n",
" ax = plt.subplot(3, 3, index + 1)\n",
" ax.imshow(imgs[index].reshape(28, 28), cmap='gray')\n",
" ax.grid(False)\n",
" plt.margins(0, 0)\n",
" plt.show()\n",
"\n",
"\n",
"plot_images(X_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对图像数据进行数据处理。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# 将数据集图像像素点的数据类型从 uint8 修改为 float32\n",
"X_train = X_train.astype('float32')\n",
"X_test = X_test.astype('float32')\n",
"\n",
"# 把数据集图像的像素值从 0-255 放缩到[-1,1]之间\n",
"X_train = (X_train - 127) / 127\n",
"X_test = (X_test - 127) / 127\n",
"\n",
"# 数据集类别个数\n",
"nb_classes = 10\n",
"\n",
"# 把 y_train 和 y_test 变成了 one-hot 的形式,即之前是 0-9 的一个数值, \n",
"# 现在是一个大小为 10 的向量,它属于哪个数字,就在哪个位置为 1,其他位置都是 0。\n",
"y_train = to_categorical(y_train, nb_classes)\n",
"y_test = to_categorical(y_test, nb_classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. 搭建神经网络模型"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def create_model():\n",
" \"\"\"\n",
" 采用 keras 搭建神经网络模型\n",
" :return: 神经网络模型\n",
" \"\"\"\n",
" # 选择模型,选择序贯模型(Sequential())\n",
" model = Sequential()\n",
" # 添加全连接层,共 512 个神经元\n",
" model.add(Dense(512, input_shape=(784,)))\n",
" # 添加激活层,激活函数选择 relu \n",
" model.add(Activation('relu'))\n",
" # 添加全连接层,共 512 个神经元\n",
" model.add(Dense(512))\n",
" # 添加激活层,激活函数选择 relu \n",
" model.add(Activation('relu'))\n",
" # 添加全连接层,共 10 个神经元\n",
" model.add(Dense(10))\n",
" # 添加激活层,激活函数选择 softmax\n",
" model.add(Activation('softmax'))\n",
" return model\n",
"\n",
"# 实例化模型\n",
"model = create_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. 训练和测试神经网络模型"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/5\n",
"938/938 [==============================] - 42s 45ms/step - loss: 0.2597 - accuracy: 0.9200\n",
"Epoch 2/5\n",
"938/938 [==============================] - 41s 43ms/step - loss: 0.1255 - accuracy: 0.9614\n",
"Epoch 3/5\n",
"938/938 [==============================] - 42s 45ms/step - loss: 0.0952 - accuracy: 0.9702\n",
"Epoch 4/5\n",
"938/938 [==============================] - 41s 44ms/step - loss: 0.0805 - accuracy: 0.9748\n",
"Epoch 5/5\n",
"938/938 [==============================] - 41s 44ms/step - loss: 0.0691 - accuracy: 0.9777\n",
"313/313 [==============================] - 1s 2ms/step - loss: 0.0867 - accuracy: 0.9732\n",
"Test loss: 0.08665528893470764\n",
"Accuracy: 0.9732000231742859\n"
]
}
],
"source": [
"def fit_and_predict(model, model_path):\n",
" \"\"\"\n",
" 训练模型、模型评估、保存模型\n",
" :param model: 搭建好的模型\n",
" :param model_path:保存模型路径\n",
" :return:\n",
" \"\"\"\n",
" # 编译模型\n",
" model.compile(optimizer='Adam',\n",
" loss='categorical_crossentropy', \n",
" metrics=['accuracy'])\n",
" # 模型训练\n",
" model.fit(X_train, y_train, epochs=5, batch_size=64)\n",
" # 保存模型\n",
" model.save(model_path)\n",
" # 模型评估,获取测试集的损失值和准确率\n",
" loss, accuracy = model.evaluate(X_test, y_test)\n",
" # 打印结果\n",
" print('Test loss:', loss)\n",
" print(\"Accuracy:\", accuracy)\n",
" \n",
"# 训练模型和评估模型\n",
"fit_and_predict(model, model_path='./model.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"那么,神经网络模型,是如何识别这一张张图片的呢?以下,我们使用通过下面几个小视频,你会更好的理解神经网络:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"神经网络的整体模型"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media1.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"通过激活函数,神经元将每一层的“化学状态”传递到下一层。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media2.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对于MNIST的图像分类,这里需要传递的信息就是图形信息:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media3.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"图像信息在神经网络中的传递:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media4.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"具体表述成函数的形式,一个第二层的神经元的值就应该是这样获得的:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media5.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"全连接网络中的参数设置。"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"<center><video src=\"http://files.momodel.cn/nn_media6.mp4\" controls=\"controls\" width=800px></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 实践与体验\n",
"#### 调节神经网络结构和参数\n",
"\n",
"1. 将两层隐藏层改为一层,训练模型并在测试集上测试,得出准确率。\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def create_model1():\n",
" \"\"\"\n",
" 搭建神经网络模型 model1,比 model 少一层隐藏层\n",
" :return: 模型 model1\n",
" \"\"\"\n",
" # 选择模型,选择序贯模型(Sequential())\n",
" model = \n",
" # 添加全连接层,共 512 个神经元\n",
" model.add()\n",
" # 添加激活层,激活函数选择 relu\n",
" model.add()\n",
" # 添加全连接层,共 10 个神经元\n",
" model.add()\n",
" # 添加激活层,激活函数选择 softmax\n",
" model.add()\n",
" return model\n",
"\n",
"# 搭建神经网络\n",
"model1 = create_model1()\n",
"# 训练神经网络模型,保存模型和评估模型\n",
"fit_and_predict(model1, model_path='./model1.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. 修改两层隐藏层神经元的数量,然后训练模型得出准确率。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def create_model2():\n",
" \"\"\"\n",
" 搭建神经网络模型 model2,隐藏层的神经元数目比 model 少一半\n",
" :return: 神经网络模型 model2\n",
" \"\"\"\n",
" # 选择模型,选择序贯模型(Sequential())\n",
" model = Sequential()\n",
" # 添加全连接层,共 256 个神经元\n",
" model.add()\n",
" # 添加激活层,激活函数选择 relu\n",
" model.add()\n",
" # 添加全连接层,共 256 个神经元\n",
" model.add()\n",
" # 添加激活层,激活函数选择 relu\n",
" model.add()\n",
" # 添加全连接层,共 10 个神经元\n",
" model.add()\n",
" # 添加激活层,激活函数选择 softmax\n",
" model.add()\n",
" return model\n",
"\n",
"# 搭建神经网络模型\n",
"model2 = create_model2()\n",
"# 训练神经网络模型,保存模型并评估模型\n",
"fit_and_predict(model2,model_path='./model2.h5')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. 输入一个手写数字,比较三种模型输出结果的差异,对其差异进行分析解释。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"np.set_printoptions(suppress=True)\n",
"from tensorflow.keras.models import load_model\n",
"\n",
"# 加载模型\n",
"model = load_model('./model.h5')\n",
"model1 = load_model('./model1.h5')\n",
"model2 = load_model('./model2.h5') "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"原始模型\n",
"其各类别预测概率:[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.],预测值: 7,真实值:7\n",
"\n"
]
}
],
"source": [
"# 预测结果\n",
"predict_results = np.round(model.predict(X_test)[0],3)\n",
"# predict_results1 = np.round(model1.predict(X_test)[0],3)\n",
"# predict_results2 = np.round(model2.predict(X_test)[0],3)\n",
"\n",
"# 打印预测结果\n",
"print('原始模型\\n其各类别预测概率:%s,预测值: %s,真实值:%s\\n' % (predict_results,np.argmax(predict_results),np.argmax(y_test[0])))\n",
"print('只有一个隐藏层的模型\\n其各类别各类别预测概率:%s,预测值: %s,真实值:%s\\n' % (predict_results1,np.argmax(predict_results1),np.argmax(y_test[0])))\n",
"print('隐藏神经元数量更改后的模型\\n其各类别预测概率:%s,预测值: %s,真实值:%s' % (predict_results2,np.argmax(predict_results2),np.argmax(y_test[0])))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}