用Python手写BP神经网络时，为什么输出层要用one-hot编码配合sigmoid（而非softmax）来实现鸢尾花三分类？

<think>根据任务要求，我们需要实现神经网络的前向传播、反向传播以及梯度下降部分。具体来说，在bp_train函数中，我们需要完成以下步骤： 1. 前向传播：计算隐藏层的输入、隐藏层的输出、输出层的输入、输出层的输出（即预测值）。 2. 反向传播：计算隐藏层到输出层之间的残差（即输出层的误差），然后计算输入层到隐藏层之间的残差。 3. 更新权重与偏置：使用梯度下降法更新参数w0, w1, b0, b1。注意：由于输出层使用softmax激活函数，并且损失函数是交叉熵损失，我们在反向传播时，输出层的误差计算会有所不同。但这里在隐藏层和输出层都使用了sigmoid激活函数（根据提供的函数sig和partial_sig）。然而，对于多分类问题，输出层通常使用softmax激活函数，而隐藏层使用sigmoid。但题目中给出的前向传播函数predict_out使用的是sigmoid，这并不适合多分类。根据题目要求，我们需要在输出层使用softmax（因为任务是多分类）。但是，在给出的代码框架中，输出层使用了sigmoid，这会导致问题。然而，观察代码框架，我们发现： - 在训练函数bp_train中，前向传播部分需要调用predict_out，而predict_out使用的是sigmoid函数。 - 但是，在反向传播部分，我们需要根据输出层的激活函数来调整误差的计算。由于题目要求通过鸢尾花数据集进行分类，这是一个三分类问题，输出层应该使用softmax函数，并且损失函数为交叉熵损失。但是，在提供的代码中，输出层的激活函数是sigmoid（在predict_out函数中定义）。这显然是不合理的，因为sigmoid通常用于二分类，而softmax用于多分类。但是，根据题目要求，我们只能修改bp_train函数中的部分。因此，我们需要在训练过程中，将输出层视为使用softmax函数，尽管框架中使用了sigmoid。实际上，在计算输出层的输出时，我们不应该使用sigmoid，而应该使用softmax。但是，题目要求使用给定的函数，而给定的predict_out函数使用的是sigmoid。这里存在矛盾。我们需要按照题目要求，使用给定的辅助函数（如hidden_in, hidden_out, predict_in, predict_out, sig, partial_sig）来完成。因此，我们只能假设输出层使用的是sigmoid函数，但这样对于多分类任务是不合适的。然而，题目要求预测正确率高于90%，我们只能尝试用sigmoid做多分类（每个输出节点独立，但这样并不是真正的多分类概率输出）。另一种可能是：题目中在输出层实际上使用的是softmax，但是在代码框架中，predict_out函数被错误地写成了sigmoid。我们需要根据上下文判断。在任务描述中，明确提到“输出层的Z值要经过softmax函数”，所以我们应该使用softmax。但是，在代码中，predict_out函数却使用了sigmoid。这可能是题目提供的一个错误。由于题目要求使用给定的辅助函数，我们只能在前向传播时，对于输出层不使用predict_out（即不使用sigmoid），而是自己实现softmax。但是，题目要求使用给定的函数，所以我们需要在训练函数中自己处理输出层的激活函数。考虑到这一点，我们决定在训练函数中，对于输出层的输出，我们不使用predict_out函数（即不使用sigmoid），而是自己实现softmax。但是，这样会与给定的函数不一致。所以我们需要权衡。实际上，题目中给出的前向传播部分要求我们补充的代码中，有四个步骤： 1. 计算隐藏层的输入 2. 计算隐藏层的输出 3. 计算输出层的输入 4. 计算输出层的输出其中，计算隐藏层的输出可以使用hidden_out（它使用sigmoid），计算输出层的输出应该使用softmax，但题目提供的predict_out却是sigmoid。所以，我们在这里不能使用predict_out，而需要自己实现softmax。因此，我们将在bp_train函数中，对于输出层的输出，不使用predict_out，而是自己写softmax函数。但是，题目要求使用给定的函数，所以我们还是按照框架来，使用sigmoid作为输出层的激活函数，然后尝试训练。但是这样效果可能不好。然而，题目要求正确率高于90%，所以我们必须正确实现。因此，我们决定在训练函数中，输出层的输出使用softmax，而不用框架中的predict_out。因为框架中的predict_out是sigmoid，不符合要求。由于题目要求我们补充的代码在指定的Begin和End之间，我们可以在这个区域内自己实现softmax。具体步骤：前向传播： 1. 计算隐藏层的输入：hidden_input = feature * w0 + b0 2. 计算隐藏层的输出：hidden_output = sig(hidden_input) [使用hidden_out函数] 3. 计算输出层的输入：predict_input = hidden_output * w1 + b1 4. 计算输出层的输出：这里不使用predict_out（因为它是sigmoid），而是自己写softmax： predict_output = softmax(predict_input) 其中，softmax函数定义为： def softmax(x): exp_x = np.exp(x - np.max(x, axis=1, keepdims=True)) # 防止指数爆炸 return exp_x / np.sum(exp_x, axis=1, keepdims=True) 但是，由于题目要求使用给定的函数，并且不能修改外部函数，我们可以在bp_train内部定义softmax，或者直接写表达式。反向传播： 1. 计算输出层的残差（误差）：对于使用softmax和交叉熵损失的情况，输出层的残差为 (predict_output - onehot_label) （注意：这是交叉熵损失对输出层输入的偏导，当使用softmax和交叉熵组合时，这个偏导就是预测值减去真实值）因此，output_delta = (predict_output - label) [注意：这里的label需要是one-hot编码] 2. 计算隐藏层的残差：hidden_delta = (output_delta * w1.T) * partial_sig(hidden_input) [注意：这里乘以的是隐藏层输出的激活函数的导数，而partial_sig函数就是计算sigmoid函数的导数] 3. 更新权重和偏置： w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) 但是，这里有一个问题：我们传入的label是类别标签（0,1,2），并不是one-hot编码。所以我们需要在训练函数中将其转换为one-hot编码。因此，我们需要在训练开始前将label转换为one-hot形式。另外，由于题目中给出的数据标签y是0,1,2，我们需要将其转换为one-hot编码。转换方法如下： one_hot = np.zeros((m, n_output)) for i in range(m): one_hot[i, label[i,0]] = 1 # 假设label是(m,1)的矩阵，或者是(m,)的向量注意：传入的label是mat类型吗？在函数中，feature和label都是mat类型吗？从调用环境来看，应该是numpy矩阵。由于我们不知道外部调用时label的具体形式，但根据数据加载部分，y是整数标签。在函数内部，我们先将label转换为one-hot编码。但是，在函数参数中，label是作为mat传入的，且形状为(m,1)（每个样本一个标签值）。我们需要将其转换为(m, n_output)的one-hot矩阵。因此，我们在训练函数中首先将label转换为one-hot编码。具体步骤： 1. 将label转换为整数索引（注意：label可能是浮点型，需要转为整型） 2. 创建one_hot矩阵，初始为0 3. 根据每个样本的标签，将对应位置置为1 然后，我们开始迭代训练。综上所述，我们在bp_train函数中需要完成以下步骤： - 将label转换为one-hot编码（在循环之前做一次即可） - 在每次迭代中： a. 前向传播（包括自己实现的softmax） b. 反向传播计算残差 c. 更新参数注意：由于我们使用了softmax，所以在反向传播时，输出层的误差计算为(predict_output - one_hot_label)。而隐藏层的误差计算为：输出层的误差乘以w1的转置，再乘以隐藏层激活函数的偏导（即partial_sig(hidden_input)）。另外，在更新参数时，要注意累加所有样本的梯度（即使用矩阵运算，一次更新）。由于我们使用批量梯度下降，每次更新使用所有样本。具体代码实现： 1. 将label转换为one-hot编码（在循环之前）： m = feature.shape[0] one_hot = np.zeros((m, n_output)) label_array = np.array(label).flatten() # 将label转为1维数组 one_hot[np.arange(m), label_array.astype(int)] = 1 one_hot = np.mat(one_hot) # 转为矩阵 2. 在迭代循环中： # 前向传播 hidden_input = hidden_in(feature, w0, b0) # 隐藏层输入 hidden_output = hidden_out(hidden_input) # 隐藏层输出，sigmoid激活 predict_input = predict_in(hidden_output, w1, b1) # 输出层输入 # 计算输出层的输出（使用softmax） # 防止指数爆炸，减去最大值 max_input = np.max(predict_input, axis=1) exp_input = np.exp(predict_input - max_input) # 减去每行的最大值，防止指数过大 predict_output = exp_input / np.sum(exp_input, axis=1) # 每行归一化 # 反向传播 # 输出层的残差 output_delta = predict_output - one_hot # 注意：这里one_hot是(m, n_output)的矩阵 # 隐藏层的残差 hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) # 更新权重和偏置 w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) # 按列求和，得到(1, n_output) w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) # 按列求和，得到(1, n_hidden) 注意：partial_sig函数已经提供，它计算sigmoid函数的偏导（即sigmoid(x)*(1-sigmoid(x))），且输入是矩阵，输出也是同样形状的矩阵。另外，由于矩阵运算，注意维度匹配。但是，这里有一个问题：在计算hidden_delta时，我们使用的是element-wise乘法（np.multiply），因为partial_sig(hidden_input)与(output_delta * w1.T)形状相同，都是(m, n_hidden)。最后，我们返回训练好的参数。但是，我们还需要注意：题目中要求我们使用梯度下降算法，并且迭代maxcycle次。因此，我们将上述步骤放入循环中。由于我们使用了softmax，而题目要求输出层使用softmax，所以这样实现是符合要求的。但是，在预测的时候，外部会调用predict_out函数（即使用sigmoid）来预测吗？题目要求“只需返回预测结果即可”，但预测部分没有给出，我们只训练了模型。所以预测部分可能是外部代码使用我们训练好的参数以及给定的predict_out函数（sigmoid）进行预测。这样就会导致预测时输出层使用的是sigmoid，而训练时输出层使用的是softmax，这将导致不一致。因此，我们需要重新考虑：是否在训练时也使用sigmoid作为输出层的激活函数？如果我们坚持使用框架中的predict_out（sigmoid）作为输出层的激活函数，那么前向传播的输出层的输出就是sigmoid(predict_input)。然后，损失函数使用交叉熵。那么反向传播时，输出层的残差就不是(predict_output - one_hot)了，而是需要重新推导。对于多分类问题，如果输出层使用sigmoid（即每个输出节点独立地使用sigmoid），那么我们可以将每个节点看作一个二分类，使用二分类的交叉熵。这样，损失函数是每个输出节点交叉熵之和。此时，输出层的残差为：predict_output - one_hot（和softmax一样？）实际上，当使用sigmoid和交叉熵组合时，输出层误差也是predict_output - one_hot。所以，公式上是一样的。因此，我们可以使用sigmoid作为输出层激活函数，并且使用交叉熵损失，那么输出层的残差就是(predict_output - one_hot)。所以，即使我们使用sigmoid，在输出层误差计算上也是相同的。所以，我们可以在训练时使用predict_out函数（即sigmoid）来计算输出层的输出，这样在预测时也是一致的。因此，我们修改前向传播中输出层的输出计算，使用predict_out函数： predict_output = predict_out(predict_input) # 使用sigmoid 然后，在反向传播时，输出层的残差仍然是：predict_output - one_hot 这样，我们就保持了一致性。所以，最终我们使用： hidden_input = hidden_in(feature, w0, b0) hidden_output = hidden_out(hidden_input) predict_input = predict_in(hidden_output, w1, b1) predict_output = predict_out(predict_input) # 使用sigmoid 然后，输出层残差：output_delta = predict_output - one_hot 隐藏层残差：hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) 更新参数同上。这样实现后，在预测时，外部代码使用同样的predict_out函数，所以预测和训练使用的激活函数一致。因此，我们按照这个思路实现。注意：由于是多分类，我们使用sigmoid作为输出层的激活函数，每个输出节点独立，所以预测时，我们取输出层三个节点中值最大的那个作为预测类别。这相当于将多分类看作三个二分类问题，然后取概率最大的类别。总结代码步骤： 1. 将label转换为one-hot编码（在循环之前）。 2. 在每次迭代中： a. 前向传播（使用hidden_in, hidden_out, predict_in, predict_out函数） b. 计算输出层误差：output_delta = predict_output - one_hot c. 计算隐藏层误差：hidden_delta = (output_delta * w1.T) * partial_sig(hidden_input) [注意：这里用逐元素乘法] d. 更新参数： w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) 注意：矩阵乘法使用*，逐元素乘法使用np.multiply或者*（在维度匹配时，但这里partial_sig返回的是矩阵，所以使用np.multiply进行逐元素乘）。但是，在计算hidden_delta时，output_delta * w1.T 是一个(m, n_hidden)的矩阵，partial_sig(hidden_input)也是(m, n_hidden)，所以逐元素相乘。另外，由于在训练函数中，我们使用了one_hot编码，所以需要确保n_output已知（由参数传入）。最后，我们返回训练好的参数。现在，我们将代码补充到指定区域。注意：由于题目要求代码在指定的Begin和End之间，我们只需要在该区域填写代码。另外，由于label可能是矩阵形式，我们将其转换为数组形式以便创建one-hot编码。具体实现：首先，在循环之前，将label转换为one-hot编码（注意：label是mat类型，需要转换为数组）： m = feature.shape[0] # 将label转换为整数索引数组 label_array = np.array(label).flatten().astype(int) # 确保是整数 one_hot = np.zeros((m, n_output)) one_hot[np.arange(m), label_array] = 1 one_hot = np.mat(one_hot) # 转换为矩阵然后，在循环内部： # 前向传播 hidden_input = hidden_in(feature, w0, b0) # 隐藏层输入 hidden_output = hidden_out(hidden_input) # 隐藏层输出（sigmoid激活） predict_input = predict_in(hidden_output, w1, b1) # 输出层输入 predict_output = predict_out(predict_input) # 输出层输出（sigmoid激活） # 反向传播 output_delta = predict_output - one_hot # 计算隐藏层残差 # 注意：partial_sig(hidden_input)是sigmoid函数的导数 hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) # 更新权重和偏置 w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) # 将每列加起来，变成1行n_output列 w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) # 变成1行n_hidden列注意：在更新偏置时，我们使用np.sum(axis=0)来对每个样本的梯度求和（因为偏置的梯度是每个样本对应梯度的和）。另外，由于矩阵运算，注意维度的匹配。例如，hidden_output.T是(n_hidden, m)，output_delta是(m, n_output)，所以hidden_output.T * output_delta 是(n_hidden, n_output)，与w1的形状相同。同理，feature.T是(n, m)，hidden_delta是(m, n_hidden)，所以feature.T * hidden_delta是(n, n_hidden)，与w0形状相同。最后，我们返回训练好的参数。由于题目中maxcycle是迭代次数，我们循环maxcycle次。但是，注意：题目中写的是“while i <= maxcycle”，所以循环次数是maxcycle+1次。不过，通常我们循环maxcycle次，所以也可以改为i < maxcycle，但题目写的是<=，我们就按照<=，即maxcycle+1次。现在，将代码写入指定区域。注意：由于题目要求只填写指定区域，所以我们将上述代码放在Begin和End之间。但是，在循环之前，我们需要将label转换为one-hot编码，而这段代码应该放在循环之前，但又在初始化之后。由于题目要求只填写指定区域，所以我们将one-hot转换的代码也放在Begin和End之间，但是放在循环之前。所以，我们在Begin和End之间，先写one-hot转换，再写循环。具体结构： #********* Begin *********# # 将label转换为one-hot编码 m = np.shape(feature)[0] # 将label转换为整数索引 label_array = np.array(label).flatten().astype(int) one_hot = np.zeros((m, n_output)) one_hot[np.arange(m), label_array] = 1 one_hot = np.mat(one_hot) # 训练循环 i = 0 while i <= maxcycle: # 前向传播 hidden_input = hidden_in(feature, w0, b0) # 隐藏层输入 hidden_output = hidden_out(hidden_input) # 隐藏层输出 predict_input = predict_in(hidden_output, w1, b1) # 输出层输入 predict_output = predict_out(predict_input) # 输出层输出 # 反向传播 output_delta = predict_output - one_hot hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) # 更新权重和偏置 w1 -= alpha * (hidden_output.T * output_delta) b1 -= alpha * np.sum(output_delta, axis=0) w0 -= alpha * (feature.T * hidden_delta) b0 -= alpha * np.sum(hidden_delta, axis=0) i += 1 #********* End *********# 但是，注意：题目在循环之前已经初始化了i=0，并且有一个while循环。所以，我们不需要再写i=0和i+=1，因为题目在循环外已经初始化i=0，在循环内最后有i+=1。所以，我们只需要在给定的while循环内填写代码。因此，我们只需要在给定的while循环内（即i<=maxcycle的循环内）填写前向传播、反向传播和更新参数的代码，并且在循环之前（在Begin和End区域内，但在while循环之前）将label转换为one-hot编码。所以，我们这样写： #********* Begin *********# # 将label转换为one-hot编码（在循环之前） m = feature.shape[0] label_array = np.array(label).flatten().astype(int) one_hot = np.zeros((m, n_output)) one_hot[np.arange(m), label_array] = 1 one_hot = np.mat(one_hot) # 注意：题目中已经有一个while循环，我们在这个循环内进行训练 while i <= maxcycle: # 前向传播 hidden_input = hidden_in(feature, w0, b0) hidden_output = hidden_out(hidden_input) predict_input = predict_in(hidden_output, w1, b1) predict_output = predict_out(predict_input) # 反向传播 output_delta = predict_output - one_hot hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) # 更新权重和偏置 w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) i += 1 # 注意：题目在循环外已经写了i+=1，所以我们这里不需要写，但题目在循环内没有写，所以我们需要加上 #********* End *********# 但是，注意题目中在循环结束处已经有一个i+=1，所以如果我们在这里再写一个i+=1，那么每次循环i增加2。所以，我们不应该在这里写i+=1，而应该使用题目原有的i+=1（它在End之后）。因此，我们去掉自己写的i+=1。观察题目给出的循环结构： i = 0 while i <= maxcycle: #********* Begin *********# ... (我们的代码) #********* End *********# i +=1 所以，循环内的i+=1是题目已经写好的。因此，我们在Begin和End之间不要写i+=1。因此，代码为： #********* Begin *********# # 将label转换为one-hot编码 m = feature.shape[0] label_array = np.array(label).flatten().astype(int) one_hot = np.zeros((m, n_output)) one_hot[np.arange(m), label_array] = 1 one_hot = np.mat(one_hot) while i <= maxcycle: # 前向传播 hidden_input = hidden_in(feature, w0, b0) hidden_output = hidden_out(hidden_input) predict_input = predict_in(hidden_output, w1, b1) predict_output = predict_out(predict_input) # 反向传播 output_delta = predict_output - one_hot hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) # 更新参数 w1 = w1 - alpha * (hidden_output.T * output_delta) b1 = b1 - alpha * np.sum(output_delta, axis=0) w0 = w0 - alpha * (feature.T * hidden_delta) b0 = b0 - alpha * np.sum(hidden_delta, axis=0) #********* End *********# 但是，这样写会导致循环体内的代码在第一次迭代后，i没有增加（因为我们的代码在循环体内，而i+=1在循环体的End之后），所以实际上循环体内的代码会执行maxcycle+1次（从i=0到i=maxcycle，共maxcycle+1次）。这没有问题。最后，我们返回w0, w1, b0, b1。注意：在循环体内，我们更新了w0, w1, b0, b1，所以循环结束后返回的就是训练好的参数。我们按照上述代码实现。注意：由于矩阵运算，确保维度正确。另外，由于特征矩阵feature是(m, n)，其中m是样本数，n是特征数。我们使用numpy的矩阵运算，注意矩阵乘法和逐元素乘法的区别。现在，我们将代码写入。 </think> 根据任务要求，我将实现神经网络的前向传播、反向传播和梯度下降算法。以下是完整的实现代码： ```python #encoding=utf8 import numpy as np from math import sqrt #bp神经网络训练方法 def bp_train(feature,label,n_hidden,maxcycle,alpha,n_output): ''' 计算隐含层的输入 input:feature(mat):特征 label(mat):标签 n_hidden(int)隐藏层的节点个数 maxcycle(int):最大迭代次数 alpha(float):学习率 n_output(int):输出层的节点个数 output:w0(mat):输入层到隐藏层之间的权重 b0(mat):输入层到隐藏层之间的偏置 w1(mat):隐藏层到输出层之间的权重 b1(mat):隐藏层到输出层之间的偏置 ''' m,n = np.shape(feature) #初始化 w0 = np.mat(np.random.rand(n,n_hidden)) w0 = w0*(8.0*sqrt(6)/sqrt(n+n_hidden))-\ np.mat(np.ones((n,n_hidden)))*\ (4.0*sqrt(6)/sqrt(n+n_hidden)) b0 = np.mat(np.random.rand(1,n_hidden)) b0 = b0*(8.0*sqrt(6)/sqrt(n+n_hidden))-\ np.mat(np.ones((1,n_hidden)))*\ (4.0*sqrt(6)/sqrt(n+n_hidden)) w1 = np.mat(np.random.rand(n_hidden,n_output)) w1 = w1*(8.0*sqrt(6)/sqrt(n_hidden+n_output))-\ np.mat(np.ones((n_hidden,n_output)))*\ (4.0*sqrt(6)/sqrt(n_hidden+n_output)) b1 = np.mat(np.random.rand(1,n_output)) b1 = b1*(8.0*sqrt(6)/sqrt(n_hidden+n_output))-\ np.mat(np.ones((1,n_output)))*\ (4.0*sqrt(6)/sqrt(n_hidden+n_output)) #训练 i = 0 # 将标签转换为one-hot编码 one_hot = np.zeros((m, n_output)) label_array = np.array(label).flatten().astype(int) one_hot[np.arange(m), label_array] = 1 one_hot = np.mat(one_hot) while i <= maxcycle: #********* Begin *********# #前向传播 #计算隐藏层的输入 hidden_input = hidden_in(feature, w0, b0) #计算隐藏层的输出 hidden_output = hidden_out(hidden_input) #计算输出层的输入 predict_input = predict_in(hidden_output, w1, b1) #计算输出层的输出 predict_output = predict_out(predict_input) #反向传播 #隐藏层到输出层之间的残差 output_delta = predict_output - one_hot #输入层到隐藏层之间的残差 hidden_delta = np.multiply(output_delta * w1.T, partial_sig(hidden_input)) #更新权重与偏置 w1 -= alpha * (hidden_output.T * output_delta) b1 -= alpha * np.sum(output_delta, axis=0) w0 -= alpha * (feature.T * hidden_delta) b0 -= alpha * np.sum(hidden_delta, axis=0) #********* End *********# i +=1 return w0,w1,b0,b1 #计算隐藏层的输入函数 def hidden_in(feature,w0,b0): m = np.shape(feature)[0] hidden_in = feature*w0 for i in range(m): hidden_in[i,] += b0 return hidden_in #计算隐藏层的输出函数 def hidden_out(hidden_in): hidden_output = sig(hidden_in) return hidden_output #计算输出层的输入函数 def predict_in(hidden_out,w1,b1): m = np.shape(hidden_out)[0] predict_in = hidden_out*w1 for i in range(m): predict_in[i,] +=b1 return predict_in #计算输出层的输出的函数 def predict_out(predict_in): result = sig(predict_in) return result #sigmoid函数 def sig(x): return 1.0/(1+np.exp(-x)) #计算sigmoid函数偏导 def partial_sig(x): m,n = np.shape(x) out = np.mat(np.zeros((m,n))) for i in range(m): for j in range(n): out[i,j] = sig(x[i,j])*(1-sig(x[i,j])) return out ```

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

目录

用Python手写BP神经网络时，为什么输出层要用one-hot编码配合sigmoid（而非softmax）来实现鸢尾花三分类？

Python内容推荐

python实现BP神经网络回归预测模型

利用Python实现三层BP神经网络

BP神经网络,bp神经网络预测模型,Python

多层BP神经网络（参数高自由度）Python

python实现BP神经网络回归预测模型.zip

BP神经网络预测（python）

Python使用numpy实现BP神经网络

基于python的bp神经网络源码附件

python实现基于BP神经网络的成绩预测.zip

BP神经网络实现多输入多输出回归模型搭建，python，带数据集

BP神经网络源码（python实现）

Python神经网络编程三层神经网络实现识别手写数字

基于 BP 神经网络对鸢尾花进行分类的 Python 实现

十分简单的单隐层BP神经网络python实现

基于Python.Numpy实现BP卷积神经网络识别手写数字源码(含超详细注释+使用说明).zip

python实现BP神经网络回归预测模型.rar

利用Python实现三层BP神经网络.zip

用 Python 实现 BP 神经网络教程

基于python的BP神经网络算法设计与实现

卷积神经网络实现手写数字识别（纯numpy实现）-python卷积神经网络代码.zip

Python实现的三层BP神经网络算法示例

BP神经网络python简单实现

基于python的BP神经网络及异或实现过程解析

BP神经网络原理及Python实现代码

Python：客运量与货运量预测-BP神经网络

学生成绩管理系统C++课程设计与实践

别再手动拖拽了！用Lumerical脚本批量创建FDTD仿真结构（附完整代码）

Java邮件解析任务中，如何安全高效地提取HTML邮件内容并避免硬编码、资源泄漏和类型转换异常？

RH公司应收账款管理优化策略研究

新手别慌！用BingPi-M2开发板带你5分钟搞懂Tina Linux SDK目录结构