网络结构没变,只是在每一层后面加了dropout,结果居然有显著提升。一开始迭代500次,跑死我了,结果过拟合了,然后观察到69次的时候结果就已经很好了,就选择了迭代69次。 复制代码
1 # Larger CNN for the MNIST Dataset 2 import numpy
3 from keras.datasets import mnist 4 from keras.models import Sequential 5 from keras.layers import Dense 6 from keras.layers import Dropout 7 from keras.layers import Flatten
8 from keras.layers.convolutional import Convolution2D 9 from keras.layers.convolutional import MaxPooling2D 10 from keras.utils import np_utils
11 import mawww.wang027.comtplotlib.pyplot as plt 12 from keras.constraints import maxnorm 13 from keras.optimizers import SGD 14 # fix random seed for reproducibility 15 seed = 7
16 numpy.random.seed(seed) 17 # load data
18 (X_train, y_train), (X_test, y_test) = mnist.load_data() 19 # reshape to be [samples][pixels][width][height]
20 X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32') 21 X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32') 22 # normalize inputs from 0-255 to 0-1 23 X_train = X_train / 255 24 X_test = X_test / 255 25 # one hot encode outputs
26 y_train = np_utils.to_categorical(y_train) 27 y_test = np_utils.to_categorical(y_test) 28 num_classes = y_test.shape[1] 29 ###raw
30 # define the larger model 31 def larger_model(): 32 # create model 33 model = Sequential()
34 model.add(Convolution2D(30, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu')) 35 model.add(MaxPooling2D(pool_size=(2, 2))) 36 model.add(Dropout(0.4))
37 model.add(Convolution2D(15, 3, 3, activation='relu')) 38 model.add(MaxPooling2D(pool_size=(2, 2))) 39 model.add(Dropout(0.4)) 40 model.add(Flatten())
41 model.add(Dense(128, activation='relu'))
42 model.add(Dropout(0.4))
43 model.add(Dense(50, activation='relu')) 44 model.add(Dropout(0.4))
45 model.add(Dense(num_classes, activation='softmax')) 46 # Compile model
47 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 48 return model 49
50 # build the model 51 model = larger_model() 52 # Fit the model
53 model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=200, batch_size=200, verbose=2) 54 # Final evaluation of the model
55 scores = model.evaluate(X_test, y_test, verbose=0) 56 print(\复制代码 结果: 复制代码
1 ____________________________________________________________________________________________________ 2 Layer (type) Output Shape Param # Connected to
3 ==================================================================================================== 4 convolution2d_1 (Convolution2D) (None, 30, 24, 24) 780 convolution2d_input_1[0][0]
5 ____________________________________________________________________________________________________ 6 maxpooling2d_1 (MaxPooling2D) (None, 30, 12, 12) 0 convolution2d_1[0][0]
7 ____________________________________________________________________________________________________ 8 convolution2d_2 (Convolution2D) (None, 15, 10, 10) 4065 maxpooling2d_1[0][0]
9 ____________________________________________________________________________________________________ 10 maxpooling2d_2 (MaxPooling2D) (None, 15, 5, 5) 0 convolution2d_2[0][0]
11 ____________________________________________________________________________________________________ 12 dropout_1 (Dropout) (None, 15, 5, 5) 0 maxpooling2d_2[0][0]
13 ____________________________________________________________________________________________________ 14 flatten_1 (Flatten) (None, 375) 0 dropout_1[0][0]
15 ____________________________________________________________________________________________________ 16 dense_1 (Dense) (None, 128) 48128 flatten_1[0][0]
17 ____________________________________________________________________________________________________ 18 dense_2 (Dense) (None, 50) 6450 dense_1[0][0]
19 ____________________________________________________________________________________________________ 20 dense_3 (Dense) (None, 10) 510 dense_2[0][0]
21 ==================================================================================================== 22 Total params: 59933
23 ____________________________________________________________________________________________________ 24 Train on 60000 samples, validate on 10000 samples 25 Epoch 1/69
26 34s - loss: 0.4248 - acc: 0.8619 - val_loss: 0.0832 - val_acc: 0.9746
27 Epoch 2/69
28 35s - loss: 0.1147 - acc: 0.9638 - val_loss: 0.0518 - val_acc: 0.9831 29 Epoch 3/69
30 35s - loss: 0.0887 - acc: 0.9719 - val_loss: 0.0452 - val_acc: 0.9855 31 、、、 32 Epoch 66/69
33 38s - loss: 0.0134 - acc: 0.9955 - val_loss: 0.0211 - val_acc: 0.9943 34 Epoch 67/69
35 38s - loss: 0.0114 - acc: 0.9960 - val_loss: 0.0171 - val_acc: 0.9950 36 Epoch 68/69
37 38s - loss: 0.0116 - acc: 0.9959 - val_loss: 0.0192 - val_acc: 0.9956 38 Epoch 69/69
39 38s - loss: 0.0132 - acc: 0.9969 - val_loss: 0.0188 - val_acc: 0.9978 40 Large CNN Error: 0.22% 41
42 real 41m47.350s 43 user 157m51.145s 44 sys 6m5.829s 复制代码
这是目前的最好结果,99.78%,然而还有很多地方可以提升,下次准确率提高了再来更 。
总结:
1.CNN在图像识别上确实比传统的MLP有优势,比传统的机器学习算法也有优势(不过也有通过随机森林取的很好效果的)
2.加深网络结构,即多加几层卷积层有助于提升准确率,但是也能大大降低运行速度
3.适当加Dropout可以提高准确率
4.激活函数最好,算了,直接说就选relu吧,没有为啥,就因为relu能避免梯度消散这一点应该选它,训练速度快等其他优点下次专门总结一篇文章再说吧。
5.迭代次数不是越多越好,很可能会过拟合,自己可以做一个收敛曲线,keras里可以用history函数plot一下,看算法是否收敛,还是发散。