Dropout 的应用

对于深层深度 RNN，在训练集上很容易过拟合。Dropout 是防止过拟合的常用技术。

可以简单的在 RNN 层之前或之后添加一层 Dropout 层，但如果需要在 RNN 层之间应用 Dropout 技术就需要DropoutWrapper。

下面的代码中，每一层的 RNN 的输入前都应用了 Dropout，Dropout 的概率为 50%。

keep_prob = 0.5
cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
cell_drop = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell_drop]*n_layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)

同时也可以通过设置output_keep_prob来在输出应用 Dropout 技术。

然而在以上代码中存在的主要问题是，Dropout 不管是在训练还是测试时都起作用了，而我们想要的仅仅是在训练时应用 Dropout。

很不幸的是DropoutWrapper不支持is_training这样一个设置选项。因此必须自己写 Dropout 包装类，或者创建两个计算图，一个用来训练，一个用来测试。后则可通过如下面代码这样实现。

import sys
is_training  = (sys.argv[-1] == "train")
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
if is_training:
    cell = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell]*n_layers)
rnn_outpus, status = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
[...] #  bulid the rest of the graph
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
    if is_training:
        init.run()
        for iteration in range(n_iterations):
            [...] #  train the model
        save_path = saver.save(sess, "/tmp/my_model.ckpt")
    else:
        saver.restore(sess, "/tmp/my_model.ckpt")
        [...] #  use the model

通过以上的方法就能够训练各种 RNN 网络了。然而对于长序列的 RNN 训练还言之过早，事情会变得有一些困难。

那么我们来探讨一下究竟这是为什么和怎么应对呢？