复用 TensorFlow 模型

如果原始模型使用 TensorFlow 进行训练,则可以简单地将其恢复并在新任务上进行训练:

  1. [...] # construct the original model
  1. with tf.Session() as sess:
  2. saver.restore(sess, "./my_model_final.ckpt")
  3. # continue training the model...

完整代码:

  1. n_inputs = 28 * 28 # MNIST
  2. n_hidden1 = 300
  3. n_hidden2 = 50
  4. n_hidden3 = 50
  5. n_hidden4 = 50
  6. n_outputs = 10
  7. X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
  8. y = tf.placeholder(tf.int64, shape=(None), name="y")
  9. with tf.name_scope("dnn"):
  10. hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name="hidden1")
  11. hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name="hidden2")
  12. hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name="hidden3")
  13. hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name="hidden4")
  14. hidden5 = tf.layers.dense(hidden4, n_hidden5, activation=tf.nn.relu, name="hidden5")
  15. logits = tf.layers.dense(hidden5, n_outputs, name="outputs")
  16. with tf.name_scope("loss"):
  17. xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  18. loss = tf.reduce_mean(xentropy, name="loss")
  19. with tf.name_scope("eval"):
  20. correct = tf.nn.in_top_k(logits, y, 1)
  21. accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name="accuracy")
  22. learning_rate = 0.01
  23. threshold = 1.0
  24. optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  25. grads_and_vars = optimizer.compute_gradients(loss)
  26. capped_gvs = [(tf.clip_by_value(grad, -threshold, threshold), var)
  27. for grad, var in grads_and_vars]
  28. training_op = optimizer.apply_gradients(capped_gvs)
  29. init = tf.global_variables_initializer()
  30. saver = tf.train.Saver()
  1. with tf.Session() as sess:
  2. saver.restore(sess, "./my_model_final.ckpt")
  3. for epoch in range(n_epochs):
  4. for iteration in range(mnist.train.num_examples // batch_size):
  5. X_batch, y_batch = mnist.train.next_batch(batch_size)
  6. sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
  7. accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,
  8. y: mnist.test.labels})
  9. print(epoch, "Test accuracy:", accuracy_val)
  10. save_path = saver.save(sess, "./my_new_model_final.ckpt")

但是,一般情况下,您只需要重新使用原始模型的一部分(就像我们将要讨论的那样)。 一个简单的解决方案是将Saver配置为仅恢复原始模型中的一部分变量。 例如,下面的代码只恢复隐藏的层1,2和3:

  1. n_inputs = 28 * 28 # MNIST
  2. n_hidden1 = 300 # reused
  3. n_hidden2 = 50 # reused
  4. n_hidden3 = 50 # reused
  5. n_hidden4 = 20 # new!
  6. n_outputs = 10 # new!
  7. X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
  8. y = tf.placeholder(tf.int64, shape=(None), name="y")
  9. with tf.name_scope("dnn"):
  10. hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name="hidden1") # reused
  11. hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name="hidden2") # reused
  12. hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name="hidden3") # reused
  13. hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name="hidden4") # new!
  14. logits = tf.layers.dense(hidden4, n_outputs, name="outputs") # new!
  15. with tf.name_scope("loss"):
  16. xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  17. loss = tf.reduce_mean(xentropy, name="loss")
  18. with tf.name_scope("eval"):
  19. correct = tf.nn.in_top_k(logits, y, 1)
  20. accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name="accuracy")
  21. with tf.name_scope("train"):
  22. optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  23. training_op = optimizer.minimize(loss)
  1. [...] # build new model with the same definition as before for hidden layers 1-3
  1. reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
  2. scope="hidden[123]") # regular expression
  3. reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])
  4. restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3
  5. init = tf.global_variables_initializer()
  6. saver = tf.train.Saver()
  7. with tf.Session() as sess:
  8. init.run()
  9. restore_saver.restore(sess, "./my_model_final.ckpt")
  10. for epoch in range(n_epochs): # not shown in the book
  11. for iteration in range(mnist.train.num_examples // batch_size): # not shown
  12. X_batch, y_batch = mnist.train.next_batch(batch_size) # not shown
  13. sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) # not shown
  14. accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images, # not shown
  15. y: mnist.test.labels}) # not shown
  16. print(epoch, "Test accuracy:", accuracy_val) # not shown
  17. save_path = saver.save(sess, "./my_new_model_final.ckpt")

首先我们建立新的模型,确保复制原始模型的隐藏层 1 到 3。我们还创建一个节点来初始化所有变量。 然后我们得到刚刚用trainable = True(这是默认值)创建的所有变量的列表,我们只保留那些范围与正则表达式hidden [123]相匹配的变量(即,我们得到所有可训练的隐藏层 1 到 3 中的变量)。 接下来,我们创建一个字典,将原始模型中每个变量的名称映射到新模型中的名称(通常需要保持完全相同的名称)。 然后,我们创建一个Saver,它将只恢复这些变量,并且创建另一个Saver来保存整个新模型,而不仅仅是第 1 层到第 3 层。然后,我们开始一个会话并初始化模型中的所有变量,然后从原始模型的层 1 到 3中恢复变量值。最后,我们在新任务上训练模型并保存。

任务越相似,您可以重复使用的层越多(从较低层开始)。 对于非常相似的任务,您可以尝试保留所有隐藏的层,只替换输出层。