LSTM
參數解析:
- init_scale - the initial scale of the weights
- learning_rate - the initial value of the learning rate
- max_grad_norm - the maximum permissible norm of the gradient
- num_layers - the number of LSTM layers
- num_steps - the number of unrolled steps of LSTM
這個指的就是time_step,也就是輸入的詞的個數
- hidden_size - the number of LSTM units
每一層lstm有多少個小單元
- max_epoch - the number of epochs trained with the initial learning rate
- max_max_epoch - the total number of epochs for training
- keep_prob - the probability of keeping weights in the dropout layer
- lr_decay - the decay of the learning rate for each epoch after "max_epoch"
- batch_size - the batch size
LSTM的輸入
將embedding和input進行映射,使用embedding_lookup,每次輸入的是[size_batches, seq_length, rnn_size],三個參數分別是:時間長度,batch的size,rnn中的unit個數
之所以這樣拆的原因是:
為使學習過程易于處理,通常的做法是將反向傳播的梯度在(按時間)展開的步驟上照一個固定長度(seq_length)截斷。 通過在一次迭代中的每個時刻上提供長度為 size_batch 的輸入和每次迭代完成之后反向傳導,這會很容易實現。
輸入的變化:x_data = [446,50,50],指的是[number_batch,size_batch,seq_length] ==>embedding = [65,128],指的是[vocab_size,rnn_size] ==> Input_data=[50,50],指的是[batch_size,seq_length] ==> embedding_lookup(embedding,input_data) ==> [50,50,128],指的是[batch_size,seq_length,rnn_size]
LSTM初始化聲明
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0)
#size 指的就是hidden_size
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=config.keep_prob) # dropout的聲明
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers) #多層RNN的聲明方式
LSTM的輸入
用一個word2vec表示每個詞語,輸入的矩陣會被隨機初始化,然后隨著模型的學習,來不斷修改
# embedding_matrix 張量的形狀是: [vocabulary_size, embedding_size]
word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
LSTM訓練過程
state = self._initial_state
with tf.variable_scope("RNN"):
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
LSTM誤差聲明
output = tf.reshape(tf.concat(1, outputs), [-1, size]) #
softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
softmax_b = tf.get_variable("softmax_b", [vocab_size])
logits = tf.matmul(output, softmax_w) + softmax_b
loss = tf.nn.seq2seq.sequence_loss_by_example(
[logits],
[tf.reshape(self._targets, [-1])],
[tf.ones([batch_size * num_steps])])
self._cost = cost = tf.reduce_sum(loss) / batch_size
LSTM的迭代過程
for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
m.num_steps)):
cost, state, _ = session.run([m.cost, m.final_state, eval_op],
{m.input_data: x,
m.targets: y,
m.initial_state: state})
costs += cost
iters += m.num_steps
if verbose and step % (epoch_size // 10) == 10:
print("%.3f perplexity: %.3f speed: %.0f wps" %
(step * 1.0 / epoch_size, np.exp(costs / iters),
iters * m.batch_size / (time.time() - start_time)))
return np.exp(costs / iters)
# 此處需要針對cost,final_state,eval_op三個結構進行求解,輸入三個參數如下,input_data,target,initial_state
返回的總誤差是$$Loss = -\frac{1}{N}\sum_{i=1}^N InP_{target_i}$$
$$TotalLoss = e^{Loss}$$
for i in range(config.max_max_epoch):
lr_decay = config.lr_decay ** max(i - config.max_epoch, 0.0)
m.assign_lr(session, config.learning_rate * lr_decay)
print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))
train_perplexity = run_epoch(session, m, train_data, m.train_op,
verbose=True)
先完成model的初始化,然后在針對loss,train_op進行優化求解,通過SGD等方式進行求解
RNN-LSTM 參數設置
import argparse
parser = argparse.AugmentParser()
#添加參數名稱,類型,缺省值,幫助提示
parser.add_argument('--batch_size',type=int,defaule = 50, help='mini batch size')
parser.add_argument('--learn_rate', type = float, default = 0.01, help = 'learn rate.')
parser.add_argument('--')
save and restore
model = Model(saved_args, True)
saver = tf.train.Saver(tf.all_variables())
with tf.Session() as sess:
#tf.initialize_all_variables().run()
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver(tf.all_variables())# save all variables
ckpt = tf.train.get_checkpoint_state(args.save_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)# restore the sess from ckpt.model_checkpoint_path
print model.sample(sess, chars, vocab, args.n, args.prime)