-
Notifications
You must be signed in to change notification settings - Fork 19
Logs27: Mutual Information
Higepon Taro Minowa edited this page Jun 26, 2018
·
30 revisions
- Run with large data. Train the seq2seq and backward intensivley.
- Training data set
- p_i: "Let's have curry for lunch."
- q_i: "Maybe Coco ichi?"
- p_i+1: "Sounds good."
- Train seq2seq
- X: concat(p_i, q_i)
- Y: p_i+1
- Train seq2seq_backward
- X: p_i+1
- Y: q_i
- RL Training
- Beam Search
- X: concat(p_i, q_i) [batch_size, decoder_length]
- Note: q_i should be accessible as iterator as well, as we need this when calculating reward.
- beam_replies: [batch_size, decoder_length, beam_width]
- logits: [batch_size, decoder_length, vocab_size]
- X: concat(p_i, q_i) [batch_size, decoder_length]
- Calc reward
- Given: p_i, q_i, a[beam_index] (from beam_search)
-
1/N_a * logP_seq2seq(a|p_i, q_i)- shape: [batch_size, beam_size]
- NOTE: Don't use logP_rl here.
- model_seq2seq.get_logits(p_i + q_i)
- For 1 data:float value
- For batch data:
- Get logits [batch_size, decoder_length, vocab_size] for [batch_size, decoder_length]
- Then calcualte and loop it over all beam candidates
-
1/N_qi * logP_backward(qi|a)- shape: [batch_size, beam_size]
- model_backward.get_logits(a) for i in range(beam_width)
- For 1 data: 1 float value.
- For batch data:
- Get logits [batch_size, decoder_length]
- Do it # of beam_width times.
- Get logits [batch_size, decoder_length]
- Get log_prob: [batch_size, decoder_length, beam_width]
- We already have this.
- Beam Search
- done Make it possible that beam coexists with infer
- Return infer_logis when beam search
- Get logits for predicted_id
- Have beam_logits.
- Refactoring
- extract attention method.
- Unify the model class?
- Confirm beam_logits is same size as logits and same values.
- for one beam search result get indices
- Fetch logprob from the indices
- reward back? or make it for multiple.
- Wait ... we'll have to use conversations.db finally? because we need p_seq2seq(a| pi, qi)
- Fully understand MI
- Read the original paper
- Read the original original paper
- we did not train a joint model (log p(T|S)−λ log p(T)), but instead trained maximum likelihood models, and used the MMI criterion only during testing.
- P_MI is trained by caliculating MI between source and target.
- P_RL is trained by RL agents (so that they can get dialogue history)?
- Let's check the existing implmentation.
- Understand where pi, qi comes from in the training
- pi let's eat curry
- qi How about kokoichi
- pi+1 sounds good
- Start always with small model.
- Have backward seq2seq training in place.
- Find old implementation of mutual information.
- Build MI model, this is happening when decoding best N results and mutual information.