Logs27: Mutual Information

Prepare small dataset to verify this.

Run with large data. Train the seq2seq and backward intensivley.

Steps

Training data set
- p_i: "Let's have curry for lunch."
- q_i: "Maybe Coco ichi?"
- p_i+1: "Sounds good."
Train seq2seq
- X: concat(p_i, q_i)
- Y: p_i+1
Train seq2seq_backward
- X: p_i+1
- Y: q_i
RL Training
1. Beam Search
  - X: concat(p_i, q_i) [batch_size, decoder_length]
    - Note: q_i should be accessible as iterator as well, as we need this when calculating reward.
  - beam_replies: [batch_size, decoder_length, beam_width]
  - logits: [batch_size, decoder_length, vocab_size]
2. Calc reward
  - Given: p_i, q_i, a[beam_index] (from beam_search)
  - 1/N_a * logP_seq2seq(a|p_i, q_i)
    - shape: [batch_size, beam_size]
    - NOTE: Don't use logP_rl here.
    - model_seq2seq.get_logits(p_i + q_i)
      - For 1 data:float value
      - For batch data:
        
        Get logits [batch_size, decoder_length, vocab_size] for [batch_size, decoder_length]
        
        Then calcualte and loop it over all beam candidates
  - 1/N_qi * logP_backward(qi|a)
    - shape: [batch_size, beam_size]
    - model_backward.get_logits(a) for i in range(beam_width)
      - For 1 data: 1 float value.
      - For batch data:
        
        Get logits [batch_size, decoder_length]
        
        Do it # of beam_width times.
3. Get log_prob: [batch_size, decoder_length, beam_width]
  - We already have this.

OLD

Steps

done Make it possible that beam coexists with infer
Return infer_logis when beam search
Get logits for predicted_id
Have beam_logits.
Refactoring
- extract attention method.
- Unify the model class?
Confirm beam_logits is same size as logits and same values.
for one beam search result get indices
Fetch logprob from the indices
reward back? or make it for multiple.

- Wait ... we'll have to use conversations.db finally? because we need p_seq2seq(a| pi, qi) - Fully understand MI - Read the original paper - Read the original original paper - we did not train a joint model (log p(T|S)−λ log p(T)), but instead trained maximum likelihood models, and used the MMI criterion only during testing. - P_MI is trained by caliculating MI between source and target. - P_RL is trained by RL agents (so that they can get dialogue history)? - Let's check the existing implmentation. - Understand where pi, qi comes from in the training - pi let's eat curry - qi How about kokoichi - pi+1 sounds good - Start always with small model. - Have backward seq2seq training in place. - Find old implementation of mutual information.

MI steps

Build MI model, this is happening when decoding best N results and mutual information.

Logs27: Mutual Information

Prepare small dataset to verify this.

Steps

OLD

Steps

MI steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally