Hack for Handling pad_packed_sequence in PyTorch

This is necessitated by the open Issue: pytorch/pytorch#1591

First, the usual pack_padded_sequence and pad_packed_sequence for handling variable length sequences;

seq_len, bsz, n_dims = feats.size()
packed_input = pack_padded_sequence(feats, lengths, batch_first=False)
packed_output, self.hidden = self.lstm(packed_input, self.hidden)
# lstm_out --> seqlen X bsz X hidden_dim
lstm_out, output_lengths = pad_packed_sequence(packed_output, batch_first = False)

Then the hack is implemented as the output size of the Variable returned by pad_packed_sequence is determined by the max length in output_lengths, not seqlen in batch. Also, you may have to hardcode MAXLEN in sequence/loss masking procedures;

if lstm_out.size(0)<seq_len:
    dummy_tensor = autograd.Variable(torch.zeros(seq_len-lstm_out.size(0), bsz, self.hidden_dim))
    lstm_out = torch.cat([lstm_out, dummy_tensor], 0)

Our accuracy metrics have remained stable and predictions in line with our expectations, so I think this hack works well.

manasRK/PyTorch Hack.md