This is necessitated by the open Issue: pytorch/pytorch#1591
First, the usual pack_padded_sequence
and pad_packed_sequence
for handling variable length sequences;
seq_len, bsz, n_dims = feats.size()
packed_input = pack_padded_sequence(feats, lengths, batch_first=False)
packed_output, self.hidden = self.lstm(packed_input, self.hidden)
# lstm_out --> seqlen X bsz X hidden_dim
lstm_out, output_lengths = pad_packed_sequence(packed_output, batch_first = False)
Then the hack is implemented as the output size of the Variable returned by pad_packed_sequence is determined by the max length in output_lengths
, not seqlen
in batch. Also, you may have to hardcode MAXLEN
in sequence/loss masking procedures;
if lstm_out.size(0)<seq_len:
dummy_tensor = autograd.Variable(torch.zeros(seq_len-lstm_out.size(0), bsz, self.hidden_dim))
lstm_out = torch.cat([lstm_out, dummy_tensor], 0)
Our accuracy metrics have remained stable and predictions in line with our expectations, so I think this hack works well.