Attending to Future Tokens for Bidirectional Sequence Generation
Accepted at Empirical Methods for Natural Language Processing (EMNLP) 2019 NLP experienced a major change in the previous months. Previously, each NLP task defined a neural model and trained this model on the given task. But in recent months, various papers (ELMo [1], ULMFiT [2], GPT [3], BERT [4], GPT2 [5]) showed that it is possible to pre-train a NLP model on a language modelling task (more on this below) and then use this model as a starting point to fine-tune to further tasks. This has been labelled as an important turning point for NLP by many ([6], [7], [8], inter alia).