TransTCN: An Attention-based TCN Framework for Sequential Modeling

ICLR 2022 Conference Withdrawn Submission

Readers:

Everyone

Keywords :

sequential modeling, multi-head attention, TCN, Transformer

Abstract :

Among the sequential modeling issues, the ability to model the long-term dependency remains a significant issue yet to be overcome. Although, recurrent networks extract this information via a recurrent connection, the training step also considers the temporal connection, which reduces the efficiency. However, a temporal connection network (TCN) exploits the benefit of parallelization of convolution and consequently models the sequential information via causal-dilated connection of layers. Moreover, Transformer has exhibited great ability to capture long-term dependency. Thus, in this study, based on the TCN model, the attention blocks in Transformer were introduced to form a model called TransTCN. TransTCN models the sequential information considering attention modules. The model was evaluated across a wide range of the tasks in time series, which are commonly used to the benchmark of TCN and recurrent networks. To the best of our knowledge, TransTCN is the first framework to combine the attention in transformer with TCN to achieve a SOTA performance. The experimental results showed that the perplexity of the word-level prediction on PennTreebank reached only $1.33$ while TCN achieved $87.90$, which is $66$ times of the original TCN. In addition, nearly all loss and perplexity/bpc was improved on other datasets that are commonly used in TCN, except for several datasets wherein our approach maintained performance similar to the original TCN. Furthermore, the training process of TransTCN converges faster than that of TCN.

One-sentence Summary :

TransTCN achieves $66$ times of perplexity better than that of the original TCN by taking the attention as the long-term dependency module.

Supplementary Material :

zip