Temporal Convolutional Networks, The Next Revolution for Time-Series?
Mục Lục
Temporal Convolutional Networks, The Next Revolution for Time-Series?
This post reviews the latest innovations of TCN based solutions. We first present a case study of motion detection and briefly review the TCN architecture and its advantages over conventional approaches such as Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN). Then, we introduce several novels using TCN, including improving traffic prediction, sound event localization & detection, and probabilistic forecasting.
A brief review of TCN
The seminal work of Lea et al. (2016) first proposed a Temporal Convolutional Networks (TCNs) for video-based action segmentation. The two steps of this conventional process include: firstly, computing of low-level features using (usually) CNN that encode spatial-temporal information and secondly, input these low-level features into a classifier that captures high-level temporal information using (usually) RNN. The main disadvantage of such an approach is that it requires two separate models. TCN provides a unified approach to capture all two levels of information hierarchically.
The encoder-decoder framework is presented in Fig.1, where further information regarding the architecture can be found in the first two references (at the end of the post). The most critical issues are provided as follows: TCN can take a series of any length and output it as the same length. A causal convolutional is used where a 1D fully convolutional network architecture is used. A key characteristic is that the output at time t is only convolved with the elements that occurred before t.
Lea et al. (2016)
The buzz around TCN arrives even to Nature journal, with the recent publication of the work by Yan et al. (2020) on TCN for weather prediction tasks. In their work, a comparative experiment was conducted with TCN and LSTM. One of their results was that, among other approaches, the TCN performs well in prediction tasks with time-series data.
Yan et al. (2020)
The next sections provide the implementation and extension of this classical TCN.
Improving traffic prediction
Ridesharing and online navigation services can improve traffic prediction and change the way of life on the road. Fewer traffic jams, less pollution, safe and fast driving are just a few examples of essential issues that can be achieved by better traffic predictions. As this is a real-time data-driven problem, it is necessary to utilize the accumulated data of upcoming traffic. For this reason, Dai et al. (2020) recently presented a Hybrid Spatio-Temporal Graph Convolutional Network (H-STGCN). The general idea is to take the advantages of the piecewise-liner-flow-density relationship and convert the upcoming traffic volume in its equivalent in travel time. One of the most interesting approaches they used in this work is the graph convolution to capture the spatial dependency. The compound adjacency matrix captures the innate characteristics of traffic approximation (for more information, please see Li, 2017). In the following architecture, four modules are presented to describe the entire prediction process.
Dai et al. (2020)
Sound event localization & detection
The field of sound event localization and detection (SELD) continues to grow. Understanding the environment plays a critical role in autonomous navigation. Guirguis et al. (2020) recently proposed a novel architecture for sound events SELD-TCN. They claim that their framework outperforms the state-of-the-art in the field, with faster training time. In their SELDnet (architecture below), a multichannel audio recording, sampled at 44.1 kHz, extracts, by applying a short-time Fourier transformation, the phase and magnitude of the spectrum, and stacks it as separate input features. Then, convolutional blocks and recurrent blocks (bi-directional GRUs) are connected, followed by a fully-connected block. The output of the SELDnet is the SOUND Event Detection (SED) and Direction Of Arrival (DOA).
Guirguis et al. (2020)
In order to outperform it, they present the SELD-TCN:
Guirguis et al. (2020)
As the dilated convolutions enable the net to process a variety of inputs, a more in-depth network may be required (which will be affected by unstable gradients during backpropagation). They overcome this challenge by adapting the WaveNet (Dario et al., 2017) architecture. They showed that the recurrent layers are not required for SELD tasks, and successfully detected the start and the end times of active sound events.
Probabilistic forecasting
A novel framework designed by Chen et al. (2020) can be applied to estimate probability density. Time series prediction improves many business decision-making scenarios (for example, resources management). Probabilistic forecasting can extract information from historical data and minimize the uncertainty of future events. When the prediction task is to predict millions of related data series (as in the retail business), it requires prohibitive labor and computing resources for parameter estimation. In order to solve these difficulties, they proposed a CNN-based density estimation and prediction framework. Their framework can learn the latent correlation among series. The novelty in their work is the deep TCN they proposed, as presented in their architecture:
Chen et al. (2020)
The encoder-decoder modules solution might help in the design of practical large-scale applications.
Summary
In this post, we presented recent works that involve the temporal convolutional network and outperform classical CNN, and RNN approaches for time series tasks. For further information, please feel free to email me.
— — — — — — — — — — — — — — — — — — — — — — — — —
About the Author
Barak Or received the B.Sc. (2016), M.Sc. (2018) degrees in aerospace engineering, and also B.A. in economics and management (2016, Cum Laude) from the Technion, Israel Institute of Technology. He was with Qualcomm (2019–2020), where he mainly dealt with Machine Learning and Signal Processing algorithms. Barak is currently studying toward his Ph.D. at the University of Haifa. His research interest includes sensor fusion, navigation, deep learning, and estimation theory. www.Barakor.com
Linkedin https://www.linkedin.com/in/barakor/
Twitter: BarakOr2
— — — — — — — — — — — — — — — — — — — — — — — — —
References
Lea, Colin, et al. “Temporal convolutional networks: A unified approach to action segmentation.” European Conference on Computer Vision. Springer, Cham, 2016.
Lea, Colin, et al. “Temporal convolutional networks for action segmentation and detection.” proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Yan, Jining, et al. “temporal convolutional networks for the Advance prediction of enSo.” Scientific Reports 10.1 (2020): 1–15.
Li, Yaguang, et al. “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.” arXiv preprint arXiv:1707.01926 (2017).
Rethage, Dario, Jordi Pons, and Xavier Serra. “A wavenet for speech denoising.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
Chen, Yitian, et al. “Probabilistic forecasting with temporal convolutional neural network.” Neurocomputing (2020).
Guirguis, Karim, et al. “SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks.” arXiv preprint arXiv:2003.01609 (2020).