KEYWORDS: Video, RGB color model, Optical flow, Video acceleration, Network architectures, Convolution, Image classification, 3D modeling, Video processing, Lithium
We propose a network structure for action recognition that is capable of extracting multi-scale temporal representations of actions. The key of the network is to combine a multi-scale temporal pooling module with a dense connection module, called multi-scale temporal pooling dense convolutional network (MTPDNet). The multi-scale temporal pooling module consists of multiple temporal scale levels. At each scale level, video frames are divided into several segments and a pooling operation is then performed on each segment to get temporal pooling information. The number of segments is set differently at different time scale levels, aiming to obtain multi-scale temporal pooling information. In addition, at each scale level, we adopt a redesigned dense connection module to learn motion representations from temporal pooling information. Finally, predictions are independently made at each scale level and the class scores of each scale level are fused to get the final prediction scores. Experimental results on two standard datasets, UCF101 and HMDB51, demonstrate that MTPDNet gets comparable or even better results among leading methods, which proves the effectiveness of the strategy combining multi-scale temporal pooling and dense connection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.