Driver action recognition is essential in vehicle safety and smart car systems targeting vehicles as a diagnostic space. In the context of video-based action recognition, attention-based architectures have surpassed conventional methods in deep learning. Former frameworks have produced excellent results on the public Drive&Act dataset. However, present frameworks do not consider the temporal ordering of frames in the video and the spatial layout of the relevant interacting objects yielding poor performance in certain actions that include semantic reversals. This includes so-called conjugate actions that originate when performed backward in time. An example would be moving the hand rightward vs. leftward. We propose a feature engineering approach to model the motion of human pose. We use key points relevant to the action to incorporate the sequential order. We implement video swin architecture on the Drive&Act dataset. Then, we utilize the histogram of oriented displacements on human joint locations and their displacements and train a support vector machine to classify actions in conjugate pairs. Performance increases in two conjugate actions namely fastening/ unfastening seat belt and taking off/ putting on sunglasses. Integrating our module with existing deep learning models increases the overall accuracy by 3% to 72%. Furthermore, our approach can be extended to other action classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.