Human Action Recognition (HAR) is a challenging task in pattern recognition. This paper proposes a novel dynamic PSO-ConvNet model for learning actions in videos, which leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. Experiments on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, and experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 show preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning).
