Multi-Scale Pooling In Deep Neural Networks For Dense Crowd Estimation

Authors

  • Mr. Ali Raza Dept. Electronic Engineering, Quaid-e-Awam University, Larkana, Pakistan
  • Dr. Fareed Ahmed Computer System Engineering, Quaid-e-Awam University, Nawabshah,Pakistan
  • Ghulam Hussain Electronic Engneering, Quaid-e-Awam University, Larkana
  • Dr. Kamran National Centre of Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
  • Mr. Arsalan Ahmed Dept. Electronic Engineering, Quaid-e-Awam University, Larkana, Pakistan

DOI:

https://doi.org/10.30537/sjet.v5i1.1023

Keywords:

Perspective Distortion, local scale, image patches, crowd counting, deep learning

Abstract

State-of-the-art-methods for counting persons in dense crowded places lack in estimating accurate crowd density due to following reasons. They typically apply the same filters over a complete image or over big image patches. Only then the perspective distortion can be compensated by estimating local scale. It is achieved by training an additional classifier with the optimal kernel size chosen from limited choices. These methods are restricted to the context they are applied on because they are not end-to-end trainable; cannot justify quick scale changes because they allocate a single scale to big image patches; and can only utilize a narrow range of receptive fields for the networks to be of a feasible size. In this study, we bring in an end-to-end trainable deep architecture that merges features achieved from multiple kernels of different sizes and learns various essential features such as quick scale changes and to utilize the right context at each image location. This technique flexibly encodes scale of related information to precisely predict crowd density. The training and validation loss of the proposed approach is 5% and 4% lower than the state-of-the-art context aware method, respectively.

Downloads

Download data is not yet available.

Downloads

Published

2022-06-30