This paper proposes a one-shot scene-specific crowd counting technique using deep learning. Specifically, it uses a convolutional neural network called CSRNet as the backbone model. For a target scene, it fine-tunes just the decoder part of CSRNet using a single labeled image of that scene. This adapts the model to the target scene. Experimental results on standard datasets show the proposed approach outperforms baseline methods and can generalize across different datasets with same or different object types. The paper addresses a novel problem of one-shot scene-specific crowd counting using deep learning with potential applications in areas like surveillance and traffic monitoring.
Related topics: