This document summarizes a proposed system for tracking and counting humans in a visual surveillance system. The system first performs background subtraction on input video frames to detect foreground objects. It applies this process to both grayscale and binary image formats, and selects the better-performing format. Features are then extracted from detected objects. Objects are tracked frame-to-frame based on these features. Counting is performed by analyzing pixels representing detected humans across frames. Experimental results on videos with various challenges like shadows, illumination changes and occlusions show the system can accurately track and count humans, with near real-time performance.