The document presents a technical paper for the COMSNETS 2025 conference on developing a deepfake detection system utilizing CNN and spatio-temporal features. It outlines the methodology involving preprocessing of video frames, feature extraction with ResNet34 and Dense Swin Transformer, and the utilization of Local Ternary Patterns for texture analysis. The proposed model achieved a prediction accuracy of 98.25%, with future work suggested to improve performance evaluation through cross-validation techniques.