This paper proposes a method for real-time 3D pose estimation and tracking of objects using natural landmarks. It uses scale-invariant feature matching for initial pose estimation and KLT tracking of keypoints for fast local pose updates. Experimental results show that the mono camera mode achieves higher frame rates while the stereo camera mode provides more accurate pose estimates. Future work is outlined to improve computational efficiency through GPU implementations and to unify contour-based tracking.