This document describes an improvisation support system for music beginners based on body motion tracking. The system allows users to input pitch and rhythm through gestures tracked by a 3D motion sensor camera or smartphone sensors. For the 3D camera approach, hand gestures like finger positions are tracked to determine pitch and rhythmic gestures like tapping are recognized. For the smartphone approach, pitch is input through phone movement while rhythmic gestures include shaking, clapping, or tapping. Both approaches use machine learning models trained on motion data to map gestures to musical notes and timing within defined tonality constraints. An evaluation found the 3D camera more accurately recognized gestures while the smartphone approach could be more widely used. The system aims to help beginners participate in