SlideShare a Scribd company logo
Improvisation Ensemble Support Systems
for Music Beginners
based on Body Motion Tracking
Shugo ICHINOSE*, Souta MIZUNO*, Shun SHIRAMATSU*, Tetsuro KITAHARA**
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology*
Department of Information Science, College of Humanities and Sciences, Nihon University **
Introduction
• There are three cognitive elements in melody recognition
• It is difficult for music beginners to attempt musical improvisation
due to the tonality
Tone
Chord
1. Rhythm
2. Pitch contour
3. Tonality
easy
difficult
Developing an improvisation ensemble support system
• The input is body motion of user as pitch contour and rhythm
• The output is harmonic sound satisfying tonality
Purpose
Ensemble support system
Tone
Chord+
Ensemble support system
Easy elements
Difficult element
ex) clapping
Two approaches were considered to develop system
• Approach with 3D Motion Sensor Camera
–  High motion recognition accuracy
–  It is not popular yet
• Approach with Smartphone Sensors
–  It is widely used
–  It is difficult to recognize body motion with high
accuracy
Approach
Intel RealSense 3D camera
Approach with 3D Motion Sensor Camera
RealSense SDK
Ensemble support system
Control of
the performance sound
with gestures
Coordinates determination
of the hand
Pitch determiner
Background
music
RealSense
User
Information Finger detection and
gesture recognition
hands the recognition result to the system
Songle
Tonality
constraints
Performance
Sound
Music
sound
Improvisation Ensemble Support Systems for Music  Beginners based on Body Motion Tracking
How to Input of Body Motion
523 Hz
494 Hz
440 Hz
392 Hz
Body motion
Pitch contour
(Time change of fingertip height)
Pitch satisfying
tonality
The up-down movement of the fingertip represents pitch contour
Sustained Sound and Decaying Sound
This system outputs two types of sounds
• Sustained Sound
– Ex) violin, flute etc
• Decaying Sound
– Ex) piano, iron harp, xylophone, etc
sec
volume
sec
volume
How to Operate System
User can operate the system with gestures
Thumb up Spread fingers Fist Tap
Switch sound type
(sustained/decaying)
Onset of
sustained sound
Offset of
sustained sound
Onset of
decaying sound
Improving Accuracies of Gesture Recognition
Speed (fingertip) > 1.8 m/s
Speed (palm) > 0.6 m/s
Moving distance > 20 mm
If all the threshold values
are satisfied
This gesture is tap
• Delay and false recognition were noticeable in the default
gesture recognition functions of RealSense SDK
• The accuracies of gesture recognition were improved by
optimizing threshold
Determination of Output Sound
• The output sound is restricted by tonality constraints
• Tonality constraints are list of mill-second and pitch frequency
that can be output under a particular chord
– It changes according to chord progression of the background music
– It were prepared in advance
Background music
List of (millsec, frequency)
3033 261.626 329.628 349.228 391.955 440
3783 261.626 329.628 349.228 391.955 440
4533 261.626 329.628 349.228 391.955 440
5283 261.626 329.628 349.228 391.955 440
6033 261.626 293.665 349.228 440
6783 261.626 293.665 349.228 440
7533 261.626 293.665 349.228 440
8283 261.626 293.665 349.228 440
9033 261.626 293.665 349.228 440 466.164
9783 261.626 293.665 349.228 440 466.164
10533 261.626 293.665 349.228 440 466.164
11283 261.626 293.665 349.228 440 466.164
C#M7
B♭m7
F#M7
Tonality constraints
Constitute of Tonality Constraints
• Tonality constraints contain the constituent notes of a chord
and frequently co-occurring notes (FCNs) with that chord
[Goto 11] Goto et al. (2011). Songle: A Web Service for Active Music Listening Improved by User Contributions. Proc. of
ISMIR 2011, pp. 311-316.
• FCNs are determined by statistical analysis from 100 songs
• Song data was obtained by using Songle API[Goto 11]
- Songle analyses chord progression of song data on the web
Tonality constraints of CM = { C, E, G } + { FCNs }
Tonality constraints of Cm = { C, E♭, G } + { FCNs }
Prepare of Selecting FCNs
• Chord and melody are represented by the relative position
with key (scale degree)
– If key is C, chord A is represented as chord VI
– If root note of chord is A, melody B is represented as melody II
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
C C# D D# E F F# G G# A A# B
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
A A# B C C# D D# E F F# G G#
Tonic Root
Scale degree of A against the tonic C
Scale degree of B against the chord root A
Melody:
Chord root:
Prepare of Selecting FCNs
• FCNs are determined on the basis of scale degree length
– Scale degree length refers to the length of each scale degree co-
occurring with each chord
I #I II III #III IV #IV V VI #VI VII #VII
Histogram of scale degree length on the chord I M
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
G G# A A# B C C# D D# E F F#
Count of Scale Degree Length
There are rules on how to count scale degree length
1. Categorize with scale degree: Even if the notation names are different,
notes that have same scale degrees are counted as same one.
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
E F F# G G# A A# B C C# D D#
Tonic Root
Melody:
Chord root:
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
C C# D D# E F F# G G# A A# B
Tonic Root
Melody:
Chord root:
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
A A# B C C# D D# E F F# G G#
The note name D# on chord E on tonic A
The note name F# on chord G on tonic C
They are regarded as melody VII sounds on the chord VM
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
G G# A A# B C C# D D# E F F#
Count of Scale Degree Length
There are rules on how to count scale degree length
2. Major and minor: Even if scale degrees are same, they are counted as
different ones if they are on major key and minor key.
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
E F F# G G# A A# B C C# D D#
Tonic Root
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
C C# D D# E F F# G G# A A# B
Tonic Root
Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ
A A# B C C# D D# E F F# G G#
The note name D# on chord E on key A
The note name F# on chord G on key Cm
In this case, although D# on the chord E on the key A and F# on the chord G on the key Cm have same scale degree,
they are counted separately for major and minor because of this rule.
major
minor
Melody:
Chord root:
Melody:
Chord root:
Selecting FCNs
𝑓 > 𝑓𝑚𝑎𝑥 × α
0
0.05
0.1
0.15
0.2
0.25
0.3
Chord
constituents
FCNs
I #I II III #III IV #IV V VI #VI VII #VII
𝑓𝑚𝑎𝑥 × α
Evaluation Experiment for 3D Motion Sensor Camera
We ask the experiment participants to operate the system and
to answer the following questions
1. As comparison of proposed tap recognition method and
RealSense SDK default method
– Q1-1: Whether there was a delay
– Q1-2: Whether there was a false recognition
2. As comparison of statistically generated tonality constraints
and tonality constraints consisting of chord constituents
– Q2-1: Whether there was a dissonance
– Q2-2: Whether user can perform as intended
Result of Experiment
Experiment of tap
The proposed method got better evaluation than the default method
Experiment of tonality constraints
The proposed method has dissonance, but can be done slightly as intended
3.92
2.92
4.75 5.00
0
1
2
3
4
5
6
Q:1-1 Q:1-2
The absense of delay The absense of
missed recognition
5.25 5.50
4.33
5.58
0.00
1.00
2.00
3.00
4.00
5.00
6.00
Q:2-1 Q:2-2
The absense of
dissonance
As Intended
Tap Tonality constraints
:Proposed method
:Default method
:Include FCNs
:Only constituent notes
Additional Evaluation Experiment
: Time when the performance
sound is heard
delay
• Comparing delay of the proposed method and default one
• The experimental participants tap according to the beat of the
song
• The start of the beat is regarded as the best timing of tap
recognition
• The delay from the best timing was recorded
Best timing
Result of Additional Evaluation Experiment
0
10
20
30
40
50
60
70
80
-375 -300 -225 -150 -75 0 75 150 225 300 375
Frequency(times)
Delay(ms)
[Tanaka 13] Tanaka et al. “The Effect of Sound Delay Conditions on Electronic Drum Performance,” Technical
Committee of Musical Acoustics of Acoustical Society of Japan, 2013. (in Japanese)
0
5
10
15
20
25
30
35
40
45
-375 -300 -225 -150 -75 0 75 150 225 300 375
Frequency(times)
Delay(ms)
• The delay average of the default method is in a serious delay area
• The proposed method is closer to the best timing
Area of serious delay[tanaka 13]Best timing
Proposal tap function default tap function
Delay average
-34.62ms 69.63ms
Approach with Smartphone Sensors
• Smartphone is widely used
• Pitch contour is inputted by movements of smartphone
• There are three options for specifying rhythm
1. Shake : acceleration sensors and gyro sensors
2. Clap : ambient light sensor
3. Tap : button located on the screen
Estimating Notation Name by Bayesian Network
• The notation names to output were
estimated by this Bayesian net
• from the value of each sensor and context such as
tonality
input
a : y-acceleration
v : speed
vc : variation in speed
g : gravitational acceleration
p : distance traveled
t : attack timing
rm : The most frequent prediction result of the last m times
Context such as tonality
c : chord of BGM
ni-1 :Previous notation name
output
ni : notation name to output
Estimating Attack Timing by Bayesian Network
• The Attack timings were estimated by this
Bayesian net
– Used only for clap and shake
– Because the touchscreen of a smartphone (tap)
is reliable enough
output
t : attack timing ( 0 or 1 )
input
ax : acceleration in x axis direction
ay : acceleration in the y-axis direction
v : speed
vc : variation in speed
g : gravitational acceleration
Collecting of Training Data by Experiment
• Training data for the Bayesian net was collected with an
examinee experiment
– This sensor data were obtained as five examinees
• The models were trained for estimating the notation name
and attack timing from the smartphone sensor data
The pitch contour of the melody
of an existing tune
They raised and lowered
their smartphone
in accordance with
Evaluation of Prediction Accuracy
1. Evaluation of prediction accuracy of notation name
– The test data and the training data are the same
– Two types of accuracy were calculated using note unit and sample unit
Notation
name
C
D
E
The number of note : 3
The number of sample : 24
Time
1 sample
(5ms)
1 note
Evaluation of Prediction Accuracy
2. Evaluation of prediction accuracy of attack timing
– The recall and the precision to the original song were examined
– They were calculated using note unit
Recall =
Precision =
Estimated attack timing
that matches the original one
The Number of attack timing of original one
The Number of estimated attack timing
Estimated attack timing
that matches the original one
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
Estimated attack timing
Result of Evaluation
•Prediction accuracy of notation
name
–Prediction accuracy every sample is
higher than every note
–In both cases, the accuracy by touch
is the highest
•Prediction accuracy of attack
timing
–Precision of shake is low. Small
movements are recognized as shake.
–Recall of clap is low. The ambient
light sensor is not reliable.
Operation Recall Precision
Shake 0.63 (171/270) 0.26 (171/661)
Clap 0.14 (39/270) 0.31 (39/151)
Operation
Ratio of notes estimated as
having same notation as original tune
Shake 0.49 (131/270)
Clap 0.49 (133/270)
Touch 0.56 (152/270)
Operation
Ratio of samples estimated as
having same notation as original tune
Shake 0.66 (42805/65278)
Clap 0.73 (47627/65218)
Touch 0.75 (49155/65251)
Accuracy of Pitch Notation
(Calculated with Sample Count)
Accuracy of Pitch Notation
(Calculated with Note Count)
Accuracy of Attack Timing (Calculated with Note Count)
Social Reuse of Improvisational Melody Data
• Our system can gather two types of users’ performance data
– Pitch contour data without conversion along with tonality constraints
– Tonal melody data converted from the pitch contour data
Social Reuse of Improvisational Melody Data
• If users publish their performance data as open data,
– the data can be used for collaborative music composing or remixing
A
B
C
D
D
Social Reuse of Improvisational Melody Data
• In particular, pitch contour data without tonality constraint
can be applied to various chord progressions.
– For example, pitch contour data of our system can be used in the loop
sequencer based on pitch contour (melodic outline) [Kitahara 16]
[Kitahara 16] T. Kitahara et al. “A loop sequencer that selects music loops based on the degree of excitement,” in
Proceedings of the 12th Sound and Music Computing Conference (SMC 2015), 2015, pp. 435–438.
Conclusion
• Two types of ensemble support systems were developed
– Used 3D motion sensor camera and smartphone respectively
– They automatically adjust note pitches to satisfy the tonality of
background music
– The systems enable music novices to participate in improvisational
ensembles
• In the future, we are also considering
– the social reuse of improvisational melody data shared as open data
– and ensemble among multiple users not only with background music
Improvisational ensemble
We define improvisational ensemble as follows
• Several people play instruments together
with no plan
• One person plays instrument with no plan in
accordance with background music
Available Gestures by RealSense
Thumb upFist V sign Spread fingers Thumb down
Full pinch tap WaveSwipe left Swipe right
Two fingers
pinch open
Improvement of other gestures
• Default function is used as it is
• When Openness is 90 or more
– Openness is a value indicating the degree of
opening of the finger
• When the degree of opening decreases
by 10 or more from the previous frame
openness > 90
openness – old openness > 10
Changing range of pitch
① change range of pitch
②change pitch
• User can change range of pitch using by the depth value
• Depth value is the distance between the hand and the camera
• If the user wanted to change the range of pitch, he should
extend his hand and move it up or down

More Related Content

PPTX
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
PPTX
The Art, Joy, and Power of Creating Musical Programs (JFugue at SXSW Interact...
PDF
Symbolic Melodic Similarity (through Shape Similarity)
PPTX
Elements of music for INSET 2024.pptx basic
PPT
Basic Harmony 101
PDF
Junior thesis: Automatic tempo identification from audio signal of bollywood ...
PPTX
An introduction to music
PPT
Guide
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
The Art, Joy, and Power of Creating Musical Programs (JFugue at SXSW Interact...
Symbolic Melodic Similarity (through Shape Similarity)
Elements of music for INSET 2024.pptx basic
Basic Harmony 101
Junior thesis: Automatic tempo identification from audio signal of bollywood ...
An introduction to music
Guide

Similar to Improvisation Ensemble Support Systems for Music Beginners based on Body Motion Tracking (20)

PPT
Guide
PPT
Music theory 101 b
PPTX
Level 2 music theory
PDF
Cantiga Roda Lesson Plans
PPT
Music Theory 101_B.ppt
PPTX
seniordesign_presentation_final
PPT
ELEMENTS OF MUSIC.ppt
PDF
Harmonic Practice in Tonal Music 2nd Edition Robert Gauldin
PPT
Copy of Copy of E4. ELEMENTS OF MUSIC.ppt
PDF
Melody harmonizer
PDF
MAPEH 6 WORKSHEETS (1) (1).pdf
PPTX
Music Theory: Fundamentals of music
PPT
Denktank 2010
PPTX
Music Appreciation Tutorial #1: Music Notation Basics
PDF
IRJET- Survey on Musical Scale Identification
PDF
Harmonic Practice in Tonal Music 2nd Edition Robert Gauldin
PPTX
Ncea level 1 music theory
PPT
7 PITCH AND TIMBRE.ppt
PPT
Music symbols
PPT
Music
Guide
Music theory 101 b
Level 2 music theory
Cantiga Roda Lesson Plans
Music Theory 101_B.ppt
seniordesign_presentation_final
ELEMENTS OF MUSIC.ppt
Harmonic Practice in Tonal Music 2nd Edition Robert Gauldin
Copy of Copy of E4. ELEMENTS OF MUSIC.ppt
Melody harmonizer
MAPEH 6 WORKSHEETS (1) (1).pdf
Music Theory: Fundamentals of music
Denktank 2010
Music Appreciation Tutorial #1: Music Notation Basics
IRJET- Survey on Musical Scale Identification
Harmonic Practice in Tonal Music 2nd Edition Robert Gauldin
Ncea level 1 music theory
7 PITCH AND TIMBRE.ppt
Music symbols
Music
Ad

More from siramatu-lab (20)

PPTX
高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作
PPTX
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
PPTX
Filtering out improper user accounts from twitter user accounts for discoveri...
PPTX
議題の関連情報推薦によるIBIS構造作成支援システムの試作
PPTX
Watanabe civictechforum
PPTX
Prototype System for Recommending Academic Subjects for Students' Self Design...
PPTX
Tag-based Approaches to Sharing Background Information regarding Social Probl...
PPTX
韻律情報による議論の場の空気推定手法の検討
PPTX
即興合奏時のコード進行をユーザがデザインする機構の検討
PPTX
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
PPTX
議論参加者の脳波による議論の場の空気推定手法の検討
PPTX
視線と表情を用いた議論の場の空気の推定手法の検討
PPTX
Ikeda ica2017
PPTX
ipsj全国大会発表スライド_水野
PPTX
2017ipsj全国大会発表スライド_宮脇
PPTX
2017ipsj全国大会発表スライド_一ノ瀬
PPTX
2017ipsj全国大会発表スライド_福本
PPTX
白松研卒論発表_渡辺
PPTX
2017ipsj全国大会発表スライド_池田
PPTX
2017ipsj全国大会発表スライド_成瀬
高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
Filtering out improper user accounts from twitter user accounts for discoveri...
議題の関連情報推薦によるIBIS構造作成支援システムの試作
Watanabe civictechforum
Prototype System for Recommending Academic Subjects for Students' Self Design...
Tag-based Approaches to Sharing Background Information regarding Social Probl...
韻律情報による議論の場の空気推定手法の検討
即興合奏時のコード進行をユーザがデザインする機構の検討
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
議論参加者の脳波による議論の場の空気推定手法の検討
視線と表情を用いた議論の場の空気の推定手法の検討
Ikeda ica2017
ipsj全国大会発表スライド_水野
2017ipsj全国大会発表スライド_宮脇
2017ipsj全国大会発表スライド_一ノ瀬
2017ipsj全国大会発表スライド_福本
白松研卒論発表_渡辺
2017ipsj全国大会発表スライド_池田
2017ipsj全国大会発表スライド_成瀬
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Modernizing your data center with Dell and AMD
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
Modernizing your data center with Dell and AMD
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
CIFDAQ's Market Insight: SEC Turns Pro Crypto
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Improvisation Ensemble Support Systems for Music Beginners based on Body Motion Tracking

  • 1. Improvisation Ensemble Support Systems for Music Beginners based on Body Motion Tracking Shugo ICHINOSE*, Souta MIZUNO*, Shun SHIRAMATSU*, Tetsuro KITAHARA** Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology* Department of Information Science, College of Humanities and Sciences, Nihon University **
  • 2. Introduction • There are three cognitive elements in melody recognition • It is difficult for music beginners to attempt musical improvisation due to the tonality Tone Chord 1. Rhythm 2. Pitch contour 3. Tonality easy difficult
  • 3. Developing an improvisation ensemble support system • The input is body motion of user as pitch contour and rhythm • The output is harmonic sound satisfying tonality Purpose Ensemble support system Tone Chord+ Ensemble support system Easy elements Difficult element ex) clapping
  • 4. Two approaches were considered to develop system • Approach with 3D Motion Sensor Camera –  High motion recognition accuracy –  It is not popular yet • Approach with Smartphone Sensors –  It is widely used –  It is difficult to recognize body motion with high accuracy Approach Intel RealSense 3D camera
  • 5. Approach with 3D Motion Sensor Camera RealSense SDK Ensemble support system Control of the performance sound with gestures Coordinates determination of the hand Pitch determiner Background music RealSense User Information Finger detection and gesture recognition hands the recognition result to the system Songle Tonality constraints Performance Sound Music sound
  • 7. How to Input of Body Motion 523 Hz 494 Hz 440 Hz 392 Hz Body motion Pitch contour (Time change of fingertip height) Pitch satisfying tonality The up-down movement of the fingertip represents pitch contour
  • 8. Sustained Sound and Decaying Sound This system outputs two types of sounds • Sustained Sound – Ex) violin, flute etc • Decaying Sound – Ex) piano, iron harp, xylophone, etc sec volume sec volume
  • 9. How to Operate System User can operate the system with gestures Thumb up Spread fingers Fist Tap Switch sound type (sustained/decaying) Onset of sustained sound Offset of sustained sound Onset of decaying sound
  • 10. Improving Accuracies of Gesture Recognition Speed (fingertip) > 1.8 m/s Speed (palm) > 0.6 m/s Moving distance > 20 mm If all the threshold values are satisfied This gesture is tap • Delay and false recognition were noticeable in the default gesture recognition functions of RealSense SDK • The accuracies of gesture recognition were improved by optimizing threshold
  • 11. Determination of Output Sound • The output sound is restricted by tonality constraints • Tonality constraints are list of mill-second and pitch frequency that can be output under a particular chord – It changes according to chord progression of the background music – It were prepared in advance Background music List of (millsec, frequency) 3033 261.626 329.628 349.228 391.955 440 3783 261.626 329.628 349.228 391.955 440 4533 261.626 329.628 349.228 391.955 440 5283 261.626 329.628 349.228 391.955 440 6033 261.626 293.665 349.228 440 6783 261.626 293.665 349.228 440 7533 261.626 293.665 349.228 440 8283 261.626 293.665 349.228 440 9033 261.626 293.665 349.228 440 466.164 9783 261.626 293.665 349.228 440 466.164 10533 261.626 293.665 349.228 440 466.164 11283 261.626 293.665 349.228 440 466.164 C#M7 B♭m7 F#M7 Tonality constraints
  • 12. Constitute of Tonality Constraints • Tonality constraints contain the constituent notes of a chord and frequently co-occurring notes (FCNs) with that chord [Goto 11] Goto et al. (2011). Songle: A Web Service for Active Music Listening Improved by User Contributions. Proc. of ISMIR 2011, pp. 311-316. • FCNs are determined by statistical analysis from 100 songs • Song data was obtained by using Songle API[Goto 11] - Songle analyses chord progression of song data on the web Tonality constraints of CM = { C, E, G } + { FCNs } Tonality constraints of Cm = { C, E♭, G } + { FCNs }
  • 13. Prepare of Selecting FCNs • Chord and melody are represented by the relative position with key (scale degree) – If key is C, chord A is represented as chord VI – If root note of chord is A, melody B is represented as melody II Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ C C# D D# E F F# G G# A A# B Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ A A# B C C# D D# E F F# G G# Tonic Root Scale degree of A against the tonic C Scale degree of B against the chord root A Melody: Chord root:
  • 14. Prepare of Selecting FCNs • FCNs are determined on the basis of scale degree length – Scale degree length refers to the length of each scale degree co- occurring with each chord I #I II III #III IV #IV V VI #VI VII #VII Histogram of scale degree length on the chord I M
  • 15. Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ G G# A A# B C C# D D# E F F# Count of Scale Degree Length There are rules on how to count scale degree length 1. Categorize with scale degree: Even if the notation names are different, notes that have same scale degrees are counted as same one. Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ E F F# G G# A A# B C C# D D# Tonic Root Melody: Chord root: Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ C C# D D# E F F# G G# A A# B Tonic Root Melody: Chord root: Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ A A# B C C# D D# E F F# G G# The note name D# on chord E on tonic A The note name F# on chord G on tonic C They are regarded as melody VII sounds on the chord VM
  • 16. Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ G G# A A# B C C# D D# E F F# Count of Scale Degree Length There are rules on how to count scale degree length 2. Major and minor: Even if scale degrees are same, they are counted as different ones if they are on major key and minor key. Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ E F F# G G# A A# B C C# D D# Tonic Root Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ C C# D D# E F F# G G# A A# B Tonic Root Ⅰ #Ⅰ Ⅱ #Ⅱ Ⅲ Ⅳ #Ⅳ Ⅴ #Ⅴ Ⅵ #Ⅵ Ⅶ A A# B C C# D D# E F F# G G# The note name D# on chord E on key A The note name F# on chord G on key Cm In this case, although D# on the chord E on the key A and F# on the chord G on the key Cm have same scale degree, they are counted separately for major and minor because of this rule. major minor Melody: Chord root: Melody: Chord root:
  • 17. Selecting FCNs 𝑓 > 𝑓𝑚𝑎𝑥 × α 0 0.05 0.1 0.15 0.2 0.25 0.3 Chord constituents FCNs I #I II III #III IV #IV V VI #VI VII #VII 𝑓𝑚𝑎𝑥 × α
  • 18. Evaluation Experiment for 3D Motion Sensor Camera We ask the experiment participants to operate the system and to answer the following questions 1. As comparison of proposed tap recognition method and RealSense SDK default method – Q1-1: Whether there was a delay – Q1-2: Whether there was a false recognition 2. As comparison of statistically generated tonality constraints and tonality constraints consisting of chord constituents – Q2-1: Whether there was a dissonance – Q2-2: Whether user can perform as intended
  • 19. Result of Experiment Experiment of tap The proposed method got better evaluation than the default method Experiment of tonality constraints The proposed method has dissonance, but can be done slightly as intended 3.92 2.92 4.75 5.00 0 1 2 3 4 5 6 Q:1-1 Q:1-2 The absense of delay The absense of missed recognition 5.25 5.50 4.33 5.58 0.00 1.00 2.00 3.00 4.00 5.00 6.00 Q:2-1 Q:2-2 The absense of dissonance As Intended Tap Tonality constraints :Proposed method :Default method :Include FCNs :Only constituent notes
  • 20. Additional Evaluation Experiment : Time when the performance sound is heard delay • Comparing delay of the proposed method and default one • The experimental participants tap according to the beat of the song • The start of the beat is regarded as the best timing of tap recognition • The delay from the best timing was recorded Best timing
  • 21. Result of Additional Evaluation Experiment 0 10 20 30 40 50 60 70 80 -375 -300 -225 -150 -75 0 75 150 225 300 375 Frequency(times) Delay(ms) [Tanaka 13] Tanaka et al. “The Effect of Sound Delay Conditions on Electronic Drum Performance,” Technical Committee of Musical Acoustics of Acoustical Society of Japan, 2013. (in Japanese) 0 5 10 15 20 25 30 35 40 45 -375 -300 -225 -150 -75 0 75 150 225 300 375 Frequency(times) Delay(ms) • The delay average of the default method is in a serious delay area • The proposed method is closer to the best timing Area of serious delay[tanaka 13]Best timing Proposal tap function default tap function Delay average -34.62ms 69.63ms
  • 22. Approach with Smartphone Sensors • Smartphone is widely used • Pitch contour is inputted by movements of smartphone • There are three options for specifying rhythm 1. Shake : acceleration sensors and gyro sensors 2. Clap : ambient light sensor 3. Tap : button located on the screen
  • 23. Estimating Notation Name by Bayesian Network • The notation names to output were estimated by this Bayesian net • from the value of each sensor and context such as tonality input a : y-acceleration v : speed vc : variation in speed g : gravitational acceleration p : distance traveled t : attack timing rm : The most frequent prediction result of the last m times Context such as tonality c : chord of BGM ni-1 :Previous notation name output ni : notation name to output
  • 24. Estimating Attack Timing by Bayesian Network • The Attack timings were estimated by this Bayesian net – Used only for clap and shake – Because the touchscreen of a smartphone (tap) is reliable enough output t : attack timing ( 0 or 1 ) input ax : acceleration in x axis direction ay : acceleration in the y-axis direction v : speed vc : variation in speed g : gravitational acceleration
  • 25. Collecting of Training Data by Experiment • Training data for the Bayesian net was collected with an examinee experiment – This sensor data were obtained as five examinees • The models were trained for estimating the notation name and attack timing from the smartphone sensor data The pitch contour of the melody of an existing tune They raised and lowered their smartphone in accordance with
  • 26. Evaluation of Prediction Accuracy 1. Evaluation of prediction accuracy of notation name – The test data and the training data are the same – Two types of accuracy were calculated using note unit and sample unit Notation name C D E The number of note : 3 The number of sample : 24 Time 1 sample (5ms) 1 note
  • 27. Evaluation of Prediction Accuracy 2. Evaluation of prediction accuracy of attack timing – The recall and the precision to the original song were examined – They were calculated using note unit Recall = Precision = Estimated attack timing that matches the original one The Number of attack timing of original one The Number of estimated attack timing Estimated attack timing that matches the original one ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ Estimated attack timing
  • 28. Result of Evaluation •Prediction accuracy of notation name –Prediction accuracy every sample is higher than every note –In both cases, the accuracy by touch is the highest •Prediction accuracy of attack timing –Precision of shake is low. Small movements are recognized as shake. –Recall of clap is low. The ambient light sensor is not reliable. Operation Recall Precision Shake 0.63 (171/270) 0.26 (171/661) Clap 0.14 (39/270) 0.31 (39/151) Operation Ratio of notes estimated as having same notation as original tune Shake 0.49 (131/270) Clap 0.49 (133/270) Touch 0.56 (152/270) Operation Ratio of samples estimated as having same notation as original tune Shake 0.66 (42805/65278) Clap 0.73 (47627/65218) Touch 0.75 (49155/65251) Accuracy of Pitch Notation (Calculated with Sample Count) Accuracy of Pitch Notation (Calculated with Note Count) Accuracy of Attack Timing (Calculated with Note Count)
  • 29. Social Reuse of Improvisational Melody Data • Our system can gather two types of users’ performance data – Pitch contour data without conversion along with tonality constraints – Tonal melody data converted from the pitch contour data
  • 30. Social Reuse of Improvisational Melody Data • If users publish their performance data as open data, – the data can be used for collaborative music composing or remixing A B C D D
  • 31. Social Reuse of Improvisational Melody Data • In particular, pitch contour data without tonality constraint can be applied to various chord progressions. – For example, pitch contour data of our system can be used in the loop sequencer based on pitch contour (melodic outline) [Kitahara 16] [Kitahara 16] T. Kitahara et al. “A loop sequencer that selects music loops based on the degree of excitement,” in Proceedings of the 12th Sound and Music Computing Conference (SMC 2015), 2015, pp. 435–438.
  • 32. Conclusion • Two types of ensemble support systems were developed – Used 3D motion sensor camera and smartphone respectively – They automatically adjust note pitches to satisfy the tonality of background music – The systems enable music novices to participate in improvisational ensembles • In the future, we are also considering – the social reuse of improvisational melody data shared as open data – and ensemble among multiple users not only with background music
  • 33. Improvisational ensemble We define improvisational ensemble as follows • Several people play instruments together with no plan • One person plays instrument with no plan in accordance with background music
  • 34. Available Gestures by RealSense Thumb upFist V sign Spread fingers Thumb down Full pinch tap WaveSwipe left Swipe right Two fingers pinch open
  • 35. Improvement of other gestures • Default function is used as it is • When Openness is 90 or more – Openness is a value indicating the degree of opening of the finger • When the degree of opening decreases by 10 or more from the previous frame openness > 90 openness – old openness > 10
  • 36. Changing range of pitch ① change range of pitch ②change pitch • User can change range of pitch using by the depth value • Depth value is the distance between the hand and the camera • If the user wanted to change the range of pitch, he should extend his hand and move it up or down