SlideShare a Scribd company logo
MuseGAN: Multi-track Sequential Generative
Adversarial Networks for Symbolic Music
Generation and Accompaniment
Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, Yi-Hsuan Yang
Research Center of IT Innovation, Academia Sinica
Demo Page https://salu133445.github.io/musegan/
*these authors contributed equally to this work
Outline
。Goals & Challenges
。Data
。Proposed Model
。Results & Evaluation
。Future Works
Source Code https://github.com/salu133445/musegan
Demo Page https://salu133445.github.io/musegan/
2
Goals
Generate pop music
。of multiple tracks
。in piano-roll format
。using GAN with CNNs
[Source Code]
https://github.com/
salu133445/musegan
[Demo Page]
https://salu133445.
github.io/musegan/
3
Challenge I
Multitrack Interdependency
vocal
piano
bass
drums
strings
music & clip by phycause
Multi-track GAN
4
Challenge II
Music Texture
melody
chord
(harmony)
Convolutional
Neural Networks
5
Challenge III
Temporal Structure
paragraph 1 paragraph 2 paragraph 3
phrase 1 phrase 2 phrase 3 phrase 4
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
song
phrase 2
4/4 time
6
Challenge III
Temporal Structure
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
phrase 2 Fixed Structure
Convolutional
Neural Networks
4/4 time
7
Data Representation
pitch
time
Bar 1 Bar 2 Bar 3 Bar 4
time step
8
Piano-roll
polyphonic  multi-track 
(with symbolic timing)
Data Representation
pitch
time
Piano-roll
Bar 1 Bar 2 Bar 3 Bar 4
polyphonic  multi-track 
(with symbolic timing)
9
A3
t0 t1
Data Representation
Multi-track Piano-roll
pitch
time
tracks
polyphonic  multi-track 
(with symbolic timing)
10
Data Representation
11
96 time steps
84
pitches 5 tracks
4 bars
a 4×96×84×5 tensor
Drums
GuitarPiano
Strings
Bass
Data
LPD (Lakh Pianoroll Dataset)
。>170,000 multi-track piano-rolls
。Derived from Lakh MIDI Dataset
。Mainly pop songs
Pypianoroll (Python package)
。Manipulation & Visualization
。Efficient Save/Load
。Parse/Write MIDI files
。On PYPI (pip installable)
[Dataset]
https://salu133445.gith
ub.io/musegan/dataset
[Pypianoroll]
https://salu133445.
github.io/pypianoroll/
12
Generative Adversarial Networks
X
real data
Gz~p(z) G(z)
random noise fake data
Generator
D real/fake
Discriminator
4-bar phrases of 5 tracks
critic
(wgan-gp)
13
MuseGAN – An Overview
Gtemp
4 latent variables1 random noise
temporal
generator
bar
generator
4 piano-roll matrices
Gbar
14
Bar Generator
MuseGAN
zzzzz
zzzz
z
z
z
z
GGGGG
15
MuseGAN
z
Bar Generator
zzzz
zzzzz
z
z
z
z
16
GGGGG
No Coordination
Coordination
track-dependent
track-independent
zzzzz
MuseGAN
z
Bar Generator
Gz
GGGGG
zzzz
zzzz
zzzzz
z
z
z
z
17
GGGGG
zzzzz
MuseGAN
z
Bar Generator
Gz
GGGGG
zzzz
zzzz
zzzzz
z
z
z
z
18
GGGGG
Time
Dependent Independent
Track
Dependent Melody Groove
Independent Chords Style
zzzzz
MuseGAN
z
Bar Generator
Gz
GGGGG
zzzz
zzzz
zzzzz
z
z
z
z
19
GGGGG
Chords
Style
Chords
Groove
Results
More Samples on Demo Page
https://salu133445.github.io/musegan/
Sample 1 Sample 2
20
Bass
Drums
Guitar
Strings
Piano
Step 0 Step 700 Step 2500 Step 6000 Step 7900
Drum pattern
Chords
Bass Line
Objective Metrics
UPC
step
QN
step
UPC
number of used pitch
classes per bar
QN ratio of qualified notes
Monitor the Training
21
step
2000 4000 6000 8000
104
106
108
1010
1012
0
Negative Critic Loss
User Study
H: harmonious
R: rhythmic
MS: musically structured
C: coherent
OR: overall rating
composer
jamming
hybrid
22
Summary
。MuseGAN
◦ a novel GAN for multi-track sequence generation
◦ multi-track, polyphonic music
◦ human-AI cooperative scenario (see the paper)
。Lakh Pianoroll Dataset (LPD) (new dataset!!)
。Pypianoroll (new package!!)
23
Future
Works
Full Song Generation
bar 1 bar 2 bar 3 bar 4
beat 1 beat 2 beat 3 beat 4
step 1 step 2 ··· step 24
phrase 2
paragraph 1 paragraph 2 paragraph 3
phrase 1 phrase 2 phrase 3 phrase 4
song
Hierarchical Temporal Structure
24
Future
Works
Cross-modal Generation
。Music + Video
。Music + Lyrics
。Video + Text
25
Q&A
MuseGAN: Multi-track Sequential Generative
Adversarial Networks for Symbolic Music
Generation and Accompaniment
Source Code https://github.com/salu133445/musegan
Demo Page https://salu133445.github.io/musegan/

More Related Content

TXT
manual hp
PDF
rED bULL Head banging Akabeko
DOCX
manejo de repositorios
DOCX
Practicas word servio japon 5
PDF
日本全国ぶらりPerl旅
KEY
PyCon JP 2011 Lightning Talk No.10
PPTX
5. audio pre production
PDF
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
manual hp
rED bULL Head banging Akabeko
manejo de repositorios
Practicas word servio japon 5
日本全国ぶらりPerl旅
PyCon JP 2011 Lightning Talk No.10
5. audio pre production
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...

Recently uploaded (20)

PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
The various Industrial Revolutions .pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Hindi spoken digit analysis for native and non-native speakers
A contest of sentiment analysis: k-nearest neighbor versus neural network
Group 1 Presentation -Planning and Decision Making .pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Assigned Numbers - 2025 - Bluetooth® Document
cloud_computing_Infrastucture_as_cloud_p
OMC Textile Division Presentation 2021.pptx
Getting started with AI Agents and Multi-Agent Systems
1 - Historical Antecedents, Social Consideration.pdf
Enhancing emotion recognition model for a student engagement use case through...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
observCloud-Native Containerability and monitoring.pptx
The various Industrial Revolutions .pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Web App vs Mobile App What Should You Build First.pdf
Tartificialntelligence_presentation.pptx
O2C Customer Invoices to Receipt V15A.pptx
Getting Started with Data Integration: FME Form 101
Hindi spoken digit analysis for native and non-native speakers
Ad
Ad

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment (AAAI 2018)