AI pipeline to "play a picture of a musical score", and its implication in generative AI

Simone Romano

Associate Partner - AI & Analytics Practice Leader

Published Oct 15, 2023

Introduction

Understand, interpret and listen the content of a musical score is something difficult if you are not a musician. Consider the scenario where you are navigating a digital library and you are reading about Johann Sebastian Bach: you can read that he was a German composer and musician, you can see some images of Bach's musical scores of English Suites but, if you are not an expert musician, it can be difficult to play this musical score, that could give you a better understanding of what you are reading and learning from the digital library.

Another important point is related to the possibility to making music scores searchable and easily accessible. This is because, as you can see in next section, the process to build the audio file from musical score picture, include the creation of a standard representation of the musical score in a machine-readable format. This means that music enthusiasts and researchers can efficiently locate specific musical pieces, composers, or genres within vast collections of scores. Researchers can analyse and compare musical compositions, styles, and historical trends more efficiently by processing and aggregating data from digitized scores. This quantitative approach allows for deeper insights into the evolution of music and its various aspects, contributing to the field of musicology. This functionality significantly enhances the user experience and access to musical resources.

In this article I will summarize a pipeline able to get in input a picture, that is a photo of your musical score and produce an audio file generated from the musical score.

AI pipeline

Theoretical description

The process that enable the reproduction of a musical score starting from a picture, is described in following picture and leverages the Optical music recognition (OMR) technique.

Article content — AI pipeline to generate an audio file from a picture of a musical score.

From Wikipedia we can read the following definition of OMR:

"Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI (for playback) and MusicXML (for page layout). In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used."

Main steps of OMR, given a picture in input, are the followings:

staffline extraction
noteheads extraction
note group extraction
symbol extraction (keys, accidentals, etc.)
barlines extraction
rhythm extraction
MusicXML building

Optical Music Recognition is a technology that is closely linked to various research domains, such as computer vision, document analysis, and music information retrieval.

Optical music recognition is able to convert a picture into a standardised music representation format, a language that machines can easily interpret. In particular, this standard is MusicXML; its definition on w3.org website is the following:

"MusicXML is a standard open format for exchanging digital sheet music. It is designed for sharing sheet music files between applications, and for archiving sheet music files for use in the future. As of this publication date it is supported by over 250 applications."

Once you have the MusicXML file from your picture, to play your musical score, last steps are:

generate a MIDI file
convert the MIDI file to wav
play it!

A possible implementation in Python

All the steps I explained in previous section can be arranged in a Python pipeline. In particular, the libraries you can use are the followings:

Oemer (End-to-end OMR): this tool allows you to generate MusicXML file from a picture of a musical score
partitura: this library allows you to generate a MIDI representation from the MusicXML file
pydub and mido: these libraries allows you to generate a wav file from the MIDI

The documentation of each libraries I mentioned, allow you to create minimal set of code to run each step of the described process. Below I will report an example with code to play the Sebastian Bach's English Suite in A Minor with the described pipeline and tools.

Example: English Suite No. 2 in A Minor, Johann Sebastian Bach

In this section, the end to end process from the picture to the wav file is traveled for a sample case: play the Sebastian Bach's English Suite in A Minor, Bourrée.

Below the starting picture:

First step is to use Oemer tool to generate MusicXML code. Once you installed the tool, you can use the following command to have the MusicXML:

omer /Users/simoneromano/Desktop/bach_suite_a_minor.jpg

This is first part of the generated MusicXML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 4.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="4.0">
  <work>
    <work-title>Bach_suite_a_minor</work-title>
  </work>
  <identification>
    <creator type="composer">Transcribed by Oemer</creator>
  </identification>
  <part-list>
    <score-part id="P1">
      <part-name>Piano</part-name>
      <score-instrument id="P1-I1">
        <instrument-name>Piano</instrument-name>
        <instrument-sound>keyboard.piano</instrument-sound>
      </score-instrument>
      <midi-instrument id="P1-I1">
        <midi-channel>1</midi-channel>
        <midi-program>1</midi-program>
        <volume>80</volume>
        <pan>0</pan>
      </midi-instrument>
    </score-part>
  </part-list>

Next step is to generate the MIDI file starting from the MusicXML. Below the code I used to implement this step using partitura library in Python:

import partitura as pt
my_xml_file = pt.EXAMPLE_MUSICXML
score = pt.load_score('bach_suite_a_minor.musicxml')

part = score.parts[0]

import numpy as np
pianoroll = np.array([(n.start.t, n.end.t, n.midi_pitch) for n in part.notes])
print(pianoroll)

pt.save_score_midi(part, 'mypart.mid')

Once you have the MIDI file, you can use following code, that uses pydub and mido libraries,to generate the wav file:

from collections import defaultdict
from mido import MidiFile
from pydub import AudioSegment
from pydub.generators import Sine

def note_to_freq(note, concert_A=440.0):
  return (2.0 ** ((note - 69) / 12.0)) * concert_A

mid = MidiFile("mypart.mid")
output = AudioSegment.silent(mid.length * 1000.0)

tempo = 100 # bpm

def ticks_to_ms(ticks):
  tick_ms = (60000.0 / tempo) / mid.ticks_per_beat
  return ticks * tick_ms
  

for track in mid.tracks:
  current_pos = 0.0

  current_notes = defaultdict(dict)
  
  for msg in track:
    current_pos += ticks_to_ms(msg.time)

    if msg.type == 'note_on':
      current_notes[msg.channel][msg.note] = (current_pos, msg)
    
    if msg.type == 'note_off':
      start_pos, start_msg = current_notes[msg.channel].pop(msg.note)
  
      duration = current_pos - start_pos
  
      signal_generator = Sine(note_to_freq(msg.note))
      rendered = signal_generator.to_audio_segment(duration=duration-50, volume=-20).fade_out(100).fade_in(30)

      output = output.overlay(rendered, start_pos)

output.export("mypart.wav", format="wav")

You can listen this draft version below.

As you can listen, main issues in auto-generated audio are the extraction of the following elements:

rhythm
accidentals

Working on image processing, these could be enhanced.

Conclusion

In this article, I reported how it is possible to play a musical score, starting from its picture (a photo) and using a combination of deep learning, machine learning and programming abilities, base of the Optical Music Recognition (OMR) I described. Considering this is a multi domains topic, to have a musician background can help, in particular to interpret the results of the algorithms.

It is really important to notice that one of the main benefits of this pipeline, in my approach, is the possibility to automatically create a standard template for music representation (MusicXML), that can be used by researchers to analyse and compare musical compositions, styles, and historical trends more efficiently. Here I'm thinking to the use of generative AI to navigate MusicXML data and generate comparisons, reports, description, etc.

This could be the main topics of a dedicated article.

Maciej Żłobiński

Zdrowie i efektywność w biurze. Ekspert: konsultant, wykladowca, szkoleniowiec w zakresie ergonomii pracy umysłowej, doradca i broker meblowy dla architekta i inwestora.

Thank You for sharing. Have you meet AI tools which is able to generate the voies (basso, soprano etc.) to main melody of song?

Nayim Hussain

4x Founder | I use AI to scale growth-centric SMBs and Startups aiming for a 7-8 figure revenue

Sounds fascinating! Can't wait to read your article.

1 Reaction

Claudio Retico

Enterprise Architect in Mauden S.r.l. (a.k.a. non esiste lo scenario della Kobayashi Maru e se c'è cambiamo il contesto)

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Simone Romano

Associate Partner - AI & Analytics Practice Leader

Introduction

AI pipeline

Theoretical description

A possible implementation in Python

Example: English Suite No. 2 in A Minor, Johann Sebastian Bach

Conclusion

More articles by this author

Others also viewed

Judge the Music, Not the Method: AI Production in a Post-Thriller World.

Cover AI: When Legends Sing Again

Sight Over Sound: Why You Need To Give It Your All in Every Performance

Creating music with AI doesn't make you a musician or songwriter.

Which is easier to write: Tonal music or post-tonal music?

The Physics of Music

Pitch Perfect Innovation: How GenAI is Becoming the Autotune of Design

The Unseen Orchestra: Unmasking Ghost Production in Music

Hausa Music: The Influence of Hindi Films on Hausa Music (Part B)

Kubernetes Architecture

Explore content categories

Introduction

AI pipeline

Theoretical description

A possible implementation in Python

Example: English Suite No. 2 in A Minor, Johann Sebastian Bach

Conclusion

Agentic AI for Architects – and Architects for Agentic AI

Sep 8, 2025

Standardizing AI-World Interactions: The Model Context Protocol Revolution

May 9, 2025

AI Agents: How to Choose the Right Implementation Paradigm

Apr 19, 2025

Key Questions to Address When Implementing Agentic AI Solutions

Mar 28, 2025

Agentic AI: Redefining Business Operations

Mar 25, 2025

Music composition and rapid prototyping with generative AI and IBM watsonx

May 8, 2024

Revolutionizing Document Management in SAP with Generative AI

Feb 1, 2024

Generative AI happened

Jan 10, 2024

Generative AI to improve OCR

Jan 10, 2024

Innovative approach to AI project delivery with Generative AI

Nov 24, 2023

Others also viewed

Judge the Music, Not the Method: AI Production in a Post-Thriller World.

Cover AI: When Legends Sing Again

Sight Over Sound: Why You Need To Give It Your All in Every Performance

Creating music with AI doesn't make you a musician or songwriter.

Which is easier to write: Tonal music or post-tonal music?

The Physics of Music

Pitch Perfect Innovation: How GenAI is Becoming the Autotune of Design

The Unseen Orchestra: Unmasking Ghost Production in Music

Hausa Music: The Influence of Hindi Films on Hausa Music (Part B)

Kubernetes Architecture

Explore content categories