AI pipeline to "play a picture of a musical score", and its implication in generative AI
Musical score picture to audio file

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Introduction

Understand, interpret and listen the content of a musical score is something difficult if you are not a musician. Consider the scenario where you are navigating a digital library and you are reading about Johann Sebastian Bach: you can read that he was a German composer and musician, you can see some images of Bach's musical scores of English Suites but, if you are not an expert musician, it can be difficult to play this musical score, that could give you a better understanding of what you are reading and learning from the digital library.

Another important point is related to the possibility to making music scores searchable and easily accessible. This is because, as you can see in next section, the process to build the audio file from musical score picture, include the creation of a standard representation of the musical score in a machine-readable format. This means that music enthusiasts and researchers can efficiently locate specific musical pieces, composers, or genres within vast collections of scores. Researchers can analyse and compare musical compositions, styles, and historical trends more efficiently by processing and aggregating data from digitized scores. This quantitative approach allows for deeper insights into the evolution of music and its various aspects, contributing to the field of musicology. This functionality significantly enhances the user experience and access to musical resources.

In this article I will summarize a pipeline able to get in input a picture, that is a photo of your musical score and produce an audio file generated from the musical score.

AI pipeline

Theoretical description

The process that enable the reproduction of a musical score starting from a picture, is described in following picture and leverages the Optical music recognition (OMR) technique.

Article content
AI pipeline to generate an audio file from a picture of a musical score.

From Wikipedia we can read the following definition of OMR:

"Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI (for playback) and MusicXML (for page layout). In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used."

Main steps of OMR, given a picture in input, are the followings:

  1. staffline extraction
  2. noteheads extraction
  3. note group extraction
  4. symbol extraction (keys, accidentals, etc.)
  5. barlines extraction
  6. rhythm extraction
  7. MusicXML building

Optical Music Recognition is a technology that is closely linked to various research domains, such as computer vision, document analysis, and music information retrieval.


Article content
Optical Music Recognition: involved domains.


Optical music recognition is able to convert a picture into a standardised music representation format, a language that machines can easily interpret. In particular, this standard is MusicXML; its definition on w3.org website is the following:

"MusicXML is a standard open format for exchanging digital sheet music. It is designed for sharing sheet music files between applications, and for archiving sheet music files for use in the future. As of this publication date it is supported by over 250 applications."

Once you have the MusicXML file from your picture, to play your musical score, last steps are:

  1. generate a MIDI file
  2. convert the MIDI file to wav
  3. play it!

A possible implementation in Python

All the steps I explained in previous section can be arranged in a Python pipeline. In particular, the libraries you can use are the followings:

  1. Oemer (End-to-end OMR): this tool allows you to generate MusicXML file from a picture of a musical score
  2. partitura: this library allows you to generate a MIDI representation from the MusicXML file
  3. pydub and mido: these libraries allows you to generate a wav file from the MIDI

The documentation of each libraries I mentioned, allow you to create minimal set of code to run each step of the described process. Below I will report an example with code to play the Sebastian Bach's English Suite in A Minor with the described pipeline and tools.

Example: English Suite No. 2 in A Minor, Johann Sebastian Bach

In this section, the end to end process from the picture to the wav file is traveled for a sample case: play the Sebastian Bach's English Suite in A Minor, Bourrée.

Below the starting picture:

Article content
Suite No. 2 in A minor, BWV 807 - Bourrée.

First step is to use Oemer tool to generate MusicXML code. Once you installed the tool, you can use the following command to have the MusicXML:

omer /Users/simoneromano/Desktop/bach_suite_a_minor.jpg        

This is first part of the generated MusicXML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 4.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="4.0">
  <work>
    <work-title>Bach_suite_a_minor</work-title>
  </work>
  <identification>
    <creator type="composer">Transcribed by Oemer</creator>
  </identification>
  <part-list>
    <score-part id="P1">
      <part-name>Piano</part-name>
      <score-instrument id="P1-I1">
        <instrument-name>Piano</instrument-name>
        <instrument-sound>keyboard.piano</instrument-sound>
      </score-instrument>
      <midi-instrument id="P1-I1">
        <midi-channel>1</midi-channel>
        <midi-program>1</midi-program>
        <volume>80</volume>
        <pan>0</pan>
      </midi-instrument>
    </score-part>
  </part-list>        

Next step is to generate the MIDI file starting from the MusicXML. Below the code I used to implement this step using partitura library in Python:

import partitura as pt
my_xml_file = pt.EXAMPLE_MUSICXML
score = pt.load_score('bach_suite_a_minor.musicxml')

part = score.parts[0]

import numpy as np
pianoroll = np.array([(n.start.t, n.end.t, n.midi_pitch) for n in part.notes])
print(pianoroll)

pt.save_score_midi(part, 'mypart.mid')        

Once you have the MIDI file, you can use following code, that uses pydub and mido libraries,to generate the wav file:

from collections import defaultdict
from mido import MidiFile
from pydub import AudioSegment
from pydub.generators import Sine

def note_to_freq(note, concert_A=440.0):
  return (2.0 ** ((note - 69) / 12.0)) * concert_A

mid = MidiFile("mypart.mid")
output = AudioSegment.silent(mid.length * 1000.0)

tempo = 100 # bpm

def ticks_to_ms(ticks):
  tick_ms = (60000.0 / tempo) / mid.ticks_per_beat
  return ticks * tick_ms
  

for track in mid.tracks:
  current_pos = 0.0

  current_notes = defaultdict(dict)
  
  for msg in track:
    current_pos += ticks_to_ms(msg.time)

    if msg.type == 'note_on':
      current_notes[msg.channel][msg.note] = (current_pos, msg)
    
    if msg.type == 'note_off':
      start_pos, start_msg = current_notes[msg.channel].pop(msg.note)
  
      duration = current_pos - start_pos
  
      signal_generator = Sine(note_to_freq(msg.note))
      rendered = signal_generator.to_audio_segment(duration=duration-50, volume=-20).fade_out(100).fade_in(30)

      output = output.overlay(rendered, start_pos)

output.export("mypart.wav", format="wav")        

You can listen this draft version below.

As you can listen, main issues in auto-generated audio are the extraction of the following elements:

  1. rhythm
  2. accidentals

Working on image processing, these could be enhanced.

Conclusion

In this article, I reported how it is possible to play a musical score, starting from its picture (a photo) and using a combination of deep learning, machine learning and programming abilities, base of the Optical Music Recognition (OMR) I described. Considering this is a multi domains topic, to have a musician background can help, in particular to interpret the results of the algorithms.

It is really important to notice that one of the main benefits of this pipeline, in my approach, is the possibility to automatically create a standard template for music representation (MusicXML), that can be used by researchers to analyse and compare musical compositions, styles, and historical trends more efficiently. Here I'm thinking to the use of generative AI to navigate MusicXML data and generate comparisons, reports, description, etc.

This could be the main topics of a dedicated article.


Maciej Żłobiński

Zdrowie i efektywność w biurze. Ekspert: konsultant, wykladowca, szkoleniowiec w zakresie ergonomii pracy umysłowej, doradca i broker meblowy dla architekta i inwestora.

1y

Thank You for sharing. Have you meet AI tools which is able to generate the voies (basso, soprano etc.) to main melody of song?

Like
Reply
Nayim Hussain

4x Founder | I use AI to scale growth-centric SMBs and Startups aiming for a 7-8 figure revenue

1y

Sounds fascinating! Can't wait to read your article.

  • No alternative text description for this image
Claudio Retico

Enterprise Architect in Mauden S.r.l. (a.k.a. non esiste lo scenario della Kobayashi Maru e se c'è cambiamo il contesto)

1y

Lo vogliamo.....

To view or add a comment, sign in

Others also viewed

Explore content categories