So, naturally, I started with a basic piece. This is an image of a score sheet for "Baa Baa Black Sheep", a popular nursery rhyme, taken from Wikimedia Commons [1].
To prepare the image for processing, I edited it using GIMP to become one long image so that the staves would be continuous. I took out the text, the G-clefts, and the rests, leaving only the important notes. My idea is to set a small window and examine the entire image note by note.
Here's what I did. First, from the long, continuous image I cropped individual notes corresponding to the tones used in the song. From inspection, I know that the song is in the key of D. From that key, the notes are D, E, F#, G, A, B, C# D. (I know a bit about music because I play the guitar and the piano.) Also, only the first six notes of the key of D were used in the song, so it will be a little easier for me to crop each individual note. The reason is that the y-position of the note with respect to the image is important to identify the tone. After cropping each note into 22x54 small images I took the center of mass so that the note will be reduced to a single point. I made a list of the y-coordinates of the notes:
d | 48.31579 |
e | 42.33333 |
f# | 38.28205 |
g | 33.3 |
a | 29.28205 |
b | 25.36585 |
So now we have a way to determine the note based on its position. Next, I set a 22x54 window which scans the length of the staff and identifying what the note it is. The next step is to determine what type of note it is: a quarter note, eighth note, etc. I only took the notes and ignored the rests; instead of pausing for one count, I made the preceding note one count longer to account for the rest. (Plus, it sounds better that way.) To determine the type of note, template matching comes in. Having samples of a quarter note, an eighth note and a half note already, it's a simple thing to take the correlation of the image and the template (in our case, the note type). I also took note of the position of all the quarter and eighth notes, and also made the notes preceding a rest one count longer. Hence, if a quarter note precedes a rest, I convert it into a half note. (Luckily no eighth note precedes a rest so it's a bit easier.) I assigned the number 4 for a quarter note, 8 for an eighth note and 2 for a half note. So my data looks like this:
Then I saved the file into a .wav file and hosted it on archive.org so I can post a link here:tones = [293.66 329.63 369.99 392.00 440.00 493.88];So now I have a list of the note sequence and the type of notes for the song. The only thing left to do is to play it. I used Ma'am Jing's code and modified it accordingly.
seq = [1 1 5 5 6 6 6 6 5 4 4 3 3 2 2 1 5 ... ]; //total of 50 notes
notes = [4 4 4 4 8 8 8 8 2 4 4 4 4 4 4 2 4 ...]; //total of 50 notes
function n = note(f, t) //from Ma'am Jing
n = sin (2*%pi*f*t);
endfunction;
s=[0];
for i=1:50 //determine type of note
if notes(i)==4 then
t = soundsec(0.5);
elseif notes(i)==8 then
t = soundsec(0.25);
elseif notes(i)==2 then
t = soundsec(1);
end
//determine note
f = tones(seq(i));
//play note
s = [s, note(f*2,t)];
end
(Edit: I can't put an embedded music player, so here's the link instead: baabaa.wav)
So, I did it! Although I think this is not the most optimized solution, as I did many pre-processing techniques on the image before I was actually able to extract the notes and play them.
Grade I give myself: 10/10. I did all the required tasks and finished them early, though I did not extend the activity anymore because it was hard.