Thursday, August 29, 2013

Activity 12 - Playing Notes by Image Processing

In this activity, we try to play music by extracting notes from an image of a musical sheet. This is a challenge for us since no other instructions are given; we are free to combine skills that we have learned in the past to achieve the desired results.

So, naturally, I started with a basic piece. This is an image of a score sheet for "Baa Baa Black Sheep", a popular nursery rhyme, taken from Wikimedia Commons [1].


To prepare the image for processing, I edited it using GIMP to become one long image so that the staves would be continuous. I took out the text, the G-clefts, and the rests, leaving only the important notes. My idea is to set a small window and examine the entire image note by note.

Here's what I did. First, from the long, continuous image I cropped individual notes corresponding to the tones used in the song. From inspection, I know that the song is in the key of D. From that key, the notes are D, E, F#, G, A, B, C# D. (I know a bit about music because I play the guitar and the piano.) Also, only the first six notes of the key of D were used in the song, so it will be a little easier for me to crop each individual note. The reason is that the y-position of the note with respect to the image is important to identify the tone. After cropping each note into 22x54 small images I took the center of mass so that the note will be reduced to a single point. I made a list of the y-coordinates of the notes:

d 48.31579
e 42.33333
f# 38.28205
g 33.3
a 29.28205
b 25.36585

So now we have a way to determine the note based on its position. Next, I set a 22x54 window which scans the length of the staff and identifying what the note it is. The next step is to determine what type of note it is: a quarter note, eighth note, etc. I only took the notes and ignored the rests; instead of pausing for one count, I made the preceding note one count longer to account for the rest. (Plus, it sounds better that way.) To determine the type of note, template matching comes in. Having samples of a quarter note, an eighth note and a half note already, it's a simple thing to take the correlation of the image and the template (in our case, the note type). I also took note of the position of all the quarter and eighth notes, and also made the notes preceding a rest one count longer. Hence, if a quarter note precedes a rest, I convert it into a half note. (Luckily no eighth note precedes a rest so it's a bit easier.) I assigned the number 4 for a quarter note, 8 for an eighth note and 2 for a half note. So my data looks like this:
tones = [293.66 329.63 369.99 392.00 440.00 493.88];
seq = [1 1 5 5 6 6 6 6 5 4 4 3 3 2 2 1 5 ... ]; //total of 50 notes
notes = [4 4 4 4 8 8 8 8 2 4 4 4 4 4 4 2 4 ...]; //total of 50 notes
So now I have a list of the note sequence and the type of notes for the song. The only thing left to do is to play it. I used Ma'am Jing's code and modified it accordingly.
function n = note(f, t)    //from Ma'am Jing
    n = sin (2*%pi*f*t);
endfunction;

s=[0];
for i=1:50     //determine type of note
    if notes(i)==4 then
        t = soundsec(0.5);
    elseif notes(i)==8 then
        t = soundsec(0.25);
    elseif notes(i)==2 then
        t = soundsec(1);
    end
    //determine note
    f = tones(seq(i));
    //play note
    s = [s, note(f*2,t)];
end
Then I saved the file into a .wav file and hosted it on archive.org so I can post a link here:

(Edit: I can't put an embedded music player, so here's the link instead: baabaa.wav)

So, I did it! Although I think this is not the most optimized solution, as I did many pre-processing techniques on the image before I was actually able to extract the notes and play them.

Grade I give myself: 10/10. I did all the required tasks and finished them early, though I did not extend the activity anymore because it was hard.

Tuesday, August 20, 2013

Activity 11 - Application of Binary Operations

For this activity we want to obtain a best estimate (mean and standard deviation) for several object in a small region of interest. In our case an image of scattered punched paper serves as the subject image, with the punched paper acting as "cells", and we want to find the average cell size in the image. The first thing to do is to divide the image into smaller 256x256 regions using GIMP. Here are some sample subimages.


Figure 1. Sample subimages. I used GIMP to crop several 256x256 images
to be used in this activity.

Sunday, August 11, 2013

Activity 9 - Color Image Segmentation

In this activity, we learned how to extract a colored region of interest (ROI) from a colored image. This is called color segmentation, as in the activity title. We learned two techniques: parametric and non-parametric segmentation.

First, I have this simple colored image of a clothes hook mounted on a white dresser. I wanted to be familiar first with the techniques so I began with an easy image. The first thing to do is to select a monochromatic ROI in the image, as I have done below.

 Figure 1. (a) The original image. (b) Cropped monochromatic
region of interest (ROI).

Next, I took the RGB channels of the image in Scilab. Remember in the previous post, I did not know how to do it in Scilab and so I had to use GIMP. Turns out it was just an additional three lines. I then transformed the RGB channels into the normalized chromaticity coordinates (NCC), given by
Although we should have three NCC terms corresponding to RGB, note that these are normalized values, and as such the sum r + g + b = 1. Therefore, we can express b in terms of r and g, b = 1 - r - g. In short, we only need two coordinates.

Thursday, August 8, 2013

Activity 8 - Enhancement in the Frequency Domain

This activity involves applying the Fourier Transform to images and changing details in the frequency domain to enhance an image. First, for some warm-ups, I created various symmetric patterns and observed the FT of the images.

In an image, a 1-pixel dot represents a dirac delta. The FT of two dots symmetric about the center is a sinusoid pattern as shown below.

Figure 1. (a) Image of two dots symmetric about the center. Each dot is only one
pixel. (b) FT of the image in (a), a sinusoid pattern.

In general, the convolution of a pattern and a dirac delta is the repetition of that pattern at the location of the dirac delta. Thus, an image of two circles symmetric about the center can be considered as a convolution of a circle and two dots symmetric at the center. Also, we now know from the previous activity that the FT of a convolution of two images is equal to the product of the FTs of the images.

The FT of two circles symmetric about the center is equal to the product of an Airy pattern and a sinusoid. With varying radius, we can also see that the pattern changes.

 Figure 2. Left, from top to bottom: Image of two circles symmetric about the center
with radii 0.01, 0.05 and 0.1, respectively. Right, from top to bottom: FT of the
corresponding images from the left. We can see that as the radius decreases, the
Airy pattern dominates over the sinusoid pattern.

The FT of two squares symmetric about the center is equal to the product of a sinc function and a sinusoid. Here I also show the FT of squares of different sizes.

Figure 3. Left, from top to bottom: Image of two squares symmetric about the center
with sides 0.01, 0.05 and 0.1, respectively. Right, from top to bottom: FT of the
corresponding images from the left. We can see that as the length of the sides
decrease, the sinc pattern dominates over the sinusoid pattern.

I did the same for Gaussian circles and obtained the corresponding FTs.

Figure 4. Left, from top to bottom: Image of two Gaussian circles symmetric about
the center with the standard deviation σ = 0.01, 0.05 and 0.1, respectively. Right, 
from top to bottom: FT of the corresponding images from the left. We can see that
as σ decreases, the Gaussian pattern dominates over the sinusoid pattern. 

Here I placed random white dots on black background and convolved it with different patterns. I show here the convolution with a star and a pentagon. The locations of the patterns are precisely the randomly generated dirac deltas.

Figure 5. Top, left to right: A pentagon and the convolution with randomly placed
dirac deltas. Bottom, left to right: A five-pointed star and the convolution with
randomly placed dirac deltas.

And last for the preliminaries, I generated equally-spaced white patterns and obtained their corresponding FTs.

Figure 6. Left, from top to bottom: Regularly spaced white lines every 50, 20,
25 (both horizontal and vertical), 10, and 5 lines. Right, from top to bottom: FT of
the corresponding images from the left, which look like interference
patterns from a grating.

Wednesday, August 7, 2013

Activity 7 - Fourier Transform Model of Image Formation

This activity served as an introduction to manipulating images in the frequency domain to enhance or remove certain parts of an image.

Brief background: basically the Fourier Transform is a linear transform of a function f(x) into F(k) through the following equation:
In App Phy 183 and App Phy 185 we applied the Fourier Transform on various time-domain signals (sound, etc.) to convert them into frequency domain. Now, we apply the two-dimensional FT to an image to obtain the spatial frequency of an image:
However, this integral can be approximated by the Fast Fourier Transform (FFT) algorithm, which is basically the discrete FT, and is widely used in many computational and image processing programs like Python and Scilab. Now this exercise is a familiarization to the fft() and fft2() functions of Scilab.