Friday, September 6, 2013

Activity 13 - Image Compression

For this activity, we want to try our hands on image compression. There are two main ways to compress an image - lossy and lossless compression - and there are many techniques underneath both. There is discrete cosine transform [1], color pallete indexing [2] and chroma subsampling [3] for lossy compression. On the other hand, there is run-length encoding [4] and DPCM [5] for lossless compression. For more details on these types of compression, you can read the links provided.

Here, we used principal component analysis (PCA) to compress an image. First, I chose a simple image, one from the Windows 7 sample images. I resized the original 1024x768 into a 800x600 image so that processing will be a little faster.

Figure 1. "Tulips" by David Nadalin, 2008. From Windows 7
default sample images.


Thursday, August 29, 2013

Activity 12 - Playing Notes by Image Processing

In this activity, we try to play music by extracting notes from an image of a musical sheet. This is a challenge for us since no other instructions are given; we are free to combine skills that we have learned in the past to achieve the desired results.

So, naturally, I started with a basic piece. This is an image of a score sheet for "Baa Baa Black Sheep", a popular nursery rhyme, taken from Wikimedia Commons [1].


To prepare the image for processing, I edited it using GIMP to become one long image so that the staves would be continuous. I took out the text, the G-clefts, and the rests, leaving only the important notes. My idea is to set a small window and examine the entire image note by note.

Here's what I did. First, from the long, continuous image I cropped individual notes corresponding to the tones used in the song. From inspection, I know that the song is in the key of D. From that key, the notes are D, E, F#, G, A, B, C# D. (I know a bit about music because I play the guitar and the piano.) Also, only the first six notes of the key of D were used in the song, so it will be a little easier for me to crop each individual note. The reason is that the y-position of the note with respect to the image is important to identify the tone. After cropping each note into 22x54 small images I took the center of mass so that the note will be reduced to a single point. I made a list of the y-coordinates of the notes:

d 48.31579
e 42.33333
f# 38.28205
g 33.3
a 29.28205
b 25.36585

So now we have a way to determine the note based on its position. Next, I set a 22x54 window which scans the length of the staff and identifying what the note it is. The next step is to determine what type of note it is: a quarter note, eighth note, etc. I only took the notes and ignored the rests; instead of pausing for one count, I made the preceding note one count longer to account for the rest. (Plus, it sounds better that way.) To determine the type of note, template matching comes in. Having samples of a quarter note, an eighth note and a half note already, it's a simple thing to take the correlation of the image and the template (in our case, the note type). I also took note of the position of all the quarter and eighth notes, and also made the notes preceding a rest one count longer. Hence, if a quarter note precedes a rest, I convert it into a half note. (Luckily no eighth note precedes a rest so it's a bit easier.) I assigned the number 4 for a quarter note, 8 for an eighth note and 2 for a half note. So my data looks like this:
tones = [293.66 329.63 369.99 392.00 440.00 493.88];
seq = [1 1 5 5 6 6 6 6 5 4 4 3 3 2 2 1 5 ... ]; //total of 50 notes
notes = [4 4 4 4 8 8 8 8 2 4 4 4 4 4 4 2 4 ...]; //total of 50 notes
So now I have a list of the note sequence and the type of notes for the song. The only thing left to do is to play it. I used Ma'am Jing's code and modified it accordingly.
function n = note(f, t)    //from Ma'am Jing
    n = sin (2*%pi*f*t);
endfunction;

s=[0];
for i=1:50     //determine type of note
    if notes(i)==4 then
        t = soundsec(0.5);
    elseif notes(i)==8 then
        t = soundsec(0.25);
    elseif notes(i)==2 then
        t = soundsec(1);
    end
    //determine note
    f = tones(seq(i));
    //play note
    s = [s, note(f*2,t)];
end
Then I saved the file into a .wav file and hosted it on archive.org so I can post a link here:

(Edit: I can't put an embedded music player, so here's the link instead: baabaa.wav)

So, I did it! Although I think this is not the most optimized solution, as I did many pre-processing techniques on the image before I was actually able to extract the notes and play them.

Grade I give myself: 10/10. I did all the required tasks and finished them early, though I did not extend the activity anymore because it was hard.

Tuesday, August 20, 2013

Activity 11 - Application of Binary Operations

For this activity we want to obtain a best estimate (mean and standard deviation) for several object in a small region of interest. In our case an image of scattered punched paper serves as the subject image, with the punched paper acting as "cells", and we want to find the average cell size in the image. The first thing to do is to divide the image into smaller 256x256 regions using GIMP. Here are some sample subimages.


Figure 1. Sample subimages. I used GIMP to crop several 256x256 images
to be used in this activity.

Sunday, August 11, 2013

Activity 9 - Color Image Segmentation

In this activity, we learned how to extract a colored region of interest (ROI) from a colored image. This is called color segmentation, as in the activity title. We learned two techniques: parametric and non-parametric segmentation.

First, I have this simple colored image of a clothes hook mounted on a white dresser. I wanted to be familiar first with the techniques so I began with an easy image. The first thing to do is to select a monochromatic ROI in the image, as I have done below.

 Figure 1. (a) The original image. (b) Cropped monochromatic
region of interest (ROI).

Next, I took the RGB channels of the image in Scilab. Remember in the previous post, I did not know how to do it in Scilab and so I had to use GIMP. Turns out it was just an additional three lines. I then transformed the RGB channels into the normalized chromaticity coordinates (NCC), given by
Although we should have three NCC terms corresponding to RGB, note that these are normalized values, and as such the sum r + g + b = 1. Therefore, we can express b in terms of r and g, b = 1 - r - g. In short, we only need two coordinates.