Friday, September 6, 2013

Activity 13 - Image Compression

For this activity, we want to try our hands on image compression. There are two main ways to compress an image - lossy and lossless compression - and there are many techniques underneath both. There is discrete cosine transform [1], color pallete indexing [2] and chroma subsampling [3] for lossy compression. On the other hand, there is run-length encoding [4] and DPCM [5] for lossless compression. For more details on these types of compression, you can read the links provided.

Here, we used principal component analysis (PCA) to compress an image. First, I chose a simple image, one from the Windows 7 sample images. I resized the original 1024x768 into a 800x600 image so that processing will be a little faster.

Figure 1. "Tulips" by David Nadalin, 2008. From Windows 7
default sample images.


Thursday, August 29, 2013

Activity 12 - Playing Notes by Image Processing

In this activity, we try to play music by extracting notes from an image of a musical sheet. This is a challenge for us since no other instructions are given; we are free to combine skills that we have learned in the past to achieve the desired results.

So, naturally, I started with a basic piece. This is an image of a score sheet for "Baa Baa Black Sheep", a popular nursery rhyme, taken from Wikimedia Commons [1].


To prepare the image for processing, I edited it using GIMP to become one long image so that the staves would be continuous. I took out the text, the G-clefts, and the rests, leaving only the important notes. My idea is to set a small window and examine the entire image note by note.

Here's what I did. First, from the long, continuous image I cropped individual notes corresponding to the tones used in the song. From inspection, I know that the song is in the key of D. From that key, the notes are D, E, F#, G, A, B, C# D. (I know a bit about music because I play the guitar and the piano.) Also, only the first six notes of the key of D were used in the song, so it will be a little easier for me to crop each individual note. The reason is that the y-position of the note with respect to the image is important to identify the tone. After cropping each note into 22x54 small images I took the center of mass so that the note will be reduced to a single point. I made a list of the y-coordinates of the notes:

d 48.31579
e 42.33333
f# 38.28205
g 33.3
a 29.28205
b 25.36585

So now we have a way to determine the note based on its position. Next, I set a 22x54 window which scans the length of the staff and identifying what the note it is. The next step is to determine what type of note it is: a quarter note, eighth note, etc. I only took the notes and ignored the rests; instead of pausing for one count, I made the preceding note one count longer to account for the rest. (Plus, it sounds better that way.) To determine the type of note, template matching comes in. Having samples of a quarter note, an eighth note and a half note already, it's a simple thing to take the correlation of the image and the template (in our case, the note type). I also took note of the position of all the quarter and eighth notes, and also made the notes preceding a rest one count longer. Hence, if a quarter note precedes a rest, I convert it into a half note. (Luckily no eighth note precedes a rest so it's a bit easier.) I assigned the number 4 for a quarter note, 8 for an eighth note and 2 for a half note. So my data looks like this:
tones = [293.66 329.63 369.99 392.00 440.00 493.88];
seq = [1 1 5 5 6 6 6 6 5 4 4 3 3 2 2 1 5 ... ]; //total of 50 notes
notes = [4 4 4 4 8 8 8 8 2 4 4 4 4 4 4 2 4 ...]; //total of 50 notes
So now I have a list of the note sequence and the type of notes for the song. The only thing left to do is to play it. I used Ma'am Jing's code and modified it accordingly.
function n = note(f, t)    //from Ma'am Jing
    n = sin (2*%pi*f*t);
endfunction;

s=[0];
for i=1:50     //determine type of note
    if notes(i)==4 then
        t = soundsec(0.5);
    elseif notes(i)==8 then
        t = soundsec(0.25);
    elseif notes(i)==2 then
        t = soundsec(1);
    end
    //determine note
    f = tones(seq(i));
    //play note
    s = [s, note(f*2,t)];
end
Then I saved the file into a .wav file and hosted it on archive.org so I can post a link here:

(Edit: I can't put an embedded music player, so here's the link instead: baabaa.wav)

So, I did it! Although I think this is not the most optimized solution, as I did many pre-processing techniques on the image before I was actually able to extract the notes and play them.

Grade I give myself: 10/10. I did all the required tasks and finished them early, though I did not extend the activity anymore because it was hard.

Tuesday, August 20, 2013

Activity 11 - Application of Binary Operations

For this activity we want to obtain a best estimate (mean and standard deviation) for several object in a small region of interest. In our case an image of scattered punched paper serves as the subject image, with the punched paper acting as "cells", and we want to find the average cell size in the image. The first thing to do is to divide the image into smaller 256x256 regions using GIMP. Here are some sample subimages.


Figure 1. Sample subimages. I used GIMP to crop several 256x256 images
to be used in this activity.

Sunday, August 11, 2013

Activity 9 - Color Image Segmentation

In this activity, we learned how to extract a colored region of interest (ROI) from a colored image. This is called color segmentation, as in the activity title. We learned two techniques: parametric and non-parametric segmentation.

First, I have this simple colored image of a clothes hook mounted on a white dresser. I wanted to be familiar first with the techniques so I began with an easy image. The first thing to do is to select a monochromatic ROI in the image, as I have done below.

 Figure 1. (a) The original image. (b) Cropped monochromatic
region of interest (ROI).

Next, I took the RGB channels of the image in Scilab. Remember in the previous post, I did not know how to do it in Scilab and so I had to use GIMP. Turns out it was just an additional three lines. I then transformed the RGB channels into the normalized chromaticity coordinates (NCC), given by
Although we should have three NCC terms corresponding to RGB, note that these are normalized values, and as such the sum r + g + b = 1. Therefore, we can express b in terms of r and g, b = 1 - r - g. In short, we only need two coordinates.

Thursday, August 8, 2013

Activity 8 - Enhancement in the Frequency Domain

This activity involves applying the Fourier Transform to images and changing details in the frequency domain to enhance an image. First, for some warm-ups, I created various symmetric patterns and observed the FT of the images.

In an image, a 1-pixel dot represents a dirac delta. The FT of two dots symmetric about the center is a sinusoid pattern as shown below.

Figure 1. (a) Image of two dots symmetric about the center. Each dot is only one
pixel. (b) FT of the image in (a), a sinusoid pattern.

In general, the convolution of a pattern and a dirac delta is the repetition of that pattern at the location of the dirac delta. Thus, an image of two circles symmetric about the center can be considered as a convolution of a circle and two dots symmetric at the center. Also, we now know from the previous activity that the FT of a convolution of two images is equal to the product of the FTs of the images.

The FT of two circles symmetric about the center is equal to the product of an Airy pattern and a sinusoid. With varying radius, we can also see that the pattern changes.

 Figure 2. Left, from top to bottom: Image of two circles symmetric about the center
with radii 0.01, 0.05 and 0.1, respectively. Right, from top to bottom: FT of the
corresponding images from the left. We can see that as the radius decreases, the
Airy pattern dominates over the sinusoid pattern.

The FT of two squares symmetric about the center is equal to the product of a sinc function and a sinusoid. Here I also show the FT of squares of different sizes.

Figure 3. Left, from top to bottom: Image of two squares symmetric about the center
with sides 0.01, 0.05 and 0.1, respectively. Right, from top to bottom: FT of the
corresponding images from the left. We can see that as the length of the sides
decrease, the sinc pattern dominates over the sinusoid pattern.

I did the same for Gaussian circles and obtained the corresponding FTs.

Figure 4. Left, from top to bottom: Image of two Gaussian circles symmetric about
the center with the standard deviation σ = 0.01, 0.05 and 0.1, respectively. Right, 
from top to bottom: FT of the corresponding images from the left. We can see that
as σ decreases, the Gaussian pattern dominates over the sinusoid pattern. 

Here I placed random white dots on black background and convolved it with different patterns. I show here the convolution with a star and a pentagon. The locations of the patterns are precisely the randomly generated dirac deltas.

Figure 5. Top, left to right: A pentagon and the convolution with randomly placed
dirac deltas. Bottom, left to right: A five-pointed star and the convolution with
randomly placed dirac deltas.

And last for the preliminaries, I generated equally-spaced white patterns and obtained their corresponding FTs.

Figure 6. Left, from top to bottom: Regularly spaced white lines every 50, 20,
25 (both horizontal and vertical), 10, and 5 lines. Right, from top to bottom: FT of
the corresponding images from the left, which look like interference
patterns from a grating.

Wednesday, August 7, 2013

Activity 7 - Fourier Transform Model of Image Formation

This activity served as an introduction to manipulating images in the frequency domain to enhance or remove certain parts of an image.

Brief background: basically the Fourier Transform is a linear transform of a function f(x) into F(k) through the following equation:
In App Phy 183 and App Phy 185 we applied the Fourier Transform on various time-domain signals (sound, etc.) to convert them into frequency domain. Now, we apply the two-dimensional FT to an image to obtain the spatial frequency of an image:
However, this integral can be approximated by the Fast Fourier Transform (FFT) algorithm, which is basically the discrete FT, and is widely used in many computational and image processing programs like Python and Scilab. Now this exercise is a familiarization to the fft() and fft2() functions of Scilab.

Wednesday, July 3, 2013

Activity 6 - Enhancement by Histogram Manipulation

This activity, as the title suggests, involves playing around with the histogram of a grayscale image to correct or enhance the image. In our case, we chose a dark image and we want to enhance the image by correcting the brightness and contrast levels so that more detail will be visible.

So here's what I did. First I chose my image. I have a sunset image of Dresden, Germany (from my adviser Dr. Rene Batac who's taking up his postdoc there) as shown below:
Figure 1. Sunset in Dresden. My adviser sent this to me
because he knew how much I loved sunsets, and it was a
particularly beautiful sunset.

As you can see, the foreground is dark while the background is light, and so I cropped the image to only select the lower half of the sunset image,
Figure 2. Lower half of the sunset image above. The foliage
and a few buildings are barely visible in this half.
and used this image for the activity today. The goal now is to adjust the brightness and contrast levels to make the foreground visible.

To enhance my image, here are the steps I did. First, I converted the image to grayscale using Scilab and took the histogram of the gray values. Normalizing the histogram, I would obtain the probability density function (PDF) of my image. Strictly speaking, a PDF describes the probability that a variable may take on a specific value, but here the PDF of the image describes the likelihood that a gray value (from 0-255) will appear on the image. As we can see in the images below, the PDF peaks around the lower values, signifying that the image is mostly dark-pixeled. Another related graph is the cumulative distribution function (CDF) of the image, which describes the likelihood that a gray value will be less than or equal to some given gray value.
For my image, the CDF is rapidly increasing along low gray values and tapers of at 1. This is expected, as the probability cannot be more than 1.
Figure 3. from left to right: (a) histogram; (b) PDF; and (c) CDF of the grayscaled
image shown in Figure 2. The gray values are concentrated at the lower end.
The next task is to correct the image by applying a linear CDF onto the original image. To do this, I first created a uniformly distributed PDF and took the corresponding CDF using Scilab. For each CDF value in the image, I need to find the corresponding image pixel in the linear CDF. My first idea was to use find() and return the index of the CDF value. However this does not work, as the CDF values in the image and in the linear CDF I constructed do not correspond. The trick is to use the mathematical equations and treat the CDF as a continuous function, rather than a matrix.

Tuesday, June 25, 2013

Activity 5 - Area Estimation

We started this activity last Thursday and we were allotted two meeting for it. The activity was a bit tedious but is easy once you get the hang of it. Scilab, GIMP and even Microsoft Paint (for my case at least) were utilized in this activity.

To start, the task is to estimate the area of a certain figure in an image. As a start, we were asked to create black and white pictures of regular polygons. I created five - rectangle, square, circle, equilateral triangle and right triangle - using Paint, as it was the easiest method for me. The background was black, and the shapes were white. Also, the area of the shapes should be analytically known. For my case, I measured the pixel dimensions to obtain the area.

Next, I loaded the image in Scilab and obtained the edge image as asked in the activity. Before running the script I made, however, I typed in help edge on the console to know more about the command. I found out that there are five ways that Scilab calculates for the edges. I used all five methods and compared the results.

So here are my results. First I used the rectangle. I measured the length and width of the figure to be 119 and 449 pixels, respectively, and so the area of the rectangle is 53,431 square pixels (sq. px.).
Figure 1.  A rectangle, with an area of
53,341 sq. px.

Then I used all five methods to obtain the edges, and ultimately the area, of the rectangle. First, I used the Sobel gradient estimation method, which uses a pair of 3x3 convolution masks to estimate the gradient along the rows and the columns of a 2x2 matrix (in our case an image) [1]. The result is shown below.
Figure 2. Obtaining the edges of the
rectangle using the Sobel method.
e = edge(im, 'sobel');

Wednesday, June 19, 2013

Activity 4 - Image Types and Formats

Yesterday's activity was about the different image types and formats used in image processing. The first task was to obtain a true color image and manipulate it by converting the image into grayscale, binary and indexed forms. For this part, I used GNU Image Manipulation Program (GIMP) and obtained true color images of beautiful landscapes online.

This is the image that I chose, scaled down to 25% for web viewing, which I obtained here. The original size is 1280 x 1024 pixels and the resolution is 72 x 72 pixels per inch (ppi). Also, the image file size is 404 kB. After rescaling, however, the new image size, resolution and file size is 320 x 256 pixels, 72 pixels per inch and 58.0 kB. The next three images show the grayscale, binary and indexed forms of the original, true color image.


Clockwise from top left: (a) rescaled image in true color, 58.0 kB; (b) image converted
into grayscale, 49.8 kB; (c) image converted into binary, 4.76 kB; and (d) indexed image
with 32 values, 25.7 kB.