Fourier Transform. Appplications to Image Processing

Contents

2D Fourier transform

Let $f(x,y)$ denote an $M\times N$ image, for $x=0,1,2,\ldots,M-1$ (columns) and $y=0,1,2,\ldots,N-1$ (rows).

The 2D discrete Fourier Transform (DFT) of $f$, denoted by $F(m,n)$, is given by

$$ F(m,n)= \frac{1}{MN}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1} f(x,y) \exp(-2\pi i(\frac{x}{M}m+\frac{y}{N}n)),$$

for $m=0,1,2,\ldots,M-1$ and $n=0,1,2,\ldots,N-1$.

The $M\times N$ rectangular region defined for $ (m,n)$ is called the frequency domain, and the values of $F(m,n)$ are called the Fourier coefficients.

The inverse discrete Fourier transform is given by

$$ f(x,y)=MN\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} F(m,n) \exp(2\pi i(\frac{m}{M}x+\frac{n}{N}y)),$$

for $x=0,1,2,\ldots,M-1$ and $y=0,1,2,\ldots,N-1$.

Remark. There are other ways of normalizing the DFT and its inverse through the factors $1/MN$ and $MN$. The one we have chosen is the usual in Python.

Even if $f(x,y)$ is real, its transform $F(m,n)$ is in general a complex function. For visualization, we use the power spectrum,

$$P(m,n)= F(m,n) \bar F(m,n),$$

where $\bar F$ denotes the complex conjugate of $F$.

Property: If the image is periodic, that is, it takes the same values in the top and bottom edges, and in the left and right edges, then, the DFT is also periodic, and the power spectrum is symmetric about the origin: $$ P(m,n)=P(-m,-n). $$

If the image is not periodic, then $F$ may have discontinuities. This is known as the Gibbs phenomenon.

Thanks to the above property, we may represent the DFT in a square centered at $ (0,0) $, which is more convenient for visualization porposes.

The DFT and its inverse are obtained in practice using a fast Fourier Transform. In Matlab, this is done using the command |fft2|: |F=fft2(f)|.

To compute the power spectrum, we use the Matlab function |abs|: |P=abs(F)^2|. If we want to move the origing of the transform to the center of the frequency rectangle, we use |Fc=fftshift(F)|. Finally, if we want to enhance the result, we use a $log$ scale.

Example.

In [1]:
from __future__ import division             # forces floating point division 
import numpy as np                          # Numerical Python 
import matplotlib.pyplot as plt             # Python plotting
from PIL import Image                       # Python Imaging Library
from numpy.fft import fft2, fftshift, ifft2 # Python DFT

# Show plots in the notebook (don't use it in Python scripts)
%matplotlib inline 

We start creating a periodic image of size $(601,1201)$. The period, $10.5$, is in the horizontal direction. In the vertical direction the image is constant:

In [2]:
hW, hH = 600, 300
hFreq = 10.5

# Mesh on the square [0,1)x[0,1)
x = np.linspace( 0, 2*hW/(2*hW +1), 2*hW+1)     # columns (Width)
y = np.linspace( 0, 2*hH/(2*hH +1), 2*hH+1)     # rows (Height)

[X,Y] = np.meshgrid(x,y)
A = np.sin(hFreq*2*np.pi*X)

plt.imshow(A, cmap = 'gray');
H,W = np.shape(A)

Next, we compute the DFT, center it in the origin, and show the plot of the (square root of the) power spectrum

In [3]:
F = fft2(A)/(W*H)                          
F = fftshift(F)
P = np.abs(F)                            
plt.imshow(P, extent = [-hW,hW,-hH,hH]);

We zoom a little bit to see the high values of $P$ at the frequencies $\pm hFreq$

In [4]:
plt.imshow(P[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);

Another examples: With the same mesh than above, we create new images:

In [5]:
hFreq = 10.5
vFreq = 20.5

A1 = np.sin(hFreq*2*np.pi*X) + np.sin(vFreq*2*np.pi*Y)

plt.figure()
plt.imshow(A1, cmap = 'gray');

F1 = fft2(A1)/(W*H)                          
F1 = fftshift(F1)
P1 = np.abs(F1)

plt.figure()
plt.imshow(P1[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);
In [6]:
hFreq = 10.5
vFreq = 20.5

A2 = np.sin(hFreq*2*np.pi*X + vFreq*2*np.pi*Y)

plt.figure()
plt.imshow(A2, cmap = 'gray');

F2 = fft2(A2)/(W*H)                          
F2 = fftshift(F2)
P2 = np.abs(F2)

plt.figure()
plt.imshow(P2[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);

We finally use a real image, with a high horizontal frequency component, as it is seen by the alignment of relative maxima in the horizontal axis:

In [7]:
I = Image.open("canes.jpg")
I = I.convert('L')                     # 'L' for gray scale mode
A3 = np.asarray(I, dtype = np.float32)  # Image class instance, I1, to float32 Numpy array, a

H,W = np.shape(A3)
hW = np.fix(0.5*W)
hH = np.fix(0.5*H)

plt.imshow(A3, cmap = 'gray');
In [8]:
F3 = fft2(A3)/(W*H)                          
F3 = fftshift(F3)
P3 = np.abs(F3)

r = 100
plt.figure()
plt.imshow(np.log(1+P3[hH-r:hH+r,hW-r:hW+r]), extent=[-r,r,-r,r]);

Application: image compression

An image is recovered from the set of coefficients $F$ by using the inverse Fourier Transform. As we have seen, in general, not all the coefficients have the same value. In order to use only the most important, we may select those that are larger in modulus than some given threshold $T$, i.e. we set to zero all the coefficients but those contained in the following set:

$$ S_T=\left\{0\leq m\leq M-1, 0\leq n \leq N-1 : \left| F(m,n)\right| \geq T\right\} .$$

Let us denote by $g(T)$ the number of indices, $(m,n)$, in $S_T$. One one hand, for $T=0$ we have $g(0)= MN$, since $\left| F(m,n)\right| $ is always non-negative. On the other hand, for $T> \max \left| F(m,n)\right| $ we obviously have $g(T)=0$. Observe that $g$ is a decreasing function of $T$. We construct here a plot of $g$ using the command |nnz|, which gives the number of non zero elements of a matrix.

In [9]:
points = 100
Trange = np.linspace(0.05,1,points)
g = np.zeros(points, dtype = 'float32')
n = 0

for T in Trange:
    g[n] = np.count_nonzero(P3 >= T)
    n += 1   
    
plt.plot(g);    

Now we perform a thresholding, i.e., set to zero all the coefficients which are smaller than a fixed $T$. How do we choose $T$? To start with,

  • $T$ must be large enough, to neglect some coefficients,
  • $T$ must be small enough, to keep some coefficients.

We take, for instance $T=0.1$ and look at the result:

In [10]:
T = 0.1
c = F3 * (P3 >= T)
fM = ifft2(c)*W*H
plt.imshow(np.abs(fM), cmap = 'gray');

We, finally, compute the ratio between the number of nonzero coefficients of the Fourier transform, $F$, and that of the thresholded matrix $c$. This is a kind of compression ratio:

In [11]:
out1 = np.count_nonzero(F3)
out2 = np.count_nonzero(c)
print out1/out2
240.988429104

Exercise. For a given image of size $W\times H$, we want to compress it at some given ratio $r$ by keeping the most significative Fourier coefficients, i.e., we want to determine the threshold, $T$, such that the compression ratio equals $r$.

In other words, we want to compute $T$ such that $ \displaystyle\frac{g(T)}{WH}=r $.

To do this, we compute the zero of the function $$h(T)=g(T) -rWH.$$

Follow these steps to write the script compression.py:

  • Load the image cameraman.tif and define $r=1/200$.
  • Compute its Fourier transform, $F$, as well as the max and min of its modulus, say $maxF,minF$.
  • Function $ h(T) $ may be computed as in the previous examples, i.e. using np.count_nonzero. To gain some intuition, plot this function for T=MinF:200.
  • Since $h(T)$ is decreasing, we may look for which $T$ there is a change of sign of $h$ (and therefore an approximation to a zero of $h$) by checking this sign from $T=minF$ to $T=maxF$. Once the value of $T$ around which a change of sign of $h$ has been determined (graphically), we may define the thresholded matrix of Fourirer coefficients, $c$, and the compressed image fM as we did in this section. Plot the original and the compressed images.
  • Check that the compression rate is close to $r$ (out1/out2).
  • Adapt any of the Bisection codes to compute an approximation to the unique zero of $h$. By the way, why is it unique? Check that the solution is approximately equal to the computed above.
  • Finally, transform the script into a function with input: a given image and a given ratio; output: threshold T and plot of original and compressed images.

Application: vehicle orientation

In this application, we have a vehicle that we want to orientate to follow vertical lines.

First, we retake the original image $A$ and rotate it:

In [12]:
A4 = 0.5*(1+ A)                           # normalize to [0,1]
I = Image.fromarray(np.uint8(A4*255))     # to PIL image

rot_angle = -65
B = I.rotate(rot_angle, expand = True)   # rotate without cropping

B4=np.asarray(B,dtype=np.float32)
sY,sX = np.shape(B4)
cX = np.fix(sX/2)                         # middle point of the image
cY = np.fix(sY/2)

plt.figure()
plt.imshow(B,cmap = 'gray');

F4 = fft2(B4)/(sX*sY)
F4 = fftshift(F4)
P4 = np.abs(F4)

plt.figure()
plt.imshow(P4[cY-25:cY+25,cX-25:cX+25], extent=[-25,25,-25,25]);

Due to the large constant (black) region, the value of the power spectrum at the origin is high, shadowing the values of the important frequencies. We remove it by substracting the mean value of the image.

In [13]:
F4 = fft2(B4-B4.mean())/(sX*sY)
F4 = fftshift(F4)
P4 = np.abs(F4)
plt.figure()
plt.imshow(P4[cY-25:cY+25,cX-25:cX+25], extent=[-25,25,-25,25]);

We compute the coordinates of the largest frequency component, and find the angle

In [14]:
indices = np.where(P4 == np.max(P4))
maxY = (indices[0][0]-cY)/sY 
maxX = (indices[1][0]-cX)/sX 

alpha=np.arctan(maxY/maxX)*180/np.pi

print alpha
65.1189251526

Then, we rotate the original image correspondingly

In [15]:
C=B.rotate(alpha)
plt.figure()
plt.imshow(C,cmap = 'gray');

Exercise. Create a function with:

  • Input: an image (with some peridic behavior)
  • Output: the angle to set the image straight, the straighten image, and the computed power spectrum

Apply it on the images as below:

In [16]:
%run angle.py
In [17]:
I5 = Image.open("furrows1.jpg")

alpha5, C5, P5 = angle(I5)

print alpha5
plt.figure()
plt.imshow(I5,cmap = 'gray');
plt.figure()
plt.imshow(C5,cmap = 'gray');
plt.figure()
plt.imshow(P5);
-30.865815124

In this example, the algorithm worked well. Notice the high values of P5 far from the origin.

However, in the next example, the algorithm does not work. Why? How could we fix it?

In [18]:
I6 = Image.open("furrows2.jpg")
alpha6, C6, P6 = angle(I6)

print alpha6
plt.figure()
plt.imshow(I6,cmap = 'gray');
plt.figure()
plt.imshow(C6,cmap = 'gray');
plt.figure()
plt.imshow(P6);
90.0
<string>:31: RuntimeWarning: divide by zero encountered in double_scalars