Let $f(x,y)$ denote an $M\times N$ image, for $x=0,1,2,\ldots,M-1$ (columns) and $y=0,1,2,\ldots,N-1$ (rows).
The 2D discrete Fourier Transform (DFT) of $f$, denoted by $F(m,n)$, is given by
$$ F(m,n)= \frac{1}{MN}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1} f(x,y) \exp(-2\pi i(\frac{x}{M}m+\frac{y}{N}n)),$$
for $m=0,1,2,\ldots,M-1$ and $n=0,1,2,\ldots,N-1$.
The $M\times N$ rectangular region defined for $ (m,n)$ is called the frequency domain, and the values of $F(m,n)$ are called the Fourier coefficients.
The inverse discrete Fourier transform is given by
$$ f(x,y)=MN\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} F(m,n) \exp(2\pi i(\frac{m}{M}x+\frac{n}{N}y)),$$
for $x=0,1,2,\ldots,M-1$ and $y=0,1,2,\ldots,N-1$.
Remark. There are other ways of normalizing the DFT and its inverse through the factors $1/MN$ and $MN$. The one we have chosen is the usual in Python.
Even if $f(x,y)$ is real, its transform $F(m,n)$ is in general a complex function. For visualization, we use the power spectrum,
$$P(m,n)= F(m,n) \bar F(m,n),$$where $\bar F$ denotes the complex conjugate of $F$.
Property: If the image is periodic, that is, it takes the same values in the top and bottom edges, and in the left and right edges, then, the DFT is also periodic, and the power spectrum is symmetric about the origin: $$ P(m,n)=P(-m,-n). $$
If the image is not periodic, then $F$ may have discontinuities. This is known as the Gibbs phenomenon.
Thanks to the above property, we may represent the DFT in a square centered at $ (0,0) $, which is more convenient for visualization porposes.
The DFT and its inverse are obtained in practice using a fast Fourier Transform. In Matlab, this is done using the command |fft2|: |F=fft2(f)|.
To compute the power spectrum, we use the Matlab function |abs|: |P=abs(F)^2|. If we want to move the origing of the transform to the center of the frequency rectangle, we use |Fc=fftshift(F)|. Finally, if we want to enhance the result, we use a $log$ scale.
Example.
from __future__ import division # forces floating point division
import numpy as np # Numerical Python
import matplotlib.pyplot as plt # Python plotting
from PIL import Image # Python Imaging Library
from numpy.fft import fft2, fftshift, ifft2 # Python DFT
# Show plots in the notebook (don't use it in Python scripts)
%matplotlib inline
We start creating a periodic image of size $(601,1201)$. The period, $10.5$, is in the horizontal direction. In the vertical direction the image is constant:
hW, hH = 600, 300
hFreq = 10.5
# Mesh on the square [0,1)x[0,1)
x = np.linspace( 0, 2*hW/(2*hW +1), 2*hW+1) # columns (Width)
y = np.linspace( 0, 2*hH/(2*hH +1), 2*hH+1) # rows (Height)
[X,Y] = np.meshgrid(x,y)
A = np.sin(hFreq*2*np.pi*X)
plt.imshow(A, cmap = 'gray');
H,W = np.shape(A)
Next, we compute the DFT, center it in the origin, and show the plot of the (square root of the) power spectrum
F = fft2(A)/(W*H)
F = fftshift(F)
P = np.abs(F)
plt.imshow(P, extent = [-hW,hW,-hH,hH]);
We zoom a little bit to see the high values of $P$ at the frequencies $\pm hFreq$
plt.imshow(P[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);
Another examples: With the same mesh than above, we create new images:
hFreq = 10.5
vFreq = 20.5
A1 = np.sin(hFreq*2*np.pi*X) + np.sin(vFreq*2*np.pi*Y)
plt.figure()
plt.imshow(A1, cmap = 'gray');
F1 = fft2(A1)/(W*H)
F1 = fftshift(F1)
P1 = np.abs(F1)
plt.figure()
plt.imshow(P1[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);
hFreq = 10.5
vFreq = 20.5
A2 = np.sin(hFreq*2*np.pi*X + vFreq*2*np.pi*Y)
plt.figure()
plt.imshow(A2, cmap = 'gray');
F2 = fft2(A2)/(W*H)
F2 = fftshift(F2)
P2 = np.abs(F2)
plt.figure()
plt.imshow(P2[hH-25:hH+25,hW-25:hW+25], extent=[-25,25,-25,25]);
We finally use a real image, with a high horizontal frequency component, as it is seen by the alignment of relative maxima in the horizontal axis:
I = Image.open("canes.jpg")
I = I.convert('L') # 'L' for gray scale mode
A3 = np.asarray(I, dtype = np.float32) # Image class instance, I1, to float32 Numpy array, a
H,W = np.shape(A3)
hW = np.fix(0.5*W)
hH = np.fix(0.5*H)
plt.imshow(A3, cmap = 'gray');
F3 = fft2(A3)/(W*H)
F3 = fftshift(F3)
P3 = np.abs(F3)
r = 100
plt.figure()
plt.imshow(np.log(1+P3[hH-r:hH+r,hW-r:hW+r]), extent=[-r,r,-r,r]);
An image is recovered from the set of coefficients $F$ by using the inverse Fourier Transform. As we have seen, in general, not all the coefficients have the same value. In order to use only the most important, we may select those that are larger in modulus than some given threshold $T$, i.e. we set to zero all the coefficients but those contained in the following set:
$$ S_T=\left\{0\leq m\leq M-1, 0\leq n \leq N-1 : \left| F(m,n)\right| \geq T\right\} .$$
Let us denote by $g(T)$ the number of indices, $(m,n)$, in $S_T$. One one hand, for $T=0$ we have $g(0)= MN$, since $\left| F(m,n)\right| $ is always non-negative. On the other hand, for $T> \max \left| F(m,n)\right| $ we obviously have $g(T)=0$. Observe that $g$ is a decreasing function of $T$. We construct here a plot of $g$ using the command |nnz|, which gives the number of non zero elements of a matrix.
points = 100
Trange = np.linspace(0.05,1,points)
g = np.zeros(points, dtype = 'float32')
n = 0
for T in Trange:
g[n] = np.count_nonzero(P3 >= T)
n += 1
plt.plot(g);
Now we perform a thresholding, i.e., set to zero all the coefficients which are smaller than a fixed $T$. How do we choose $T$? To start with,
We take, for instance $T=0.1$ and look at the result:
T = 0.1
c = F3 * (P3 >= T)
fM = ifft2(c)*W*H
plt.imshow(np.abs(fM), cmap = 'gray');
We, finally, compute the ratio between the number of nonzero coefficients of the Fourier transform, $F$, and that of the thresholded matrix $c$. This is a kind of compression ratio:
out1 = np.count_nonzero(F3)
out2 = np.count_nonzero(c)
print out1/out2
Exercise. For a given image of size $W\times H$, we want to compress it at some given ratio $r$ by keeping the most significative Fourier coefficients, i.e., we want to determine the threshold, $T$, such that the compression ratio equals $r$.
In other words, we want to compute $T$ such that $ \displaystyle\frac{g(T)}{WH}=r $.
To do this, we compute the zero of the function $$h(T)=g(T) -rWH.$$
Follow these steps to write the script compression.py
:
np.count_nonzero
. To gain some intuition, plot this function for T=MinF:200
. fM
as we did in this section. Plot the original and the compressed images. out1/out2
).In this application, we have a vehicle that we want to orientate to follow vertical lines.
First, we retake the original image $A$ and rotate it:
A4 = 0.5*(1+ A) # normalize to [0,1]
I = Image.fromarray(np.uint8(A4*255)) # to PIL image
rot_angle = -65
B = I.rotate(rot_angle, expand = True) # rotate without cropping
B4=np.asarray(B,dtype=np.float32)
sY,sX = np.shape(B4)
cX = np.fix(sX/2) # middle point of the image
cY = np.fix(sY/2)
plt.figure()
plt.imshow(B,cmap = 'gray');
F4 = fft2(B4)/(sX*sY)
F4 = fftshift(F4)
P4 = np.abs(F4)
plt.figure()
plt.imshow(P4[cY-25:cY+25,cX-25:cX+25], extent=[-25,25,-25,25]);
Due to the large constant (black) region, the value of the power spectrum at the origin is high, shadowing the values of the important frequencies. We remove it by substracting the mean value of the image.
F4 = fft2(B4-B4.mean())/(sX*sY)
F4 = fftshift(F4)
P4 = np.abs(F4)
plt.figure()
plt.imshow(P4[cY-25:cY+25,cX-25:cX+25], extent=[-25,25,-25,25]);
We compute the coordinates of the largest frequency component, and find the angle
indices = np.where(P4 == np.max(P4))
maxY = (indices[0][0]-cY)/sY
maxX = (indices[1][0]-cX)/sX
alpha=np.arctan(maxY/maxX)*180/np.pi
print alpha
Then, we rotate the original image correspondingly
C=B.rotate(alpha)
plt.figure()
plt.imshow(C,cmap = 'gray');
Exercise. Create a function with:
Apply it on the images as below:
%run angle.py
I5 = Image.open("furrows1.jpg")
alpha5, C5, P5 = angle(I5)
print alpha5
plt.figure()
plt.imshow(I5,cmap = 'gray');
plt.figure()
plt.imshow(C5,cmap = 'gray');
plt.figure()
plt.imshow(P5);
In this example, the algorithm worked well. Notice the high values of P5
far from the origin.
However, in the next example, the algorithm does not work. Why? How could we fix it?
I6 = Image.open("furrows2.jpg")
alpha6, C6, P6 = angle(I6)
print alpha6
plt.figure()
plt.imshow(I6,cmap = 'gray');
plt.figure()
plt.imshow(C6,cmap = 'gray');
plt.figure()
plt.imshow(P6);