Python

ICA on Images with Python

Click here to see my recommended reading list.


What is Independent Component Analysis (ICA)?

If you’re already familiar with ICA, feel free to skip below to how we implement it in Python.

ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated.

Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we lay out in this post, the variables represent pixels in an image. One of the motivations behind using ICA on images is to perform image compression i.e. rather than storing thousands or even millions of pixels in a image, the storage of the independent components takes up much less memory. Also, by its nature, ICA extracts the independent components of images — which means that it will find the curves and edges within an image. For example, in facial recognition, ICA will identify the eyes, the nose, the mouth etc. as independent components.

ICA can be implemented in several open source languages, including Python, R, and Scala. This post will show you how to do ICA in Python with scikit-learn.

For more information on the mathematics behind ICA and how it functions as an algorithm, see here. Also, for a contrast between ICA and PCA, check out this Udacity video.

ICA with Python

First, let’s load the packages we’ll need. The main functionality we want is the FastICA method available from sklearn.decomposition. We’ll also load the skimage package, which we’ll use to read in a sample image, and pylab which will show the image to the screen (you may need this if you’re using an IPython Notebook).

# load packages
from sklearn.decomposition import FastICA
from pylab import *
from skimage import data, io, color

Next, we read in the image. We will set the parameter, as_grey, equal to True. This will make every pixel in the image a value between 0 and 255, rather than a 3-dimensional RGB value. For more information, see this link.


emc2_image = io.imread("emc2.png", as_grey = True)

Now, we choose a number of components we want, and use that number to create a FastICA object. In the sample below, we’ll create a FastICA object with 10 components. This will allow us to run ICA on our image, resulting in 10 independent components.


ica = FastICA(n_components = 10)

Then, we use our object, ica, to run the ICA algorithm on the image.


# run ICA on image
ica.fit(emc2_image)

An important test when doing any type of dimensionality reduction to test how much information has been lost. In our example, we will reconstruct the image with the independent components — i.e. how does the image look if we only know the 10 independent components we’ve developed?


# reconstruct image with independent components
emc2_image_ica = ica.fit_transform(emc2_image)
emc2_restored = ica.inverse_transform(emc2_image_ica)

# show image to screen
io.imshow(emc2_restored)
show()

As you can see, using just 10 independent components still shows a very recognizable version of our original picture. What happens if we change the number of components?

One Component

Three Components

Five Components

Ten Components

Twenty Components

By five independent components, our image is fairly recognizable. After twenty components, our image looks very similar to the original version.

ICA has many other applications, including analyzing stock market prices, facial recognition, and more.

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

2 years ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

3 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

4 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

4 years ago

Python collections tutorial

In this post, we'll discuss the underrated Python collections package, which is part of the…

4 years ago