How do computers see an image?

January 15, 2016

We all love to see beautiful images, but have you ever thought how do computers see an image? In this tutorial, we will give an explanation of how images are stored in a computer.

Consider the above image, a normal human can easily tell that there is a cat in the image. But, can computers really see the cat? The answer is no, computers see a matrix of numbers (between 0 to 255). Broadly, we can classify images as grayscale images or color images.

First, we will discuss grayscale images and then move on to color images. The image above is a grayscale image — that means each pixel represents the brightness of a pixel, which is the smallest controllable element of a picture represented on the screen.

Let us first see what computers see in the case of the above image.

We’ve resized the above image to 18 * 18 pixels in order to make it easy to understand. Unlike us, the computers see an image as a 2D matrix. You might have heard people saying that this image is of size 1800700 or 1300 * 700. This size shows width and height of an image. In other words, if size is 1300 * 700 then there are 1300 pixels horizontally and 700 vertically. That means there are total 910,000 (1300700) pixels. If the image is the size of 700 * 500 then the dimensionality of the matrix will be (700, 500). Here, each element in the matrix represents the intensity of brightness in that pixel. Here, 0 represents black and 255 represents white color, and any value between these two is just a shade of gray.

Color Images

In grayscale images, each pixel represents the intensity of only one shade — how bright or dark the pixel is. In other words, it has only one channel. On the other hand, in color images we have 3 channels: R, G and B (red, green, blue). Standard digital camera have 3 (RGB) channels.

As you can see in above image, the color image is composed of three channels red, green and blue. Now the question is, how do computers see this image? Again, the answer is they see the matrix. How do we represent this image in the matrix since it has 3 channels, unlike grayscale images where we had only one channel? In this case, we have a 3D matrix. We have one 2D matrix for each channel, and we’ll have three matrices for Red, Green and Blue channels stacked onto each other making the matrix 3D.

For example: Shape of a matrix representing a 700px by 700px color image will be (700, 700, 3). Each pixel in this color image has three numbers (ranging from 0 to 255) associated with it. These numbers represent intensity of red, green and blue color in that particular pixel.

Conclusion

Computers see images as matrices. Grayscale images have one channel (gray). So, we can represent grayscale images as a 2D matrix, where each element represents the intensity of brightness in that particular pixel, where 0 means black and 255 means white. Color images have three channels RGB (red, green, blue). We can represent color images as a 3D matrix where the depth is 3.