A digital image is basically a representation of colors through numbers. In this tutorial we’ll see the logic behind those numbers and most common image types in Python.

If digital images were biological beings we could probably say they have absolutely thrived in evolutionary terms in the first decades of the new millennia. Digital image production surged in humongous amounts since the first digital cameras hit the mainstream consumer markets sometime in the late 90s.

As Computer Vision is one of the cutting edge technologies in a range of domains from Robotics to Autonomous Driving, understanding digital image can be a very satisfying and fruitful process.

Used Where?

  • Image manipulation
  • Image research
  • Computer vision
  • Data science
  • Web
  • Apps

First thing first, let’s import PIL library and its Image module:


import PIL
from PIL import Image

Estimated Time

5 mins

Skill Level

Beginner

Types

RGB, B&W, Monochrome

Libraries

PIL, cv2, numpy

Tutorial Provided by

HolyPython.com

RGB

After that we can open a colorful image and see how it looks in numbers. To open the image with PIL:

f1 = r'C://Users/xx/Desktop/151013.jpg'
data1 = Image.open(f1)

To get the pixel values of an image we can use:

.getdata() function of the Image module.

raw1 = data1.getdata()

If we print raw1 we’ll get a list of tuples consisted of 3 elements each. The reason why each tuple has 3 values is because the image we opened is an “RGB” image which stands for Red, Green and Blue channels.

Turns out any color in the visible spectrum can be produced by a combination of blue, green and red wavelenghts also known as additive color mixture.

Let’s take a partial look:

print(list(raw1))

Output:

[(186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (186, 190, 189), (184, 188, 187), (185, 189, 188), (186, 190, 189), (187, 191, 190), (187, 191, 190) ……… (187, 191, 190), (187, 191, 190), (186, 190, 189), (186, 190, 189), (188, 192, 191), (188, 192, 191), (187, 191, 190), (184, 188, 187), (182, 186, 185), (182, 186, 185), (182, 186, 185), (184, 188, 187), (185, 189, 188)]

There are going to be as many tuples as there are pixels in the image. So if the image is 500×500 pixels that means there will be 250000 pixels in the image hence 250000 tuples in the list above representing each pixel in 3 values: red, green and blue.

Also each value usually takes a number between 0 and 255 which adds up to 256 total values. This is because the common 24 bit depth suggests an 8bit*8bit*8bit structure. Each color represented with 8 bit means 2⁸ which is 2x2x2x2x2x2x2x2 = 256.

Bunch of croc babies tanning in the sun (RGB Image)

Black and White

Now let’s convert the same image to Black & White and investigate its pixel values. For this conversion we need to use .convert("L")
data2 = data1.convert("L")
raw2 = data2.getdata()
print(list(raw2))

Output:

[189, 189, 188, 188, 190, 190, 189, 186, 184, 184, 184, 186, 187, 188, 187, 185, 184, 185, 186, 185, 187, 190, 190, 188, 186, 185, 186, 186, 186, 186, 185, 185, 184, 184, 184, 183, 183, 183, 183,    ……….   183, 183, 183, 179, 180, 181, 174, 175, 176, 176, 174, 173, 174, 175, 172, 173, 174, 175, 175, 175, 175, 174, 175, 175, 175, 174, 174, 173, 173, 173, 173, 173, 173, 172, 172, 171, 171, 171, 170, 170, 170, 169, 169, 168, 168, 168, 167, 167, 167, 166, 166, 165, 165, 165, 165]

Now you can see instead of tuples of 3 values, we have only 1 value for each pixel again from 0 to 255. If the pixel value is 0 that means pitch black, 255 means white and any value in between means different tones of white, black and gray.

Same image in Black & White

Monochrome

The difference between B&W and Monochrome is that monochrome pixels will take only either 0 or 255 as value while B&W pixels can take any value from 0 to 255 inclusive. 

This means each monochrome pixel is either full black or full white and there are no gray tones in between for any pixel.

For this conversion we need to use .convert("1")

data3 = data1.convert("1")
raw3 = data3.getdata()
print(list(raw3))

Output:

[0, 255, 0, 255, 0, 255, 0, 255, 255, 0, 255, 0, 255, 255, 0, 255, 0, 255, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 255, 255, 255, 0, 255, 255, 0, 255, 255, 255, 0, 255, 255, 255, 255, 0, 255, 0, 255, 0, 255, 0, 255, 255, 0, 255, 0, 255, 255, 0, 255, 255, 0, 255, 0, 255, 255, 0,     ……     , 255, 0, 255, 255, 255, 0, 255, 0, 255, 255, 0, 255, 255, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 255, 0, 255, 0, 255, 0, 255, 0, 255, 255, 0, 255, 255]

Monochrome Image

RGBA (RGB with Alpha Transparency Channel)

Another useful image type is RGBA. RGBA images have one additional channel compared to RGB’s red-green-blue, which is alpha.

Alpha channel is particularly useful when you need transparency adjustments. For instance, an RGBA color shade with (0, 0, 0, 255) is full pitch black since RGB are all zero and transparency is maximum value of alpha.

Similarly, (255, 255, 255, 25) means white color with ~10% transparency since alpha channel is approximately 10 percent of its maximum value and RGB are all at maximum creating white color.

You can also apply this knowledge to pixel-level image manipulation.

Transparency is particularly useful when:

  • Drawing on images with transparency
  • Applying color shades, color masks or tint on images
  • Applying transparent text on images (Watermarking)
  • Combining multiple images with transparent blending

You can easily convert an image to RGBA upon opening with following code:

from PIL import Image
img = Image.open(r'testimage.jpg').convert("RGBA")

Pixel Position

So after using .getdata() function, if we get 10s of thousands of numbers, how do we know which number represent which pixel? It’s easier than it sounds.

If you were lucky and unlucky enough to witness the dial-up modem miracle, you might remember that often times images were interrupted during loading on websites due to low internet speed (maybe 1kb per second or even bytes per sec sometimes).

The reason the bottom part of the image is missing when this happens is because the pixels start loading from left to right horizontally and once the row is filled they start loading the next row until all the rows are finished to the bottom.

Pixel positions

Now that you know the inner workings of digital images, a lot of machine learning is actually done with analyzing or manipulating those numbers pixels take as values.

Recommended Posts