Saturday, February 26, 2011


This is a project I've been wanting to do for a while, and I finally got around to putting the code together yesterday.

It's based on a project called JSteg, the idea of which is to simply alter JPEG data after the lossy compression phase has been completed.

Microsoft has a fairly good explanation of how JPEG works and this can be simplified into three main phases:

  1. Downsampling
  2. Transforming and quantising
  3. Lossless compression
The first phase represents the image in terms of brightness and chrominance, brightness being measured on a one dimensional scale, and chrominance on a two dimensional scale with one axis being red-green, and the other being blue-yellow. Experiments have shown that the brightness channel is what effects our perception of images the most - the JSteg page has an example of how far the chrominance channels can be downsampled  before noticable changes start to appear.

Transforming and quantising
The data is then subjected to a discrete cosine transform which effectively replaces the pixel values with frequency values. These are then quantised (smoothing off the rough edges if you like) and then rearranged in order to place all the resultant 0 values next to each other. You can imagine what the next step is now.

Lossless encoding
The adjacent zeroes are perfect fodder for run length encoding, followed by huffman coding. The resultant space savings come not only from direct reductions in the space used by chrominance values, but also a simplification of the data such that traditional compression can be used.

Hiding a 160 bit hash
This is actually the easiest part. All the complicated work is done in the lossy stage, so all that needs to be done to hide our hash is to alter the file before it is losslessly compressed. The basic operation is as follows:
  • Decompression
  • Altering the coefficients
  • Compression
The way we hide the data is overwriting the LSB of the coefficients, which results in a minor change spread across that part of the image. The advantage of saving a fixed length of data is that no extra metadata needs to be stored, so that any file that hasn't had anything encoded will still produce a hash, it just won't have any meaning, so it gives the encoding some amount of deniability.

You can download the project from here if you'd like to try it out for yourself.

No comments:

Post a Comment