publicMasters/2-InformationTheory/3InformationTheory.md

Cours de Florent.

# Information Theory

## A quick introduction to compression

When we store or transmit data, it makes sense to try to reduce the storage space or the transmission delay. We are looking for two algorithms : one that can code a chunk of data to some smaller sized chunk of data; and, a second one that can from the compressed data reconstruct (perfectly or not) the initial larger chunk of data.

When we can reconstruct the data, we are performing lossless compression, otherwise we speak of irreversible or lossy compression.

We shall provide an introduction with simple methods.


## Image compression
If the data repeats locally some symbols, for example long sequences of white symbols in some black and white image, we may gain a lot of space if we describe the sequence using a number for the length of the sequence together with the repeated symbol.

This so called run length encoding was used for early file images before the gif format appeared.
The gif formal itself uses also another compression method known as LSW.

In contrast with the above where there is no loss of information, let us cite the jpg format which aims at compressing data with loss of indormation, but when performed adequatly the human eye can not detect the loss of information.

If you are interested in patent controversy, which seems to be an american sport, there are historical examples with both the gif and jpg format.


* (run length encoding)[https://en.wikipedia.org/wiki/Run-length_encoding]
* (gif)[https://en.wikipedia.org/wiki/GIF]
* (LSW)[https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch]
* (jpg)[https://en.wikipedia.org/wiki/JPEG]

## a digression : vectorial graphics format.

So far we mentionned only so called *raster graphics* formats, i.e. rectangles of pixels.
There is an alternative, in particular for artifical images : icons, diagrams, maps etc.

In contrast in a *vectorial format*, rather than describing the image, the file will describe in a mini language a recipe explaining how to draw it.
This means in particular that on a screen : the image can be zoomed to an arbitrary precision.

One format which includes as an essential component this idea is the SVG format, now a standard on the web. Basically an SVG file looks similar to an html file (it is an XML file).
Just like HTML, this XML text format allows also many manipulation by scripting techniques via the so called Document Object Model (DOM). For example, to highlight part of the image if something is selected.

You may draw some svg pictures instead of coding using for example inkscape.

* (svg)[https://en.wikipedia.org/wiki/SVG]
* (tutorial and examples on w3 schools)[https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_myfirst]

## A first concrete example of compression : image and run length encoding

(activité informatique débranché irem clermont)[http://www.irem.univ-bpclermont.fr/Images-numeriques]

## Variable length encoding : Hufman code.

(Hufman)[https://en.wikipedia.org/wiki/Huffman_coding]


## Archive

(tar)["https://en.wikipedia.org/wiki/Tar_(computing)"]

## Backup

(rsync)[https://en.wikipedia.org/wiki/Rsync]