diff --git a/2-InformationTheory/0InformationTheory.md b/2-InformationTheory/0InformationTheory.md index 9a08b97..d5e0a18 100644 --- a/2-InformationTheory/0InformationTheory.md +++ b/2-InformationTheory/0InformationTheory.md @@ -1,8 +1,10 @@ -Cours de Florent. - # Information Theory This is a general field of study that we could understand as covering a number of topics outlined below. +We shall cover only some of them in depth. + +Regarding evaluation, there shall be one group work on patents providing one grade, and one written exam on the topic covered during the course. + ## Database theory In particular the relational model, where data is stored in tables (like an excel table) and that can be queried efficiently. In practice, the standard is SQL and compliant system include ORACLE, Postgresql, Mysql etc. This goes back to the seventies for the theory with mature software since the late eighties. Most websites propose content that is constructed from data stored in such a database. diff --git a/2-InformationTheory/3InformationTheory.md b/2-InformationTheory/3InformationTheory.md index 0527353..63a8758 100644 --- a/2-InformationTheory/3InformationTheory.md +++ b/2-InformationTheory/3InformationTheory.md @@ -1,5 +1,3 @@ -Cours de Florent. - # Information Theory ## A quick introduction to compression @@ -22,12 +20,12 @@ In contrast with the above where there is no loss of information, let us cite th If you are interested in patent controversy, which seems to be an american sport, there are historical examples with both the gif and jpg format. -* (run length encoding)[https://en.wikipedia.org/wiki/Run-length_encoding] -* (gif)[https://en.wikipedia.org/wiki/GIF] -* (LSW)[https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch] -* (jpg)[https://en.wikipedia.org/wiki/JPEG] +* [run length encoding](https://en.wikipedia.org/wiki/Run-length_encoding) +* [gif](https://en.wikipedia.org/wiki/GIF) +* [LSW](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch) +* [jpg](https://en.wikipedia.org/wiki/JPEG) -## a digression : vectorial graphics format. +## A digression : vectorial graphics format. So far we mentionned only so called *raster graphics* formats, i.e. rectangles of pixels. There is an alternative, in particular for artifical images : icons, diagrams, maps etc. @@ -40,22 +38,79 @@ Just like HTML, this XML text format allows also many manipulation by scripting You may draw some svg pictures instead of coding using for example inkscape. -* (svg)[https://en.wikipedia.org/wiki/SVG] -* (tutorial and examples on w3 schools)[https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_myfirst] +* [svg](https://en.wikipedia.org/wiki/SVG) +* [tutorial and examples on w3 schools](https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_myfirst) + +## An experiment : size does matter. +We consider several files in the folder [jokeTextInImage](jokeTextInImage/). + +Save them on a computer on which you have access to a terminal with basic linux commands. + +Use the command ls with the appropriate option to find out the size that the data requires. + +NB. the jpg file is the original "joke" I found on the web. The txt, svg and html were crafted by hand to provide the same information content (txt) and mimic the displayed format (svg, html). +The other two png files were obtained by using the print screen option. + + + ```bash + ls -Shl + ``` + + The manual explain the role of the options. + ```bash + man ls + ``` + +This variant gives larger sizes (actual size on disk) because it includes meta data not just the data, and also the way the disk is organised means that block of certain minimal size are used (probably at least 4kB). + ```bash + ls -sSh + ``` + Hang on : what does the size mean here? What is the actual unit? + + [Byte](https://en.wikipedia.org/wiki/Byte) + + En français on dit donc un octet (8 bits). + For historical reasons, byte may mean e.g. 6 bits, but in practice it is now uniformly understood as 8 bits. + + We can compute the size of the text file without too much problem. + We just have to count the number of characters. + We already discussed ASCII and we know that we need 7 bits, and in practice 8 bits, for a basic character as used in english speaking countries. + + The following command can help us to the counting (manual of the command wc) + ```bash + man wc + ``` + Number of bytes + ```bash + wc -c joke_as_a_text_file.txt + ``` + Number of characters + ```bash + wc -m joke_as_a_text_file.txt + ``` +is the same here. + +It does coincide with the data size given by the ls command. + +We can reproduce the experiment with the html and svg files (they are also text files) with the same result. + +For the jpg file it is harder to understand why the image has this data size. + + ## A first concrete example of compression : image and run length encoding -(activité informatique débranché irem clermont)[http://www.irem.univ-bpclermont.fr/Images-numeriques] +[activité informatique débranché irem clermont](http://www.irem.univ-bpclermont.fr/Images-numeriques) ## Variable length encoding : Hufman code. -(Hufman)[https://en.wikipedia.org/wiki/Huffman_coding] +[Hufman](https://en.wikipedia.org/wiki/Huffman_coding) ## Archive -(tar)["https://en.wikipedia.org/wiki/Tar_(computing)"] +[tar]("https://en.wikipedia.org/wiki/Tar_(computing)") ## Backup -(rsync)[https://en.wikipedia.org/wiki/Rsync] +[rsync](https://en.wikipedia.org/wiki/Rsync) diff --git a/2-InformationTheory/jokeTextInImage/460386452_10162762322329276_8719283729344919114_n.jpg b/2-InformationTheory/jokeTextInImage/460386452_10162762322329276_8719283729344919114_n.jpg new file mode 100644 index 0000000..de0fe4a Binary files /dev/null and b/2-InformationTheory/jokeTextInImage/460386452_10162762322329276_8719283729344919114_n.jpg differ diff --git a/2-InformationTheory/jokeTextInImage/PrintScreenOfJPGFile.png b/2-InformationTheory/jokeTextInImage/PrintScreenOfJPGFile.png new file mode 100644 index 0000000..0d11645 Binary files /dev/null and b/2-InformationTheory/jokeTextInImage/PrintScreenOfJPGFile.png differ diff --git a/2-InformationTheory/jokeTextInImage/PrintScreenOfSVGFile.png b/2-InformationTheory/jokeTextInImage/PrintScreenOfSVGFile.png new file mode 100644 index 0000000..c8f71cf Binary files /dev/null and b/2-InformationTheory/jokeTextInImage/PrintScreenOfSVGFile.png differ diff --git a/2-InformationTheory/jokeTextInImage/joke_as_a_text_file.txt b/2-InformationTheory/jokeTextInImage/joke_as_a_text_file.txt new file mode 100644 index 0000000..ecc600b --- /dev/null +++ b/2-InformationTheory/jokeTextInImage/joke_as_a_text_file.txt @@ -0,0 +1,3 @@ +Don't be scared but... + +Halloween 2024 is on Friday the 13th for the 1st time in 666 years. diff --git a/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.html b/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.html new file mode 100644 index 0000000..283cd7a --- /dev/null +++ b/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.html @@ -0,0 +1,26 @@ + + + + + + + + + + Don't be scared but... + + + Halloween 2024 is on + + + Friday the 13th for the + + + 1st time in 666 Years. + + + Sorry, your browser does not support inline SVG. + + + + diff --git a/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.svg b/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.svg new file mode 100644 index 0000000..7f238c6 --- /dev/null +++ b/2-InformationTheory/jokeTextInImage/joke_as_an_svg_file.svg @@ -0,0 +1,18 @@ + + + + + Don't be scared but... + + + Halloween 2024 is on + + + Friday the 13th for the + + + 1st time in 666 Years. + + + Sorry, your browser does not support inline SVG. +