publicMasters/2-InformationTheory/0InformationTheory.md

2.6 KiB

Cours de Florent.

Information Theory

This is a general field of study that we could understand as covering a number of topics outlined below.

Database theory

In particular the relational model, where data is stored in tables (like an excel table) and that can be queried efficiently. In practice, the standard is SQL and compliant system include ORACLE, Postgresql, Mysql etc. This goes back to the seventies for the theory with mature software since the late eighties. Most websites propose content that is constructed from data stored in such a database.

In the last 20 years, new models have arisen, in particular graph model that are better suited for data that is not necessrily regular and sometimes partial and where data is queried in a local fashion. The keyword NOSQL is used with mature system like MongoDB in use in the industry. The data is queried and stored differently. This kind of database model is used for example by software of the caisse d'allocation familiale (CAF) to search and detect for large scale fraud.

We shall try to give the intuition behind the relational model by studying and manipulating data stored in tables in csv format.

Coding theory

When we store or transmit data, no system is perfect and some bits of information are incorrectly stred/retrieved or transmitted.

The purpose of this field is to come up with coding and decoding methods that allows to detect and correct errors with a high probablilty.

We shall provide an introduction with simple codes.

Compression theory

When we store or transmit data, it makes sense to try to reduce the storage space or the transmission delay. We are looking for two algorithms : one that can code a chunk of data to some smaller sized chunk of data; and, a second one that can from the compressed data reconstruct (perfectly or not) the initial larger chunk of data.

When we can reconstruct the data, we are performing lossless compression, otherwise we speak of irreversible or lossy compression.

For example, jpg is an image format that allows to compress information with some loss of information, but when performed adequatly the human eye can not detect the loss of information.

We shall provide an introduction with simple methods.

Regular expressions.

Regular expressions (regexp for short) provide an effective tool to define languages. The correspondance with finite automata mean that it is possible to efficiently compile a regular expression into an automaton that recognises the corresponding language.

On Linux this is precisely what the command grep does.

We will explain how an automaton can be generated from a regexp and see how to use the grep command to solve riddles.