Information theory (start)

This commit is contained in:
Florent Madelaine 2022-12-07 17:07:32 +01:00
parent 56d47ef786
commit 48677c4ad8
4 changed files with 125 additions and 0 deletions

38
0InformationTheory.md Normal file
View File

@ -0,0 +1,38 @@
Cours de Florent.
# Information Theory
This is a general field of study that we could understand as covering a number of topics outlined below.
## Database theory
In particular the relational model, where data is stored in tables (like an excel table) and that can be queried efficiently. In practice, the standard is SQL and compliant system include ORACLE, Postgresql, Mysql etc. This goes back to the seventies for the theory with mature software since the late eighties. Most websites propose content that is constructed from data stored in such a database.
In the last 20 years, new models have arisen, in particular graph model that are better suited for data that is not necessrily regular and sometimes partial and where data is queried in a local fashion. The keyword NOSQL is used with mature system like MongoDB in use in the industry. The data is queried and stored differently. This kind of database model is used for example by software of the caisse d'allocation familiale (CAF) to search and detect for large scale fraud.
We shall try to give the intuition behind the relational model by studying and maniulating data stored in tables in csv format.
## Coding theory
When we store or transmit data, no system is perfect and some bits of information are incorrectly stred/retrieved or transmitted.
The purpose of this field is to come up with coding and decoding methods that allows to detect and correct errors with a high probablilty.
We shall provide an introduction with simple codes.
## Compression theory
When we store or transmit data, it makes sense to try to reduce the storage space or the transmission delay. We are looking for two algorithms : one that can code a chunk of data to some smaller sized chunk of data; and, a second one that can from the compressed data reconstruct (perfectly or not) the initial larger chunk of data.
When we can reconstruct the data, we are performing lossless compression, otherwise we speak of irreversible or lossy compression.
For example, jpg is an image format that allows to compress information with some loss of information, but when performed adequatly the human eye can not detect the loss of information.
We shall provide an introduction with simple methods.
## Regular expressions.
Regular expressions (regexp for short) provide an effective tool to define languages.
The correspondance with finite automata mean that it is possible to efficiently compile a regular expression into an automaton that recognises the corresponding language.
On Linux this is precisely what the command grep does.
We will explain how an automaton can be generated from a regexp and see how to use the grep command to solve riddles.

22
1InformationTheory.md Normal file
View File

@ -0,0 +1,22 @@
Cours de Florent.
# Information Theory
## Coding theory
When we store or transmit data, no system is perfect and some bits of information are incorrectly stred/retrieved or transmitted.
The purpose of this field is to come up with coding and decoding methods that allows to detect and correct errors with a high probablilty.
We shall provide an introduction with simple codes.
## Topics covered on the board
* Binary symmetric channel
* Coding and decoding one bit to obtain arbitrary error
* example for a probability of error of 1/6
repeating 3 times, repeating 5 times.
This is essentially a practical version of Shannon's noisy-channel coding theorem.
[Details here](https://en.wikipedia.org/wiki/Binary_symmetric_channel)

65
2InformationTheory.md Normal file
View File

@ -0,0 +1,65 @@
Cours de Florent.
# Information Theory
## A quick introduction to regular expressions.
Regular expressions (regexp for short) provide an effective tool to define languages.
The correspondance with finite automata mean that it is possible to efficiently compile a regular expression into an automaton that recognises the corresponding language.
On Linux this is precisely what the command grep does.
We will explain how an automaton can be generated from a regexp and see how to use the grep command to solve riddles.
### Ingredients of classical regexp.
```
the letters of the alphabet
+ means or
used as L1 + L2 where L1 and L2 are two languages
means a word of L1 or a word of L2.
denotes the union of the languages
. means concatenation
used as L1.L2
means a word of L1 followed by a word of L2
* means repetition 0 or more times
used as L*
means the empty word epsilon (0 repetition) or one or more words of L one after another
equivalent to epsilon + L + L.L + L.L.L + L.L.L.L + ...
```
### Construction : from regexp to automaton
We allow for automaton that allow transitions labelled with epsilon.
Then we show how to do without them.
Details on the board.
Note that JFLAP proposes an activity for this construction.
There is also an inverse transformation from automaton to regexp, also available on JFLAP.
This shows that languages defined by a regexp and languages recognized by a finite automaton form the same class of languages, commonly known as regular languages.
### grep
We shall in fact use the extended regular expressions of grep.
Use the command egrep or grep -e.
See the manual of grep for the syntax.
### Some additionnal commands
* tr to replace a character by another
* grep to search for some regular expression line by line
* wc to count words (or characters)
* sort to sort the lines of a file
### Exercise
Wordle.
Demo.