Information Theory

A quick introduction to regular expressions.

Regular expressions (regexp for short) provide an effective tool to define languages. The correspondance with finite automata mean that it is possible to efficiently compile a regular expression into an automaton that recognises the corresponding language.

On Linux this is precisely what the command grep does.

We will explain how an automaton can be generated from a regexp and see how to use the grep command to solve riddles.

Ingredients of classical regexp.

the letters of the alphabet

+ means or
used as L1 + L2 where L1 and L2 are two languages
means a word of L1 or a word of L2.
denotes the union of the languages

. means concatenation 
used as L1.L2 
means a word of L1 followed by a word of L2

* means repetition 0 or more times
used as L*
means the empty word epsilon (0 repetition) or one or more words of L one after another
equivalent to epsilon + L + L.L + L.L.L + L.L.L.L + ...

Construction : from regexp to automaton

We allow for automaton that allow transitions labelled with epsilon. Then we show how to do without them.

Details on the board.

Note that JFLAP proposes an activity for this construction.

There is also an inverse transformation from automaton to regexp, also available on JFLAP. This shows that languages defined by a regexp and languages recognized by a finite automaton form the same class of languages, commonly known as regular languages.

grep

We shall in fact use the extended regular expressions of grep. Use the command egrep or grep -e.

See the manual of grep for the syntax.

Some additionnal commands

tr to replace a character by another
grep to search for some regular expression line by line
wc to count words (or characters)
sort to sort the lines of a file

Exercise

Wordle.

Demo.

1.8 KiB Raw Blame History