Formal Languages and Compilers Notes & Tutorials Chapter 1 Finite Automata and Regular Languages

Self-study notes and study tutorials for Formal Languages ​​and Compilers.
Introduction to the author of the note: Da Shuangge, The small UP owner of station b , One-on-one programming tutor.

1 Finite Automata and Regular Languages

Finite automata and regular languages

In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to many modern regular expressions engines, which are augmented with features that allow recognition of non-regular languages).
In theoretical computer science and formal language theory, regular languages ​​(also called rational languages) are a A formal language that can be defined by regular expressions, in the strict sense of theoretical computer science (as opposed to many modern regular expression engines, which have added features that allow the recognition of non-conventional languages).

Alternatively, a regular language can be defined as a language recognized by a finite automaton. The equivalence of regular expressions and finite automata is known as Kleene’s theorem (after American mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular languages are the languages generated by Type-3 grammars.

Alternatively, a regular language can be defined as a language recognized by finite automata. The equivalence of regular expressions and finite automata is known as Kleene's theorem (named after the American mathematician Stephen Cole Kleene). In Chomsky's hierarchy, a regular language is a language generated by a Type-3 grammar.

1 Languages

The Formal Language Theory considers a Language as a mathematical object.
The Formal Language Theory considers a Language as a mathematical object.

Alphabet, string and language
Alphabet, string and language

Understanding of symbols and concepts
Formal Notions:

  • symbol: a single basic symbol

  • alphabet ∑ \sum : a non-empty finite set of symbols
    A non-empty finite set of symbols, generally used ∑ \sum Display

  • string over ∑ \sum : a finite sequence of symbols
    字母表 ∑ \sum A finite sequence of symbols in ∑ (sequence: ordered arrangement)

  • ∣ w ∣ |w| w: Get the length of string w (Number of symbols in stringw)

  • ε \varepsilonε: empty string
    empty string

  • ∑ ∗ \sum^* : the set of all strings over ∑ \sum .
    Character table ∑ \sum The set of all strings
    Linguistic Universe (language universe)

  • language: a set of strings
    A collection of strings (a group of strings)

关系:
L ⊆ ∑ ∗ L \subseteq \sum^* L
L is ∑ ∗ \sum^* A subset of ∗

L may be infinite!
L may be infinite

Example

  • symbols: 0,1
  • ∑ \sum : {0, 1}
  • string: 10, 01, 101, 010
  • Language: {0, 011, 0111, 01111, …}

2 Deterministic Finite Automaton

Machine to recognize whether a given string is in a given set.

DFA: Deterministic Finite Automaton
Deterministic Finite Automaton
Basic introduction

In DFA, for each input symbol, one can determine the state to which the machine will move.
In DFA, for each input symbol, one can determine the state to which the machine will move. status.
Hence, it is called Deterministic Automaton.
Hence, it is called Deterministic Automaton.
As it has a finite number of states, the machine is called Deterministic Finite Machine or Deterministic Finite Automaton.
As it has a finite number of states, the machine is called Deterministic Finite Machine or Deterministic Finite Automaton. It is called a deterministic finite machine or a deterministic finite automaton.

Formal Definition of a DFA
Formal Definition of a DFA

A deterministic finite automaton M M M is a 5-tuple ( Q Q Q, ∑ \sum , δ \delta δ, q 0 q_0 q0, F F F) where

  • Q Q Q: a finite set of states
    A finite set that stores states
  • ∑ \sum : a finite set of input symbols
    A finite set that stores input symbols (alphabet)
  • d \deltaδ: a transition function where $\delta : Q \times \sum \rightarrow Q $
    转换函数
  • q 0 q_0 q0: an initial or start state $ q_0 \in Q $
    initial or start state
  • F F F: a set of accept states F ⊆ Q F\subseteq Q FQ
    A set of acceptance states (final state, end state, final state)

Graphical Representation of a DFA
A DFA is represented by digraphs called state diagram.
A DFA can be represented by digraphs called state diagram. State diagram.

  • The vertices represent the states.
    The vertices represent the states.
  • The arcs labeled with an input alphabet show the transitions.
    The arcs labeled with an input alphabet show the transitions.
  • The initial state is denoted by an empty single incoming arc.
    The initial state is represented by an empty single incoming arc.
  • The final state is indicated by double circles.
    The final state is indicated by double circles.

If after processing a string of input, M M The status of M is F F F, then the input is accepted.
Otherwise, it is rejected

Example
举例

The following example is of a DFA M, with a binary alphabet, which requires that the input contains an even number of 0s.

The following example is a DFA with a binary alphabet M M M, which requires the input to contain an even number of zeros.

M = ( Q , ∑ , δ , q 0 , F ) M = (Q, \sum, \delta, q_0, F)M=(Q,,δ,q0,F)

  • Q = { q 0 , q 1 } Q = \{q_0, q_1\} Q={ q0,q1}
  • ∑ = { 0 , 1 } \sum = \{0, 1\} ={ 0,1}
  • $ F = {q_0} $

Transaction function δ \delta δ如下
δ ( q 0 , 0 ) = q 1 \delta(q_0, 0) = q_1 δ(q0,0)=q1
δ ( q 0 , 1 ) = q 0 \ delta ( q_0 , 1 ) = q_0δ(q0,1)=q0
δ ( q 1 , 0 ) = q 0 \ delta ( q_1 , 0 ) = q_0δ(q1,0)=q0
δ ( q 1 , 0 ) = q 1 \ delta ( q_1 , 0 ) = q_1δ(q1,0)=q1

d \deltaδ is shown in a table as follows (state transition table):

0 1
q 0 q_0q0 q 1 q_1 q1 q 0 q_0q0
q 1 q_1q1 q 0 q_0q0 q 1 q_1q1

M M The state diagram of M is as follows

Please add image description

Analysis: M M M will change the state when reading 0, and the state will not change when reading 1.
M M Mjust existence q 0 q_0 q0The state ends.

Reason M M M only accepts an even number of 0s and any number of 1s.
The corresponding regular expression is (1*)(0(1*)0(1*))*.
where * represents the character repeated any number of times (0 times, 1 to multiple times)

extended transition function
extended transition function

δ ^ : Q × ∑ ∗ → Q \hat \delta: Q \times \sum ^* \rightarrow Qd^:Q×Q

  • δ ^ ( q , ε ) = q \hat \delta(q, \varepsilon)=qd^(q,ε)=q
  • δ ^ ( q , a x ) = δ ^ ( δ ( q , a ), x ), a ∈ ∑ , x ∈ ∑ ∗ \hat \delta(q, ax)=\hat \delta(\delta(q, a ), x), a \in \sum, x \in \sum ^*d^(q,ax)=d^(δ(q,a),x),a,x

regular
regular, regular

  • w w w is accepted by M M M if δ ^ ( q 0 , w ) ∈ F \hat \delta(q_0, w) \in F d^(q0,w)F.
  • w ww is rejected by M M M if δ ^ ( q 0 , w ) ∉ F \hat \delta(q_0, w) \notin F d^(q0,w)/F.
  • L ( M ) = { w ∈ ∑ ∗ ∣ δ ^ ( q 0 , w ) ∈ F } L(M) = \{ w \in \sum^* | \hat \delta (q_0, w) \in F\}L(M)={ wd^(q0,w)F} is the language acepted by M M M.
  • $ A \subseteq \sum^*$ is regular if A = L ( M ) A=L(M) A=L(M) for some DFA M M M

Simply put, if a language A (A is ∑ ∗ \sum^* subset),
can find the corresponding DFA, then the language is regular< /span>.

Replenish

  • N \Bbb N N: the set of natural numbers,
    The set of natural numbers, including 0 and positive integers
  • A ˉ \bar{A} Aˉ: 集合 A A ACollection
  • ∅ \emptyset : empty set, containing no elements, and { ε } \{\varepsilon\} { ε}different

3 Non-Deterministic Finite Automata

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

  • each of its transitions is uniquely determined by its source state and input symbol, and
  • reading an input symbol is required for each state transition.

In automaton theory, a finite state machine is called a deterministic finite automaton (DFA) if

- each of its transitions is uniquely determined by its source state and input symbol, and
- each state transition requires reading an input symbol.

A nondeterministic finite automaton (NFA), or nondeterministic finite-state machine, does not need to obey these restrictions. In particular, every DFA is also an NFA. Sometimes the term NFA is used in a narrower sense, referring to an NFA that is not a DFA, but not in this article.

Nondeterministic finite automata (NFA) or nondeterministic finite state machines do not need to obey these restrictions. In a broad sense, every DFA is also an NFA. NFA is used in a narrow sense, referring to NFA that is not a DFA. (The following should mainly discuss NFA in a narrow sense)

Simply speaking, DFA is a state, and for each input character ($ sybmol \in Q $), the result is uniquely determined.
If the result is not unique and there are multiple (or none), then it is NFA

NFA can also be called NDFA, NFA can be converted to the equivalent DFA

Formal Definition of an NFA
A deterministic finite automaton M M M is a 5-tuple ( Q Q Q, ∑ \sum , δ \delta δ, q 0 q_0 q0, F F F) where

  • Q , ∑ , q 0 , F Q, \sum, q_0, FQ,,q0,F has the same meaning as in DFA
  • transition function δ \delta δ: Q × ∑ → P ( Q ) Q \times \sum \rightarrow \mathcal P(Q) Q×P(Q).

P ( Q ) \ math P(Q)P(Q) denotes the power set of Q Q Q, that is, the set of subsets of Q Q Q.
P ( Q ) = S ∣ S ⊆ Q \mathcal P(Q)={S | S \subseteq Q} P(Q)=SSQ

Give an example to show the difference between the two.

  • DFA: δ ( q 0 , a ) = q 1 \delta(q_0, a) = q_1 δ(q0,a)=q1, the result of which is a single state
  • NFA: δ ( q 0 , a ) = { q 0 , q 1 } \delta(q_0, a) = \{q_0, q_1\} δ(q0,a)={ q0,q1}, the result is a state set (can be multiple, or even an empty set)

NFA M M The operation of M is basically the same as DFA.
The differences are as follows

  • If M M M is in state q q q and the next symbol is a a a then M M M moves to any state in δ ( q , a ) \delta(q, a) δ(q,a).
    As a result M M Mis in status q q qThe lower one sign is a a a,则 M M Mpossible to move δ ( q , a ) \delta(q, a) δ(q,Any state in a).
  • If δ ( q , a ) \delta(q, a) δ(q,a) is empty then M M M gets stuck.
    如果 δ ( q , a ) \delta(q, a) δ(q,a)为空,则 M M MAssembly.
  • M accepts w if at least one transition sequence ends in a state p ∈ F after reading all of w.
    在读取所有 w w After the symbol of w, if at least one transition sequence ends in accpet state,
    , that is, there is at least one transition sequence whose final state satisfies p ∈ F p \in F pF,则 M M MAccessible w w w

Give an example to understand

Bottom side is NFA M 2 M2 M2 state diagram

Insert image description here

Then the transition relation is as follows
δ ( q 0 , 0 ) = { q 0 } \delta(q_0, 0) = \{q_0\} δ(q0,0)={ q0}
δ ( q 0 , 1 ) = { q 0 , q 1 } \delta(q_0, 1) = \{q_0, q_1\} δ(q0,1)={ q0,q1}
δ ( q 1 , 0 ) = { q 2 } \delta(q_1, 0) = \{q_2\} δ(q1,0)={ q2}
δ ( q 1 , 1 ) = { q 2 } \delta(q_1, 1) = \{q_2\} δ(q1,1)={ q2}

The table is shown as follows (state transition table):

0 1
q 0 q_0 q0 { q 0 } \{q_0\} { q0} { q 0 , q 1 } \{q_0, q_1\} { q0,q1}
q 1 q_1 q1 { q 2 } \{q_2\} { q2} { q 2 } \{q_2\}{ q2}

Possible transition sequences for input 110:
When input 110, the possible transitions are as follows

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-r3nSf1k6-1657611642079) (imgs/103.png)]

There is an end state belonging to F F F
Reason110Nouka NFA reception

NFA -> DFA
Using the subset construction algorithm, each NFA can be translated to an equivalent DFA.
Using the subset construction algorithm, Each NFA can be converted into an equivalent DFA.

The example is as follows
Put the above NFA M 2 M2 M2 Convertible DFA
for q 0 1 q_01 q01表示 { q 0 , q 1 } \{q_0, q_1\} { q0,q1}

The converted DFA table is shown as follows (state transition table):

0 1
q 0 q_0q0 $q_0$ q 01 q_{01} q01
q 01 q_{01}q01 q 02 q_{02} q02 q 012 q_{012} q012
q 02 q_{02}q02 q 0 q_0q0 q 01 q_{01}q01
q 012 q_{012}q012 q 02 q_{02}q02 q 012 q_{012}q012

The DFA diagram (state diagram) is as follows
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Lqsw3ygP -1657611642080)(imgs/104.png)]

ε \varepsilonε-Transitions
Formal Definition of an NFA
A deterministic finite automaton M M M is a 5-tuple ( Q Q Q, ∑ \sum , δ \delta δ, q 0 q_0 q0, F F F) where

  • Q , ∑ , q 0 , F Q, \sum, q_0, FQ,,q0,F has the same meaning as in DFA
  • ε \varepsilonε is a speical symbol with $\varepsilon \notin \sum $
  • δ : Q × ( ∑ ∪ { ε } ) → P ( Q ) \delta: Q \times (\sum \cup \{\varepsilon\} ) \rightarrow \mathcal P(Q)d:Q×({ ε})P(Q)
  • d \deltaδ may have ε \varepsilon ε-transitions and yields a set of successor states.

QUESTION

  1. Whether each state of DFA must be able to handle all input symbols.
    For each input symbol, whether there must be a corresponding arrow, what will happen if not.
    Answer: There must be status
    NFA's ε \varepsilon ε can be replaced by walking to an infinite loop node.

To be continued. . .

Guess you like

Origin blog.csdn.net/python1639er/article/details/125744557