Kiyoshi's Blog

Introduction to The Theory of Computation

7/21/2024|Academic|6 min read

Introduction

The Theory of Computation is about how machines can process formal languages; i.e. how we can describe patterns of strings such as aaa, aabaa so that a machine can read/recognize them.

There are three major problems that you should keep asking about:

  • What problems are computers capable of solving?
  • What resources are needed to solve a problem?
  • Are some problems hard than others?

A Problem

Before talking about the detail of computer theory, it is important to define what is a problem.

The Theory of Computation focuses on decision problems the most; i.e. making a decision or computing a value based on some input.

Additionally, the difficulty of a problem is positively correlated to the number of resources the problem requires to be solved.

Vocabulary

  • Character/Symbol: a, b, c, 0, 1, ...
  • Alphabet: a non-empty set of symbols
  • Word/String: a sequence of symbols: 000111000, AceY, Aron, ...
  • Language: a set of strings

Language Operations

Let r1r_1 and r2r_2 be languages over the alphabet Σ\Sigma.

Union

r1r2r_1\cup r_2 is the language that includes any string that is either in r1r_1 or r2r_2.

r1r2={wwr1wr2}r_1\cup r_2=\{w\mid w\in r_1 \lor w\in r_2 \}

Concatenation

r1r2r_1\circ r_2 is the language that includes any string from r1r_1 concatenated with any string from r2r_2.

r1r2={w1w2w1r1w2r2}r_1\circ r_2=\{w_1\circ w_2\mid w_1\in r_1\land w_2\in r_2\}

Kleene Star

SS^* is the set of all finite strings made up of concatenations of elements of SS.

S={w1w2wnw1,w2,,wnSn0}S^*=\{w_1\circ w_2\circ\cdots\circ w_n\mid w_1,w_2,\dots,w_n\in S\land n\geq 0\}

Language

Regular Language

A regular language is a language that can be described with a Regular Expression (RegEx) or recognized by a Deterministic Finite Automaton (DFA) or a Nondeterministic Finite Automaton (DFA).

A regular language is closed under:

  • Union
  • Concatenation
  • Kleene Star
  • Complementation

Regular Expression (RegEx)

A regular expression is a string of symbols that describes a language.

RR is a regular expression over the alphabet Σ\Sigma if:

  1. R=aR=a, where aΣa\in \Sigma
  2. R=ϵR=\epsilon
  3. R=R=\emptyset
  4. R=(R1R2)R=(R_1\cup R_2), where R1R_1, R2R_2 are themselves regular expressions
  5. R=(R1R2)R=(R_1\circ R_2), where R1R_1, R2R_2 are themselves regular expressions
  6. R=(R1)R=(R^*_1), where R1R_1 is a regular expression

Notation: As mentioned above, a regular expression is a string of symbols that follow particular rules. Moreover, the alphabet of allowable symbols you can use for a regular expression is:

{(,),,,,ϵ,}Σ\{(,),\cup,\circ,*,\epsilon,\emptyset\}\cup \Sigma

Say we have a language AA that can be described as aba^*b^*, then:

A=L(ab)A=L(a^*b^*)

Deterministic Finite Automaton (DFA)

A deterministic finite automaton is a directed graph with the following properties:

  • The vertices of the graph are called states
  • The edges of the graph are called transitions
  • The transitions are labeled with characters from an alphabet Σ\Sigma
  • Each vertex has one outgoing edge for each character in Σ\Sigma
  • Exactly one vertex is labeled as the "start state"
  • Each vertex is either an accept state or not

An Example DFA

Notation: Formally, we can define a DFA with a 5-tuple (Q,Σ,δ,q0,F)(Q,\Sigma,\delta,q_0,F) where:

  • QQ is a finite set called the states
  • Σ\Sigma is a finite set called the alphabet
  • δ:Q×ΣQ\delta:Q\times\Sigma\rightarrow Q is the transition function
  • q0Qq_0\in Q is the start state
  • FQF\subseteq Q is the set of accept states

Therefore, with the notation, we can describe the above DFA by M=({q0,q1,q2,q3},{0,1},δ,q0,{q3})M=(\{q_0,q_1,q_2,q_3\},\{0,1\},\delta,q_0,\{q_3\}) where the function δ\delta is specified by:

StateInputOutput
q0q_000q1q_1
q0q_011q1q_1
q1q_100q1q_1
q1q_111q3q_3
q2q_200q3q_3
q2q_211q2q_2
q3q_300q3q_3
q3q_311q3q_3

Nondeterministic Finite Automaton (NFA)

A nondeterministic finite automaton has the same properties as a DFA except for vertexes. Instead, an NFA's vertexes can:

  • Have any number of edges
  • Have epsilon (the empty string ϵ\epsilon) transitions

Notation: Formally, we can define an NFA with a 5-tuple (Q,Σ,δ,q0,F)(Q,\Sigma,\delta,q_0,F) where:

  • QQ is a finite set called the states
  • Σ\Sigma is a finite set called the alphabet
  • δ:Q×ΣϵP(Q)\delta:Q\times\Sigma_\epsilon\rightarrow \mathcal{P}(Q) is the transition function
  • q0Qq_0\in Q is the start state
  • FQF\subseteq Q is the set of accept states

DFA vs NFA vs RegEx

Every DFA is essentially an NFA since it meets all the properties that an NFA possesses. On the other hand, an NFA can always be converted to a DFA with the standard subset construction.

Similarly, regex can be converted to NFAs using structural induction, and a DFA can be converted to a regex using generalized nondeterministic finite automata (GNFAs).

Non-regular Language

A DFA can have some "memory" —— states. A DFA is finite because it can only "remember":

  • finitely far in the past
  • finitely much information

In other words, if a computation path visits the same state more than once, the machine cannot tell the difference between the first time and the future times it visited the state.

L={0n10nn0}L=\{0^n10^n\mid n\geq0\} is an example of a non-regular language; there is no way one can design a DFA that recognizes this language. The inability is caused by the variable nn that is used twice to control the number of zeros in the front matches the number of zeros at the back.

Proof of A Non-regular Language: The Pumping Lemma

Human brains are finite as well; there is no way for one to test all the possible DFAs to prove the non-regularity of a language. However, there is a lemma that can help us.

The pumping lemma says: if AA is a regular language, then \exists a number pp (the pumping length) where, if ss is any string in language AA of length at least pp, then ss may be divided into three pieces, s=xyzs=xyz such that:

  • y>0|y|>0, AND
  •   i0\forall \; i\geq 0, xyizAxy^iz\in A, AND,
  • xyp|xy|\leq p

With these properties, we can easily prove if a language is non-regular.

Limitation: Note that in practice, there are non-regular languages that follow the pumping lemma. Therefore, the pumping lemma can only be used to prove the non-regularity and it is never sufficient to prove if any language is regular.

Example 1: Prove L={0n1nn0}L=\{0^n1^n\mid n\geq 0\} is not regular.

Claim: The language L={0n1nn0}L=\{0^n1^n\mid n\geq 0\} is not regular.

Proof: Consider an arbitrary positive integer pp. WTS pp is not a pumping length for LL.

Assume the language L={0n1nn0}L=\{0^n1^n\mid n\geq 0\} is regular.

Consider the string s=0p1ps=0^p1^p, sLs\in L, since the language is regular, then ss can be pumped.

Apply the pumping lemma, the string can be divided into three parts:

s=xyz=0ix0jy0k1pz  where  i0,j>0,k0,i+j+k=p\begin{align*} s&=xyz \\ &=\underbrace{0^i}_x \underbrace{0^j}_y \underbrace{0^k1^p}_z\;\text{where}\;i\geq 0,j>0,k\geq0,i+j+k=p \end{align*}

Then by pumping lemma, we know that xy2zLxy^2z\in L, while i+2j+kpi+2j+k\geq p.

Thus, xy2zxy^2z is a string that has more zeros than ones, so xy2zLxy^2z\notin L. Contradiction!

Therefore, the language L={0n1nn0}L=\{0^n1^n\mid n\geq 0\} is non-regular.

Trick: it makes the proof clean and simple to make the first part of the string be length pp since there is only one way to split the string.

Example 2: Prove L={www{0,1}}L=\{ww\mid w\in \{0,1\}^*\} is not regular.

Claim: the language L={www{0,1}}L=\{ww\mid w\in \{0,1\}^*\} is not regular.

Proof: Assume the language L={www{0,1}}L=\{ww\mid w\in \{0,1\}^*\} is regular.

Consider the string 0n10n10^n10^n1 where n0n\geq 0.

Apply the pumping lemma, then   p:0p10p1L\exists\;p:0^p10^p1\in L.

Let s=0p10p1s=0^p10^p1, ss can be divided into three parts:

s=xyz=0ix0jy0k10p1z  where  i0,j>0,k0,i+j+k=p\begin{align*} s&=xyz \\ &=\underbrace{0^i}_x \underbrace{0^j}_y \underbrace{0^k10^p1}_z\;\text{where}\;i\geq 0,j>0,k\geq0,i+j+k=p \end{align*}

Then by pumping lemma, we know that xy2zLxy^2z\in L, while i+2j+kpi+2j+k\geq p.

Thus, xy2zxy^2z is a string that has more zeros in the first half than the latter half, so xy2zLxy^2z\notin L. Contradiction!

Therefore, the language L={www{0,1}}L=\{ww\mid w\in \{0,1\}^*\} is non-regular.

Trick: when trying to prove a very broad-defined language, we can pick the string pattern in our favor. As long as the picked string follows the defined pattern when proving its non-regularity, pumping can always push it to the original "space" where the language the not defined.

Context-free Language

A context-free language is a language that can be recognized by a Context-free Grammar (CFG) or a Pushdown Automaton (PDA).

A context-free language is closed under:

  • Union
  • Concatenation
  • Kleene Star

Context-free Grammar (CFG)

Pushdown Automaton (PDA)

Copyright © 2022 - 2024 Kiyoshi's BlogPowered by Nuxt.js • Designed by Kiyoshi