Nested word explained
In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words,so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.[1]
Formal definition
, the notation
denotes the set
\{1,2,\ldots,\ell-1,\ell\}
, with the special case
.
A matching relation ↝ of length
is a subset of
\{-infty,1,2,\ldots,\ell-1,\ell\} x \{1,2,\ldots,\ell-1,\ell,infty\}
such that:
- all nesting edges are forward, that is, if then ;
- nesting edges never have a finite position in common, that is, for, there is at most one position h such that, and there is at most one position j such that i ↝ j; and
- nesting edges never cross, that is, there are no such that both and .
A position i is referred to as
- a call position, if i ↝ j for some j,
- a pending call if i ↝ ∞,
- a return position, if h ↝ i for some h,
- a pending return if −∞ ↝ i, and
- an internal position in all remaining cases.
A nested word of length
over an
alphabet Σ is a pair (
w,↝), where
w is a word, or
string, of length
over Σ and ↝ is a matching relation of length
.
Encoding nested words into ordinary words
Nested words over the alphabet
\Sigma=\{a1,a2,\ldots,an\}
can be encoded into "ordinary" words over the
tagged alphabet
, in which each symbol
a from Σ has three tagged counterparts: thesymbol
⟨a for encoding a call position in a nested word labelled with
a, the symbol
a⟩ for encoding a return position labelled with
a, and finally the symbol
a itself for representing an internal position labelled with
a. More precisely, let
φ be the function mapping nested words over Σ to words over
such that each nested word (
,↝) is mapped to the word
, where the letter
equals
⟨a,
a, and
a⟩, if
and
i is a (possibly pending) call position, an internal position, and a (possibly pending) return position, respectively.
Example
For illustration, let be the nested word over a ternary alphabet with and matching relation . Then its encoding as word reads as .
Automata
Nested word automaton
A nested word automaton has a finite number of states, and operates in almost the same way as a deterministic finite automaton on classical strings: a classical finite automaton reads the input word
from left to right, and the state of the automaton after reading the
jth letter
depends on the state in which the automaton was before reading
.
In a nested word automaton, the position
in the nested word (w,↝) might be a return position; if so, the state after reading
will not only depend on the
linear state in which the automaton was before reading
, but also on a
hierarchical state propagated by the automaton at the time it was in the corresponding call position. In analogy to
regular languages of words, a set
L of nested words is called
regular if it is accepted by some (finite-state) nested word automaton.
Visibly pushdown automaton
Nested word automata are an automaton model accepting nested words. There is an equivalent automaton model operating on (ordinary) words. Namely, the notion of a deterministic visibly pushdown automaton is a restriction of the notion of a deterministic pushdown automaton.
Following Alur and Madhusudan, a deterministic visibly pushdown automaton is formally defined as a 6-tuple
M=(Q,\hat{\Sigma},\Gamma,\delta,q0,F)
where
is a finite set of
states,
is the
input alphabet, which – in contrast to that of ordinary pushdown automata – is partitioned into three sets
,
, and
. The alphabet
denotes the set of
call symbols,
contains the
return symbols, and the set
contains the
internal symbols,
is a finite set which is called the
stack alphabet, containing a special symbol
denoting the empty stack,
\delta=\deltac\cup\deltar\cup\deltaint
is the
transition function, which is partitioned into three parts corresponding to call transitions, return transitions, and internal transitions, namely
\deltac\colonQ x \Sigmac\toQ x \Gamma
, the
call transition function\deltar\colonQ x \Sigmar x \Gamma\toQ
, the
return transition function\deltaint:Q x \Sigmaint\toQ
, the
internal transition function,
is the
initial state, and
is the set of
accepting states.
The notion of computation of a visibly pushdown automaton is a restriction of the one used for pushdown automata. Visibly pushdown automata only add a symbol to the stack when reading a call symbol
, they only remove the top element from the stack when reading a return symbol
and they do not alter the stack when reading an internal event
. A computation ending in an accepting state is an
accepting computation.
As a result, a visibly pushdown automaton cannot push to and pop from the stack with the same input symbol. Thus the language
cannot be accepted by a visibly pushdown automaton for any partition of
, however there are pushdown automata accepting this language.
over a tagged alphabet
is accepted by a deterministic visibly pushdown automaton, then
is called a
visibly pushdown language.
Nondeterministic visibly pushdown automata
Nondeterministic visibly pushdown automata are as expressive as deterministic ones. Hence one can transform a nondeterministic visibly pushdown automaton into a deterministic one, but if the nondeterministic automaton had
states, the deterministic one may have up to
states.
Decision problems
Let
be the size of the description of an automaton
, then it is possible to check if a word
n is accepted by the automaton in time
. In particular, the emptiness problem is solvable in time
.If
is fixed, it is decidable in time
and space
where
is the depth of
n in a streaming seeing. It is also decidable with space
and time
, and by a uniform Boolean circuit of depth
.
For two nondeterministic automata A and B, deciding whether the set of words accepted by A is a subset of the word accepted by B is EXPTIME-complete. It is also EXPTIME-complete to figure out if there is a word that is not accepted.
Languages
As the definition of visibly pushdown automata shows, deterministic visibly pushdown automata can be seen as a special case of deterministic pushdown automata; thus the set VPL of visibly pushdown languages over
forms a subset of the set
DCFL of
deterministic context-free languages over the set of symbols in
. In particular, the function that removes the matching relation from nested words transforms regular languages over nested words into context-free languages.
Closure properties
The set of visibly pushdown languages is closed under the following operations:
- set operations:
- union
- intersection
- complement,
thus giving rise to a Boolean algebra.
For the intersection operation, one can construct a VPA M simulating two given VPAs
and
by a simple product construction : For
, assume
is given as
(Qi, \hat{\Sigma}, \Gammai, \deltai, si, Zi, Fi)
. Then for the automaton
M, the set of states is
, the initial state is
, the set of final states is
, the stack alphabet is given by
, and the initial stack symbol is
.
If
is in state
on reading a
call symbol
, then
pushes the stack symbol
and goes to state
, where
is the stack symbol pushed by
when transitioning from state
to
on reading input
.
If
is in state
on reading an
internal symbol
, then
goes to state
, whenever
transitions from state
to
on reading
a.
If
is in state
on reading a
return symbol
, then
pops the symbol
from the stack andgoes to state
, where
is the stack symbol popped by
when transitioning from state
to
on reading
.
Correctness of the above construction crucially relies on the fact that the push and pop actions of the simulatedmachines
and
are synchronized along the input symbols read. In fact, a similar simulation is no longer possible for
deterministic pushdown automata, as the larger class of deterministic context-free languages is no longer closed under intersection.
In contrast to the construction for concatenation shown above, the complementation construction for visibly pushdown automata parallels the standard construction[2] for deterministic pushdown automata.
Moreover, like the class of context free languages the class of visibly pushdown languages is closed under prefix closure and reversal, hence also suffix closure.
Relation to other language classes
point out that the visibly pushdown languages are more general than the parenthesis languages suggested in . As shown by, the visibly pushdown languages in turn are strictly contained in the class of languages described by operator-precedence grammars, which were introduced by and enjoy the same closure properties and characteristics (see for ω languages and logic and automata-based characterizations). In comparison to conjunctive grammars, a generalization of context-free grammars, shows that the linear conjunctive languages form a superclass of the visibly pushdown languages. The table at the end of this article puts the family of visibly pushdown languages in relation to other language families in the Chomsky hierarchy.Rajeev Alur and Parthasarathy Madhusudan[3] [4] related a subclass of regular binary tree languages to visibly pushdown languages.
Other models of description
Visibly pushdown grammars
Visibly pushdown languages are exactly the languages that can be described by visibly pushdown grammars.
Visibly pushdown grammars can be defined as a restriction of context-free grammars. A visibly pushdown grammar G is defined by the 4-tuple:
G=(V=V0\cupV1,\Sigma,R,S)
where
and
are disjoint finite sets; each element
is called
a non-terminal character or a
variable. Each variable represents a different type of phrase or clause in the sentence. Each variable defines a sub-language of the language defined by
, and the sub-languages of
are the one without pending calls or pending returns.
is a finite set of
terminals, disjoint from
, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar
.
is a finite relation from
to
such that
\existw\in(V\cup\Sigma)*:(S,w)\inR
. The members of
are called the
(rewrite) rules or
productions of the grammar. There are three kinds of rewrite rules. For
,
and
and if
then
and
and if
then
is the
start variable (or
start symbol), used to represent the whole sentence (or program).
Here, the asterisk represents the Kleene star operation and
is the empty word.
Uniform Boolean circuits
The problem whether a word of length
is accepted by a given nested word automaton can be solved by uniform
Boolean circuits of depth
.
Logical description
Regular languages over nested words are exactly the set of languages described by monadic second-order logic with two unary predicates call and return, linear successor and the matching relation ↝.
See also
Notes
- https://scholar.google.com/scholar?as_q=&as_oq=%22nested+words%22+%22visibly+pushdown%22 Google Scholar search results
- .
- Book: 10.1145/1007352.1007390 . Alur . R. . Madhusudan. 978-1581138528 . P.. Visibly pushdown languages . Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04 . 202–211. 2004 . 7473479 . http://www.cis.upenn.edu/~alur/Stoc04.pdf. Sect.4, Theorem 5,
- 10.1145/1516512.1516518. Alur . R. . Madhusudan . P. . Adding nesting structure to words . Journal of the ACM . 56 . 3 . 1–43 . 2009 . 10.1.1.145.9971. 768006 . Sect.7
References
- Floyd . R. W. . Robert W. Floyd . Syntactic Analysis and Operator Precedence . Journal of the ACM . 10 . 3 . 316–333 . July 1963 . 10.1145/321172.321179. 19785090 . free .
- McNaughton . R. . Parenthesis Grammars . Journal of the ACM . 14 . 3 . 490–500 . 1967 . 10.1145/321406.321411. 10926200 . free .
- 10.2168/LMCS-4(4:11)2008 . Alur . R. . Arenas . M. . Barcelo . P. . Etessami . K. . Immerman . N. . Libkin . L. . Grädel . Erich . First-Order and Temporal Logics for Nested Words . Logical Methods in Computer Science . 4 . 4 . 2008 . 0811.0537. 220091601 .
- Crespi Reghizzi. Stefano. Mandrioli. Dino. Operator precedence and the visibly pushdown property. Journal of Computer and System Sciences. 2012. 78. 6. 1837–1867. 10.1016/j.jcss.2011.12.006. free.
- Lonati. Violetta. Mandrioli. Dino. Panella. Federica. Pradella. Matteo. Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization. SIAM Journal on Computing. 2015. 44. 4. 1026–1088. 10.1137/140978818. 2434/352809. free.
- Okhotin, Alexander: Comparing linear conjunctive languages to subfamilies of the context-free languages, 37th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2011).
- Book: John E. . Hopcroft . Jeffrey D. . Ullman . Introduction to Automata Theory, Languages, and Computation. 1979. Addison-Wesley. 978-0-201-02988-8. Introduction to Automata Theory, Languages, and Computation .
External links