A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “(α)?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.
More formally, a syntactic predicate is a form of production intersection, used in parser specifications or in formal grammars. In this sense, the term predicate has the meaning of a mathematical indicator function. If p1 and p2, are production rules, the language generated by both p1 and p2 is their set intersection.
As typically defined or implemented, syntactic predicates implicitly order the productions so that predicated productions specified earlier have higher precedence than predicated productions specified later within the same decision. This conveys an ability to disambiguate ambiguous productions because the programmer can simply specify which production should match.
Parsing expression grammars (PEGs), invented by Bryan Ford, extend these simple predicates by allowing "not predicates" and permitting a predicate to appear anywhere within a production. Moreover, Ford invented packrat parsing to handle these grammars in linear time by employing memoization, at the cost of heap space.
It is possible to support linear-time parsing of predicates as general as those allowed by PEGs, but reduce the memory cost associated with memoization by avoiding backtracking where some more efficient implementation of lookahead suffices. This approach is implemented by ANTLR version 3, which uses Deterministic finite automata for lookahead; this may require testing a predicate in order to choose between transitions of the DFA (called "pred-LL(*)" parsing).[1]
The term syntactic predicate was coined by Parr & Quong and differentiates this form of predicate from semantic predicates (also discussed).[2]
Syntactic predicates have been called multi-step matching, parse constraints, and simply predicates in various literature. (See References section below.) This article uses the term syntactic predicate throughout for consistency and to distinguish them from semantic predicates.
Bar-Hillel et al.[3] show that the intersection of two regular languages is also a regular language, which is to say that the regular languages are closed under intersection.
The intersection of a regular language and a context-free language is also closed, and it has been known at least since Hartmanis[4] that the intersection of two context-free languages is not necessarily a context-free language (and is thus not closed). This can be demonstrated easily using the canonical Type 1 language,
L=\{anbncn:n\ge1\}
Let
L1=\{ambncn:m,n\ge1\}
L2=\{anbncm:m,n\ge1\}
L3=L1\capL2
Given the strings , , and , it is clear that the only string that belongs to both L1 and L2 (that is, the only one that produces a non-empty intersection) is .
In most formalisms that use syntactic predicates, the syntax of the predicate is noncommutative, which is to say that the operation of predication is ordered. For instance, using the above example, consider the following pseudo-grammar, where X ::= Y PRED Z is understood to mean: "Y produces X if and only if Y also satisfies predicate Z":
S ::= a X X ::= Y PRED Z Y ::= a+ BNCN Z ::= ANBN c+ BNCN ::= b [BNCN] c ANBN ::= a [ANBN] b
Given the string , in the case where Y must be satisfied first (and assuming a greedy implementation), S will generate aX and X in turn will generate , thereby generating . In the case where Z must be satisfied first, ANBN will fail to generate , and thus is not generated by the grammar. Moreover, if either Y or Z (or both) specify any action to be taken upon reduction (as would be the case in many parsers), the order that these productions match determines the order in which those side-effects occur. Formalisms that vary over time (such as adaptive grammars) may rely on these side effects.
Parr & Quong give this example of a syntactic predicate:
In the first production of rule stat, the syntactic predicate (declaration)? indicatesthat declaration is the syntactic context that must be present for the rest of that production to succeed. We can interpret the use of (declaration)? as "I am not sure ifdeclaration will match; let me try it out and, if it does not match, I shall try the nextalternative." Thus, when encountering a valid declaration, the rule declaration will berecognized twice—once as syntactic predicate and once during the actual parse to execute semantic actions.
Of note in the above example is the fact that any code triggered by the acceptance of the declaration production will only occur if the predicate is satisfied.
The language
L=\{anbncn|n\ge1\}
Using a bound predicate:
S → B
A → X 'c+' X → 'a' [X] 'b' B → 'a+' Y Y → 'b' [Y] 'c'
Using two free predicates:
A → <'a+'>a <'b+'>b Ψ(a b)X <'c+'>c Ψ(b c)Y
X → 'a' [X] 'b' Y → 'b' [Y] 'c'
(Note: the following example actually generates
L=\{anbncn|n\ge0\}
S → AB&DC A → aA | ε B → bBc | ε C → cC | ε D → aDb | ε
Although by no means an exhaustive list, the following parsers and grammar formalisms employ syntactic predicates:
<nowiki><before ...></nowiki>
" or "<nowiki><!before ...></nowiki>
" (that is: "not before"). Perl 5 also has such lookahead, but it can only encapsulate Perl 5's more limited regexp features.. Terence Parr . The Definitive ANTLR Reference: Building Domain-Specific Languages . . 2007 . 328 . 978-3-540-63293-1 .
. Parr . Terence J. . Terence Parr . Quong . Russell . Adding Semantic and Syntactic Predicates to LL(k) parsing: pred-LL(k) . Adding semantic and syntactic predicates to LL(k): Pred-LL(k) . Lecture Notes in Computer Science . 263–277 . Army High Performance Computing Research Center Preprint No. 93-096 . October 1993 . 786 . 10.1007/3-540-57877-3_18 . 978-3-540-57877-2 . 10.1.1.26.427 . https://www.infona.pl/resource/bwmeta1.element.springer-2220fd34-4f51-3df0-82d8-f7d1e5244b9f . August 26, 2023 .