Characteristic samples explained
Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm
is given a set of pairs of strings and labels
, and returns a representation
that is consistent with
. Characteristic samples consider the scenario when the goal is not only finding a representation consistent with
, but finding a representation that recognizes a specific target language.
A characteristic sample of language
is a set of pairs of the form
where:
if and only if
if and only if
Given the characteristic sample
,
's output on it is a representation
, e.g. an automaton, that recognizes
.
Formal Definition
The Learning Paradigm associated with Characteristic Samples
There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm.
Given a class of languages
and a class of representations for the languages
, the paradigm goes as follows:
selects a language
and reports it to the teacher
then computes a set of strings and label them correctly according to
, trying to make sure that the inference algorithm will compute
- The adversary can add correctly labeled words to the set in order to confuse the inference algorithm
- The inference algorithm
gets the sample and computes a representation
consistent with the sample.
The goal is that when the inference algorithm receives a characteristic sample for a language
, or a sample that subsumes a characteristic sample for
, it will return a representation that recognizes exactly the language
.
Sample
Sample
is a set of pairs of the form
such that
Sample consistent with a language
We say that a sample
is consistent with language
if for every pair
in
:
l(s)=-1ifandonlyifs\notinL
Characteristic sample
Given an inference algorithm
and a language
, a sample
that is consistent with
is called a characteristic sample of
for
if:
's output on
is a representation
that recognizes
.
that is consistent with
and also fulfils
,
's output on
is a representation
that recognizes
.
A Class of languages
is said to have charistaristic samples if every
has a characteristic sample.
Related Theorems
Theorem
If equivalence is undecidable for a class over of cardinality bigger than 1, then doesn't have characteristic samples.[1]
Proof
Given a class of representations such that equivalence is undecidable, for every polynomial
and every
, there exist two representations
and
of sizes bounded by
, that recognize different languages but are inseparable by any string of size bounded by
. Assuming this is not the case, we can decide if
and
are equivalent by simulating their run on all strings of size smaller than
, contradicting the assumption that equivalence is undecidable.
Theorem
If
is a characteristic sample for
and is also consistent with
, then every characteristic sample of
, is inconsistent with
.
Proof
Given a class that has characteristic samples, let
and
be representations that recognize
and
respectively. Under the assumption that there is a characteristic sample for
,
that is also consistent with
, we'll assume falsely that there exist a characteristic sample for
,
that is consistent with
. By the definition of characteristic sample, the inference algorithm
must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample
, the answer of the inferring algorithm needs to recognize both
and
, in contradiction.
Theorem
If a class is polynomially learnable by example based queries, it is learnable with characteristic samples.[2]
Polynomialy characterizable classes
Regular languages
The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation,
(where
for
if and only if
\forallz\in\Sigma*:xz\inL\leftrightarrowyz\inL
). Note that if
,
are not congruent with respect to
, there exists a string
such that
but
or vice versa, this string is called a separating suffix.
Constructing a characteristic sample
The construction of a characteristic sample for a language
by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton
recognizing
, starting from its initial state, we get a suffix closed set of words,
, ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of
on them ends in the respective states. We refer to the set of separating suffixes as
. The labeled set (sample) of words the teacher gives the adversary is
\{(w,l(w))|w\inW ⋅ S\cupW ⋅ \Sigma ⋅ S\}
where
is the correct lable of
(whether it is in
or not). We may assume that
.
Constructing a deterministic automata
Given the sample from the adversary
, the construction of the automaton by the inference algorithm
starts with defining
and
, which are the set of prefixes and suffixes of
respectively. Now the algorithm constructs a matrix
where the elements of
function as the rows, ordered by the shortlex order, and the elements of
function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix
and suffix
:
- If
- else,
Now, we say row
and
are distinguishable if there exists an index
such that
. The next stage of the inference algorithm is to construct the set
of distinguishable rows in
, by initializing
with
and iterating from the first row of
downwards and doing the following for row
:
- If
is distinguishable from all elements in
, add it to
- else, pass on it to the next row
From the way the teacher constructed the sample it passed to the adversary, we know that for every
and every
, the row
exists in
, and from the construction of
, there exists a row
such that
and
are indistinguishable. The output automaton will be defined as follows:
.
- The initial state is the state corresponding to row
.
- The accepting states is the set
.
- The transitions function will be defined
, where
is the element in
that is indistinguishable from
.
Other polynomially characterizable classes
- Class of languages recognizable by multiplicity automatons[3]
- Class of languages recognizable by tree automata[4]
- Class of languages recognizable by multiplicity tree automata
- Class of languages recognizable by Fully-Ordered Lattice Automata
- Class of languages recognizable by Visibly One-Counter Automata[5]
- Class of fully informative omega regular languages
Non polynomially characterizable classes
There are some classes that do not have polynomially sized characteristic samples. For example, from the first theorem in the Related theorems segment, it has been shown that the following classes of languages do not have polynomial sized characteristic samples:
- The class of context-free grammars Languages over
of cardinality larger than
- The class of linear grammar languages over
of cardinality larger than
- The class of simple deterministic grammars Languages
- The class of nondeterministic finite automata Languages
Relations to other learning paradigms
Classes of representations that has characteristic samples relates to the following learning paradigms:
Class of semi-poly teachable languages
A representation class
is semi-poly
teachable if there exist 3 polynomials
, a teacher
and an inference algorithm
, such that for any adversary
the following holds:
Selects a representation
of size
from
computes a sample that is consistent with the language that
recognize, of size bounded by
and the strings in the sample bounded by length
adds correctly labeled strings to the sample computed by
, making the new sample of size
then computes a representation equivalent to
in time bounded by
The class of languages that there exists a polynomial algorithm that given a sample, returns a representation consistent with the sample is called consistency easy.
Polynomially characterizable languages
Given a representation class
, and
a set of identification algorithms for
,
is polynomially characterizable for
if any
has a characteristic sample of size polynomial of
's size,
, that for every
,
's output on
is
.
Releations between the paradigms
Theorem
A consistency-easy class
has characteristic samples if and only if it is semi-poly
teachable.
Proof
Assuming
has characteristic samples, then for every representation
, its characteristic sample
holds the conditions for the sample computaed by the teacher, and the output of
on every sample
such that
is equivalent to
from the definition of characteristic sample.
Assuming that
is semi-poly
teachable, then for every representation
, the computed sample by the teacher
is a characteristic sample for
.
Theorem
If
has characteristic sample, then
is polynomially characterizable.
Proof
Assuming falsely that
is not polynomially characterizable, there are two non equivalent representations
, with characteristic samples
and
respectively. From the definition of characteristic samples, any inference algorithm
need to infer from the sample
a representation compatible with
and
, in contradiction.
See also
References
- De La Higuera . Colin . 1997 . [No title found] ]. Machine Learning . 27 . 2 . 125–138 . 10.1023/A:1007353007695.
- Goldman . Sally A. . Mathias . H.David . April 1996 . Teaching a Smarter Learner . Journal of Computer and System Sciences . 52 . 2 . 255–267 . 10.1006/jcss.1996.0020 . 0022-0000.
- Beimel . Amos . Bergadano . Francesco . Bshouty . Nader H. . Kushilevitz . Eyal . Varricchio . Stefano . May 2000 . Learning functions represented as multiplicity automata . Journal of the ACM . 47 . 3 . 506–530 . 10.1145/337244.337257 . 0004-5411.
- Book: Burago, Andrey . Learning structurally reversible context-free grammars from queries and counterexamples in polynomial time . 1994 . 140–146 . Proceedings of the seventh annual conference on Computational learning theory - COLT '94 . http://dx.doi.org/10.1145/180139.181075 . New York, New York, USA . ACM Press . 10.1145/180139.181075. 0-89791-655-7 .
- Book: Berman . Piotr . Roos . Robert . Learning one-counter languages in polynomial time . October 1987 . 28th Annual Symposium on Foundations of Computer Science (SFCS 1987) . http://dx.doi.org/10.1109/sfcs.1987.36 . 61–67 . IEEE . 10.1109/sfcs.1987.36. 0-8186-0807-2 .