Characteristic samples explained

Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm

is given a set of pairs of strings and labels

, and returns a representation

that is consistent with

. Characteristic samples consider the scenario when the goal is not only finding a representation consistent with

, but finding a representation that recognizes a specific target language.

A characteristic sample of language

is a set of pairs of the form

(s,l(s))

where:

l(s)=1

if and only if

s\inL

l(s)=-1

if and only if

s\notinL

Given the characteristic sample

's output on it is a representation

, e.g. an automaton, that recognizes

Formal Definition

The Learning Paradigm associated with Characteristic Samples

There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm.

Given a class of languages

and a class of representations for the languages

, the paradigm goes as follows:

The adversary

selects a language

L\inC

and reports it to the teacher

The teacher

then computes a set of strings and label them correctly according to

, trying to make sure that the inference algorithm will compute

The adversary can add correctly labeled words to the set in order to confuse the inference algorithm
The inference algorithm

gets the sample and computes a representation

R\inR

consistent with the sample.

The goal is that when the inference algorithm receives a characteristic sample for a language

, or a sample that subsumes a characteristic sample for

, it will return a representation that recognizes exactly the language

Sample

is a set of pairs of the form

(s,l(s))

such that

l(s)\in\{-1,1\}

Sample consistent with a language

We say that a sample

is consistent with language

if for every pair

(s,l(s))

l(s)=1ifandonlyifs\inL

l(s)=-1ifandonlyifs\notinL

Characteristic sample

Given an inference algorithm

and a language

, a sample

that is consistent with

is called a characteristic sample of

for

if:

's output on

is a representation

that recognizes

For every sample

that is consistent with

and also fulfils

S\subseteqD

's output on

is a representation

that recognizes

A Class of languages

is said to have charistaristic samples if every

L\inC

has a characteristic sample.

Related Theorems

Theorem

If equivalence is undecidable for a class $\mathbb$ over $\Sigma$ of cardinality bigger than 1, then $\mathbb$ doesn't have characteristic samples.^[1]

Proof

Given a class of representations $\mathbb$ such that equivalence is undecidable, for every polynomial

p(x)

and every

n\inN

, there exist two representations

r₁

and

r₂

of sizes bounded by

, that recognize different languages but are inseparable by any string of size bounded by

p(n)

. Assuming this is not the case, we can decide if

r₁

and

r₂

are equivalent by simulating their run on all strings of size smaller than

p(n)

, contradicting the assumption that equivalence is undecidable.

Theorem

S₁

is a characteristic sample for

L₁

and is also consistent with

L₂

, then every characteristic sample of

L₂

, is inconsistent with

L₁

Proof

Given a class $\mathbb$ that has characteristic samples, let

R₁

and

R₂

be representations that recognize

L₁

and

L₂

respectively. Under the assumption that there is a characteristic sample for

L₁

S₁

that is also consistent with

L₂

, we'll assume falsely that there exist a characteristic sample for

L₂

S₂

that is consistent with

L₁

. By the definition of characteristic sample, the inference algorithm

must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample

S_1\cupS₂

, the answer of the inferring algorithm needs to recognize both

L₁

and

L₂

, in contradiction.

Theorem

If a class is polynomially learnable by example based queries, it is learnable with characteristic samples.^[2]

Polynomialy characterizable classes

Regular languages

The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation,

\sim_L

(where

x\sim_Ly

for

x,y\in\Sigma^*

if and only if

\forallz\in\Sigma^*:xz\inL\leftrightarrowyz\inL

). Note that if

are not congruent with respect to

\sim_L

, there exists a string

such that

xz\inL

but

yz\notinL

or vice versa, this string is called a separating suffix.

Constructing a characteristic sample

The construction of a characteristic sample for a language

by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton

recognizing

, starting from its initial state, we get a suffix closed set of words,

, ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of

on them ends in the respective states. We refer to the set of separating suffixes as

. The labeled set (sample) of words the teacher gives the adversary is

\{(w,l(w))|w\inW ⋅ S\cupW ⋅ \Sigma ⋅ S\}

where

l(w)

is the correct lable of

(whether it is in

or not). We may assume that

\epsilon\inS

Constructing a deterministic automata

Given the sample from the adversary

, the construction of the automaton by the inference algorithm

starts with defining

P=prefix(W)

and

S=suffix(W)

, which are the set of prefixes and suffixes of

respectively. Now the algorithm constructs a matrix

where the elements of

function as the rows, ordered by the shortlex order, and the elements of

function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix

p_i

and suffix

s_j

p_is_j\inW → M_ij=l(p_is_j)

else,

M_ij=0

Now, we say row

and

are distinguishable if there exists an index

such that

M_ij=-1 x M_tj

. The next stage of the inference algorithm is to construct the set

of distinguishable rows in

, by initializing

with

\epsilon

and iterating from the first row of

downwards and doing the following for row

r_i

is distinguishable from all elements in

, add it to

else, pass on it to the next row

From the way the teacher constructed the sample it passed to the adversary, we know that for every

s\inQ

and every

\sigma\in\Sigma

, the row

s\sigma

exists in

, and from the construction of

, there exists a row

s'\inQ

such that

and

s\sigma

are indistinguishable. The output automaton will be defined as follows:

The set of states is

The initial state is the state corresponding to row

\epsilon\inQ

The accepting states is the set

\{s\inQ|l(s)=1\}

The transitions function will be defined

\delta(s,\sigma)=s'

, where

is the element in

that is indistinguishable from

s\sigma

Other polynomially characterizable classes

Class of languages recognizable by multiplicity automatons^[3]
Class of languages recognizable by tree automata^[4]
Class of languages recognizable by multiplicity tree automata
Class of languages recognizable by Fully-Ordered Lattice Automata
Class of languages recognizable by Visibly One-Counter Automata^[5]
Class of fully informative omega regular languages

Non polynomially characterizable classes

There are some classes that do not have polynomially sized characteristic samples. For example, from the first theorem in the Related theorems segment, it has been shown that the following classes of languages do not have polynomial sized characteristic samples:

CFG

- The class of context-free grammars Languages over

\Sigma

of cardinality larger than

LING

- The class of linear grammar languages over

\Sigma

of cardinality larger than

SDG

- The class of simple deterministic grammars Languages

NFA

- The class of nondeterministic finite automata Languages

Relations to other learning paradigms

Classes of representations that has characteristic samples relates to the following learning paradigms:

Class of semi-poly teachable languages

A representation class

is semi-poly

T/L

teachable if there exist 3 polynomials

p,q,r

, a teacher

and an inference algorithm

, such that for any adversary

the following holds:

Selects a representation

of size

from

computes a sample that is consistent with the language that

recognize, of size bounded by

p(n)

and the strings in the sample bounded by length

q(n)

adds correctly labeled strings to the sample computed by

, making the new sample of size

then computes a representation equivalent to

in time bounded by

r(m)

The class of languages that there exists a polynomial algorithm that given a sample, returns a representation consistent with the sample is called consistency easy.

Polynomially characterizable languages

Given a representation class

, and

l{I}

a set of identification algorithms for

is polynomially characterizable for

l{I}

if any

R\inR

has a characteristic sample of size polynomial of

's size,

, that for every

I\inl{I}

's output on

Releations between the paradigms

Theorem

A consistency-easy class

has characteristic samples if and only if it is semi-poly

T/L

teachable.

Proof

Assuming

has characteristic samples, then for every representation

R\inC

, its characteristic sample

holds the conditions for the sample computaed by the teacher, and the output of

on every sample

such that

S\subseteqS'

is equivalent to

from the definition of characteristic sample.

Assuming that

is semi-poly

T/L

teachable, then for every representation

R\inC

, the computed sample by the teacher

is a characteristic sample for

Theorem

has characteristic sample, then

is polynomially characterizable.

Proof

Assuming falsely that

is not polynomially characterizable, there are two non equivalent representations

R_1,R₂\inC

, with characteristic samples

S₁

and

S₂

respectively. From the definition of characteristic samples, any inference algorithm

need to infer from the sample

S₁\cupS₂

a representation compatible with

R₁

and

R₂

, in contradiction.

References

De La Higuera . Colin . 1997 . [No title found] ]. Machine Learning . 27 . 2 . 125–138 . 10.1023/A:1007353007695.
Goldman . Sally A. . Mathias . H.David . April 1996 . Teaching a Smarter Learner . Journal of Computer and System Sciences . 52 . 2 . 255–267 . 10.1006/jcss.1996.0020 . 0022-0000.
Beimel . Amos . Bergadano . Francesco . Bshouty . Nader H. . Kushilevitz . Eyal . Varricchio . Stefano . May 2000 . Learning functions represented as multiplicity automata . Journal of the ACM . 47 . 3 . 506–530 . 10.1145/337244.337257 . 0004-5411.
Book: Burago, Andrey . Learning structurally reversible context-free grammars from queries and counterexamples in polynomial time . 1994 . 140–146 . Proceedings of the seventh annual conference on Computational learning theory - COLT '94 . http://dx.doi.org/10.1145/180139.181075 . New York, New York, USA . ACM Press . 10.1145/180139.181075. 0-89791-655-7 .
Book: Berman . Piotr . Roos . Robert . Learning one-counter languages in polynomial time . October 1987 . 28th Annual Symposium on Foundations of Computer Science (SFCS 1987) . http://dx.doi.org/10.1109/sfcs.1987.36 . 61–67 . IEEE . 10.1109/sfcs.1987.36. 0-8186-0807-2 .

Characteristic samples explained

Formal Definition

The Learning Paradigm associated with Characteristic Samples

Sample

Sample consistent with a language

Characteristic sample

Related Theorems

Theorem

Proof

Theorem

Proof

Theorem

Polynomialy characterizable classes

Regular languages

Constructing a characteristic sample

Constructing a deterministic automata

Other polynomially characterizable classes

Non polynomially characterizable classes

Relations to other learning paradigms

Class of semi-poly teachable languages

Polynomially characterizable languages

Releations between the paradigms

Theorem

Proof

Theorem

Proof

See also

References