Colour refinement algorithm explained
In graph theory and theoretical computer science, the colour refinement algorithm also known as the naive vertex classification, or the 1-dimensional version of the Weisfeiler-Leman algorithm, is a routine used for testing whether two graphs are isomorphic.[1] While it solves graph isomorphism on almost all graphs, there are graphs such as all regular graphs that cannot be distinguished using colour refinement.
Description
The algorithm takes as an input a graph
with
vertices. It proceeds in iterations and in each iteration produces a new colouring of the vertices. Formally a "
colouring" is a function from the vertices of this graph into some set (of "colours"). In each iteration, we define a sequence of vertex colourings
as follows:
is the initial colouring. If the graph is unlabelled, the initial colouring assigns a trivial colour
to each vertex
. If the graph is labelled,
is the label of vertex
.
, we set
λi+1=\left(λi(v),\{\{λi(w)\midwisaneighborofv\}\}\right)
.
In other words, the new colour of the vertex
is the pair formed from the previous colour and the
multiset of the colours of its neighbours.This algorithm keeps refining the current colouring. At some point it stabilises, i.e.,
. This final colouring is called the
stable colouring.
Graph Isomorphism
Colour refinement can be used as a subroutine for an important computational problem: graph isomorphism. In this problem we have as input two graphs
and our task is to determine whether they are
isomorphic. Informally, this means that the two graphs are the same up to relabelling of vertices.
To test if
and
are isomorphic we could try the following. Run colour refinement on both graphs. If the stable colourings produced are different we know that the two graphs are not isomorphic. However, it could be that the same stable colouring is produced despite the two graphs not being isomorphic; see below.
Complexity
It is easy to see that if colour refinement is given a
vertex graph as input, a stable colouring is produced after at most
iterations. Conversely, there exist graphs where this bound is realised. This leads to a
implementation where
is the number of vertices and
the number of edges.
[2] This complexity has been proven to be optimal under reasonable assumptions.
[3] Expressivity
We say that two graphs
and
are
distinguished by colour refinement if the algorithm yields a different output on
as on
. There are simple examples of graphs that are not distinguished by colour refinement. For example, it does not distinguish a cycle of length 6 from a pair of triangles (example V.1 in
[4]). Despite this, the algorithm is very powerful in that a random graph will be identified by the algorithm asymptotically almost surely.
[5] Even stronger, it has been shown that as
increases, the proportion of graphs that are
not identified by colour refinement decreases exponentially in order
.
[6] The expressivity of colour refinement also has a logical characterisation: two graphs can be distinguished by colour refinement if and only if they can be distinguished by the two variable fragment of first order logic with counting.[7]
Notes and References
- Book: https://doi.org/10.7551/mitpress/10548.003.0023 . 10.7551/mitpress/10548.003.0023 . Color Refinement and Its Applications . An Introduction to Lifted Probabilistic Inference . 2021 . 9780262365598 . Grohe . Martin . Kersting . Kristian . Mladenov . Martin . Schweitzer . Pascal . 59069015 .
- Cardon . A. . Crochemore . M. . 1982-07-01 . Partitioning a graph in O(¦A¦log2¦V¦) . Theoretical Computer Science . en . 19 . 1 . 85–98 . 10.1016/0304-3975(82)90016-0 . 0304-3975. free .
- Berkholz . Christoph . Bonsma . Paul . Grohe . Martin . 2017-05-01 . Tight Lower and Upper Bounds for the Complexity of Canonical Colour Refinement . Theory of Computing Systems . en . 60 . 4 . 581–614 . 10.1007/s00224-016-9686-0 . 12616856 . 1433-0490. free . 1509.08251 .
- Book: Grohe, Martin . 2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) . The Logic of Graph Neural Networks . 2021-06-29 . https://doi.org/10.1109/LICS52264.2021.9470677 . LICS '21 . New York, NY, USA . Association for Computing Machinery . 1–17 . 10.1109/LICS52264.2021.9470677 . 2104.14624 . 978-1-6654-4895-6. 233476550 .
- Babai . László . Erdo˝s . Paul . Selkow . Stanley M. . Random Graph Isomorphism . SIAM Journal on Computing . August 1980 . 9 . 3 . 628–635 . 10.1137/0209047 . en . 0097-5397.
- Book: Babai . L. . Kucera . K.. 20th Annual Symposium on Foundations of Computer Science (SFCS 1979) . Canonical labelling of graphs in linear average time . https://ieeexplore.ieee.org/document/4567999/;jsessionid=3nsdXVGO6TSsLdiJnZQ6slDYPxa-Qyh0XugyK5ti0b5TpRiyrKyo!-452107954 . 2024-01-18 . 1979 . 39–46 . 10.1109/SFCS.1979.8 .
- Grohe, Martin. "Finite variable logics in descriptive complexity theory." Bulletin of Symbolic Logic 4.4 (1998): 345-398.