Tuesday, July 9, 2019

graph equivalence 7: why it DOESN'T work, and a conjecture


I've got around to read a bit more about the graph equivalence (i.e. isomorphism) problem. Looks like the state of the art algorithm is called "nauty". I don't understand it yet, I haven't found a straightforward enough description nor understood the complex kind of description. But based on the introductory page http://pallini.di.uniroma1.it/Introduction.html,  some things are easy to notice.

First, they use the same signature-based approach for comparison of the graphs, they call it the "canonical labeling".

Second, they use the same approach of splitting the set of nodes into the equivalency classes, they call it "coloring" and "partition refinement". Which is good, this means that I've reinvented something that is known to work in general.

Third, right that page contains an example of a graph on which my algorithm doesn't work. It's the pink one in the section "Refinement: equitable partition". Here is a copy, with names assigned to the nodes:

.              -----A-----
.             /     |     \
.            /      |      \
.           B-------+------ H
.           |       |       |
.           |       |       |
.           C-------+-------G
.           |       |       |
.           |       |       |
.           D-------+-------F
.            \      |      /
.             \     |     /
.              -----E-----

I've been starting with the premise that the nodes would have tags, and the links will be directional, but with a basic unlabelled non-directional graph the things are much harder. All you've got to start with is the number of links in every node. If there is some asymmetry, it will be used to unravel the dependencies. But if every node has the same number of links then the logic is stuck, and as this example shows, it's possible to make all the nodes have the same number of links but not be all symmetrical.

Before I've read further about the existing algorithm, I want to take another independent stab at the solution. As I've described in the Part 4, I've started with the idea of describing the graph topology in terms of the spanning trees, rooted at each of the nodes, and then tried to simplify it with the hashes. This simplification has turned out to be wrong. So let's return to the spanning trees (with redundant links on the side), and take another approach to avoid the circular dependency. When computing the equivalency class (AKA "hash" AKA "color") of each node on each top-level iteration of the algorithm, instead of looking at the immediate neighbor nodes, let's take this node as the root of the spanning tree and look at its relation with each other node by building a path from the root to that node (if the graph is disconnected and no path exists, take an empty path, or just don't enter any path for that node). To account for the possible redundant links, we could find multiple paths: remove all the links but one from the destination node, find a path, then return to the original graph, and remove all but a different single link, find another path, and so on.

The path in general would include the tags on the nodes, the number of links connected to them (in the original graphs), and so on, to make it more distinctive. But if the nodes are untagged (initially belonging to the same equivalence class), and all have the same number of links, the only distinction between the paths would be their length.

Let's build the paths:

Root A:
  Node B can be reached via the paths:
    using the link AB: AB - 1 step
    using the link HB: AHB - 2 steps
    using the link CB: AHGCB or AEDCB - 4 steps
  Node C can be reached via the paths:
    ABC - 2 steps
    AHGC - 3 steps
    AEDC - 3 steps
  Node D can be reached via the paths:
    AED - 2 steps
    ABCD - 3 steps
    AEFD - 3 steps
  Node E can be reached via the paths:
    AE - 1 step
    ABCDE - 4 steps
    AHGFE - 4 steps
  Node F is similar to D
  Node G is similar to C
  Node H is similar to B

So overall the set of paths going from the root A can be described in the shorter form as:
  2x(1, 2, 4); (1, 4, 4); 4x(2, 3, 3)

Root B (using the shorter notation):
  Node A: 1, 2, 4
  Node C: 1, 3, 4
  Node D: 2, 3, 4
  Node E: 2, 3, 4
  Node F: 3, 3, 4
  Node G: 2, 2, 4
  Node H: 1, 2, 3

The resulting set of paths from the root B:
  (1, 2, 3); (1, 2, 4); (1, 3, 4); (2, 2, 4); 2x(2, 3, 4); (3, 3, 4)

Root C: (using the shorter notation):
  Nodes A and E: 2, 3, 3
  Nodes B and D: 1, 3, 4
  Nodes F and H: 2, 2, 3
  Node G: 1, 3, 3

The resulting set of paths from the root C:
  (1, 3, 3); 2x(1, 3, 4); 2x(2, 2, 3); 2x(2, 3, 3)

The node E will have the same paths as A; G will have the same paths as C; nodes D, F, H will have the same paths as B. As you can see, these three classes of nodes will have distinct sets of paths, so they will be successfully split into three classes indeed.

This solves the problem for this specific example. Would it solve every possible example? I don't know, right now I can't think of a proof one way or another.

But let's estimate, how difficult would it be to build these paths. If there are N nodes and total L links, first we'll have the loop that goes over every of the N root nodes. Then it would go over each of the (N-1) target nodes, and for each one of them go over up to (N-1) links that enter it, and find the shortest (or, with tags, also "lexicographically first") path between the root and the target node (which shouln't take longer than L steps). Then we'll sort and collate the paths to each target, and concatenate them into the sorted set of paths, which should take within O(N^3). And then sort and collate the nodes by their paths, which should be within O(N^4) or so. Which means that it's all still polynomial.

So, if this approach works for every possible example, that would mean that it could solve each possible example in polynomial time. I've grown a bit more pessimistic about that, so probably it doesn't work for every possible example. But it would be interesting to understand, why.

As a side note, I've tried to reduce this graph a bit, and apply my new algorithm to the reduced version:

.              -----A-----
.             /     |     \
.            /      |      \
.           B-------+------ F
.           |       |       |
.           |       |       |
.           C-------+-------E
.            \      |      /
.             \     |     /
.              -----D-----

And the algorithm produced the same path sets for every node. Only then did I notice that in the reduced graph all the nodes really are equivalent! This graph can be seen as a triangular prism, with the top face ABF and bottom face CDE:

.              A
.             /|\
.            / | \
.           B--+- F
.           |  |  |
.           |  |  |
.           |  D  |
.           | / \ |
.           |/   \|
.           C-----E

And all the corners in this prism really are equivalent! The extra horizontal line in the original example adds the magic to make the nodes different.

Another side note: it's easy to construct a similar directional graph too. Just replace each "horizontal" and "vertical" non-directional link that has a pair of nodes at the end with a pair of arrows, one going one way, another one going back (between the same pair of nodes, or inserting another pair of nodes for the "back" link below the original pair). And as a final touch, turn the links that go "around the circle" into the arrows, all of them going either clockwise or counter-clockwise.

P.S. A good reference page: http://algorist.com/problems/Graph_Isomorphism.html


No comments:

Post a Comment