The general graphs are worse, the "redundant links" in them that aren't in the tree present a major problem. Take a look at the "bowtie" graph from the part 3:
. A D |\ /| | C | |/ \| B E
If we try to follow the algorithm for the tree, start from the root C and go into the subtree rooted at A, then we would have to go to the 2nd-level subtree rooted at B, then to the 3rd-level subtree rooted at C, and get stuck in an endless recursion. This is not good.
So my first successful attempt at solving this was:
- For each node, find a spanning tree with this node as root. Build the tree from the shortest paths from the root to each node (not only in terms of the number of links in the path but also in terms of the sorting of the tags on the path).
- Build the root node's signature from this spanning tree.
- Then append the information about the redundant links that are not included in the tree:
- Each redundant link can be described as the tag on it plus the paths from both of its ends to the root. For two redundant links to be equivalent, their tags must be equal and all the subtrees rooted at every node included in these paths from the end to the root have to be equivalent.
- So build the lists of such subtrees for paths from both sides of the link, and order these two lists by the usual subtree comparison criteria.
- Then build the signature of the redundant link from its tag and two lists of subtrees, lists ordered between themselves by the subtree order.
- Then order the redundant links according to their signatures.
- Then add the signatures of the redundant links in this order to the end of root node's signature.
- And then the nodes can be compared by these signatures as before, and the rest of the algorithm is the same.
I believe this would work but in a kind of slow and painful way, even slower than the algorithm for the trees (although for all I can tell, it's still better than the factorial complexity, just the power of the polynome is kind of high).
Then I've come up with the hash-based version that I described first, and then I've found the explanation of how it works.
Basically, the problem with the algorithm for the tree is that it recurses in depth, and with loops present this depth goes to infinity. The alternative is to limit the depth and to build the signatures of all the nodes in parallel.
We start with the signatures of depth 0 for each node, formed of the tag on the node and of the sorted tags of the links connected to it.
The signatures of depth 1 are formed from the tag on the node, followed by the sorted links and signatures of depth 0 of the nodes from the other ends of these links.
The signatures of depth 2 are formed from the tag on the node, followed by the sorted links and signatures of depth 1 of the nodes from the other ends of these links.
And so on. We stop when all the signatures include all the nodes and all the links. But when we include all the nodes and do one more step, this guarantees that all the remaining links get included too, so this can be used as an equivalent stopping condition.
The signatures we get from this fully represent the topology of the graph, and are dependent only on the topology. So when we order the nodes according to these signatures, we can use the rest of the algorithm from the tree unchanged, assigning the IDs to the unique nodes in this order, and then picking one node from the first equivalency class.
Well, the hashes are the short and efficient way to represent these node signature strings, and ignoring the hash collisions, these algorithms are the same.
I'll describe the solution for the hash collisions in the next part.