Sunday, July 1, 2018

graph equivalence 6: optimizations

<<Prev Next>>

I've already mentioned some possible optimizations but let's have another look at them.

The first optimization is that whenever a node gets assigned an ID, this ID can also be used as its hash H(0), or more exactly, since IDs start with 0, H(0) = ID+1, to avoid having the hash value of 0. It's not a very good hash but this sidesteps the whole problem of making sure that H(0) of each node with the  assigned ID is unique. It also makes easy the checking that the hashes of the other nodes don't collide with the hashes of the nodes with the assigned IDs, just by checking that their value is greater than N.

The second optimization is something I've already talked about: the simplified computation of H(D+1) that goes like this:

1. Combine S(D) of this node and of all the directly connected nodes to produce S(D+1).
2. If the node has an ID assigned, assume H(D+1) = H(0) and stop. Otherwise continue with the next steps.
3. Include the hash H(0) at distance 0 of the node itself into H(D+1).
4. Iterate through the links connected to this node in the order of their tags (pre-sort them at the start of the algorithm). Within the links of the same tag, combine the previous hashes H(D) of the nodes at the other end of the links in a commutative way, such as by adding them together. Then include the tag of the link and the combined hashes of the nodes into H(D+1).


Skipping the sorting of the nodes on the links with the same tag reduces the time to O(N). Then the main algorithm can be re-shuffled a bit by moving the search for the unique nodes into the outer loop:

1. While there are any nodes with unassigned ID:
  1.1. Repeat for D=0...N+1
    1.1.1. For each node, compute H(D), S(D).
    1.1.2. If all S(D)==S(D-1), break out of the loop (it means that the whole topology is included).
  1.2. Order nodes in the ascending order by (ID, H(D)), the unassigned IDs go after all the assigned IDs.
  1.3. See if any of the nodes with unassigned ID have unique H(D). If any of them do, assign the IDs for them, continuing from the last assigned ID, and going in the order of their H(D).
  1.4.  If there still are any nodes with unassgined ID, take one arbitrary node with the lowest H(D) and assign the next ID in sequence to it.

This way the computation of the loop in 1.1 would take O(N^2),. The sorting in 1.2 would take O(N*log(N)), the iteration in 1.3 will take O(N), and 1.4 is O(1). The largest one there is O(N^2). Then we multiply it by the outer loop executing N times and get O(N^3). So this optimization saved the factor of log(N) at the run time.

The third optimization is kind of straightforward: when sorting the nodes by H(D), there is really no need to include the nodes with the assigned IDs into the sort, they will stay first anyway. It's not a very might optimization but saves a little.

The fourth optimization is that the signatures for the graphs can be computed in parallel before comparison. So if you have a million graphs, that allows for some impressive parallelism.

<<Prev Next>>