Tuesday, September 19, 2017

Reversible computing

I've read an article in the IEEE magazine about he reversible computing: https://spectrum.ieee.org/computing/hardware/the-future-of-computing-depends-on-making-it-reversible

I've never thought about it before, that the normal switching in the circuits dissipates the energy continuously, and the way to fix it would be to make sure that the energy is always shuffled around and never dissipated (although of course there will always be the "friction losses"). I haven't found that much other information on the Internet. So I've started thinking about how such a reversible computing can be implemented, based on the example from the article. And I think I've come up with a decently simple implementation on the digital level, not sure why the article says that it's complicated. Of course, it might end up extremely complicated in the implementation, I know nothing about that. So, here is what I've come up with:

The first obvious problem was how do the values ("billiard balls" from the mechanical analogy) get returned back? That has a kind of obvious fundamental solution: in all the digital automata the information fundamentally goes in circles like this:

.
          +-----+ 
          |     |---> output   
          |     |   
          |     |    +------+
          |     |    |      |
input --->|logic|--->|memory|--+
          |     |    |      |  |
      +-->|     |    +------+  |
      |   +-----+              |              
      +------------------------+

Well, mostly goes in circles, there are inputs and outputs. So the next problem is how to make the inputs and outputs work. Note that the inputs don't come from nowhere, and outputs don't go into nowhere, instead they connect to the other circuits. So the inputs and outputs can be also made in the form of cycles, with the interaction between the output of one circuit and input of another circuit:


.
output --->-+ | +-<-- recycle
            | > |
recycle <---+ | +---> input



The output/input interaction can be represented as a function with the inputs:

x - the output value
1 - the constant 1

and outputs:

rx - recycling of the output value
i - the input sent to the circuit
ri - recycling of the input value

The truth table for this function will be:

x  1  |  rx i  ri
==================
0  1  |  0  0  1
1  1  |  1  1  0

As you can see, the number of 1s in the output always matches the number of 1s in the inputs. If x=1, the constant 1 gets shipped to the input, otherwise it goes to the recycling and back into the constant.

The next problem is the memory: it has to store a value that might be 0 or 1, which is obviously not symmetric. But it looks like the basic solution for that is easy: for each memory cell keep 2 sub-cells, one of them storing the value, and another one its complement. So there will always be one packet of energy stored, and the value would be determined by the sub-cell where it's stored. Then the contents of one sub-cell would be sent out for the next step of computations, and the other sub-cell will stay until the new value gets stored. I guess this could also be expressed equivalently as a constant 1 that gets connected to one or another output of the memory unit.

How would the recycling be done? For that we've got to connect the 1s in the results back to where they came from. And ultimately they all come from the constant 1s, either introduced in the computations or use din the logic of the inputs. For all I can tell, by the time the bits get recycled, they get really mixed up, and there is no easy way to put it back. The only universal way I see to put them back would be to order all the original "constant 1s" and all the recycled values (the specific order doesn't matter, just there has to be some order) and then do the logic like this:

If the 1st recycled value is 1, put it into the 1st "constant 1".
If the 2nd recycled value is 1 {
  If the 1st "constant 1" is still 0, put the 2nd recycled value into it,
  else put the 2nd recycled value into the 2nd "constant 1".
}
If the 3rd recycled value is 1 {
  If the 1st "constant 1" is still 0, put the 3rd recycled value into it,
  else if the 2nd "constant 1" is still 0, put the 3rd recycled value into it,
  else put the 3rd recycled value into the 3rd "constant 1".
}

and so on. As you can see, the logic can be parallelized to some extent but still will require as many steps as there are input values. So for the large circuits it becomes quite slow. Maybe this can be optimized in some way for some specific applications but that would be quite difficult. Maybe that's what they mean when they say that the design of the reversible circuits is difficult.

But I think that there is a simple solution: make each individual circuit small. Maybe limit it to 2 inputs and possibly one additional constant 1. That would limit the recycling to 3 steps, which is reasonably fast. Then use the input/output method described above to connect these small circuits into the large circuits. Then as the inputs propagate through the layers of these simple circuits, the early layers would compute their recycling in parallel with that propagation, so overall there will be very little penalty.

Note that there would be no memory between these layers of small circuits, it would be mostly layers of the small circuits, with one memory layer at the end.

The classic digital electronics has the implementation of functions like AND and OR with a large number of inputs, not just 2, that get computed in the same time as a function with 2 inputs, which comes quite handy. This won't be that easy for the reversible circuits. But there probably is no point in trying to do this for the reversible circuits in the first place: even if a function of N arguments would get computed in one step, then it would still have to do the recycling in N steps. On the other hand, such a function of N arguments can be built as a tree hierarchy of 2-argument functions, of the depth log2(N). This tree would be computed in log2(N) steps and the recycling of the last element will take 5 steps (the recycling of the previous elements could be done in parallel). The number 5 comes from a logical operation on 2 inputs producing 3 outputs to stay reversible, as given in an example in the article (and I suspect that it's an universal property of such operations), plus 2 inputs having 1 recycling output each. So the total time would be (log2(N)+5) which is better than N. Even if we include the I/O overhead, that would still be (2*log2(N)+5) which is still better than N.

Thus I think that the basic design of the reversible circuits on the digital level with the "billiard ball analogs" looks not that difficult. It might be that I've missed something and got everything wrong. And of course, the implementation on the level of the underlying electronic elements like transistors might be very difficult. I'm still trying to figure out how it could be done, and since my knowledge about the transistors is fairly limited, it's difficult for me. I have some ideas but I'd need to think some more about them before writing them down (and they might be completely wrong anyway).

A bit on a side subject, about the logic. In the article they use the the example of function [(NOT A) AND B] which looks kind of unconventional (but maybe they have some good reasons, one of them being the convenience of producing both the result and its complement). Well, there is more than one way to create  a set of operations that can represent all the boolean functions. One is the set of (AND, OR, NOT), then (AND-NOT) and (OR-NOT) do everything in one operation(and perhaps the [(NOT A) AND B] does that too, I forgot how to check that), and yet another set is (XOR, constant 1). The XOR gate looks more straightforward to me, it was the first one I've thought of. And it turns out that XOR is quite popular in the reversible computing, it's known as the Toffoli gate. But in the end it probably doesn't matter much. Or maybe the best way is to use the set of (AND, OR, NOT). The reason for that being that I think OR of 2 arguments can get by with only 2 outputs, and so does NOT, only AND requiring 3 outputs. Which means one fewer recycling step after the OR, which means a slightly higher performance and about 10% (half of the saved 20%) simpler recycling circuitry compared to the situation when all the circuitry is built of one universal function like (AND-NOT) or (OR-NOT) or (XOR).

No comments:

Post a Comment