Wednesday, February 22, 2017

a better explanation of Bayes-AdaBoost connection

After some re-reading, my previous posts on the Bayesian interpretation of AdaBoost have been kind of cryptic. It's no wonder because I've been writing them as I was understanding things, so they came out as more of a personal reminder rather than the general explanation. But I've got around and wrote another version as a whitepaper, that goes all the way from scratch and hopefully is much more understandable:

https://sourceforge.net/p/exbayes/code/HEAD/tree/whitepapers/AdaboostBayes.pdf?format=raw

Monday, February 20, 2017

Bayesian examples now on SourceForge

I've been working on one more post on the Bayes-AdaBoost cross-over, and I've finally got tired of pasting the substantial examples right in the post. So I've created a SourceForge project for them and uploaded all the examples there:

https://sourceforge.net/p/exbayes/code/HEAD/tree/v1/

The proper SVN checkout recipe from there is:

svn checkout svn://svn.code.sf.net/p/exbayes/code/ exbayes

Saturday, February 4, 2017

The mythical 10x programmer, or the silver bullet explained

Recently I've been talking about some of the experiences from Aleri times when my interlocutor responded "so I guess you're one of those 10x programmers". This made me pause. Like any programmer per Larry Wall's recipe, I have a good amount of hubris but can I really say that I'm a 10x programmer? After some thinking, I guess sometimes I am and sometimes I'm not. This got me thinking further, about what was different between the situations when I was and when I wasn't. And I think the difference is very much about the organization and methodology, and thus can probably be applied to any programmers to improve their productivity. I've tried to distill the knowledge, and here it goes. By the way, Fred Brooks also called the goal of 10x productivity increase the "silver bullet", so here is maybe your silver bullet. Maybe. I'm not entirely sure that the recipe will work out of the box, it will probably need at least some debugging. Or maybe it won't work but it's still worth a try.

Let me start with describing a couple of cases when I could fairly be said to have been a 10x programmer. There were some other good cases too but on a couple of occasions I've actually had a benchmark.

One case was at Aleri, working on the CEP system. Here is the real short version of the events: At the time there were 3 main players in the CEP world: StreamBase, Aleri and Coral8, all having an approximate parity in their products. Around 2009 Coral8 apparently ran out of money, so Aleri used it this opportunity to consume it. And then it had two approximately equal products and set about merging them. The merging didn't go that great and Aleri ran out of money too, eventually consumed in 2010 by Sybase. See, even though the Coral8 product was about the same from the functional standpoint, it had the code base about 10x the size, and had maybe not quite 10x but a lot more people working on it. Since the Coral8 code base was so much larger and there were many more people familiar with it, it was taken as the foundation of the merged product. And this merging then crashed and burned. Another merging attempt was then done under Sybase, this time doing it the other way around, and it worked much better. So it's an interesting example where the 10x was there, then it wasn't, and then it was again.

Another case was the UnitedLinux. The full history of UnitedLinux is an interesting story of dog-eat-dog in the world of the Linux vendors but I won't go into the full funny details, just touch up on the important parts. One of conditions of the UnitedLinux consortium set by SuSE was that SuSE does all the general development. Everyone else does just their branding customizations and additional packages. So Caldera where I worked at the time (soon to be renamed SCO, with the SCO Linux brand of UnitedLinux) had closed the Linux development office in Germany, with most of the people heading to SuSE. When UnitedLinux was released, the various consortium members (I remember Connectiva, and whoever else) held their release parties and posted pictures with great many people. We didn't have any party at SCO: see, the SCO Linux was done by only two developers and half a tester (and a bit of a graphical artist), and Ron was in California while I was in New Jersey, so having a party would have been difficult. And we've had it ready earlier: the final image was sent to testing about 20 minutes after downloading the drop from SuSE. I think I still have the commemorative t-shirt somewhere, one thing Caldera was big on was the t-shirts.

So what was special in these cases? In a very short summary, I'd say that it was the compact code done with the familiar technologies, incrementally, with low interdependencies, and with high level of script automation. Oh, and you need to have the good engineers as well. But just the good engineers aren't enough: I wouldn't say that the engineers at Coral8 were any worse than at Aleri, just they've got mired in the slow methodologies.

Let's look at the details.

The compact code pretty much means the straightforward and to-the-point code. There is a classic book "The practice of programming" (and I liked it so much that I've patterned the name of my book after it) that shows the code like this: when you have a problem, solve it in the most straightforward way. There are lots of published systems of various code decorations, and all of them are wrong.  The decorations don't make the code better, they make it worse, more difficult to read and modify. The extra layers of abstractions don't help anything either, they make the code more difficult to both understand and modify (not to mention writing it in the first place). There is an article about "worse is better" written by Richard Gabriel from the MIT AI labs when he discovered that the simpler Unix code worked better than his "better" over-abstracted code. But this phrase is misleading, better is better, it's just that the over-abstractions are worse, not better. This absolutely doesn't mean that the code should be flat abstraction-wise. The code that is too flat is also an unmaintainable spaghetti mess. The abstraction layers are needed. But they have to be introduced for a good reason, not on a whim and not by the bureaucratic rules. The good functions are usually pretty large. The good classes are usually pretty large. The good rule of thumb for making a chunk of code into a new abstraction is to notice when you do the same thing over and over again. Then it's time to put this code into one place and call it from the other places. A good abstraction reduces the size of code, does not increase it.

The problem with the overly-abstracted code is really the same as with the  under-abstracted flat code: there are only so many things you can keep in your head at a time. In the code that is too flat you need to keep too many interdependencies of that flat level in your head. In the code that is too abstracted you need to keep too many abstractions and their APIs and calling patterns in your head. The optimal spot is in the middle.

A lot of things are easier to just do than to use a library. If you do the logging from a shell script, a 10-line shell function will not only work much better than a shell version of log4j but will also be a lot easier to call (not to mention avoiding an extra dependency). Or an iteration over a container with a plain for-loop is not only easier to write and understand but also much easier to remember than the STL templates for that.

An important thing is to keep the documentation close to the code. This means both the good comments in the code and the user documentation.

The basic rule behind the good comments is very simple: the comments must be one (or sometimes more) step more abstract than the code. A comment should be explaining either why something is done (or not done), especially if the reasoning is not obvious, or telling what is done as a short summary in the higher-level terms. As long as this rule is followed, the more comments is pretty much the better (but there is only so much you can write without breaking this rule).

Keeping the user documentation together with the code, in the same versioning system and in the same tree hierarchy is very important. This way the documentation can be updated in parallel with the code and will never become obsolete. And at the end of the release cycle all the changes can be found and re-edited. This principle implies that the documentation must be kept in a text-based format that is easy to diff, just as the code is, and that the tools used on it must not change that exact text format at random. Which generally means that the editing should be done with a normal text editor, and the WYSIWYG tools should go into the garbage bucket. The pain inflicted by them is much greater than the gain. The ease of versioning and diffing is much more important than the ease of visual formatting. While on the subject of documentation, the technical writers need to either really be technical or be kept in check. The precise meaning of documentation is important, and I've seen too often an edit to make a sentence smoother also making it ambiguous or entirely meaningless. And "smoother" is not always "better" either. I've bought a book of Richard Feynman's letters, more for the collection purposes since the letters are usually fairly boring but unexpectedly it also turned out to be a great reading. Except for the part where it included the magazine articles about his Nobel prize. The style of these articles was completely different from Feynman's, it was the usual unreadable journalist drivel. Don't turn your documentation into the journalist drivel.

To give another example of where I think smooth is bad, the sentence above "An important thing is to keep the documentation close to the code" is better in that place than "Keeping the documentation close to the code is important". The complex sentence slows you down and gives you a breather before the text switches to a new subject, emphasizing that switch. Also, it brings the word "important" closer to the start of the sentence where it's more noticeable.

Next principle, the familiar technologies, affects the speed of development a lot. It's kind of obvious but the counter-point is "if we use that latest and greatest new tool, we'll be done much faster". And sometimes the latest and greatest tool really does that. But most of the time the familiarity is much more important. If you have an idea of how the goal can be straightforwardly achieved with the familiar tools, that's usually the best bet by a wide margin. If not, learn the new tool. Sometimes this would also give you an idea of how the goal can be achieved easier with your old tools. And a lot of technologies are just crap anyway. So the overall priority should be not "the goal for the tool" but "the tool for the goal".

This doesn't mean that the new tools shouldn't be learned or that everyone has to be familiar with every tool to be used. Usually one person having the familiarity with a tool is enough to frame the code around it, giving everyone else a big head start in learning it. It's much easier to add code to an existing unfamiliar framework than to build a new framework with an unfamiliar tool. And there are the times to use the familiar tools and times to learn the new ones. There is the cadence of development when the things start amorphous and slow, and then as the solutions for various items start falling together the project starts crystallizing and picking up the pace until it rolls at full steam to the goal. Then the next cycle starts. The slow part of the cycle is a good time to learn the new tools. But things tend to go a lot faster when only one thing is new: a new tool for an old purpose or an old tool for a new purpose. Of course sometimes a bullet has to be bitten and a new tool learned together with the new purpose but that goes slower.


On to the incrementality. It's nothing new, Eric Raymond wrote about it in "The Cathedral and the Bazaar". But it still gets lost surprisingly often, even in the "agile methodologies". Instead of "we'll make this great new thing to replace that dingy old thing" the right approach is "we'll transform this dingy old thing into this great new thing". The programs are not physical objects, they're more like blueprints for the physical objects. It's not like you have to raze a bridge to build a new bridge in its place. Instead a program's run is more like a physical object. Every time you restart a program an old bridge gets razed and a new one gets built for you according to the new plan. Or I guess another good example from the world of the physical things is the automobile manufacturing: the car models might stay but their internal mechanics is updated pretty regularly. At some point the cars would look the same but the new cars from that point on would get say a better alternator. The Chevy small block V8 engine has been around for more than 50 years. The only item that stayed for all this time is the dimensions of the cylinders. Everything else has been replaced. But it hasn't been replaced in one go, it has been a product of a long evolution with many gradual small changes.

Every time you hear about an order of magnitude budget and time overruns, it's when someone tried to re-do a program from scratch. Fred Brooks talks about this in his "Mythical man-month", so the effect has been known since at least 1970s, and yet it keeps being repeated. And guess what, that's not limited to software either. Have you ever heard of Tucker? It was a new automobile company in 1940s that decided to design its car completely from scratch, with all the newest and greatest technology. They've never got the cars working, they kept falling apart until the company ran out of money, and its founder got sued by the government for all the dubious financial activities he'd done to delay the running out of money. Here someone might ask "but what about the iPhone, what about Tesla?" They are actually the examples of the evolutionary development. iPhone was built around the clever idea of a big-screen phone used as an application platform but it wasn't developed all at once, and Apple didn't try to reinvent the cell communications chipsets at the same time. Some of the evolutionary moments of iPhone are widely known: such as, they started with a stylus as was then typical, and then Steve Jobs said "screw this, use the fingers instead". And the last-moment decision of replacing the plastic screens with glass had been a subject of many moralizing articles. Tesla is best seen in the contrast to GM's EV1. GM tried to develop the whole technology chain from scratch, spent a huge amount of time and money, and failed. While Tesla piggybacked on the already existing technologies and packaged them together into a car (that they also took off the Lotus's shelf). Even then Tesla's path wasn't easy, they've had their tuckeresque moments with the failing transmissions (which they eventually simply got rid of) and with the battery cooling for a few years before they were able to release their first car.

One of the worst statements I've heard about why a system needed to be rewritten from scratch went like this: "it has accumulated all these little features that make making changes without breaking them difficult, so we have to re-write everything to get rid of them and make the changes simple again". It completely misses the point: it's these little features that make the product worthwhile. Doing the 80% of the functionality is easy but nobody needs only 80% of the functionality (and in reality that particular project overran by years). The code is a living document that preserves the knowledge of the subject area. Discarding the code means discarding this knowledge which then has to be re-learned. This doesn't mean that the best solution isn't sometimes to throw away the old code and re-write it from scratch. But it's usually done by replacing the code in small chunks and checking that the functionality is preserved after the move of every chunk. To give an analogy from biology, the human (and animal) organs develop by growing the new cells within a scaffolding of the other neighboring cells that shapes them and makes them properly differentiate for the function. The old code provides such a scaffolding for the new chunks.

I want to illustrate this last point with a couple of examples. I'll leave the company and all the details in the first example unnamed, since the example is kind of embarassing.  This company decided to make an appliance with its software. It had all the software bits and pieces that it has been selling to the customers, so the only thing that needed to be done was put it all together, like the customers do. Simple, right? Turns out, no, the bits and pieces would not work together, and no customer had actually put them all together because they didn't work. Each piece worked in a way, so each responsible team was happy, but not together. And getting them to work together took a large amount of time and work. Another similar example comes from Microsoft where I've seen a very similar issue with the PowerShell support in the Windows clusters: there are many components involved and each one has its own PowerShell commandlets for management but they don't mesh well together with each other, need to be combined in the very non-obvious ways with no big goal-oriented commandlets, and some things turn out to be not really doable at all.

The obvious objection is "but how do we test all these changes"? This is where the automated testing comes in, and it's such a big subject that I wrote a whole separate post about that. But it's really a bogus question. When you write a completely new system,you still test each chunk of code as you write it, and if you don't have the automated testing then you just test it manually. The difference is that in an incomplete system your chunk will have only the limited interactions, so you would be able to test only some of its effects. In a complete system you get a much more complete coverage. And if the old system doesn't have any automated tests then it's a good way to start them: for each chunk write the tests before replacing it, and then use them to check that the the replaced chunk still works in the same way. Another important point, a chunk doesn't have to be all in one place, it might be distributed, such as an one-line change in a thousand files.

The small changes don't have to go together with the frequent releases. But every time you do a release one way or another, you get the issue "we have to complete all the changes before the release!" and it gets worse if you want to do the more frequent releases. Well, you definitely do want to make sure that the things don't get accidentally broken at the last moment, so the commits have to slow down just before the release. But it doesn't mean that you can't include the changes at all. Any change is fine to include as long as it doesn't break the existing functionality. The new functionality doesn't have to work, as long as it doesn't get called in the production environment nobody cares about that, it only has to not break the old functionality. This way you can integrate the changes a lot earlier, and then spend a lot less effort merging them together.

In the post on testing I've already said that an important part of the incrementality is the incremental tests, being ability to run easily the immediate subset of tests affected by a change. But a prerequisite for it is the incrementality of the build: the ability to build quickly all the components dependent on the change, all the way to the final package, while skipping the build of the unchanged parts of the tree. "Quickly" here means right away, not in some kind of a nightly build. This does wonders for the turnaround time. Obviously, in some cases, like a fundamental library, pretty much all of the tree will depend on it. But then the incrementality should be interpreted as the ability to build a few of the representative projects based on it to test that they didn't break. If the full nightly test shows that some dependent components got broken, the ability to rebuild them quickly also makes wonders for the turnaround of the investigation.

Another thing about the build is that I think the purely rules-based builds are bad. It's not that having the common rules is bad, they're actually quite convenient. But the build system must also allow to do the quick one-offs.  These one-offs are very important for the automation of the build. They allow to write quickly say a small Perl script that would do some kind of transformation or code generation. A build system without such one-offs turns the hour-long tasks into the week-long ones. So just please do support the full proper make and the scripting environment, not some crap like Cmake. The counter-argument is of course "but we need to build on multiple platforms including Windows". Yeah, building on Windows sucks, and the only decent solution I see it to bring a decent environment of the portable tools to each of your build platforms.

Another very simple but important thing about the build is that it must detect the errors. The Java-based tools in particular are notorious for ignoring the status codes, so when something breaks, the only way you know it is by the result files being absent. Or not even by them being absent but them failing in the tests. This is bad. All the errors have to be detected as early as possible.

This looks like a good place for another related rant, on reliability. Basically, when something works it should work reliably. Einstein said "The definition of insanity is doing the same thing over and over again and expecting different results." But in reality we don't control everything, and when these hidden changes cause a different result of the same actions, it's just maddening and time-consuming. The good code, including the tests and build, should work reliably. If the underlying systems are known to have outages, it should retry and still work reliably on top of them and not propagate this unreliability up. Every step the unreliability propagates makes it more difficult to fix, so it's best nipped in the bud. There of course are the reasonable limits too, the calls should not get stuck retrying forever, and probably not ever for an hour. Although retrying for an hour might be fine for a scheduled nightly run, but for a manual run it's way too long. And when a program is retrying it should not be silent, it should be telling what its is doing.

Next item is the low interdependencies. Basically, it means that the people should be able to work without stepping too much on each other. You might ask: but doesn't the principle of the small code base go against it? More code should allow more people to work on its part, right? No, it doesn't work like this. The catch is that if you have more code solving the same problem, any changes to it mean that you have to change about the same percentage of the code. This is why a sprawling code base is slow in development, because any change requires changing a lot of code. But this is also why it doesn't allow more people to work on it in parallel. So the corollary is that if you add too many people to a project, they will start stepping on each other a lot and the productivity will drop.

The other major kind of interaction is between the people in general. If you need to do a 5-minute task, it takes you 5 minutes. But if you need someone else to do a 5-minute task, it's likely to take you a day of interaction. Which still might be faster than learning how to do it yourself but if the task is repetitive then this becomes a major drag. To work fast, this drag has to be reduced and eliminated whenever possible. One of the consequences is that the posessive code ownership is bad. Now, having someone knowledgeable available to talk about a part of code is good, and it also helps to discuss the general directions of the code to avoid breaking things for other people. But there should not be a single-person or single-team bottleneck for changes to a certain part of code, nor any single person controlling what is allowed to go in. There are other reasons to avoid these things too but even the speed of development should be a sufficient reason. Another corollary is that the strict code style requirements are bad. The small style variations just don't matter, and the enforcement is a lot of pain with no gain whatsoever.

Another case where the excessive interaction shows is planning. Have you seen these beautiful Gantt charts and critical path diagrams, with the carefully planned moments of interdependencies? Well, they're all garbage. They might be an interesting exercise on the way to the better understanding of the project but things never work out like this in reality. In reality all the interdependency points end up as a mess. So there is no point in overplanning. Instead I really like the saying of my past manager: "We can plan an aggressive schedule because we don't mind it slipping once in a while". When combined with the idea of the incrementality, this points to the following concept: the detail level of the plans must be tied to their distance. A detailed planning of the far-off items doesn't make sense because they are sure to change before we get there. These items should be on the list, so that we can keep track of the direction, but only in the general terms. Only the very near-term items need to be planned in detail, and the farther in the future the less detail is needed. Though the very near-term items have their Heisenbergian uncertainty as well. In my experience the minimal unit of time for planning is a week. If you have five items that you expect to take one day each, they will never work out like this. Some of them will take an hour and some will take three days. But if you lump all five together, they will take together a week with a decent amount of certainty.

Since I've already talked about the scripted automation throughout the text, this is a wrap for this post. The things I've described look kind of small but they really are important, and I believe that taken together the do amount to the 10x difference in productivity. What do you think?

P.S. Someone else's thoughts on the subject of 10x: http://antirez.com/news/112