This year my daughter was taking a college class on optimization, so I've had a look too. No, it's not about the software optimization, it's about optimization as a math discipline. And I've learned a good deal of new things that I want to share (and also write down a summary for myself, before I forgot too much of it). The reason why you'd want to care about it is that it has an application to the neural networks, the gradient descent algorithm used for their training is one of the optimization methods, and there are interesting consequences to think about. I'll get to that part eventually (it will take multiple posts), but first things first.
They've started with the "linear programming" problems. I've been thinking "oh, I know this stuff, it's all recursion + memoization". But no, it turns out that they use this name for something completely different. In programming we call "linear programming" the discrete problems where we need to find a best-fitting combination of the discrete units. Not much programming but at least some in all that recursive trying. What they do in optimization is they call "linear programming" the analog problems where the expressions are linear.
So you have a number of variables - call them x, y, z, ..., or x_1, x_2, x_3, ..., or a vector x (this subject is ripe on vectors and matrices). And you have a function on them that you want to maximize or minimize. A linear function (that's the source of "linear" in "linear programming") would be:
c_1 * x_1 + c_2 * c_2 + c_3 * x_3 + ...
Well, the solution would be easy, to minimize it, send all the values of x to minus infinity (or if some c < 0 then those x to plus infinity), right? But the problems also come with conditions on x, given as inequalities, which in the "linear programming" are also linear expressions. A simple example would be
x_1 > 0
or a more complex example
3 * x_1 - 5 * x_2 < 5
So instead of going to infinity, we'll hit the boundaries of conditions, and the problem is pretty much about which boundaries will be most restrictive.
In visual terms, the conditions form a polyhedral shape (which might be enclosed or open on some side). In a 2-dimensional form (with only x_1 and x_2) we can represent it as a 2-dimensional fence, and then the function will be some inclined plane. If we build a model of this out of plywood and let a ball roll on the plywood sheet that represents the inclined plane of the function, the ball would stop against the fence at the lowest point and solve the minimization problem (BTW, the minimization and maximization problems can be turned into each other by simply changing the signs on everything; including turning around the "less" and "greater" signs on the boundaries). There would always be a corner (or maybe a whole straight line) that is at the lowest level. Or if the fence happens to be open at the lowest point, the ball would roll out to "minus infinity", that's called an "unbounded problem". It could also be that the conditions contradict each other, and when we try to bulid the fence, we could not do it - that's a "problem with no solution". But since each condition represents an infinitely long straight line, the fenced area cut off by all these lines is guaranteed to be convex (a "convex polyhedral"), with no strangely-shaped nooks and crannies.
In the full multi-dimensional form the conditions represent the planes in that multidimensional space, that cut and split that space, and form this multidimensional "fence". This fence would still be a convex polyhedral, and the solution would be where the condition plane intersects this "fence" at the lowest point in the condition dimension.
For solving this, they have two slightly different "standard" forms of presenting these equations, one is called the "standard form" and another the "canonical form". I guess, two different people approached these problems differently, so now there are forms for both approaches, but the problems can be converted easily between these forms.
No comments:
Post a Comment