IEEE Spectrum had printed an issue on AI, and there they've described a way to teach the old neural networks the new tricks without making them forget the old tricks: by marking the more important connections as immutable when training the network on a data set for the new kind of classification.
Which looks like a generalization of the technique of "emergents" (see https://babkin-cep.blogspot.com/2017/06/emergents.html): there everything below the top 2 layers is declared immutable and the top 2 layers are re-trained for the new classification from scratch. The idea being that the first training had taught the network, what details are important, and then re-training assembles the new classifier from these details.
But the generalization goes farther: it can look at the weights of the connections, and if the weight is close to 0 (on the range of [-1, 1]), this can't be an important connection, and is a candidate to re-train, making the model learn the new details too. They say in the article that the old classifier degrades somewhat after this re-training but that's not surprising: the set of the "emergents" has changed during the re-training while the old classifier section is still expecting the old set. It would make sense to fixate the network below the top 2 layers (the "new emergents") and do another training of the old part with that. The newly learned emergents might even be useful to improve the classification of the previously learned categories. They actually talk in the article about doing a periodic re-training with the previously trained subsets of data but not quite in the same context.