optimize_for_inference
isn’t a function that is usually presented in beginner
materials for pytorch. I just read about it in Aurélien
Géron’s new edition of his
hands on ML book. This edition (considered a 1st edition)
focuses on pytorch in Part II. I loved all his previous
editions because he really takes beginners on a deep-dive on
DL and generously sprinkles links to recent/important papers
on each of the topics. In this case, he explains that the
function optomize_for_inference performs some fusion of
batch normalization with previous linear layers.
Batch normalize will learn and which will scale and shift the inputs to the BN layer after normalizing them (using and ). So, if the previous layer computes , then the BN layer will compute . The BN layer can be eliminated if we let and .
It appears that fusion with CNN layers has been a well established technique for runtime/inference and it’s even possible to reduce the memory requirements during training by recognizing that fusion will occur.