TIL Google published Primer: Searching for Efficient Transformers for Language Modeling in 2021 that describes the an variant of the ReLU activation funct... #deep learning