This article discusses the recent trend of energy-efficient LLMs relying on linear operations, as opposed to their original benefit of modeling non-linear relationships. This has been motivated by concerns about the energy requirements for running LLMs at scale, leading to the development of architectures like Mamba and RWKV. While linearity offers simplicity and scalability, it is limited in handling complex relationships, which is where Deep Learning and Neural Networks excel. The article also touches on the difference between Deep Learning and Neural Networks and their importance in the non-linear world.
