Transformer-based Large Language Models (LLMs) have emerged as the backbone of Natural Language Processing (NLP) due to their creative self-attention mechanism. However, self-attention layers have limitations when working with lengthy sequences. Selective state-space models (SSMs) provide a more efficient solution by reducing computational complexity and memory requirements. Recent studies have shown that SSMs can compete with, if not outperform, Transformers in language modeling tasks.
