Researchers have extended the Transformer model to computer vision tasks, resulting in various Transformer-based models dominating image-related tasks. Among these models, TTSR and SwinIR stand out for their use of the Transformer architecture for image generation and super-resolution reconstruction, respectively. The Multi-Attention Fusion Transformer (MAFT) is a new model that combines global and local attention aggregation modules to improve performance on image tasks.
