Sažetak
Since its introduction in 2015, U-net has become state-of-the-art neural network architecture for biomedical image segmentation. Although many modifications have been proposed, few novel concepts were introduced. Recently, some significant breakthroughs have been achieved by introducing attention or, more specifically, Transformers. Many attempts to incorporate self- attention mechanisms into solving computer vision tasks resulted in Vision Transformer (ViT). As ViT has some downsides compared to convolutional neural networks (CNNs), neural networks which merge advantages from both concepts prevail, especially in small data regimes we often face in medicine. U-net architecture still outperforms ViT models as their high accuracy relies on massive data. This paper investigates how attention added in U-net architecture affects results. We evaluate the outcomes on a publicly available dataset which consists of 1136 retinal optical coherence tomography (OCT) images from 24 patients suffering from neovascular age-related macular degeneration (nAMD). Also, we compare results to previously published results, and it could be noted that the Attention-based U-net model achieves higher Dice scores by a significant margin. The code is publicly available.
Ključne riječi
Vision Transformer ; attention ; convolutional neural networks ; U-net ; automatic segmentation ; retinal optical coherence tomography images