Combining self-attention and convolution to understand OCT images

  • Abbas Haider
  • , David Wright
  • , Ruth Hogg
  • , Hui Wang
  • , Tunde Peto
  • , Richard Gault

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Downloads (Pure)

Abstract

Optical Coherence Tomography (OCT) is a widely used imaging modality for diagnosing retinal diseases, with artificial intelligence (AI) increasingly supporting clinical decision-making. A prominent AI model in this domain, called RetFound is a large Vision Transformer (ViT-H/24) model pretrained on nearly one million OCT scans that has shown promise in tasks such as age-related macular degeneration (AMD) detection. However, its substantial size (over 300 million parameters) and high computational requirements pose challenges for real-world deployment. In this study, we hypothesize that ViTs alone are inefficient for OCT interpretation and propose a novel hybrid model that combines self-attention with convolutional layers. This architecture leverages both global context and local structure while remaining lightweight and training-efficient. We demonstrate the effectiveness of this approach on down stream binary classification tasks with our model, AUROC of 0.51, outperforming RetFound (0.49). On a subsequent multiclass task, our model achieves an AUROC of 0.85, closely matching RetFound's 0.87 with both outperforming other benchmark models. Additionally, the model demonstrates competitive image reconstruction, indicating a stronger grasp of underlying OCT structures.
Original languageEnglish
Title of host publicationIrish Pattern Recognition and Classification Society
EditorsSonya Coleman, Dermot Kerr
Place of PublicationOnline
Publisher Irish Pattern Recognition and Classification Society
Pages202-206
Number of pages4
ISBN (Electronic)9780993420795
Publication statusPublished - 1 Sept 2025
Externally publishedYes
Event27th Irish Machine Vision and Image Processing Conference - Derry, United Kingdom
Duration: 1 Sept 20253 Sept 2025
https://imvipconference.github.io/#

Conference

Conference27th Irish Machine Vision and Image Processing Conference
Abbreviated titleIMVIP 2025
Country/TerritoryUnited Kingdom
CityDerry
Period1/09/253/09/25
Internet address

Fingerprint

Dive into the research topics of 'Combining self-attention and convolution to understand OCT images'. Together they form a unique fingerprint.

Cite this