Self-supervised Vision Transformers for Land-cover Segmentation and Classification

Item Type Conference or Workshop Item (Paper)
Abstract Transformer models have recently approached or even surpassed the performance of ConvNets on computer vision tasks like classification and segmentation. To a large degree, these successes have been enabled by the use of large-scale labelled image datasets for supervised pre-training. This poses a significant challenge for the adaption of vision Transformers to domains where datasets with millions of labelled samples are not available. In this work, we bridge the gap between ConvNets and Transformers for Earth observation by self-supervised pre-training on large-scale unlabelled remote sensing data. We show that self-supervised pre-training yields latent task-agnostic representations that can be utilized for both land cover classification and segmentation tasks, where they significantly outperform the fully supervised baselines. Additionally, we find that subsequent fine-tuning of Transformers for specific downstream tasks performs on-par with commonly used ConvNet architectures. An ablation study further illustrates that the labelled dataset size can be reduced to one-tenth after self-supervised pre-training while still maintaining the performance of the fully supervised approach.
Authors Scheibenreif, Linus Mathias; Hanna, Joëlle; Mommert, Michael & Borth, Damian
Research Team AIML Lab
Language English
Subjects computer science
HSG Classification contribution to scientific community
Date 19 June 2022
Publisher Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Place of Publication Earthvision Workshop
Depositing User Joëlle Hanna
Date Deposited 15 Jun 2022 15:13
Last Modified 20 Jul 2022 17:48
URI: https://www.alexandria.unisg.ch/publications/266502

Download

[img] Text
Self-Supervised_Vision_Transformers_for_Land-Cover_Segmentation_and_Classification_CVPRW_2022_paper.pdf - Accepted Version

Download (1MB)

Citation

Scheibenreif, Linus Mathias; Hanna, Joëlle; Mommert, Michael & Borth, Damian: Self-supervised Vision Transformers for Land-cover Segmentation and Classification. 2022.

Statistics

https://www.alexandria.unisg.ch/id/eprint/266502
Edit item Edit item
Feedback?