Geospatial technologies have risen rapidly to a position of the utmost importance across the globe. By providing a better understanding of the earth’s ever -evolving landscape and our complicated interactions with the environment, theses technologies help us navigate complex global challenges. As the amount of geospatial data increases, researchers are investigating ways of bringing the full force to deep learning to carry on its analysis.
In artificial intelligence (AI), foundation models have emerged as a transformative technology that offers unmatched performance in domains such as computer vision and natural language treatment. When existing image deposits are adapted to the geospatial domain, they tend to fall shorts of the inherent between natural images and remote measurement data. On the other hand, training geospatial security models from the bottom of resource-intensive, time-consuming and environmentally expensive are expensive.
In our recent work “Against the Geospatial Foundation models via continuous pretraining”, published on 2023 International Conference on Computer Vision (ICCV), we show how to create more-powerful Geosppatial Foundation models while keeping the resource in chess. Instead of following the usual Playbook, we examine the potential of continuous prior preaching, involving further refining existing foundation models for specific domains through a secondary prior phase. A refined model can then be fine -tuned to various downstream tasks within its domain.
In tests, we compared approval with six base lines on seven downstream data sets covering tasks such as changes Detection, classification, multilable classification, semantic segmentation and super resolution. Across all seven tasks, our approach overturned significantly the baseline lines.
Our approaches have the potential to improve the performance of using big screen image representations as a foundation on which robust geospatial models can be built. The computer vision community continuously improves models with nature images and offers an existing source of better priesting baseline models. Our approaches open the door to geospatial models to utilize these progress with minimal resource consumption, which ultimately leads to sustainable benefits for geospatial society.
Geopile
Building an effective foundation model begins with the choice of data. A common choice for prior geospatial models is data from the Sentinel-2 satellite. However, it is not enough to have a large corpus of such images.
To pre-wing our geospatial model, we use the type of self-oversee that has become standard for foundation models: In a process known as masked image modeling (MIM), we mask elements in the input images and the model learns to fill them. But in this context, the lack of complexity and variabity in the Sentinel-2 data can make the reconstruction task too straightforward.
To add this challenge, we combined data from five open source data sets with both the brand and unmarked images to produce a different set of geospatial preracing data, which we call geopile. For textual detail, we ensure a number of different abrasive test distances (GSDs), included images with much higher resolution than those trapped by Sentinel-2 (which has a GSD of 10 meters). In addition, the labeled data sets are a wide range of image classes from general remote measurement scenes, ensuring visual diversity across samples.
Continuous pre -determination to geospatial foundation models
Much previous research at the Geospatial Foundation Models (GFMS) has ignored existing models with natural image images. We, on the other hand, reason why utilizing the knowledge coded in these models must produce strong performance with minimal overhead. To this end, we offer an unattended, multi-lens exercise paradigm for effective and effective lintraining of geospatial models.
Our GFM continuous-gregation paradigm is a teacher’s dude approval that used two parallel model branches. The teacher (FT.) is equipped with the weighty knowledge of Imaget-22K infitialization and acts as an indicative force during training. The student (FS.) Starting from an empty slate and evolves into the final geospatial foundation model.
This paradigm in the ideal two-fold optimization. Distillation from the teacher’s intermediate functions ensures that the student can bone fit from the teacher’s diverse nowledge, learn more in less time. At the same time, the student is given the freedom to adapt to data in the domain through his own MIM-PRERACING goal and collect new features to improve performance.