climate analysis artificial intelligence AI IBM NASA Hugging Face

IBM, NASA, and Hugging Face Speed/Innovate Climate Analysis

The increasing frequency and severity of extreme weather events, as well as mounting concerns about global climate change, have underscored the importance of environmental and climate analysis. Gaining full value from climate information resources, such as the Earth-satellite images collected by NASA and other government organizations is no easy task. Conventional hands-on methodologies make analysis tedious and slow. While digital tools can help streamline the process, the number and volume of NASA’s datasets are a literally massive barrier for scientists to overcome.

The recent announcement that an IBM geospatial foundation model – built from NASA’s satellite data – will now be openly available on Hugging Face is likely to significantly speed and enhance climate and environmental analysis. Let’s consider it more closely.

Size and speed: Geospatial analysis challenges

Before diving into the IBM, NASA and Hugging Face announcement, what challenges are the trio attempting to address? First and foremost is the sheer size of NASA’s datasets. In 2018, the agency published a report noting that it was managing about 100 petabytes (PB) of satellite-collected data, and estimated that by 2025, that number would grow to 250PB.

Why are NASA’s data assets growing so quickly? For two reasons. First, because of the agency’s careful and consistent collection methodology. IBM science writer Kim Martineau noted in a blog coinciding with the announcement that data from the Harmonized Landsat Sentinel-2 satellite (HLS) which the company and NASA used to train the new AI foundation model collects a full view of the earth every 2-3 days. In fact, the Landsat program and satellite images which began in 1972 are responsible for one of the lengthiest, uninterrupted time series of Earth images.

In addition, satellite imagery technology continues to evolve and improve. The Landsat 1-5 satellites launched between 1972 and 1984 delivered 60 meter-per-pixel images in a relatively narrow range of visual and thermal bands. Today’s Landsat 9 images support a larger number of bands and have a resolution of 30 meters-per-pixel (note: a U.S. football field would comprise roughly 5 pixels). That resolution is detailed enough to detect land use changes but not enough to identify individual trees. However, as Landsat and similar technologies continue to evolve, so will the size and complexity of satellite imagery datasets.

IBM and NASA’s geospatial foundation model

How are IBM and NASA approaching these challenges? Along with developing and training the foundation model, the pair have adapted new and emerging technologies to aid in data analysis efforts. For example, satellite image analysis traditionally leverages the skills of living experts who annotate specific features and objects. It is somewhat akin to radiologists scrutinizing x-rays, CT scans and other medical images for signs of disease or medical conditions.

IBM’s foundation model is designed to remove or limit the need for human analysis by compressing and capturing the Landsat images’ basic structure and identifying specific and recurring features. As Martineau pointed out, the foundation model’s spatial attention mechanism was also expanded to include time analysis, enabling researchers to study how environmental and climate conditions have changed over days, weeks, months or years. Plus, the IBM model was trained and tuned to identify burn-scars and flood damage resulting from wildfires.

How did the IBM and NASA project perform? In tests, IBM researchers found that the model showed a 15 percent improvement over conventional state-of-the-art techniques using half as much data. That is a crucial point given the continuing rise of dataset size and complexity. In addition, IBM noted that the base model can also be redeployed for tasks, such as predicting crop yields, tracking deforestation and monitoring greenhouse gases. IBM and NASA are also working with Clark University to adapt the new model for similarity research, time-series segmentation and other applications.

Final analysis

A point worth noting is IBM and NASA’s decision to develop the new model with open-source tools and to make it available on Hugging Face, a recognized leader in open-source AI and a well-respected repository for transformer models. That reflects IBM’s longstanding support for open-source projects and communities, and NASA’s decade-long Open-Source Science Initiative to build a more accessible, inclusive and collaborative scientific community.

In addition, the joint decision to make the model and data openly available on Hugging Face should increase the speed of climate and environmental analysis. It will also demonstrate the strategic value of customizing implementations for the needs of specific use cases, both in terms of model flexibility (to choose precise models) and the ability to tune models with specific parameters.

In essence, by providing the data and tools needed for the job, in an open ecosystem with leading partners, built on a foundation of open source, everyone wins – climate scientists and researchers, the organizations they inhabit and the people and communities who benefit from their work.

Overall, IBM, NASA and Hugging Face deserve kudos for creating and delivering a promising new platform that should help deepen, speed and improve the study and analysis of data that is vital to climate and environmental sciences.

Scroll to Top