Getting Started with Geospatial Rust

// 2026-05-07 · Updated 2026-05-11 · 10 min read

If you’ve ever looked at a satellite image of farmland, grasslands, forests, or cities, you’ve seen remote sensing data. Each pixel in those images contains measured values — stored as digital numbers (DNs) in raw data, or as surface reflectance in analysis-ready products — across different spectral bands: blue, green, red, near-infrared, and more. These bands tell us about vegetation health, water content, urban development, and land cover.

This post introduces the core concepts of geospatial raster processing for Rust programmers who’ve never worked with satellite imagery. We’ll cover what the satellites measure, why their data looks the way it does, and what you can actually do with it.

Already know remote sensing? Jump ahead to End-to-End Geospatial Processing with EORST — a code-heavy walkthrough building a complete pipeline.

Why Satellite Imagery?

Earth observation satellites carry sensors that measure sunlight reflected from Earth’s surface. Unlike a regular camera, these sensors don’t just capture red, green, and blue — they capture many more “bands” across the electromagnetic spectrum. Each band measures a different wavelength of light.

These bands reveal things our eyes can’t see:

Near-infrared (just past what we can see) is strongly reflected by healthy vegetation
Shortwave infrared detects water content in soil and vegetation
Thermal infrared measures surface temperature

By combining these bands, we can map vegetation health, detect water stress, identify burn scars, classify land cover, and track changes over time.

The Two Main Satellite Platforms

Two satellite programs provide the backbone of open Earth observation data:

Landsat (NASA)

The Landsat program has been collecting imagery since 1972 — that’s over 50 years of continuous data. Landsat 8 and Landsat 9 carry the Operational Land Imager (OLI) sensor. Each scene covers 185×185 km at 30-meter resolution, revisiting the same location every 16 days.

Band	Wavelength (µm)	Resolution	What it shows
Coastal Aerosol	0.43–0.45	30m	Coastal waters, atmospheric correction
Blue	0.45–0.51	30m	Water penetration, soil/vegetation contrast
Green	0.53–0.59	30m	Healthy vegetation (peaks here)
Red	0.64–0.67	30m	Vegetation biomass, chlorophyll absorption
NIR	0.85–0.88	30m	Healthy vegetation (strongly reflects)
SWIR 1	1.57–1.65	30m	Soil/vegetation moisture
SWIR 2	2.11–2.29	30m	Soil moisture, vegetation stress
Pan	0.50–0.68	15m	Image sharpening
Thermal 1	10.6–11.2	100m	Surface temperature
Thermal 2	11.5–12.5	100m	Surface temperature

Sentinel-2 (ESA)

The European Sentinel-2 satellites provide higher-resolution imagery (10-20m) with more frequent revisits (every 5 days with both 2A and 2B). They include unique “red-edge” bands sensitive to vegetation stress.

Band	Wavelength (µm)	Resolution	What it shows
Coastal	0.44	60m	Coastal aerosols
Blue	0.49	10m	Water, soils
Green	0.56	10m	Healthy vegetation
Red	0.665	10m	Vegetation chlorophyll
Red-Edge 1	0.705	20m	Early vegetation stress
Red-Edge 2	0.74	20m	Mid-vegetation stress
Red-Edge 3	0.783	20m	Late vegetation stress
NIR	0.842	10m	Biomass, vegetation
NIR Narrow	0.865	20m	Vegetation moisture
SWIR 1	1.61	20m	Soil/vegetation moisture
SWIR 2	2.19	20m	Soil/vegetation moisture

Both platforms provide free, open data. Combining them gives you more frequent observations and redundant coverage.

Raw Data vs. Analysis-Ready Products

When you download satellite imagery, you have a choice: raw scenes or analysis-ready products.

Raw Data

The raw “Level-1” data (called L1G or L1TP) is the sensor’s measurement at the top of atmosphere. Problems:

Atmospheric scattering — blue light scatters off air molecules, making everything bluer than reality
Atmospheric absorption — water vapor absorbs SWIR bands
Sun angle differences — same location looks different at 9am vs noon
Sensor degradation — the sensor changes over time
Geolocation errors — pixels might not line up perfectly

To use raw data, you’d need to apply your own atmospheric correction algorithm — a PhD’s worth of work.

Who Does All This Correction?

Thankfully, agencies like Australia’s DEA (Digital Earth Australia), the USGS, and ESA invest significant compute and expertise to process raw data into analysis-ready products. They:

Download raw scenes from NASA/ESA archives
Apply atmospheric correction (the complex physics)
Orthorectify for terrain
Calibrate across sensors and time
Package as Cloud-Optimized GeoTIFFs (COGs)
Serve via STAC (Spatio-Temporal Asset Catalog) APIs

STAC is a standard API specification — think of it as “REST for rasters.” You query by location, time, and cloud cover; the catalog returns URLs to cloud-hosted GeoTIFFs. No manual download needed.

The key providers:

DEA — Australian data, NBART naming, free
Microsoft Planetary Computer — Global Landsat+Sentinel, free
Element84 — AWS-hosted Landsat, free

Analysis-Ready Products

Most users don’t start with raw data. Instead, they use analysis-ready products that have been corrected:

DEA’s NBART (National Broadband Analysis-Ready Tool) corrects for:

Atmospheric scattering and absorption
BRDF effects (viewing angle)
Terrain illumination
Sensor calibration drift

The bands are renamed: nbart_blue, nbart_green, nbart_red, nbart_nir, nbart_swir_1, nbart_swir_2

Microsoft Planetary Computer provides similar ARD (Analysis-Ready Data) with consistent Band 1-7 naming.

This is why rss_core uses .canonical_bands(["red", "nir"]) — it translates your semantic names to the provider-specific band names (nbart_red vs band4 vs B04) automatically.

Why Use Indices?

Once you have analysis-ready products, you face a new problem: how do you see what’s happening when you have 10-13 bands?

Human eyes can’t visualize 13-dimensional data. It’s hard to spot “that field is stressed” when you’re looking at a stack of grey rasters. That’s where spectral indices come in.

Over decades, remote sensing scientists discovered that specific materials behave differently at specific wavelengths:

Healthy vegetation strongly reflects near-infrared (NIR) but absorbs red (chlorophyll)
Water absorbs both red and NIR, but reflects blue-green
Bare soil has a fairly flat response across visible + NIR
Burned areas show low NIR (charcoal) but elevated SWIR

By exploiting these patterns, indices condense multiple bands into a single value that highlights a phenomenon. Instead of checking 4 bands visually, you look at one “vegetation health” map.

What Are Spectral Indices?

A spectral index is a mathematical combination of bands that highlights a specific phenomenon. Indices let you condense multi-band data into single-value “vegetation health” or “water presence” maps.

Here are five commonly used indices:

1. NDVI — Normalized Difference Vegetation Index

The most widely used index. Measures vegetation greenness by comparing red (absorbed) to NIR (reflected):

NDVI = (NIR - Red) / (NIR + Red)

Values range from -1 to +1
Healthy vegetation: 0.3 to 0.8
Water: negative values
Bare soil: near 0

2. NDWI — Normalized Difference Water Index

Highlights open water:

NDWI = (Green - NIR) / (Green + NIR)

Positive values = water
Built-up areas: near 0 or negative

3. NBR — Normalized Burn Ratio

Detects burn scars:

NBR = (NIR - SWIR) / (NIR + SWIR)

Healthy vegetation: positive (high NIR, low SWIR)
Burned areas: negative (low NIR, high SWIR)
The change in NBR (dNBR) between pre- and post-fire images indicates burn severity

4. EVI — Enhanced Vegetation Index

Similar to NDVI but corrects for atmospheric and canopy background:

EVI = 2.5 × (NIR - Red) / (NIR + 6×Red - 7.5×Blue + 1)

Less saturated than NDVI in dense vegetation
Better at tracking canopy variation

5. SAVI — Soil Adjusted Vegetation Index

Works better in sparse vegetation areas:

SAVI = (NIR - Red) / (NIR + Red + L) × (1 + L)

L = soil adjustment factor (typically 0.5)
L = 0 is equivalent to NDVI

The Cloud Problem

Here’s the catch: satellites look through the sky. Clouds obscure the ground, and their shadows darken pixels. Without accounting for clouds, your analysis is wrong.

Cloud cover varies by location and season. In tropical regions like northern Australia, the “wet season” (November to April) has daily afternoon thunderstorms — many scenes are completely clouded over.

Cloud Detection Algorithms

Several algorithms detect clouds and shadows:

Fmask (Function of Mask) — most widely used, developed originally for Landsat
Pixel Quality (PQ) — USGS approach
S2Cloudless — Sentinel-2 specific
ACCA — Automated Cloud Cover Assessment

These analyze spectral signatures and spatial patterns to classify each pixel as clear, cloud, shadow, snow, or water. The results come as QA bands alongside your data.

For most workflows, you query the STAC catalog with a cloud cover threshold (e.g., < 10%) and use the QA band to mask cloudy pixels before analysis. See DEA’s masking guide for practical examples.

Python: The De Facto Standard

Python is the standard language for geospatial analysis. Libraries like rasterio, xarray, and datacube (Open Data Cube) power most research and operational systems worldwide. It’s accessible, expressive, and fast enough for most tasks.

For smaller analyses — a few scenes, a single year — Python with Dask works well:

xarray with Dask creates delayed computation graphs
datacube handles STAC queries and data loading
Processing runs across cores automatically

But there’s a catch: as your data grows, so do the problems.

When Python Struggles

For large-scale processing — thousands of scenes, continental coverage, multi-decade time series — Python starts to strain:

Memory spikes: Dask manages multiple workers, but Python’s garbage collector runs at unpredictable times, causing pauses mid-process
Dask overhead: The task graph and scheduler add overhead; for CPU-bound operations, you spend as much time scheduling as computing
Silent failures: A typo in a band name runs without error and produces wrong output; tests catch things, but they’re not required
Type confusion: Is data an int16 array or float32? You have to track this manually or document extensively
Dependency hell: Python versions, package conflicts, environment management — anyone who’s conda installed incompatible packages knows the pain

These are all Mitigable — mypy, pytest, ruff, typed libraries — but they’re opt-in. There’s no mandatory type checker or linter. If you don’t write tests, nothing fails.

Why Rust?

Rust flips this around:

Compiler-enforced safety: You can’t accidentally mix int16 and float32 — the compiler rejects it
No silent failures: Wrong types or missing error handling fail at compile time, not runtime
No garbage collection: Consistent, predictable latency
Safe parallelism: The borrow checker prevents data races at compile time; rayon handles threads safely
Single-machine parallelism: Your binary runs on all cores locally — no Dask cluster overhead
Cargo: Dependency management is fast, reproducible, and declarative

Note: eorst currently runs on a single machine (multicore, not multinode). Distributed processing across multiple nodes is a planned feature.

You still write your algorithms in the same conceptual style — apply functions to arrays, combine bands, compute indices. But the compiler is your teammate, catching bugs before you run.

The Rust Libraries

Two Rust crates make up the core workflow:

rss_core handles the data access layer:

Query satellite imagery from STAC catalogs (DEA, Microsoft Planetary Computer, Element84)
Canonical band resolution: .canonical_bands(["red", "nir"]) automatically translates to provider-specific band names
File caching with TTL
Cloud mask decoding

Earth Observation and Remote Sensing Toolkit (eorst) handles the processing layer:

Build virtual RasterDataset from files or STAC queries
Process blocks in parallel using rayon
Semantic band selection with the Select trait
Export to GeoTIFF
Optional LightGBM integration for ML classification

The Typical Pipeline

A geospatial processing pipeline follows four steps:

Query — Define your area of interest, time range, and cloud cover threshold
Download — Fetch the data (or stream directly from cloud storage)
Process — Apply your analysis (indices, classification, change detection)
Export — Write results to GeoTIFF for use in GIS or Python

In the next post, I’ll build a complete pipeline: query, download, and process satellite imagery — End-to-End Geospatial Processing with EORST.

Benchmarking eorst vs Python+Dask? See Benchmarking Rust vs Python+Dask for NDVI: 23× Faster, and Why That Matters for Carbon — comparing eorst/Rust against Python+rioxarray on real satellite imagery. Simple NDVI is 2.9× faster in Rust; add FMask cloud masking and it’s 23× faster.