Getting Started with Geospatial Rust
If you’ve ever looked at a satellite image of farmland, grasslands, forests, or cities, you’ve seen remote sensing data. Each pixel in those images contains measured values — stored as digital numbers (DNs) in raw data, or as surface reflectance in analysis-ready products — across different spectral bands: blue, green, red, near-infrared, and more. These bands tell us about vegetation health, water content, urban development, and land cover.
This post introduces the core concepts of geospatial raster processing for Rust programmers who’ve never worked with satellite imagery. We’ll cover what the satellites measure, why their data looks the way it does, and what you can actually do with it.
Already know remote sensing? Jump ahead to End-to-End Geospatial Processing with EORST — a code-heavy walkthrough building a complete pipeline.
Why Satellite Imagery?
Earth observation satellites carry sensors that measure sunlight reflected from Earth’s surface. Unlike a regular camera, these sensors don’t just capture red, green, and blue — they capture many more “bands” across the electromagnetic spectrum. Each band measures a different wavelength of light.
These bands reveal things our eyes can’t see:
- Near-infrared (just past what we can see) is strongly reflected by healthy vegetation
- Shortwave infrared detects water content in soil and vegetation
- Thermal infrared measures surface temperature
By combining these bands, we can map vegetation health, detect water stress, identify burn scars, classify land cover, and track changes over time.
The Two Main Satellite Platforms
Two satellite programs provide the backbone of open Earth observation data:
Landsat (NASA)
The Landsat program has been collecting imagery since 1972 — that’s over 50 years of continuous data. Landsat 8 and Landsat 9 carry the Operational Land Imager (OLI) sensor. Each scene covers 185×185 km at 30-meter resolution, revisiting the same location every 16 days.
| Band | Wavelength (µm) | Resolution | What it shows |
|---|---|---|---|
| Coastal Aerosol | 0.43–0.45 | 30m | Coastal waters, atmospheric correction |
| Blue | 0.45–0.51 | 30m | Water penetration, soil/vegetation contrast |
| Green | 0.53–0.59 | 30m | Healthy vegetation (peaks here) |
| Red | 0.64–0.67 | 30m | Vegetation biomass, chlorophyll absorption |
| NIR | 0.85–0.88 | 30m | Healthy vegetation (strongly reflects) |
| SWIR 1 | 1.57–1.65 | 30m | Soil/vegetation moisture |
| SWIR 2 | 2.11–2.29 | 30m | Soil moisture, vegetation stress |
| Pan | 0.50–0.68 | 15m | Image sharpening |
| Thermal 1 | 10.6–11.2 | 100m | Surface temperature |
| Thermal 2 | 11.5–12.5 | 100m | Surface temperature |
Sentinel-2 (ESA)
The European Sentinel-2 satellites provide higher-resolution imagery (10-20m) with more frequent revisits (every 5 days with both 2A and 2B). They include unique “red-edge” bands sensitive to vegetation stress.
| Band | Wavelength (µm) | Resolution | What it shows |
|---|---|---|---|
| Coastal | 0.44 | 60m | Coastal aerosols |
| Blue | 0.49 | 10m | Water, soils |
| Green | 0.56 | 10m | Healthy vegetation |
| Red | 0.665 | 10m | Vegetation chlorophyll |
| Red-Edge 1 | 0.705 | 20m | Early vegetation stress |
| Red-Edge 2 | 0.74 | 20m | Mid-vegetation stress |
| Red-Edge 3 | 0.783 | 20m | Late vegetation stress |
| NIR | 0.842 | 10m | Biomass, vegetation |
| NIR Narrow | 0.865 | 20m | Vegetation moisture |
| SWIR 1 | 1.61 | 20m | Soil/vegetation moisture |
| SWIR 2 | 2.19 | 20m | Soil/vegetation moisture |
Both platforms provide free, open data. Combining them gives you more frequent observations and redundant coverage.
Raw Data vs. Analysis-Ready Products
When you download satellite imagery, you have a choice: raw scenes or analysis-ready products.
Raw Data
The raw “Level-1” data (called L1G or L1TP) is the sensor’s measurement at the top of atmosphere. Problems:
- Atmospheric scattering — blue light scatters off air molecules, making everything bluer than reality
- Atmospheric absorption — water vapor absorbs SWIR bands
- Sun angle differences — same location looks different at 9am vs noon
- Sensor degradation — the sensor changes over time
- Geolocation errors — pixels might not line up perfectly
To use raw data, you’d need to apply your own atmospheric correction algorithm — a PhD’s worth of work.
Who Does All This Correction?
Thankfully, agencies like Australia’s DEA (Digital Earth Australia), the USGS, and ESA invest significant compute and expertise to process raw data into analysis-ready products. They:
- Download raw scenes from NASA/ESA archives
- Apply atmospheric correction (the complex physics)
- Orthorectify for terrain
- Calibrate across sensors and time
- Package as Cloud-Optimized GeoTIFFs (COGs)
- Serve via STAC (Spatio-Temporal Asset Catalog) APIs
STAC is a standard API specification — think of it as “REST for rasters.” You query by location, time, and cloud cover; the catalog returns URLs to cloud-hosted GeoTIFFs. No manual download needed.
The key providers:
- DEA — Australian data, NBART naming, free
- Microsoft Planetary Computer — Global Landsat+Sentinel, free
- Element84 — AWS-hosted Landsat, free
Analysis-Ready Products
Most users don’t start with raw data. Instead, they use analysis-ready products that have been corrected:
DEA’s NBART (National Broadband Analysis-Ready Tool) corrects for:
- Atmospheric scattering and absorption
- BRDF effects (viewing angle)
- Terrain illumination
- Sensor calibration drift
The bands are renamed: nbart_blue, nbart_green, nbart_red, nbart_nir, nbart_swir_1, nbart_swir_2
Microsoft Planetary Computer provides similar ARD (Analysis-Ready Data) with consistent Band 1-7 naming.
This is why rss_core uses .canonical_bands(["red", "nir"]) — it translates your semantic names to the provider-specific band names (nbart_red vs band4 vs B04) automatically.
Why Use Indices?
Once you have analysis-ready products, you face a new problem: how do you see what’s happening when you have 10-13 bands?
Human eyes can’t visualize 13-dimensional data. It’s hard to spot “that field is stressed” when you’re looking at a stack of grey rasters. That’s where spectral indices come in.
Over decades, remote sensing scientists discovered that specific materials behave differently at specific wavelengths:
- Healthy vegetation strongly reflects near-infrared (NIR) but absorbs red (chlorophyll)
- Water absorbs both red and NIR, but reflects blue-green
- Bare soil has a fairly flat response across visible + NIR
- Burned areas show low NIR (charcoal) but elevated SWIR
By exploiting these patterns, indices condense multiple bands into a single value that highlights a phenomenon. Instead of checking 4 bands visually, you look at one “vegetation health” map.
What Are Spectral Indices?
A spectral index is a mathematical combination of bands that highlights a specific phenomenon. Indices let you condense multi-band data into single-value “vegetation health” or “water presence” maps.
Here are five commonly used indices:
1. NDVI — Normalized Difference Vegetation Index
The most widely used index. Measures vegetation greenness by comparing red (absorbed) to NIR (reflected):
NDVI = (NIR - Red) / (NIR + Red)
- Values range from -1 to +1
- Healthy vegetation: 0.3 to 0.8
- Water: negative values
- Bare soil: near 0
2. NDWI — Normalized Difference Water Index
Highlights open water:
NDWI = (Green - NIR) / (Green + NIR)
- Positive values = water
- Built-up areas: near 0 or negative
3. NBR — Normalized Burn Ratio
Detects burn scars:
NBR = (NIR - SWIR) / (NIR + SWIR)
- Healthy vegetation: positive (high NIR, low SWIR)
- Burned areas: negative (low NIR, high SWIR)
- The change in NBR (dNBR) between pre- and post-fire images indicates burn severity
4. EVI — Enhanced Vegetation Index
Similar to NDVI but corrects for atmospheric and canopy background:
EVI = 2.5 × (NIR - Red) / (NIR + 6×Red - 7.5×Blue + 1)
- Less saturated than NDVI in dense vegetation
- Better at tracking canopy variation
5. SAVI — Soil Adjusted Vegetation Index
Works better in sparse vegetation areas:
SAVI = (NIR - Red) / (NIR + Red + L) × (1 + L)
- L = soil adjustment factor (typically 0.5)
- L = 0 is equivalent to NDVI
The Cloud Problem
Here’s the catch: satellites look through the sky. Clouds obscure the ground, and their shadows darken pixels. Without accounting for clouds, your analysis is wrong.
Cloud cover varies by location and season. In tropical regions like northern Australia, the “wet season” (November to April) has daily afternoon thunderstorms — many scenes are completely clouded over.
Cloud Detection Algorithms
Several algorithms detect clouds and shadows:
- Fmask (Function of Mask) — most widely used, developed originally for Landsat
- Pixel Quality (PQ) — USGS approach
- S2Cloudless — Sentinel-2 specific
- ACCA — Automated Cloud Cover Assessment
These analyze spectral signatures and spatial patterns to classify each pixel as clear, cloud, shadow, snow, or water. The results come as QA bands alongside your data.
For most workflows, you query the STAC catalog with a cloud cover threshold (e.g., < 10%) and use the QA band to mask cloudy pixels before analysis.
See DEA’s masking guide for practical examples.
Python: The De Facto Standard
Python is the standard language for geospatial analysis. Libraries like rasterio, xarray, and datacube (Open Data Cube) power most research and operational systems worldwide. It’s accessible, expressive, and fast enough for most tasks.
For smaller analyses — a few scenes, a single year — Python with Dask works well:
xarraywith Dask creates delayed computation graphsdatacubehandles STAC queries and data loading- Processing runs across cores automatically
But there’s a catch: as your data grows, so do the problems.
When Python Struggles
For large-scale processing — thousands of scenes, continental coverage, multi-decade time series — Python starts to strain:
- Memory spikes: Dask manages multiple workers, but Python’s garbage collector runs at unpredictable times, causing pauses mid-process
- Dask overhead: The task graph and scheduler add overhead; for CPU-bound operations, you spend as much time scheduling as computing
- Silent failures: A typo in a band name runs without error and produces wrong output; tests catch things, but they’re not required
- Type confusion: Is
dataanint16array orfloat32? You have to track this manually or document extensively - Dependency hell: Python versions, package conflicts, environment management — anyone who’s
conda installed incompatible packages knows the pain
These are all Mitigable — mypy, pytest, ruff, typed libraries — but they’re opt-in. There’s no mandatory type checker or linter. If you don’t write tests, nothing fails.
Why Rust?
Rust flips this around:
- Compiler-enforced safety: You can’t accidentally mix
int16andfloat32— the compiler rejects it - No silent failures: Wrong types or missing error handling fail at compile time, not runtime
- No garbage collection: Consistent, predictable latency
- Safe parallelism: The borrow checker prevents data races at compile time;
rayonhandles threads safely - Single-machine parallelism: Your binary runs on all cores locally — no Dask cluster overhead
- Cargo: Dependency management is fast, reproducible, and declarative
Note: eorst currently runs on a single machine (multicore, not multinode). Distributed processing across multiple nodes is a planned feature.
You still write your algorithms in the same conceptual style — apply functions to arrays, combine bands, compute indices. But the compiler is your teammate, catching bugs before you run.
The Rust Libraries
Two Rust crates make up the core workflow:
rss_core handles the data access layer:
- Query satellite imagery from STAC catalogs (DEA, Microsoft Planetary Computer, Element84)
- Canonical band resolution:
.canonical_bands(["red", "nir"])automatically translates to provider-specific band names - File caching with TTL
- Cloud mask decoding
Earth Observation and Remote Sensing Toolkit (eorst) handles the processing layer:
- Build virtual RasterDataset from files or STAC queries
- Process blocks in parallel using rayon
- Semantic band selection with the Select trait
- Export to GeoTIFF
- Optional LightGBM integration for ML classification
The Typical Pipeline
A geospatial processing pipeline follows four steps:
- Query — Define your area of interest, time range, and cloud cover threshold
- Download — Fetch the data (or stream directly from cloud storage)
- Process — Apply your analysis (indices, classification, change detection)
- Export — Write results to GeoTIFF for use in GIS or Python
In the next post, I’ll build a complete pipeline: query, download, and process satellite imagery — End-to-End Geospatial Processing with EORST.
Benchmarking eorst vs Python+Dask? See Benchmarking Rust vs Python+Dask for NDVI: 23× Faster, and Why That Matters for Carbon — comparing eorst/Rust against Python+rioxarray on real satellite imagery. Simple NDVI is 2.9× faster in Rust; add FMask cloud masking and it’s 23× faster.