Sentinel-2 Scaling & Harmonization: Offsets, Processing Baselines, and “Harmonized” Collections

Q: What is the safest formula to convert Sentinel-2 DN to reflectance?

Use the metadata driven form: ρ = (DN + ADD OFFSET) / QUANTIFICATION VALUE . For L1C the offset is typically RADIO ADD OFFSET , and for L2A it is typically BOA ADD OFFSET . Read both the offset and the quantification value from the product metadata instead of hard coding constants.

Q: What changed in Processing Baseline 04.00, and why do I care?

PB 04.00 (operational from 2022 01 25) introduced an additional radiometric offset in metadata so dark scene noise would not be clipped during quantization. If you mix baseline eras without handling offsets, long time series can show a false step change. Platform “harmonized” collections often remove that step for you, which is useful but also changes what corrections you should apply.

Q: Is DN=0 always NoData in Sentinel-2?

No. Sentinel 2 products declare special values (NoData and saturated) in metadata, and some tools expose them through derived masks. A value of zero might be used as a special value in some contexts, but you should not assume it without checking the product’s declared special values. The safest approach is: read the special values, mask them intentionally, and treat “negative after offset” as valid unless your application defines otherwise.

TL;DR: Sentinel-2 reflectance bands are stored as integers (DN, digital numbers) plus metadata: a scale (quantification value) and, since PB 04.00, a per-band add offset. Convert with ρ = (DN + ADD_OFFSET) / QUANTIFICATION_VALUE and use the product’s declared special values for NoData and saturation. If you are using a “Harmonized” collection (for example Earth Engine’s Sentinel-2 harmonized datasets), the PB 04.00 offset has already been removed, so do not correct it twice.

Quick rules (SAFE to reflectance)

Scale: Sentinel-2 L1C (Level-1C) TOA (top-of-atmosphere) and L2A (Level-2A) BOA (bottom-of-atmosphere) are reflectance stored as integers with a scale defined in metadata (often exposed as 0.0001, meaning “scaled by 10000”). Read the quantification value rather than hard-coding it when you can.^{Google Earth Engine dataset docs: Sentinel-2 SR bands are scaled by 10000 and PB 04.00+ scenes are range-shifted by 1000 in non-harmonized form}
Add offsets (PB-dependent): From PB 04.00 onward, ESA introduced per-band radiometric offsets so dark-scene noise does not get clipped. In SAFE metadata you will typically see RADIO_ADD_OFFSET on L1C and BOA_ADD_OFFSET on L2A, and you apply the signed value before scaling.^{ESA STEP Forum: PB 04.00 introduced an additional radiometric offset in metadata to avoid truncating negative values over dark surfaces}
Special values: Do not assume DN=0 is always NoData. SAFE products declare special values (NoData and saturated) in metadata, and different toolchains may map or expose them differently.^{GDAL Sentinel-2 driver docs: Sentinel-2 products expose special values (NoData, saturated) and related masks via metadata}
“Harmonized” collections: Platform “harmonized” Sentinel-2 datasets typically remove the PB 04.00 range shift so time series align across baseline eras. That is convenient, but it also means you should not subtract or add the SAFE offsets again.^{Google Earth Engine Sentinel-2 overview: harmonized collections remove the PB 04.00 band-dependent offset for alignment with pre-04.00 scenes}

Processing baseline in plain language

A processing baseline (PB) is ESA’s versioned recipe for turning raw measurements into Level-1C and Level-2A products. Baseline changes can affect pixel semantics, not just metadata, so “same sensor” does not automatically mean “same radiometry”. ESA documents baseline evolution and also runs reprocessing campaigns to deliver more consistent time series across the archive, which is why you may see older acquisition dates appear with newer baseline semantics after reprocessing.^{Sentinel Online: Sentinel-2 processing baseline concept, baseline evolution, and archive reprocessing for consistent time series}

PB 04.00 (operational from 2022-01-25) is the baseline where the radiometric offset change became a frequent source of confusion. The goal was practical: represent near-zero and sometimes negative values over very dark targets without clipping them during quantization, by shifting the stored range and declaring the offset in metadata.^{ESA STEP Forum: Motivation for PB 04.00 radiometric offset and how it avoids losing information over dark surfaces}

The safe conversion from DN to reflectance

Think of SAFE as “numbers plus a recipe”. The numbers are the per-pixel integers, and the recipe is the metadata.

For Level-1C, the offset is typically exposed as RADIO_ADD_OFFSET, and ESA documentation describes the storage relationship as a quantized reflectance with an offset term. In practice you recover reflectance with the inverse form: add the offset (signed) and divide by the quantification value.^{Copernicus Data Space forum: L1C quantization and RADIO_ADD_OFFSET relationship and how to recover TOA reflectance from DN}

For Level-2A, the same principle applies but the field name is typically BOA_ADD_OFFSET. The most robust habit is to treat both the quantification value and the per-band add offsets as metadata-driven inputs, not constants, even if a specific era tends to use the same values.

Product level	Add offset field (per band)	Quantification field	Special values to respect
L1C (TOA)	`RADIO_ADD_OFFSET`	`QUANTIFICATION_VALUE`	NoData and saturated are declared in metadata, do not guess
L2A (BOA)	`BOA_ADD_OFFSET`	`BOA_QUANTIFICATION_VALUE` or `QUANTIFICATION_VALUE` (varies by packaging)	NoData and saturated are declared in metadata, do not guess

Read the quantification value and the per-band add offset from metadata.
Read the declared special values (NoData and saturated) and decide how you will mask them.
Compute reflectance per band: ρ = (DN + ADD_OFFSET) / QUANTIFICATION_VALUE.
Apply your mask policy after conversion, and do not treat “negative” as “missing” unless your application explicitly requires it.

Indices note: Pure ratios like (A − B)/(A + B) cancel a shared multiplicative scale, but they do not cancel additive offsets. Indices with constants (for example EVI2 and OSAVI) also assume physically scaled reflectance. Correct offsets and scaling first if you want your index math to mean what you think it means.

The two meanings of “Harmonized”

“Harmonized” is overloaded, so you want to pin down which meaning you are relying on.

In a platform sense (for example Earth Engine), “harmonized Sentinel-2” usually means the PB 04.00 range shift has been removed so post-2022-01-25 scenes sit in the same numeric range as older scenes. That improves long time series out of the box, but it changes what you should do next: you still scale to 0–1 reflectance, but you do not apply the SAFE offsets again.^{Google Earth Engine harmonized Sentinel-2 SR: PB 04.00+ scenes are shifted to match older range; SR bands are scaled by 10000}

In a remote sensing science sense, “harmonized” can also mean cross-sensor harmonization (for example Landsat plus Sentinel products). This article is not about cross-sensor harmonization. It is about within-Sentinel-2 baseline-era alignment and offset handling.

Where teams get bitten in real projects

Most failures are quiet and look like “a small bias”. A multi-year vegetation series can show an artificial step at the PB 04.00 boundary if offsets are mishandled. Low-signal areas (deep water, heavy shadow, winter scenes) are where offset logic matters most, because that is where clipping and special-value confusion produce maps that look clean but are wrong.

The other common failure is double-correction. It happens when a team starts from a harmonized platform collection and then applies SAFE-era offsets again because a script was written for raw SAFE JP2 years ago. The result is not “slightly off”. It is systematically shifted reflectance across bands, which then contaminates thresholds, indices, and any downstream machine learning.

ClearSKY delivery modes

If you receive Sentinel-2 through ClearSKY, ask which semantics you are getting: baseline-harmonized reflectance (analysis-ready time series) or PB-native values (closer to ESA SAFE semantics for detailed QA). The right choice depends on whether your priority is production consistency across years or reproducing baseline-era behavior exactly. Either way, treat the conversion as metadata-driven, and keep provenance (baseline, scaling, offsets, processing version) attached to every raster you store.

FAQ

›What is the safest formula to convert Sentinel-2 DN to reflectance?

Use the metadata-driven form: ρ = (DN + ADD_OFFSET) / QUANTIFICATION_VALUE. For L1C the offset is typically RADIO_ADD_OFFSET, and for L2A it is typically BOA_ADD_OFFSET. Read both the offset and the quantification value from the product metadata instead of hard-coding constants.

›What changed in Processing Baseline 04.00, and why do I care?

PB 04.00 (operational from 2022-01-25) introduced an additional radiometric offset in metadata so dark-scene noise would not be clipped during quantization. If you mix baseline eras without handling offsets, long time series can show a false step change. Platform “harmonized” collections often remove that step for you, which is useful but also changes what corrections you should apply.

›What does “Harmonized Sentinel-2” mean in Google Earth Engine?

In Earth Engine, “harmonized” means PB 04.00+ scenes have been shifted to match the numeric range of older scenes, which removes the baseline-era discontinuity. The reflectance bands are still scaled integers (commonly “scaled by 10000”), so you still scale to 0–1. You should not apply the SAFE-era add offsets again on top of the harmonized values.

›Do NDVI and other indices require scaling and offsets first?

If your index is a pure normalized difference, a shared multiplicative scale cancels, but additive offsets do not. Indices with constants (for example EVI2) assume reflectance is physically scaled, so computing them on raw integers can bias results. When in doubt, convert to reflectance first and keep the pipeline consistent across time.

›Is DN=0 always NoData in Sentinel-2?

No. Sentinel-2 products declare special values (NoData and saturated) in metadata, and some tools expose them through derived masks. A value of zero might be used as a special value in some contexts, but you should not assume it without checking the product’s declared special values. The safest approach is: read the special values, mask them intentionally, and treat “negative after offset” as valid unless your application defines otherwise.

Sentinel-2 Scaling & Harmonization: Offsets, Processing Baselines, and “Harmonized” Collections

Quick rules (SAFE to reflectance)

Processing baseline in plain language

The safe conversion from DN to reflectance

The two meanings of “Harmonized”

Where teams get bitten in real projects

ClearSKY delivery modes

FAQ

Related articles

Sentinel-2 Indices: The Big Cheat Sheet (Formulas, Uses, and Pitfalls)

EPSG and UTM for Satellite Imagery: Choosing the Right CRS

ClearSKY vs Planet Fusion Monitoring: Daily Imagery Without the Guesswork