Best Sentinel-2 Cloud Mask: SCL vs s2cloudless vs FMask
2026-03-31 · 8 min read · Sentinel-2 · Cloud Masking · Earth Observation · Near Real-Time

TL;DR: There is no universal winner. SCL is the best built-in Sentinel-2 quality layer, s2cloudless is usually the most useful single-date cloud probability model, and FMask is strongest when you need a fuller cloud-plus-shadow workflow. For production time series, the best answer is often hybrid, or when the workflow allows it, less reliance on single-date cloud masks altogether.
The plain-English answer
The frustrating answer is also the honest one: Asking for the “best” Sentinel-2 cloud mask mixes together three tools that were built for slightly different jobs.
SCL, short for Scene Classification Layer, is the easiest place to start because it arrives with many Sentinel-2 L2A (Level-2A surface reflectance) products. It is explainable, convenient, and good enough for a lot of filtering tasks. But it is not just a cloud detector. ESA’s algorithm description says the scene classification map is mainly there to support Sen2Cor atmospheric correction by separating cloudy pixels, clear land, and water, while also providing classes such as cloud shadow, cloud probability classes, cirrus, and snow. That makes SCL useful, but it also tells you what it was optimized for. ESA, Sentinel-2 L2A ATBD
s2cloudless is different. It is a single-scene, pixel-based cloud detector that uses 10 Sentinel-2 bands and returns cloud probability, which you can threshold to match your tolerance for false positives or missed clouds. That probability output is the real reason people like it. You are not locked into one hard class boundary. One practical nuance matters, though: If you use hosted convenience layers instead of running the model yourself, some platforms expose precomputed cloud probability and cloud mask bands at 160 m resolution, which is fine for fast screening but not identical to running the model at full request resolution. Sentinel Hub, s2cloudless repository Sentinel Hub, Cloud Masks documentation
FMask is different again. It is not just a cloud score. It is a fuller masking workflow designed to identify clouds and cloud shadows, and the current public implementation for Sentinel-2 expects L1C (Level-1C top-of-atmosphere) input and outputs explicit classes for clear land, clear water, cloud shadow, snow or ice, and cloud. That makes it attractive when shadow contamination is the real operational problem, not just cloud opacity. GERSL, FMask repository
What each option actually gives you
| Option | What it really is | What you get | Best fit | Main weakness |
|---|---|---|---|---|
| SCL | A Sen2Cor scene classification map inside Sentinel-2 L2A | Fixed semantic classes such as cloud shadow, medium and high probability cloud, cirrus, snow or ice, water, vegetation, not-vegetated | Fast quality assurance, explainable masking, product-native filtering | Not a tunable probability model, and not optimized as a standalone cloud detector |
| s2cloudless | A single-scene cloud probability model | Cloud probability, then a thresholded cloud mask if you choose one | Near real-time screening, tunable omission versus commission trade-offs | Cloud only, so you still need separate shadow handling |
| FMask | A cloud and cloud-shadow masking workflow | Clear land, water, cloud shadow, snow or ice, cloud | Pipelines that must solve cloud and shadow together | More moving parts, more preprocessing assumptions, and more scene-specific failure modes |
That table already hints at the answer. If your question is really “What should I use as the default mask on standard Sentinel-2 L2A imagery?”, SCL is the simplest honest answer. If your question is “What gives me the most control over cloud aggressiveness in a single date?”, s2cloudless is usually the more useful answer. If your question is “What gives me cloud and cloud shadow in one workflow?”, FMask is the most complete of the three.
Which cloud mask is best for near real-time work
Near real-time Earth observation has asymmetric costs. A missed cloud can quietly poison downstream indices, classifications, and alerts. An extra false cloud usually just costs you one observation. Because of that asymmetry, probability often beats hard classes. It lets you tune the mask to the application instead of inheriting someone else’s threshold.
That is where s2cloudless tends to win in practice. It is easy to push it conservative for alerting workflows, or relax it when coverage matters more than perfect cleanliness. It also plays well with other signals. You can combine probability with view geometry, temporal consistency, or a product-native quality layer instead of pretending one bitmask should settle everything.
SCL still earns its place in near real-time pipelines because it gives you interpretable semantic classes with almost no extra work. Cloud shadow, cirrus, snow or ice, saturated or defective pixels, and water are all operationally useful categories. But when teams treat SCL as the final authority on cloud, they usually hit the same ceiling: It is convenient, yet not flexible enough when the scene gets ambiguous.
FMask becomes compelling when shadow contamination matters as much as cloud detection. That is common in mountainous terrain, urban areas with strong contrast, and workflows where dark cloud shadow can be misread as water, burn severity, or vegetation change. In those cases, a cloud-only score is not enough.
Where SCL, s2cloudless, and FMask usually break
SCL struggles most where fixed classes meet messy reality. Thin cloud edges, bright bare ground, snow transitions, and haze-like conditions are exactly where a crisp semantic label can feel overconfident. The problem is not that the map is bad. The problem is that users often ask more from it than it was meant to deliver.
s2cloudless breaks in a different way. It gives you a probability surface, which is powerful, but it does not solve the full masking problem on its own. You still need to decide how to threshold it, how much to dilate cloud edges, and how to handle shadow. If you skip those decisions, the model can look better in a demo than in an analysis pipeline.
FMask’s failure mode is complexity. Once you ask one workflow to detect cloud and then match cloud shadow geometrically, you gain useful structure but also inherit more assumptions. Input level matters, preprocessing matters, and scene geometry matters. That is why FMask can be excellent in the right pipeline and still not be the easiest default for every Sentinel-2 user.
Why published comparisons keep disagreeing
Cloud-mask comparisons are notoriously sensitive to what exactly gets scored. Thick cloud cores are the easy part. Thin or semi-transparent clouds, cloud edges, shadows, snow, bright cities, bright rock, and coastal haze are where methods separate. The CMIX, short for Cloud Mask Intercomparison eXercise, found that algorithm performance varies by reference dataset and that thin or semi-transparent clouds remain harder than thick clouds. In other words: The leaderboard moves when the truth data and scoring rules move. Skakun et al., 2022, CMIX
That is why simplistic takes like “FMask is best” or “SCL is enough” usually age badly. They compress several choices into one slogan: Product level, cloud definition, shadow handling, dilation buffer, and tolerance for omission versus commission. Once you unpack those choices, the masks stop looking interchangeable.
Practical recommendation for production workflows
For a typical Sentinel-2 L2A workflow, SCL is the best built-in quality layer, but not usually the best final cloud decision by itself. For a single-date cloud detector that you can tune to your risk tolerance, s2cloudless is usually the best starting point. For a workflow that must solve cloud and cloud shadow together, especially from L1C input, FMask is the more complete answer.
In production time series, a hybrid pattern is usually stronger than any single mask. Use cloud probability as the main cloud signal, keep SCL for hard semantic vetoes such as cloud shadow or snow and ice, and then clean up the remaining mistakes with temporal quality assurance (QA). That is less elegant than choosing a single winner, but it is closer to how robust pipelines actually work.
There is also a more fundamental point for time-series systems: The cleanest workflow is often the one that depends less on single-date cloud masking in the first place. At ClearSKY, our output is designed to be cloud-free, and when the workflow allows it we prefer not to use cloud masks as the primary gate on input data. That is especially true for time-series analysis, where many cloud masks are inconsistent enough to introduce temporal noise of their own. In that setting, multi-observation logic and temporal methods are often more reliable than asking any one mask to be correct everywhere on every date.
FAQ
›Is SCL enough for NDVI (Normalized Difference Vegetation Index) or land-cover time series?
It is often enough for quick filtering and exploratory work. It is usually not enough for the cleanest production time series because cloud edges, thin cirrus, and bright surfaces can still leak through or get over-masked. Teams that care about consistency over time usually add a probability layer, temporal cleanup, or both.
›Why do people like s2cloudless so much?
The main reason is control. A probability map lets you choose how conservative or permissive the mask should be for a specific workflow instead of accepting one fixed class boundary. That matters a lot when the cost of a missed cloud is very different from the cost of masking a usable pixel.
›When should I prefer FMask over s2cloudless?
Prefer FMask when cloud shadow is a first-order problem and you want one workflow to tackle both cloud and shadow together. That is especially relevant in terrain with strong shadow effects or in applications where dark shadows can look like real surface change. If you only need a tunable cloud signal, s2cloudless is usually simpler.
›What is the best production recipe today?
The most robust answer is rarely a single mask. Start with a tunable cloud probability, keep semantic classes for obvious bad pixels, and use time-series logic to remove leftovers that neither single-date method catches cleanly. In some production systems, especially cloud-free time-series products, the better strategy is to reduce dependence on cloud masks altogether rather than trusting them as the main decision layer.


