COG Explained: Why Cloud-Optimized GeoTIFF Became the Default

TL;DR: A COG is a normal GeoTIFF arranged so clients can fetch only the pixels they need using HTTP byte-range requests. It became the default because it works with plain object storage, scales without special servers, and stays compatible with existing GeoTIFF tools.

If you wonder why modern raster catalogs keep pointing to .tif files sitting in cloud buckets, the answer is usually COG (Cloud-Optimized GeoTIFF). It is not a new raster format in the way that PNG is new relative to JPEG. It is a GeoTIFF with a predictable internal layout that makes partial reads fast over the network.

That sounds minor until you look at how people actually use satellite imagery today: Web maps that only need the tiles visible on screen, notebooks that read a small window for analysis, and pipelines that want to process thousands of scenes without downloading terabytes first.

This guide covers what makes a GeoTIFF “cloud-optimized,” how streaming works in practice, and what to watch for when creating COGs.

What a Cloud-Optimized GeoTIFF is (and is not)

A Cloud-Optimized GeoTIFF is still a GeoTIFF. Most software that can read GeoTIFF can open a COG, because the “cloud-optimized” part is about file organization, not a new decoding algorithm. The OGC (Open Geospatial Consortium) published a formal Cloud Optimized GeoTIFF standard that describes both the TIFF layout requirements and the assumptions needed for fast web delivery. ^{OGC: Cloud Optimized GeoTIFF standard overview}

A practical definition you can use in specs is: A COG is a GeoTIFF whose metadata, internal tiling, and internal overviews (pyramids) are arranged to support efficient partial access over HTTP (Hypertext Transfer Protocol). This is also how USGS explains it in plain language: Clients can issue HTTP range requests and read only the portions of the file they need. ^{USGS FAQ: What are Cloud Optimized GeoTIFFs?}

So when someone says “we serve COGs”, they are usually saying: “You can open this GeoTIFF straight from a URL, pan and zoom, and only pull the bytes for your view.”

How COGs stream: Internal tiling and HTTP byte-range reads

COG performance comes from combining two ideas that already existed, and then being strict about how they are arranged in the file.

First: Internal tiling. A tiled GeoTIFF stores data in small blocks (tiles) instead of long scanlines, so reading a small window does not force you to read unrelated pixels.

Second: Byte-range reads. HTTP range requests let a client ask a server for only a slice of a file by byte offsets, instead of downloading the whole thing. If the server supports it, the client sends a Range: bytes=... header and gets back only those bytes. ^{RFC 7233: HTTP Range Requests}

A COG works because the important metadata is positioned so the client can figure out which byte ranges contain the tiles it wants. The OGC standard formalizes this, and GDAL (Geospatial Data Abstraction Library)’s COG driver documentation is a good “how it’s laid out in practice” reference for implementers. ^{GDAL documentation: COG driver and layout details}

Here is the mental model for what happens when a web map or notebook opens a COG from object storage:

The client reads the header and key metadata at the start of the file (often via a small range request).
The client selects an overview level that matches the requested zoom and resolution.
The client computes which tiles intersect the viewport or analysis window.
The client issues a handful of range requests for just those tile byte ranges, then decodes them locally.

That is why the internal ordering matters so much. If the overviews or tile offsets are scattered, the client is forced into many small requests, and latency dominates.

Why it became the default for web maps and cloud analytics

COG won because it fits how cloud infrastructure actually behaves.

Object storage (like Amazon S3 and Azure Blob Storage) is great at serving immutable files, and it is priced for access patterns where you do not want to ship entire multi-gigabyte rasters to every user. COG matches that with “read small parts, on demand” behavior, without needing a specialized raster server in the middle.

It also became a social standard. Major public archives started publishing COG versions of popular datasets so users could stream them efficiently. For example, the AWS Open Data Sentinel-2 L2A COGs dataset explicitly exists because the original JPEG2000 assets were converted into Cloud-Optimized GeoTIFFs for cloud-friendly access. ^{AWS Open Data Registry: Sentinel-2 L2A COGs}

And institutions began recommending it as the default distribution format wherever GeoTIFF already made sense. NASA’s Earthdata guidance explicitly recommends using COG for distribution when you would otherwise distribute GeoTIFF, and notes that OGC COG 1.0 was published in July 2023. ^{NASA Earthdata: COG guidance and standard status}

One more reason it stuck: It plays nicely with STAC (SpatioTemporal Asset Catalog) catalogs. STAC is a JSON metadata convention for describing and discovering spatiotemporal assets, and “assets are COGs behind HTTPS URLs” became a simple, interoperable pattern across many catalogs. ^{STAC specification overview}

COG vs GeoTIFF vs chunked cloud formats

COG is a layout convention for a file format people already know, which is why adoption was so fast. But it is not the best tool for every job.

What you need most	COG (Cloud-Optimized GeoTIFF)	Plain GeoTIFF	Chunked cloud formats (for example, Zarr)
Fast random reads over HTTP	Yes, designed for it	Sometimes, but often inefficient	Yes, designed for it
Backward compatibility with GeoTIFF tools	Yes	Yes	No (different ecosystem)
Frequent updates or appends	Not ideal	Not ideal	Better fit

If your primary workflow is “publish scenes for discovery, visualization, and windowed analysis,” COG is hard to beat. If your primary workflow is “store multi-dimensional arrays with lots of chunk-wise writes,” consider chunked formats instead.

How to create a good COG with GDAL (Not just a valid one)

Most “bad COG experiences” are not about the idea, they are about files that technically open but behave like a slow remote drive.

A good COG is usually: Internally tiled, internally overviewed (pyramids stored inside the same file), and written so the metadata needed for tile access can be read up front. GDAL’s COG driver can create compliant files and is widely used as a reference implementation. ^{GDAL documentation: Creating COGs and choosing options}

Here is a simple example that produces a tiled, overviewed COG:

gdal_translate input.tif output_cog.tif -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE=512

Validation is not only about “does it open.” It is about request count and throughput. A quick checklist you can apply before publishing:

Confirm the file is tiled (not strip-organized) and has internal overviews at sensible factors.
Confirm the file can be efficiently read over HTTP range requests (few requests for a typical viewport).
Use realistic block sizes for your use case. Too small increases request overhead. Too big wastes bandwidth.
Keep metadata compact and avoid optional extras that bloat the first-byte read.

If you publish at scale, track these two operational metrics: How many range requests a typical render takes, and how many total bytes are transferred per render. Those numbers often predict cost and user experience better than raw file size.

Common pitfalls that break performance

The phrase “COG compatible” is sometimes used loosely. In practice, performance issues show up when the file layout forces clients into too many network round trips.

Common offenders include:

Missing overviews, so every zoom reads full resolution.
External overviews stored as separate files, which complicates hosting and access patterns.
Non-tiled internal layout (strip-organized data).
Tile ordering that scatters offsets and increases request count.

Also: Remember that “cloud optimized” assumes your HTTP stack actually supports range requests end-to-end. If a proxy strips headers or the server does not advertise byte ranges, clients fall back to full downloads, and the COG behaves like a big file again. ^{RFC 7233: Requirements and behavior for byte range requests}

When COG is the wrong default

COG is a great distribution and access format. It is not automatically the best storage format for every pipeline.

If you need multi-dimensional data cubes, fast chunk-wise writes, or many concurrent updates to the same dataset, consider formats designed around chunked storage and parallel writes. You can still publish derived layers as COG for broad compatibility, while keeping your internal processing format optimized for computation.

Where ClearSKY fits

ClearSKY delivers its satellite mini-tiles and tiles as Cloud-Optimized GeoTIFFs by default. For enterprise workflows, outputs can be delivered straight into your S3-compatible object storage, so your own tools can stream the data using the same byte-range and overview behavior described above.

FAQ

›Is a COG a different file format than GeoTIFF?

No. A COG is a GeoTIFF with a specific internal organization that makes network access efficient, but it remains readable by standard GeoTIFF software. The OGC standard formalizes the requirements so different tools can create and read COGs consistently. ^{OGC: Cloud Optimized GeoTIFF standard overview}

›Does a COG have to live in the cloud to be useful?

No. The same layout that makes COG fast over the web can also make local reads faster for windowed access, because tiling and overviews reduce unnecessary I/O. The “cloud” part mainly reflects the assumption of HTTP access and object storage hosting, not a different encoding.

›Why do COGs load quickly in web maps without downloading the whole file?

Because clients can request just the byte ranges that contain the tiles needed for the current view and resolution. That behavior relies on HTTP range requests and on the file storing offsets so the client can jump directly to those tiles. ^{USGS FAQ: What are Cloud Optimized GeoTIFFs?}

›What is the easiest way to create and validate a COG?

Use GDAL’s COG driver to create a file with tiling and internal overviews, and then test a real read pattern over HTTP to confirm request count and throughput. GDAL’s documentation covers creation options and layout expectations that most tools follow. ^{GDAL documentation: COG driver and creation options}