Over the past year, we’ve been hearing more and more about Apache Iceberg.

But how can we explain what Apache Iceberg is to an information worker? The simple answer: “It’s like Excel, but for huge amounts of data.”

We’re all familiar with the volume problem: an ever-growing number of Excel files stored in folders on our hard drive or in the cloud. Every day, we waste valuable time trying to:

  • Find the right file
  • Find the correct version
  • Go back to a previous version
  • Add an extra column
  • Fix mistakes

Without Iceberg, you rely on file names and folder structures to manage your data. With Iceberg, you get a smart system that:

  • Keeps track of everything neatly
  • Supports version control
  • Helps you search quickly and accurately

Where did Apache Iceberg come from? We all know Netflix, one of the largest data platforms in the world. They stored massive amounts of data in Amazon S3 using Apache Hive tables. But they ran into serious challenges when trying to provide a fast and reliable system for their data engineers, and most importantly, for their users.

Some of their problems:

  • No ACID transactions → errors during updates/deletes
  • Unsafe schema changes → jobs would crash
  • Too many small files → slow queries
  • No version control → no rollback possible

To solve these problems, Netflix developed their own solution: Iceberg. They released it as an open-source project under the Apache Software Foundation, hence the name Apache Iceberg.

Let us explain it with a practical example: A Sports Store Selling Soccer Shoes

  • /sales-soccer-shoes-2025
  • /sales_jan.xlsx
  • /sales_feb.xlsx
  • /sales_mar.xlsx

When it’s time to hand over your files to the accountant for tax filings, changes or corrections are needed. This leads to:

  • /sales-jan-v2.xlsx
  • /sales-jan-draft.xlsx
  • /sales-jan-final.xlsx

You can imagine how quickly this becomes messy—and that's just for one product.

What does Apache-Iceberg?

  • Automatically store everything in one organized structure
  • Track all previous versions
  • Easily correct mistakes
  • Add new columns without breaking anything
  • Instantly find specific data

Instead of scattered Excel files... Rows are stored automatically in Parquet files (a format designed for big data).

Date Store Brand Quantity

2025-03-01 Ghent Adidas 150

2025-03-01 Antwerp Nike 90

2025-03-02 Ghent. Puma 120

As a user, you just see one clean, logical table. But behind the scenes, Iceberg is managing all files, versions, and changes—and gives you fast search capabilities.

You stay in control:

  • “Show me yesterday’s sales.”
  • “What did I sell before March 2nd?”
  • “Which version is correct?”
  • “I need to correct the numbers for March 1st.”

Ivan Schotsmans

BARC Ambassador

BARC is a leading analyst firm for data & analytics and enterprise software with a reputation for unbiased and trusted advice. Our expert analysts deliver a wide range of research, events and advisory services for the data & analytics community.