Apache Iceberg: Excel for Massive Data Volumes!

Over the past year, we’ve been hearing more and more about Apache Iceberg.

But how can we explain what Apache Iceberg is to an information worker? The simple answer: “It’s like Excel, but for huge amounts of data.”

We’re all familiar with the volume problem: an ever-growing number of Excel files stored in folders on our hard drive or in the cloud. Every day, we waste valuable time trying to:

Find the right file
Find the correct version
Go back to a previous version
Add an extra column
Fix mistakes

Without Iceberg, you rely on file names and folder structures to manage your data. With Iceberg, you get a smart system that:

Keeps track of everything neatly
Supports version control
Helps you search quickly and accurately

Where did Apache Iceberg come from? We all know Netflix, one of the largest data platforms in the world. They stored massive amounts of data in Amazon S3 using Apache Hive tables. But they ran into serious challenges when trying to provide a fast and reliable system for their data engineers, and most importantly, for their users.

Some of their problems:

No ACID transactions → errors during updates/deletes
Unsafe schema changes → jobs would crash
Too many small files → slow queries
No version control → no rollback possible

To solve these problems, Netflix developed their own solution: Iceberg. They released it as an open-source project under the Apache Software Foundation, hence the name Apache Iceberg.

Let us explain it with a practical example: A Sports Store Selling Soccer Shoes

/sales-soccer-shoes-2025
/sales_jan.xlsx
/sales_feb.xlsx
/sales_mar.xlsx

When it’s time to hand over your files to the accountant for tax filings, changes or corrections are needed. This leads to:

/sales-jan-v2.xlsx
/sales-jan-draft.xlsx
/sales-jan-final.xlsx

You can imagine how quickly this becomes messy—and that's just for one product.

What does Apache-Iceberg?

Automatically store everything in one organized structure
Track all previous versions
Easily correct mistakes
Add new columns without breaking anything
Instantly find specific data

Instead of scattered Excel files... Rows are stored automatically in Parquet files (a format designed for big data).

Date Store Brand Quantity

2025-03-01 Ghent Adidas 150

2025-03-01 Antwerp Nike 90

2025-03-02 Ghent. Puma 120

As a user, you just see one clean, logical table. But behind the scenes, Iceberg is managing all files, versions, and changes—and gives you fast search capabilities.

You stay in control:

“Show me yesterday’s sales.”
“What did I sell before March 2nd?”
“Which version is correct?”
“I need to correct the numbers for March 1st.”

Ivan Schotsmans

BARC Ambassador

BARC is a leading analyst firm for data & analytics and enterprise software with a reputation for unbiased and trusted advice. Our expert analysts deliver a wide range of research, events and advisory services for the data & analytics community.

Apache Iceberg: Excel for Massive Data Volumes!

Our Partner