The TIP of your Lakehouse
TL;DR
The Iceberg Catalog landscape is evolving rapidly with significant announcements from Snowflake and Databricks. Adding to this vibrant ecosystem, HANSETAG introduces TIP, a Rust-native Iceberg REST Catalog that prioritizes data quality, governance, and flexibility. It innovates with change events, contract validation, and multi-tenancy — all in a lightweight, customizable solution.
In recent days, seismic shifts have reverberated through the data landscape. Notably, Snowflake announced their open-source Iceberg REST catalog, Polaris , while Databricks acquired , Tabular.io - the pioneering company behind Apache Iceberg. Not stopping there, Databricks also open-sourced Unity Catalog – with Iceberg REST (read) support. It seems that these days, to be taken seriously as a Data Company, you need to open-source your Iceberg Catalog implementation.
Its already Tuesday, and a suspicious absence looms: no new catalog has surfaced this week. But fear not! Allow us to introduce TIP - a Rust-native, multi-tenant, single-binary implementation of the Iceberg REST Catalog. But beware, we are doing things a bit differently to (announced 😉) and existing Catalogs. More on that later. Let’s start with some basics.
Who we are
At HANSETAG, we build a state-of-the art Data Product Platform that enables companies to seamlessly build, share and monetize their Data Assets. Our platform ensures consistently high Data Quality through enforced SLOs documented in Data Contracts. To power our platform, we need a reliable multi-tenant Iceberg Catalog under the hood. As existing implementations didn’t meet our needs, we built our own Catalog under Apache License. Feel free to contribute! 😊
Data Lakehouse
Data Lakehouses have become the go-to architecture for Data & Analytics initiatives and provide the foundation for the implementation of a Data Mesh. They combine the flexibility of Data Lakes with the structure and comfort of Data Warehouses. It is crucial to recognize that not all Lakehouses are equal. Over the years, three Table formats have emerged for Lakehouses: Apache Iceberg, Delta Lake and Apache Hudi. Because of Icebergs rapid development and great vendor-independent community, we have chosen Iceberg for our initial go-to Lakehouse format at HANSETAG.
Iceberg and the REST Catalogs
The Apache Iceberg Catalog is the brain of the Data Lakehouse, providing efficient management and organization of large-scale datasets. It serves as the central repository of namespaces, tables and views and manages fine-grained access to individual objects.
Iceberg supports other Catalogs than REST. However, underlined by the recent market activities from Snowflake and Databricks, the Iceberg REST Catalog has the brightest future. It is also the only Iceberg Catalog today, that enables client access to data without specifying Storage Credentials on client side. The Iceberg community is currently thinking about augmenting all other Catalogs via the REST Catalog as part of a new Version of the REST Specification.
How TIP is different
While building a Catalog is no Rocket Science, it is still a decision that we didn’t make lightly. We prioritize Data Quality and Data Governance for our Platform and want to give users full control of their data — may they be self-hosted or in the Cloud. As a result, we have built a Catalog that innovates upon existing standards in significant ways:
- Multi-Tenant: A single deployment of TIP can serve multiple projects. Each project can in turn contain multiple warehouses. Warehouses and Projects can be added dynamically during runtime via REST-API and can each reside on different storage locations.
- Change Events: It is important for external systems to know what changes happen to our tables. We are emitting events (Cloudevents) for every change that happens — for example when the schema of a table changes.
- Change Approval / Contract Validation: Data Contracts are at the core of what we are doing at HANSETAG. While it’s great to know when a schema changes via Change Events, it may still break Data Contracts of your Data Product and consequently all downstream Data Pipelines. With TIP you can prohibit those changes using our ContractVerification trait. Stay tuned for our open-source implementation of Data Contracts!
- Written in Rust: Single small all-in-one binary. No JVM or Python env required.
- Customizable: TIP is meant to be extended — but comes with batteries included. We expose key interfaces publicly, which allows us to add integrations in the future and enables you to write your own integrations by implementing a few methods. These interfaces are:
- Backend Database (Catalog): Pre-Shipped with Postgres
- Secret Store: Pre-Shipped with Postgres, Vault work in progress
- Event Store: Pre-Shipped with NATS
- Authentication: Pre-Shipped with OIDC, Zitadel work in progress
- Authorization: Pre-Shipped with OpenFGA coming soon
- Contract Validation: Support for our soon to be released Data Contract Library coming soon
- Storage Access Management: Built-in S3-Signing that enables support for self-hosted as well as AWS S3 WITHOUT sharing S3 credentials with clients. We are also working on vended-credentials!
- External Fine-Grained Access (FGA): Our Catalog does not store any permissions internally and will never do so. We are leveraging OpenFGA based on Googles Zanzibar Paper to implement authorization. If your company already has a different system in place, you can integrate with it by implementing a handful of methods in the
AuthZHandler
trait.
Getting Started
git clone https://github.com/hansetag/iceberg-catalog.git
cd iceberg-catalog/examples
docker compose up
Head over our Documentation on GitHub more details and don’t forget to leave us a ⭐!
Github: https://github.com/hansetag/iceberg-catalog
HANSETAG: https://hansetag.com/