Katalyst DI

Palantir Foundry use cases: ETL, Data Lakes, & Transformation

Our job on the data team is to make life easy; ease of ETL, data transformation, and creating data lakes is one of the main reasons we chose Palantir Foundry as an essential part of our tech stack for data analysis.

At the beginning of this year, I received my official Palantir Foundry Data Engineer Certification! You can read about my colleague Jeff's experience with Palantir Founder Certification program on one of our earlier blogs. Suffice it to say, soon there will be two of us at Katalyst, and we will need to get him to write another blog about Foundry and how we use it to create ontologies of construction supply chains. Construction supply chain ontologies allow you to create actionable information that you can use to make better-informed decisions about your projects.

Before you can leverage an ontology, you need to take many data sources, transform them so they are compatible with each other, then put that data somewhere until the moment they are needed. Often the question comes up: "if you're a software company, why do you use Foundry? Palantir Foundry, one of our backend data platforms is how I make that happen. Today I’ll tell you how we use Palantir Technologies' product Foundry for ETL, Data Lakes, and Data Transformation to accomplish our customer's business objectives with a data-driven approach. 

ETL in Palantir Foundry

First, if you aren’t familiar with the term. ETL stands for Extract, Transfer, and Load. Foundry is a tool I use to access many external data sources via REST API endpoints and other data connections. The source may be tools such as Autodesk BIM360, Procore, or any of the other integrations we offer for enterprise clients. This functionality allows us to leverage your current tools to help accelerate your package library.  There are two ways we work with data integration in Foundry: a “backend” and a “frontend.” For the ETL part, we typically leverage the “backend” through API-based data sources. We would consider this a “data pull.” This functionality allows us to take your package library, find the right endpoints, then grab the data from your other systems via an API data integration. 

One of Foundry's strengths is spinning up new data integrations. It’s also a really robust tool and allows us the ability to create custom data integrations with relatively little effort on our end. There are also pre-configured APIs we leverage. For instance, if you are a supplier and want to connect your HubSpot instance, we can do that with some of the off-the-shelf connectors. Our job on the data team is to make life easy; ease of integration is one of the main reasons we chose Palantir Foundry as an essential part of our tech stack. 


While those connections are great on their own, they are enhanced by our ability to use this data.  Supplier lead time is an example of one area we utilize this functionality. Long story short, this is where we create the ability for a collaborator to add a lead time to your scope in a project or a non-localized package within the KDI platform. This data gets stored in the Postgres server, and then we log this historical data in Palantir Foundry. The results are then mapped to your instance in KatalystDI which allows your design team to make decisions with your unique supply chain constraints. This is also the same functionality that sends updates to on-site teams or procurement when there are supply chain disruptions.

Palantir Foundry & Data Transformation Use Cases

Access Controls for enterprise data are a highly discussed topic when it comes to developing new projects. You don’t want your competitors getting a leg up on you by accident. Let’s say we want to pull data via integration from 3 different supplier sources, then aggregate and anonymize for an outside team like a consultant to leverage.

Foundry provides tools, like Cipher, to minimize and encrypt values in datasets prior to becoming operational. These allow us to merge one source with the others easily and remove information that is extraneous or isn’t essential for everyone to have access to. If we want to anonymize your data to share with a supplier or put together big-picture views on the state of multiple products to share with external project members without cross-pollinating your data, we can run the data through a data pipeline to strip out any identifying information. As you use the KatalystDI platform, the data you and your partners provide will enhance the outputs of these workflows. You will be able to use these “big picture” anonymized sources to share in presentations, for public reporting, or other situations where it’s important to protect internal IP. 

I’ll give an example: Say we have data on multiple types of air condenser units, we can strip the identifying data out and get an idea on lead time and cost. This cleanup could be a union, a join, and/or complete anonymization, which then allows you to take advantage of it in order to surface actionable insights to your internal team or external stakeholders such as an EPC.

Palantir Data Lakes

 You’ve probably heard the term “big data.” It sounds complicated, but if you kept up with the first part of this blog, you’re already halfway there to being an expert. Palantir Foundry makes “big data” simple to digest. I’ll start with an example. Imagine all of the data sets you interact with on a daily basis. Maybe that sounds too complex, so I’ll get a bit more granular. You’re working in a variety of operational systems. I won’t name specific tools, but it probably looks like some combination of the following functionalities: ERP,  project management, scheduling, and logistics, on top of the tools your vendors or customers are using (depending on what side of the project equation you are on). 

Foundry allows us to take all of these data sets, including the anonymized ones I just mentioned, and combine them. This aggregation gives us data management abilities to start to do what we do best: build models. We do that through Data Lakes.


Imagine dumping all of these things into a lake, then going fishing. Except when compared with actual fishing, we aren’t throwing a line into the water and hoping to catch a record-breaking (Bass, Trout, Tuna, or whatever you’re looking for). Instead, we key in some commands and out pops a prize-winning fish that tells us you have 5 generators on order across a handful of projects. Generators have been delivered late in 4 out of 6 projects your company has been working on in the last 5 months, so maybe you should consider rescheduling the installation team by a week. 

Palantir ETL, Data Transformation, and Data Lakes: Why it matters

Regardless of the portion of a construction project you are working on (supplier, contractor, owner, or investor) you have the same desire: on-time delivery and within budget. Enterprise data you already collect in other operational systems can help you achieve that. You need a way to unlock that with the right data sets to surface actionable insights you can make decisions on. Big data can seem complex, these functions in Palantir Foundry help us aggregate your data through data integrations to KatalystDI with the right access controls so you can make sense of it all. 

Similar posts

Get the latest resources and news from KatalystDI

Subscribe to receive updates about projects we are working on, industry data, and our perspective on construction.