Post header background circles

How DataOps.live Enriches a Data Vault Implementation

Blog

By Doug 'The Data Guy' Needham

Feb 14, 2023

4 min read

How DataOps.live Enriches a Data Vault Implementation

The toolset around implementing a Data Vault continues to grow. Once a decision has been made to implement Data Vault, there are still further choices to be made to make the Data Vault implementation successful. Using the DataOps.live Data Platform can make these choices simpler to implement. Our Data Platform can abstract the various tools such that the only thing needed in our platform is configuration information showing the tools what to do. The Automation of Snowflake is built into our platform. Running a pipeline will produce the proper environment around the Data Vault that it needs.

8FA18E44-1AA0-4A78-BE4C-F2D16A25CB28

More can be said about Data Vault than can be covered adequately in a simple blog post. The essence of Data Vault modeling and implementation is that tables are created using three main structures:

HUB
LINK
SATELITE

One of the most important topics when designing a Data Vault structure is the business key. A business or natural key is the key that the business uses to identify their records in the source systems. I think of things like account number or customer ID number, chances are these keys are not primary keys on the tables, yet they are the keys I use as a customer when interacting with a business.

A hub is a set of unique business keys. This table pattern can be loaded from multiple systems that share business keys across various systems. In the diagram above these are the white circles with the labels.

A link is the relationship between business keys. These relationships could be transactions, hierarchies, ownership, and other types of relationships. A link can be a relationship between two or more hubs. In the diagram above the link is the triangle in the middle.

A Satellite is a time-dimensional table housing detailed information about the hubs or links business keys. In the diagram above these are the ellipses connected to either the hubs or links.

As I said at the beginning of this article, more can be said about Data Vault than what I intend to cover here, so I would suggest reaching out to the inventor of Data Vault, Dan Linstedt if you want to learn more. He will point you in the right direction.

Our founders did a video describing the relationship between Data Vault and Dataops.live some time ago. The essence of the video has not changed even though our software has matured with new capabilities.

The DataOps.live data platform can augment the development, creation, and ongoing growth of a production quality Data Vault implementation with several of our features described below:

Environment management

Using our feature branching capability of providing many data engineers with their own data and code branch environment as the team grows in creating a Data Vault they will create new branches, then those branches will be merged as things go to production and new structures will be added to the production Data Vault.

Security

The DataOps.live data platform can manage even the most sophisticated security requirements. I wrote about implementing a Snowflake recommended security model for your production database in this article. I will not repeat much of that article, but there will always be a need to implement security around even the most carefully designed data model. Our platform eases the management of the role hierarchy and ensures that if a role has been dropped for some reason it will be redeployed the next time an orchestration is run.

Orchestration

Often there are several tools in an organization’s data ecosystem. Monitoring of load processes in an ETL tool, checking the status of testing in a test framework, ensuring that all the metadata is captured and published to the data catalog. Stringing together all these various disparate tools can be overwhelming and their importance can be undervalued by the data product consumers (trust me I know). With the DataOps.live data platform, our tool integrates all these things in a single framework. When adding new tools, it is a matter of writing a configuration file, then testing it in your feature branch. Once it works, merge the branch into dev, and start the process of promoting to production.

Testing

Ensuring the assumptions that are made about the data are in fact valid is crucial to ensuring the data is loaded, processed, enriched, and made available to the right people at the right time in the right format. If a test fails, you will be notified before the data product consumer. As with many of our customers your customers will come to support you when you say that stale good data is better than bad fresh data. As you grow the tests and learn more about the data that makes up your Data Vault you will be able to constantly run all these tests during every orchestration run. Wouldn’t you sleep better knowing that the data has been validated properly constantly? Or would you rather let your customers find problems with the data before you?

We can show you how our data platform can support a Data Vault implementation. Let us know how these features will help you in your production implementation. In the meantime, check out our #TrueDataOps Podcast series which features pioneers and thought leaders in the Data Vault and DataOps worlds!

Share

Related resources

View all articles

DataOps.live and VaultSpeed Launch Joint Integration to Drive Data Vault Agility, Orchestration, and Automation on the Snowflake Data Cloud

DataOps.live & VaultSpeed have partnered to provide multi-layered automation capabilities for Snowflake customers. Join...

Posted by Patrick Connolly

Vaultspeed Orchestrator Accelerates Your Data Vault on Snowflake

Accelerate the set up and management of your Data Vault on the Snowflake Data Cloud with the DataOps.live new...

Posted by Doug 'The Data Guy' Needham

Data Governance

DataOps Live

Let's Dig In to Data Mesh

Find out exactly what data mesh is, how it can help your organization get more value from your data, and ways to...

Posted by DataOps.live

How to successfully automate your Data Cloud

Discover how DataOps.live & VaultSpeed empower your data operations. Reduce costs for Snowflake data pipelines while...

Posted by DataOps.live