Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Resource Hub
On-demand resources: eBooks, white papers, videos, webinars.

Customer Stories
Academy

Enroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.


Learning Resources
A collection of resources to support your learning journey.
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
Blogs

Stay updated with the latest insights and news from our DataOps team and community.


#TrueDataOps.org
#TrueDataOps is defined by seven key characteristics or pillars:
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
DataOps.liveJan 4, 2021 6:18:00 PM2 min read

PART 1: The Challenges of Repeatable and Idempotent Schema Management: Introduction

This is the first in a series of blog posts discussing automating the lifecycle events (e.g. creation, alteration, deletion) of database objects which is critical in achieving repeatable DataOps processes. Rather than the database itself being the source of truth for the data structure and models, instead the code within the central repository should define these.

However, applying changes (creations, alterations, deletions) to database objects requires the use of SQL, which is an imperative language. This means that a sequence of operations needs to be executed in a certain order from a known database state in order to reach the target configuration.

State-0-N


In the diagram above, the target is State N, starting from State 0. Operations A, B and C are applied in sequence, transiting through states 1 and 2 in order to reach the target state.

Take the following example:

CREATE DATABASE ...
USE DATABASE ...
CREATE SCHEMA ...
USE SCHEMA ...
CREATE TABLE ...
CREATE ROLE ...
GRANT ROLE ... TO TABLE ...


The above code expects the starting state to be a certain way, i.e. no database, schema, table etc. But if the database, for example, was to already exist when the code was run, there would be an error.

In general, imperative procedures suffer from a number of limitations and drawbacks, including:

  • The starting point must be well defined and known in advance, otherwise changes may fail (e.g. if an object to be altered does not exist).
  • Some changes must be executed in sequence (e.g. applying a grant to a table after it has been created), leading to complex serial/parallel branching processes (DAGs).
  • A failure part-way through an imperative process can leave the database in an unknown state.
  • The process author will need to be familiar with the low-level database commands and operations, including limiting factors such as execution times and potential race conditions.

Ideally, it would be possible to apply changes to a database in an idempotent way. This means that a sequence of operations is still applied, but it is insensitive to the initial state, and could therefore be applied one or more times with the same successful effect. If idempotent code could be used, a DataOps process would not need to know the current state of the database, worry about when the process was last executed (or what the result of the last execution was), or care if external processes have updated the database, as the result should always be the same.

State-X-to-N


In the diagram above, an idempotent approach can move the database from State X , where X is an unknown state that could be state 0, 1 or 2, to the target State N.

The next two blog posts will examine different DDL operations used in SQL-based relational databases for the ability to be executed in an idempotent way, particularly in the context of common DataOps use cases, highlighting key exceptions and methods for handling them. Snowflake has been chosen as a popular cloud-based database platform, commonly used in DataOps applications.

RELATED ARTICLES