Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Resource Hub
On-demand resources: eBooks, white papers, videos, webinars.

Customer Stories
Academy

Enroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.


Learning Resources
A collection of resources to support your learning journey.
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
Blogs

Stay updated with the latest insights and news from our DataOps team and community.


#TrueDataOps.org
#TrueDataOps is defined by seven key characteristics or pillars:
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
Guy Adams - CTO, DataOps.liveAug 9, 2022 1:10:41 PM5 min read

DataOps with highly governed infrastructure

Introduction

DataOps pipelines, based on the #TrueDataOps philosophy, follow the software CI/CD paradigm. Part of this, particularly the Continuous Deployment part of this, usually includes a significant element of the target environment. For example, for a simple Web Application on Kubernetes, not only would a development pipeline build and test the WebApp, but it would create the Pods and Services (and any other related setup) in the Kubernetes cluster as part of deploying the application, and similarly remove all of these when the development of this feature is complete. Phrased another way, the job of a pipeline could be stated as “build and deploy everything required to have the correct version of my application running”—including all of the infrastructure as required. 

Why?

If we think about the old way of software development, it looked something like this: 

Feature Branch diagram 1


The fixed number of ‘hard’ environments here makes the infrastructure management problem relatively simple but places a huge number of limitations on development engineers, including: 

  • Huge coordination requirements and time sharing 
  • More time setting up/resetting environments than actually developing
  • Constant conflicts between engineers 
  • Huge inefficiencies 
  • Greatly increased time to value  


The software world has moved to a far more elastic model:
Feature Branch diagram 2


This uses the dynamic and elastic capabilities of something like Kubernetes to allow developers to create environments as they need them, just for them, for as long as they need them and then to tear them down again afterwards. This has dramatically improved the efficiency of software engineers.
 

 The on-prem/legacy models for data were very similar to those of software: 


With all the same challenges and in many cases, actually worse than the software world —to work out whether your Dev server has all the correct versions of some libraries is a pain, but to work out with it has every record of correct data is almost impossible. In particular, there has been a high barrier to creating a database. In the legacy world a database is a meaningful, tangible thing with clear costs associated with it, especially if that database has a considerable volume of data. This in turn creates a high barrier to each developer or even each feature having their own develop environment.
 

Cloud Data Platforms, specifically Snowflake, now enable data to follow similar models to software, with Snowflake providing the dynamic and elastic capabilities for data that Kubernetes does for software: 
Background pattern

Description automatically generated with low confidence

Databases are no longer precious objects with high barriers to creation. In fact they are nothing more than logical containers. In Snowflake a table (and other similar object types) carry actual data, state and informationdatabases and schemas are little more than logical containers to make them easier to manage at scale. There is therefore a very low barrier to entry to create new databases for individual engineers or individual features to develop and test against, and even with large volumes of data, the use of Zero Copy Clone creates a very low cost to this high value capability.  

DataOps Patterns 

If the job of a software CI/CD pipeline is “build and deploy everything required to have the correct version of my application running”including all of the infrastructure as required then it’s reasonable that, as a default design pattern, DataOps.live does the same for data.  The pipeline builds all of the infrastructure (primarily using DataOps.live’s Snowflake Object Lifecycle Engine (SOLE) and then builds and deploys the Data Product(s) on top of this: 

DataOps.live Build and Deploy infrastructure


For many customers, this model works fine. However, for some organizations this is not suitable, for example if there is a corporate requirement around separation of responsibilities for infrastructure.
 

Separation for Highly Governed Infrastructure 

While DataOps allows a large amount of configuration and code to co-exist in a single project, it also supports breaking these out into separate projects. By far the most common separation is to break out a Data Product and its infrastructure.  

Infrastructure Project 

 The Infrastructure Project: 

  • Contains only the configuration and code to build the infrastructure 
  • Manages a subset of object types. These can vary based on detailed requirements, but typically would be Account Level objects e.g. 

Schema layout


  • Has members of the infrastructure team as Owners and Maintainers, i.e. the only ones who can approve changes going into protected branches such as QA and Master 
  • Has a much simpler pipeline e.g. 

    Pipeline - Validate - Snowflake Orch and prep


  • Typically runs at a lower frequencymaybe daily, or even weekly in a defined maintenance window 

However, it doesn’t mean that only members of the infrastructure team can make changesany permitted member from a Data Product team can create a branch in an infrastructure project, make the changes then need and submit as a requestit just needs a member of the infrastructure team to approve. This means the request is made as an actual proposed new configuration, not a vague text description, cutting down time and mistakes. 

Infrastructure Project  

This also means that infrastructure changes, even going through another team for approval can often be achieved in hours rather than days or weeks. 

Data Project 

In contrast, the Data Project: 

  • Contains only the configuration and code to ingest, transform, test, observe the data, without worrying about the underlying infrastructure 
  • Manages a subset of object types. These can vary based on detailed requirements, but typically would be database level objects e.g. 

Schema table


  • Has members of the Data Product team as Owners and Maintainers  
  • Has a more sophisticated pipeline(s)   

DOL Main panel


  • Typically runs at a higher frequencies, and often different subsets will run at different frequencies based on the availability of data for ingestion
  • This all allows the Data Product Engineers to develop rapidly and without need for external approval, within the Infrastructure that has been requested and provided by the Infrastructure Team. 

Infrastructure to Data Project Relationship 

There are many possible approaches to how many infrastructure projects you have, which will be dictated in the main by the chosen Snowflake architecture and the complexity of the shared versus Data Product specific infrastructure. The simplest model is a single infrastructure project that supports all the Data Product projects: 

Graphical user interface, application

Description automatically generated 

However, if there is a considerable about of infrastructure specific to each Data Product, these could be split into a core infrastructure project and one per data product, e.g. 

Conclusion

Careful attention and a dash of discipline can go a long way towards the success of Data Product teams. Taking its cues from the world of software development, DataOps.live provides an effective approach for developers in a data context to “build and deploy everything required” for their data products to work correctlysupported by Snowflake capabilities, extending to the infrastructure required, and including highly governed environments.  

You have the flexibility to separate a Data Product and its infrastructure on a project-by-project basis, while also benefitting from the specific change control, approvals, owners and maintainers that may entail in a way that never risks slowing development and deployment.

Indeed, DataOps is a great way to deliver existing and new data services and products rapidly, despite changing infrastructures. 

 

avatar

Guy Adams - CTO, DataOps.live

Snowflake Global #1 Data SuperHero! An experienced CTO and VP, I'm passionate about DataOps. I've spent 20+ years running software development organizations and now my focus is on bringing the principles and business value from DevOps and CI/CD to data. Cofounder of the truedataops.org movement. Also Dad, technologist, (over) engineer, amateur inventor, skier, and mildly eccentric.

RELATED ARTICLES