Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Resource Hub
On-demand resources: eBooks, white papers, videos, webinars.

Customer Stories
Academy

Enroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.


Learning Resources
A collection of resources to support your learning journey.
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
Blogs

Stay updated with the latest insights and news from our DataOps team and community.


#TrueDataOps.org
#TrueDataOps is defined by seven key characteristics or pillars:
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
Doug 'The Data Guy' NeedhamJan 24, 2023 3:12:28 PM6 min read

What does Data Operations mean to me?

When you think about Data Operations, or “DataOps”—what comes to mind?  

DataOps can be a confusing topic to discuss. People have various ideas about what makes up the responsibilities of DataOps and how that relates to Data management. I have worked in Data Operations and been a Data Operations manager several times in my career. In my experience, more people seem to understand most of the moving pieces involved in running an application for an enterprise.  

Talk of load balancers, dynamic web servers, Kubernetes clusters running microservices, virtual machines or containerized applications coming up and down throughout the day, adequate disk storage, centralized collection of logs, blue-green deployments, application databases configured for high availability, sometimes even more than a single type of database server supporting the applications. These are all relatively common topics to be discussed about the operation of an application supporting a business more commonly known as DevOps. 

Data is different. In the application world, if a micro-service misbehaves, you can destroy it and replace it with another. In the data world, destroying data could be a crime, depending on the nature of the data being kept.  

In the application world, APIs process one record at a time and give their results in milliseconds. In the data world, we have processes that can manipulate millions of records at a time and could run in minutes to hours (I have even seen data processing jobs run in days).  

We refer to the data ecosystem because many moving parts protect, enrich, enable, and produce data products for an enterprise. 

My mission statement for a data operations group is as follows:  

“To get the right data to the right people, at the right time, and in the right format.” 

  • What is the right data?
  • Who are the right people?
  • When is the right time?
  • Where is the definition of the format? 

This simple mission statement, backed up by a proper philosophical approach, can accomplish many things. This statement implies that data must move. 

Move from where to where?  

Data must move for it to be of value. Typically, there is a process that brings application data into an enrichment platform (data lake, data warehouse, data mart, data lakehouse) to be used by non-application users. The tools needed to move data are very diverse. ETL, ELT, data loading platforms, change-data-capture processes, transaction-log monitoring, and even running queries on the application database (This last should be frowned upon because it will impact application performance).  

The reason the data has to move is that the type of data model whose desired state is performance of the application is seldom the database design that is useful for data scientists, business analysts, or business intelligence consumers. The data modeling techniques used for application performance are different from the type of data model needed for reporting and analysis. Not only must the data flow, the data must be transformed. 

 
Do I have the right data?  

Knowing the data collected by the application systems can take time and effort. I have gone through the process of explaining what a data dictionary is to several application developers. I don’t think I would have had the strength to describe a data catalog to that audience.  

Data dictionaries, metadata, and the data-model configurations to know when a record was created or updated, not to mention when it was soft-deleted rather than hard-deleted are important. Data model design and documentation that illustrates how the data in one table is related to data in another table. These are all fundamental things that are part of the data ecosystem, even if they are not part of the application environment.  

Having a well-defined data modeling process as part of the deployment of the enrichment platform is crucial to success. Finding the data necessary to answer a question should be as simple as looking it up. It should not be a multi-week effort to pin down various application developers and ask them what they were thinking when they created a new table. 

The world runs on schedules. Having the data movement and transformation processes run frequently and ensuring that the data is fresh for purpose is why some data operations groups have multiple shifts. At one point in time, one of my promotions to data operations manager required me to go into the office and monitor all the data feeds starting at 0400. 

It was an on-premise physical server solution a few years before the data cloud or even cloud computing was available. If there is any failure, whether soft or hard, the data must flow, so fixing it on the spot (in production) was mandatory. The need for the data to flow has only grown with the arrival of more diverse cloud-based applications. 

Ensuring that the data is fit for purpose requires that the data be transformed. The physical data model for an application database will differ from the one for an enrichment platform. Data structures optimized for application performance are only sometimes useful for analytical, visualization, or reporting purposes. 

Machine Learning algorithms, Data Visualization tools, Business Intelligence tools, dashboards, and even excel spreadsheets need to have data in a particular structure. This is all the transformation work that the DataOps team must provide. 

Whether there is a formal or an informal group within the enterprise named DataOps, I assure you that all this work is necessary. Sometimes the budget for all these tools, much like the data ecosystem described above, is an assortment of organizational responsibilities with one set of people taking on the responsibility of ensuring that data flows. Hidden in the whitespace of budget line items reside a group of people working tirelessly to ensure that the enterprise’s most valuable asset—DATA—can do its job.  

Their job could be made more efficient. They could start producing valuable data products for your organization. Unless they are too busy working with Terraform or writing integration code to orchestrate all the tools in the ecosystem to meet the mission objectives. They should be able to write individual test cases for the intermediate data structures that will ultimately produce the data products for the organization. 

A Data Platform is needed to manage the operations of these moving parts. It needs to: 

  • Coordinate the various tools within the ecosystem.  
  • Manage data structures and objects within the database.  
  • Manage versioning and branching of the production database.  
  • Run all tests that have ever been written to validate the data.  
  • Provide visibility into the overall status of operations.  
  • Ensure that the data flows to the right place.  
  • Does not interfere with production in any negative way.  
  • Captures metadata and generates documentation dynamically from the metadata.  
  • Allows for the straightforward integration of new tools.  

These are just some of the things that DataOps.live does out of the box that I have had to build from scratch during my experience as a Data Operations Manager.  

As you think about how to enable your data operations team to turn things up to 11, be sure to check out our free trial through Snowflake Partner Connect.  

 PartnerConnect


The Data Operations team that uses this platform will be able to show an organization how to use data to grow the business. 
 

avatar

Doug 'The Data Guy' Needham

“The Data Guy” Needham started his career as a Marine Database Administrator supporting operational systems that spanned the globe in support of the Marine Corps missions. Since then, Doug has worked as a consultant, data engineer, and data architect for enterprises of all sizes. He is currently working as a data scientist tinkering with graphs and enrichment platforms – showing others how to get more meaning from data.

RELATED ARTICLES