Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Resource Hub
On-demand resources: eBooks, white papers, videos, webinars.

Customer Stories
Academy

Enroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.


Learning Resources
A collection of resources to support your learning journey.
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
Blogs

Stay updated with the latest insights and news from our DataOps team and community.


#TrueDataOps.org
#TrueDataOps is defined by seven key characteristics or pillars:
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
DataOps.liveJun 8, 2021 2:45:00 AM3 min read

DataOps launches support for Snowflake Snowpark

Introduction

Recently DataOps.live announced our support for Snowflake Snowpark.

Snowflake is known for its performance, scalability, and concurrency. Before Snowpark, users interacted with Snowflake predominately through SQL. Now, customers will be able to execute more workflows entirely within Snowflake’s Data Cloud, without the need to manage additional processing systems.

Snowpark enables users comfortable with other languages, such as Scala and Java, to write code that is natural for them using a widely used and familiar DataFrame model. Dataops.live enables that Scala or Java code to be stored, managed and lifecycled, and projects the latest code (depending on the environment it is run against e.g. dev, test, or production) into Snowflake (via Snowpark) every time a pipeline is run.

In conjunction with Java UDFs and other recent development has made Snowflake the first fully programmable data cloud in history.

DataOps.live enables Snowpark source code to be fully managed and life-cycled through development, test and production environments alongside all other objects in Snowflake.

 


How does Snowpark work

In very simple terms Snowpark libraries allow you to use standard DataFrame paradigms in code (e.g. Scala). When these are executed the Snowpark libraries transparently compile these to a combination of SQL and Java UDFs, execute them, then take the results back and convert them back to DataFrames e.g.

how does Snowpark work

This approach is game changing for the Snowflake Developer experience.

Snowpark Development

In addition to managing all the current objects in Snowflake (see our blog series on Imperative vs Declarative on how to manage these in Snowflake) using Snowpark presents some additional challenges for developers and users. As well as all the normal source of truth for ‘everything data’ that we store in the DataOps git repository, we now have full software management requirements e.g. Scala code:

snowpark-development


We need to branch, version, compile, test and deploy the software and produced artifacts just like any other software project.

As Snowflake continues its journey becoming the programmable data cloud, the ability to lifecycle and manage software code as well as data objects is critical. The programmable data cloud means that the future of DataOps is really Data + Software (code).

As it happens, this fits perfectly with the #TrueDataops philosophy which advocates “starting with pure DevOps and Agile principles (which have been battle hardened over 20+ years) and determining where they don't meet the demands of Data and adapting accordingly”.

The key requirements for developing with Snowpark are:

  • Full development cycle and Environment Management e.g. Feature Branch->Dev->QA->PROD


  • Full code lifecycle, diffs, Merge Requests, roll back etc:


  • Fully git compatible so continue to use IDE and Development tools of your choice

 


Snowpark Execution

Running a Snowpark really means running of the application in which the Snowpark libraries are being used. For example, if Scala is being used then an environment is needed with all the runtime tools for each specific language and libraries (plus the Snowpark libraries themselves) e.g.


Fortunately, DataOps.live provides an execution environment (we call them runners) which can be run anywhere a customer needs, including in a Private Cloud or on prem e.g.

runners


The use of this Snowpark runner abstracts all of the complexity and dependency management and makes running a Snowpark application as part of a DataOps pipeline as simple as:


In many cases a Snowpark application will be used to do advanced data manipulation, in particular manipulation beyond what can be achieved within SQL, but with results still stored back into Snowflake. In these cases the automated Data testing within the DataOps Modelling and Transformation Engine can be used to validate the results of the Snowpark application:

snowpark-results


In practice in a realistic Snowpark scenario, many Snowpark applications may need executing, at different points in a pipeline, interspersed with ingestion, transformation, testing, data sharing etc with all the correct dependencies modelled and tested. This orchestration of Snowpark applications as part of a complete end to end pipeline.

Conclusion

Snowpark is a transformational new Snowflake feature but ones that creates new requirements in terms of management and deployment, adding many of the requirements of the software development world to the existing Data requirements of DataOps. Despite being heavily extended for Data, DataOps.live has it’s roots in software world, and has lost none of these ability to provide complete lifecycle management and deployment of software source code like that used in Snowpark. In addition, DataOps.live provides turn key, fully tested execution environments to make running a Snowpark application.

 

Ready to get started?

Access the latest resources on DataOps lifecycle management and support for Snowpark and Java UDFs from Snowflake.

 

 

RELATED ARTICLES