Community blog | DataOps.live

Building Data Applications using Snowflake Cortex ML and LLMs

Written by Thomas Steinborn, SVP Products | May 13, 2024 4:21:18 PM

Overview of Snowflake AI/ML 

Snowflake provides a rich set of Artificial Intelligence (AI) and Machine Learning (ML) capabilities covering various use cases.

Snowflake AI, Snowflake ML, and Snowflake LLM summary

  • Cortex Search Service
  • Document AI
  • Snowflake Copilot
  • Snowpark ML
  • Snowpark ML Packages
  • Snowpark Model Registry
  • Snowpark Feature Store
  • Snowpark Container Services (with Nvidia GPUs) 

Snowpark provides great flexibility in your choice of data science tasks. You can choose which Python packages to use, which models to run, and, of course, use the ever-popular Pandas. 

Snowpark Container Services (SPCS) gives more freedom and is highly effective for AI tasks when used with GPU compute pools. One common scenario is to train a model on SPCS and then use Snowpark ML to run predictions. This offers a good balance between cost and performance. 

Snowflake Cortex functions provide serverless ML and LLM functions on top of your data in Snowflake. Let’s focus on Cortex and how we help you to build data applications rapidly. 

 

What can you do with Cortex LLM functions and DataOps? 

Snowflake Cortex LLM functions cover common use cases for text analytics and chatbots. 

Let's start with text analytics against transcription of all your meeting recordings. Snowflake simplifies access to accurate summaries without any prompt engineering. A simple call to SNOWFLAKE.CORTEX.SUMMARIZE against your table of transcription is sufficient. 

See how you can use Snowpark + Streamlit to build data applications on Snowflake. Watch on-demand now. 

Within DataOps.live, you can develop a full Streamlit application calling the native function from Python: 

 

The final result can be a rich user experience built with DataOps.live and deployed as Streamlit in Snowflake. Based on the summary, you can then choose a meeting recording and analyze it: 

 

Once you find a recording that interests you, start interacting with it. 

Let’s create a chatbot to query the full transcription in natural language. The SNOWFLAKE.CORTEX.COMPLETE function is the right choice to pass your input as prompts to a Large Language Model (LLM). 

 

Snowflake offers the choice of different LLM models to you to tailor it for your use case 

  • mistral-large 
  • mixtral-8x7b 
  • mistral-7b 
  • llama2-70b-chat 
  • gemma-7b 

For our example, mistral-large gave the best results. 

In addition, you can use the just-announced Snowflake Arctic model. 

Build and deploy it with DataOps.live and provide a fully immersive experience to your users: 

 

 

What can you do with Snowflake Cortex ML functions and DataOps? 

Cortex further provides ML-based functions working on top of your Snowflake data. You can benefit from further use cases, e.g., time-series forecasting or anomaly detection. Time-series forecast employs a machine learning algorithm to predict future data using historical time series data. Anomaly detection is the process of identifying outliers in data. 

When you want to use time-series forecasts, you can use the Snowflake Object Lifecycle Engine (SOLE) to create your data tables, run your data pipeline to ingest the necessary data and then launch into our development environment DataOps.live Develop. 

You can explore the underlying Snowflake data directly in our browser-based IDE. We will use a Jupyter Notebook to connect to Snowflake. Then, we will run a Pandas query with Snowpark on your data table. 

Once you reviewed the data, plotted it, and found the interesting data pattern, you can prototype the forecast and visualize the upper and lower bound as well as the expected forecast for the next few months. 

Next, you can create your Snowflake view for your training data. Once done, you can create your SNOWFLAKE.ML.FORECAST function my_forecast_model. Later, you can use the new function in standard SQL with CALL my_forecast_model!forecast. 

Learn more about the DataOps.live platform and its capabilities here.