And 5 Reasons Snowflake’s Data Sharehouse Does
For the past decade the technology world has been obsessed with APIs. While the nature of the APIs themselves has changed over the years (e.g. from SOAP to REST), the premise has always been the same—APIs are the way to allow different systems to share information. This doctrine is so well accepted, no one questions it. Certainly, for transactional processes, such as creating a user account, APIs work fine.
But what about a Service Provider wanting to share usage data with an Enterprise customer? Let’s say they want to share performance and operational time series data for a global network, which could easily run to many Terabytes.
It’s straight forward to provide an API to query usage data for a certain historical period. However standard BI tools don’t talk to APIs directly. In this case, the Enterprise would use the APIs to query the usage data from the Service Provider and then store it locally, typically in a SQL database. In this case APIs are used as an interface for creating a copy of the data. But there are problems with this model of data sharing.
Why APIs won’t work for data sharing
1. The data is being stored twice (probably more than twice in most cases)—incurring additional cost.
2. Since APIs usually cannot handle massive quantities of data in single queries (and should have abuse protection built in to prevent people from trying), in order to pull a complete history will require scripting to import data in small chunks at a time creating a big processing overhead on extraction.
3. When new data is added, the Enterprise needs to poll for new data (which introduces delay) or use a streaming API like a WebSockets which requires a lot of sophisticated development to manage reconnections, collecting missing data, etc.
4. If the Service Provider needs to make a change or fix an error in historical data, it’s very hard for the Enterprise to discover this and make the associated updates.
5. There are enormous issues associated with trying to keep copies of a data set in sync and correct. This is especially problematic when monitoring SLAs or other contentious topics—if the Service Provider and the Enterprise have differences in their data, both will be adamant that they are right—there is no single source of truth.
Snowflake Data Sharing eliminates ALL these issues. The Service Provider can provide a Secure View onto the data for the Enterprise who can then use this data in a form that is ready for consumption by BI tools. There are some clear benefits and advantages.
Why Snowflake’s Data Sharehouse works
1. Data is stored once and only once
2. There is no effort or processing required to enable access to a long history of data
3. There is no effort or processing required to get the most recent data as it is written
4. Any bulk updates to historical data are seen by the Enterprise instantly
5. The data as seen by both the Service Provider and the Enterprise is guaranteed to be identical—since there is only one physical copy of the data in existence, there is now a single source of truth
APIs can now be relieved of the burden of analytics and data extraction and be used for what they are good at—transactional processing. And data sharing can now live up to the promises being made with the use of Snowflake’s unique data sharehouse capability.