Roche
Case Study
Doing now what patients need next
The DataOps.live platform is helping data product teams in this global pharmaceutical giant to orchestrate and benefit from next-generation analytics on self-service data and analytics infrastructure consisting of Snowflake and other tools using data mesh approach.
- How to improve data management and analytics to empower teams and drive the company's purpose?
- DataOps.live platform enables a key capability for the self-service data and analytics infrastructure as part of the data mesh implementation, integrating Snowflake, AWS and other tools in a powerful new true DataOps approach.
- 120 releases a month vs 1 per Quarter
- 6-8 weeks average MVP time
- $40+ million in cost savings
AWS Solution focus:
For Roche PDIL, DataOps.Live was leveraged to construct orchestration pipelines for various workload use cases in Snowflake. These pipelines enable Roche to automate environment management within Snowflake, orchestrate third-party tools like Talend for data integration, and utilize data cataloguing tools such as Collibra. DataOps.live incorporates data quality checks and extracts metadata at each step to ensure observability and monitoring. All of this is achieved while maintaining an agile environment that allows developers to release frequently, often on a daily basis.
To integrate multiple data sources, it was decided to utilize AWS S3 as a central data lake. S3 offers the advantage of effectively organizing data in hierarchical formats, which then allows DataOps.live to establish lifecycle policies for data retention management. Since the customer already had DataOps.Live runners deployed in their AWS account, DataOps.live also configured appropriate AWS IAM policies and roles to grant access to the S3 buckets used for staging purposes.
The pipeline that have been constructed orchestrates a Talend job that retrieves newly integrated data from S3 and ingests it into the Integration Schema within Snowflake. DataOps.live then implement data quality tests on both the source and transformed data as it moves between different schemas (e.g., Integration schema, Processed schema, and Reporting schema). Ultimately, the metadata generated throughout the various stages of this pipeline is extracted and compiled into a formatted document, which is subsequently published to Collibra for cataloguing purposes.