This blog was co-authored by Starburst Technical Content Marketing Manager Cindy Ng.
In the data analytics and compliance world, data sovereignty is a concept that has our attention. Policy makers suggest that the best way to protect citizen data while encouraging data-driven innovations is to ensure that data resides in local servers and is subject to the laws of the country in which it is processed. For example, if data is stored in the EU, regardless of where it originated from, it is bound by the protections of the General Data Protection Regulation (GDPR). Moreover, EU legislators anticipate1 that EU “citizens will trust and embrace data-driven innovations only if they are confident that any personal data sharing in the EU will be subject to full compliance with the EU’s strict data protection rules.”
3 Problems Associated with Data Stored in Different Geographical Locations
With more than 100 countries2 that have some sort of data sovereignty law in place, that means that organizations will have siloed customer data that cannot be easily integrated with other dashboards or analysis. Access to all relevant customer information is vital to any data analytics team that wants to drive meaningful data insights to inform current and future business decisions. Without complete, real-time data, your organization is operating at a significant disadvantage. Here are three reasons why:
#1 Hidden Fees From Data Egress Charges
Typically, cloud providers usually do not charge to ingress — transferring data into the cloud, however, they do charge for data egress — to move data back to your on-premise environment. Another data egress fee may include data transfer fees — assessed when moving data to a new geographical location or availability zones within the same cloud provider. Data egress fees are often seen as hidden fees because it’s a challenge for organizations to monitor and manage data egress fees especially in large enterprises where multiple offices and/or departments all perform data analytics.
#2 Storage Costs Associated with Data Duplication
There are often system errors, software bugs, and third party data that do impact data duplication and storage costs, but here are two common costs associated with data duplication:
Date Duplication As a Byproduct of Mergers & Acquisitions
When companies merge, data from multiple data sources go through a massive migration project. During the merger, even though the data structure of two companies may differ, they may share the same similar data sets such as customer information. The result and amount of duplication becomes awfully complicated and makes the new organization vulnerable to data breaches — more data, more data governance problems.
Lack of Data Quality
Organizations that don’t have data governance policies or have limited data quality processes often have inaccurate, duplicated data. Without clear guidelines on how to enter data correctly on systems such as a CRM, it leads to multiple entries for a single record which doesn’t help with accuracy. Eventually, a data analyst would have a difficult time synthesizing all the data, which may have implications downstream that could amplify inefficiencies and inaccuracies.
#3 Complex ETL Pipelines
Companies with an investment in the single source of truth know the pains of moving data into a data warehouse. Data engineers spend about 70% of their time moving, copying or ETL/ELTing data, and it simply limits real-time data analysis and time-to-insight.
Egress charges, data duplication and finally ETL pipelines bring us to why Starburst not only reduces the need to copy and move data, it helps your organization meet compliance, by design.
A Brief Word about Privacy and Security
Before we get into how Starburst Stargate can help your organization meet regulatory requirements, repeat after me, privacy is not security. Security is also not privacy. Security and privacy have a common goal to protect sensitive data. But they have very different approaches for achieving the same effect. Security focuses on protecting the data from breaches. Privacy governs how data is collected, shared and used.
It’s common for companies to believe that if they’re responsibly managing sensitive data according to specific data security requirements to believe that they’re also complying with data privacy requirements. That’s just not true. Even with the best security tools, insiders or third-party vendors with access to sensitive data can mismanage it if they’re unaware of privacy policies. Yes, you can have security without privacy, but you cannot have privacy without security. Security is the best way to ensure that companies are both meeting compliance and guaranteeing privacy.
This is where Starburst comes in. Starburst helps with regulations such as Sarbanes-Oxley Act (SOX), Health Insurance Portability and Accountability Act (HIPAA), General Data Protection Regulation (GDPR) and more — through security features surrounding requirements of authentication, access control and auditing. Additionally, the overall Starburst architecture requires absolutely no storage of data and focuses only on compute, which eliminates another layer of complexity in the compliance world.
How Starburst Stargate Enables Data Sovereignty
Starburst Stargate addresses the challenge of moving sensitive data across borders by deploying Starburst clusters where the data lives. The Starburst Stargate connector enables you to link data catalogs and data sources supported by remote Starburst clusters. With aggregated or filtered data computed within a remote Starburst cluster, it all enables you to run queries as if they are in the same location. All of these functions are performed without moving or copying data, while maintaining compliance.
Don’t just take our word for it. Here’s what Director of Data Services Alexander Seeholzer at SOPHiA GENETICS said about performing data & analytics functions without jeopardizing compliance, “Cross-regional querying is also very important. Due to data compliance laws, data can only remain in its country of origin. With Starburst, we have the ability to query business metrics across borders without compromising on security, governance, and regulations.”
Below are Starburst Stargate security features that enable governance, privacy, and protection through authentication, access control, and auditing:
User Impersonation authenticates users in the external service using credentials stored in the connector properties file. It enables, say, an administrator, to connect to the underlying data source on behalf of another individual, taking on the identity of a specific user of the data source. With this method, you won’t have to replicate the select user of the data source and maintain the existing data source security strategy.
Starburst enables Kerberos authentication and keeps your existing authentication system without having to implement a new authentication process and system.
Credential passthrough allows the Starburst user to connect to the underlying data source using the same credential, and inherit the security strategy already configured in the data source. Compliance requirements such as GDPR, SOX or HIPAA are set on the data source and are then automatically taken into account by Starburst.
With our role-based access control (RBAC) rules via Apache Ranger, it ensures that PII data (at a table, column, or row level) can be accessed from a specific environment, in accordance with its respective geographical laws and regulation. Even after entitlement reviews, access controls revoked, changes will be implemented instantaneously without impacting query performance.
As data volume rises, companies are under pressure to monitor data access within their organization. With Starburst, event logging functionality, a complete compliance-level historical audit trail is available in real-time. The log tracks all queries accessed across all data sources submitted, and creates an audit report on data accessed as well as users involved in queries that were executed.
Meet Regulatory Requirements with Starburst Stargate
Starburst Stargate reduces data movement to the minimum and sets regulatory rules on local and siloes data source, while enabling the ability to federate data from different and multiple environments, cloud or on-premise. In reducing the need to move data, and enabling access to data where it lives, Starburst offers customers faster time to insight while meeting the most stringent regulatory standards.
Read Brian Luisi’s post as he delves deeper into the architecture aspect of Starburst Stargate.