Explore the next frontier of data

Read the latest news and opinions from our experts


Featured Post

Recent Posts

Data Pandemic Stories: How Data Drives Digital Transformation in a Crisis, by Promethium

After debuting our blog series on data pandemic stories with a story from Tableau  and a perspective from Privacera, we are excited to bring to you the third installment authored by Kaycee Lai, CEO of Promethium, a Starburst partner. The year 2020 will be etched in history as rather an unusual one: One where the outbreak of a global pandemic wreaked havoc but also taught us important lessons. Behavioral patterns changed forever, Data science models were put to test, and the need for fast data access and "analytics anywhere" emerged as a key growth driver. Companies with strong digital foundations thrived while others were forced to adapt. According to a report by McKinsey, digital transformation “vaulted five years forward.” If you have a compelling data pandemic story to tell and would like to get featured on Starburst Blog, please write to us at content@starburst.io. 

—Your friends at Starburst



One of the things we’ve seen in the pandemic is that organizations who are able to quickly adapt by making data driven decisions have thrived. At Promethium, we’ve witnessed that the change ushered in by COVID boils down to one word: Speed

Previous to COVID, it typically took months to answer a question with data. When the pandemic hit, the acceptable time frame – including discovery, ETL, prep, complex queries, etc. – literally went from months to minutes. People started saying, "We're not sure what our inventory levels will be in four months, we need to make decisions now." Companies need to be able to pivot on a dime and react to changing market conditions. 

So how do companies go from a data analytics time frame of months to one in which answers are delivered in minutes?  There are three main factors at play here:

1. Breaking down data silos. We've had this challenge forever because we've created data silos. "My data warehouse is here...my database is here...my cloud is here....by the way, marketing doesn't like this cloud, so they went with that cloud." But technologies have come along that allow you to cross these barriers and get to the data very quickly, such as the work done at Trino (formerly “PrestoSQL”) in data federation and virtualization.

2. Collaboration. Data analytics is a team sport. You need collaboration between the people who have the technical skill sets to get the data, and the people who provide the business context. Too often the data analytics narrative goes something like this: "Tell me what you want, and then leave me alone for two months, and then hopefully I’ll have something for you…[two months pass]...Oh, it's wrong? Whoops! Let me do it again." We don't do this in our everyday lives. We get things done very quickly because we can collaborate and communicate in real time. People are now starting to expect this level of service in data analytics.

3. Skill sets. For a long time the skills to analyze data were centralized in one team. Advancements in technology like natural language processing, for example, have created a big shift toward self-service analytics. For the next generation, if you've used Google, you’ll be able to get what you're looking for very quickly. 

Major technology shifts in these areas have made it possible to go from taking months to only minutes to get data-backed answers to business questions

Addressing the knowledge gap with crowdsourcing and transparency

Clearly we're getting over the coding gap, but we’ll always require the ability to understand how to use data. So an interesting question begins to pop into everyone’s mind: What are the types of skills that you need in order to understand whether you’ve found the right ‘needle’ in the haystack?   

We do this by going back to human nature and looking at the most intuitive processes. Traditionally we relied on tribal knowledge, but that creates problems in the data analytics lifecycle because often we have trouble tracking down – or perhaps don’t have a solid working relationship with – the right subject matter expert. 

The answer is to rely on something that has worked in many other industries: Crowdsourcing. For instance, we use Yelp to source the experiences of thousands of other people through something as simple as a five-star rating system. This kind of guidance helps tremendously in guiding people to the right data in a timely manner, and complements traditional approaches such as tags, etc. 

Another factor that influences collaboration is transparency. During the pandemic food delivery apps like DoorDash have soared in popularity, and one key reason for this is that they tell you what step of the order process you're in. This gives people more confidence in the process, and can also speed it along because it introduces structure and accountability. If the person who requested a data-derived answer has full visibility into the data steward’s process for getting that answer, and can communicate in real-time throughout the process, we find that it proceeds much more quickly, and the answer tends to be much more on-target.  

So, what’s the next chapter of the data pandemic?

In my opinion, the next chapter of the ‘data pandemic’ will be the return to service of industries and businesses that have been largely shutdown. Consider travel and leisure as an example. Industry giants like Disney announced that its Disneyland California parks are reopening on April 30 and Royal Carribean announced it is launching seven night cruises from the Bahamas. For companies like these, re-opening isn’t just a matter of turning the lights back on, they need to reconnect with massive supply chains, reboard/onboard tens of thousands of employees, and more. Data needs to be available now – not months from now – to make accurate forecasting, supply chain, operations and human resource decisions for effective return to service.

Kaycee Lai

As the founder and CEO of Promethium, Kaycee has a proven track record of bringing new, disruptive technologies such as data de-duplication, data virtualization, and hyper-converged infrastructure to market. With nearly 20 years of experience in the technology industry, Kaycee has lead global operations & product management for both startups and Fortune 500 companies. Most recently, Kaycee was the President and Chief Operating Officer for Waterline Data where he helped transform the company into a leader in the data governance space by growing revenues by 600% within 12 months. At Waterline Data, Kaycee worked with numerous global organizations in their efforts to automate data governance and prepare for GDPR. Prior to Waterline Data, Kaycee helped introduce hyper-converged infrastructure and software-defined storage to the market while at Virsto Software. During that time, Virsto’s revenues increased by 500% under his leadership, resulting in the successful acquisition by VMware. Prior to Virsto, Kaycee held senior leadership positions at Avamar (acquired by EMC for $165M), Delphix, and EMC. At EMC, he served as General Manager for a $120M business unit that was growing at 50% per year. ​ A self-proclaimed “data geek”, Kaycee began his career as a business analyst working with data, databases, and business intelligence solutions at companies such as EMC, Microsoft and The Federal Reserve. Kaycee graduated from Pomona College with a Bachelor’s in Psychology.

Your Comments :

Datanova 2022

From Facebook

Read more of what you like.