Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Josh Howard

Senior Engineering Manager

Starburst

Faster Query Processing: CPU Time

Last Updated: May 17, 2023

Analytics Engineer Data Architect Data Engineer Data Scientist Head of Analytics Platform Admin SQL

A key engineering responsibility at Starburst is on performance enhancements. One is to reduce the amount of time that a CPU has to work on a given query, also known as CPU time. As far as concurrent queries are concerned, CPU time is a stable metric that reflects real performance.

When CPU time for an individual query drops, so can Trino’s ability to utilize the CPUs fully. This can be due to scheduling. Note that CPU time reduction does not always translate to a reduction in latency or wall time. After a year of major performance enhancements, we decided to focus our efforts on increasing CPU utilization and reducing query wall time.

Increasing CPU utilization

The most significant change is that Trino’s query.execution-policy now defaults to phased rather than all-at-once. The all-at-once approach scheduled all query stages in a single shot with the goal of simplicity and reduced latency. The phased execution policy was later added as a configuration option that would schedule only the stages of a query that can make progress.

Recently, Starburst Co-founder and Trino Committer Karol Sobczak observed that the phased execution policy can schedule stages that can create subsequent stages which can’t make progress. That issue defeats the purpose of the phased execution policy. Fixing this logic resulted in reduced latency, and the ability to set the phased execution policy as default.

Additional performance enhancements

Other significant changes include adaptively setting task.concurrency to the number of physical cores on a node and increasing the default value of hive.split-loader-concurrency. Our observation is that hyper-threaded cores do not translate to improved query performance. Instead, increasing the split loader concurrency assists the engine process partitions and small files more quickly.

Benchmark metrics

With a 20% reduction in wall time for TPC-H partitioned data, customers can expect an average reduction of 13% in wall time based on our internal benchmarking. We have seen improvements as high as 50% for TPC-H query 12 on partitioned data.

The benchmark results were obtained by running the TPC-H and TPC-DS benchmarks with one coordinator and six worker nodes. The data was queried by the Hive connector with partitioned and unpartitioned data at a 1TB scale.

TPC-H benchmarks wall time

While TPC-H and TPC-DS are both decision-supporting benchmarks, TPC-H is more an agent of ad hoc queries, which tend to be simpler.

TPC-DS Benchmark Wall Time

Available out of the box

The best part about our enhancements is that they do not require action on your end. They’re available out of the box in the upcoming LTS release. However, remember that the software makes use of existing configurations, so you might need to unset query.execution-policy and task.concurrency.

Sit, relax and savor the faster query processing.

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

Faster Query Processing: CPU Time

Last Updated: May 17, 2023

Related posts

Increasing CPU utilization

Additional performance enhancements

Benchmark metrics

Available out of the box

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

Faster Query Processing: CPU Time

Last Updated: May 17, 2023

Related posts

Starburst Enterprise LTS Backport Releases

Introducing New Data Observability Features in Starburst Galaxy – Now in Public Preview

Automating the “Icehouse” – Fully-managed Open Lakehouse Platform on Starburst Galaxy

What’s New in Starburst Galaxy – April 2024

Increasing CPU utilization

Additional performance enhancements

Benchmark metrics

Available out of the box

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy