Optimal Predicate Pushdown Synthesis (PLDI 2026 - PLDI Research Papers)

Mon 15 - Fri 19 June 2026 Boulder, Colorado, United States

Who

Robert Zhang, Eric Hayden Campbell, Dixin Tang, Işıl Dillig

Track

PLDI 2026 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Jun 2026 17:10 - 17:30 at Flatirons 2 - Optimization of Data Pipelines Chair(s): Pavel Panchekha

Abstract

Predicate pushdown is a long-standing performance optimization that filters data as early as possible in a computational workflow. In modern data pipelines, this transformation is especially important because much of the computation occurs inside \emph{user-defined functions (UDFs)} written in general-purpose languages such as Python and Scala. These UDFs capture rich domain logic and complex aggregations and are among the most expensive operations in a pipeline. Moving filters ahead of such UDFs can yield substantial performance gains, but doing so requires \emph{semantic} reasoning. This paper introduces a general semantic foundation for predicate pushdown over stateful fold-based computations.

We view pushdown as a correspondence between two programs that process different subsets of input data, with correctness witnessed by a \emph{bisimulation invariant} relating their internal states. Building on this foundation, we develop a sound and relatively complete framework for verification, alongside a synthesis algorithm that automatically constructs \emph{optimal pushdown decompositions} by finding the strongest admissible pre-filters and weakest residual post-filters. We implement this approach in a tool called Pusharoo and evaluate it on 150 real-world pandas and Spark data-processing pipelines. Our evaluation shows that Pusharoo is significantly more expressive than prior work, producing optimal pushdown transformations with a median synthesis time of 1.6 seconds per benchmark. Furthermore, our experiments demonstrate that the discovered pushdown optimizations speed up end-to-end execution by an average of 2.4$\times$ and up to two orders of magnitude.

DOI

https://doi.org/10.1145/3808312

Robert Zhang

University of Texas at Austin

United States

Eric Hayden Campbell

University of Texas at Austin

United States

Dixin Tang

University of Texas at Austin

United States

Işıl Dillig

University of Texas at Austin

United States

Time Zone

The program is currently displayed in (GMT-06:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-06:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 Jun
Displayed time zone: Mountain Time (US & Canada) change

16:10 - 17:30	Optimization of Data PipelinesPLDI Research Papers at Flatirons 2 Chair(s): Pavel Panchekha University of Utah

16:10 20m Talk		[SIGPLAN OOPSLA’25] Homomorphism Calculus for User-Defined Aggregations PLDI Research Papers Ziteng Wang University of Texas at Austin, Ruijie Fang University of Texas at Austin, Linus Zheng University of Texas at Austin, Dixin Tang University of Texas at Austin, Işıl Dillig University of Texas at Austin
16:30 20m Talk		Bonsai: Compiling Queries to Pruned Tree TraversalsDistinguished Paper PLDI Research Papers Alexander J Root Stanford University, Christophe Gyurgyik Stanford University, Purvi Goel Stanford University, Kayvon Fatahalian Stanford University, Jonathan Ragan-Kelley Massachusetts Institute of Technology, Andrew Adams Adobe Research, Fredrik Kjolstad Stanford University DOI Pre-print
16:50 20m Talk		A Compiler for Fused Relational Operations on Multisets PLDI Research Papers James Dong Stanford University, Fredrik Kjolstad Stanford University DOI
17:10 20m Talk		Optimal Predicate Pushdown Synthesis PLDI Research Papers Robert Zhang University of Texas at Austin, Eric Hayden Campbell University of Texas at Austin, Dixin Tang University of Texas at Austin, Işıl Dillig University of Texas at Austin DOI