This program is tentative and subject to change.

Fri 19 Jun 2026 16:50 - 17:10 at Flatirons 2 - Optimization of Data Pipelines

Predicate pushdown is a long-standing performance optimization that filters data as early as possible in a computational workflow. In modern data pipelines, this transformation is especially important because much of the computation occurs inside \emph{user-defined functions (UDFs)} written in general-purpose languages such as Python and Scala. These UDFs capture rich domain logic and complex aggregations and are among the most expensive operations in a pipeline. Moving filters ahead of such UDFs can yield substantial performance gains, but doing so requires \emph{semantic} reasoning. This paper introduces a general semantic foundation for predicate pushdown over stateful fold-based computations.

We view pushdown as a correspondence between two programs that process different subsets of input data, with correctness witnessed by a \emph{bisimulation invariant} relating their internal states. Building on this foundation, we develop a sound and relatively complete framework for verification, alongside a synthesis algorithm that automatically constructs \emph{optimal pushdown decompositions} by finding the strongest admissible pre-filters and weakest residual post-filters. We implement this approach in a tool called Pusharoo and evaluate it on 150 real-world pandas and Spark data-processing pipelines. Our evaluation shows that Pusharoo is significantly more expressive than prior work, producing optimal pushdown transformations with a median synthesis time of 1.6 seconds per benchmark. Furthermore, our experiments demonstrate that the discovered pushdown optimizations speed up end-to-end execution by an average of 2.4$\times$ and up to two orders of magnitude.

This program is tentative and subject to change.

Fri 19 Jun

Displayed time zone: Mountain Time (US & Canada) change

15:50 - 17:10
Optimization of Data PipelinesPLDI Research Papers at Flatirons 2
15:50
20m
Talk
[SIGPLAN] Homomorphism Calculus for User-Defined Aggregations
PLDI Research Papers
Ziteng Wang University of Texas at Austin, Ruijie Fang University of Texas at Austin, Linus Zheng University of Texas at Austin, Dixin Tang University of Texas at Austin, Işıl Dillig University of Texas at Austin
16:10
20m
Talk
Bonsai: Compiling Queries to Pruned Tree Traversals
PLDI Research Papers
Alexander J Root Stanford University, Christophe Gyurgyik Stanford University, Purvi Goel Stanford University, Kayvon Fatahalian Stanford University, Jonathan Ragan-Kelley Massachusetts Institute of Technology, Andrew Adams Adobe Research, Fredrik Kjolstad Stanford University
DOI Pre-print
16:30
20m
Talk
A Compiler for Fused Relational Operations on Multisets
PLDI Research Papers
James Dong Stanford University, Fredrik Kjolstad Stanford University
DOI
16:50
20m
Talk
Optimal Predicate Pushdown Synthesis
PLDI Research Papers
Robert Zhang University of Texas at Austin, Eric Hayden Campbell University of Texas at Austin, Dixin Tang University of Texas at Austin, Işıl Dillig University of Texas at Austin
DOI