Hardening the Foundation: Testing Data and Compute-Intensive AI-Enabling Stacks
The modern AI revolution relies on a complex, integrated stack that spans from distributed data processing frameworks like Apache Spark to specialized hardware accelerators like FPGAs. However, the rapid evolution of this data-to-silicon pipeline has outpaced our ability to ensure its correctness using traditional software testing. To maintain developer productivity and maximize the potential of heterogeneous hardware, we must rethink how we discover edge cases across these diverse system layers. In this talk, I will reflect on my group’s experience designing domain-aware testing engines for data-intensive and compute-intensive systems. I will argue that traditional fuzzing is insufficient for the evolving requirements of extensible Multi-Level Intermediate Representations (MLIR) and reconfigurable hardware targets. Specifically, I will highlight the specialization bottleneck—the high manual effort required to encode domain-specific constraints and custom mutation operators.
To lower this barrier to entry, I will discuss automated techniques for fuzzer specialization, including custom mutation synthesis from examples and rule-based repair. I will conclude by advocating for a shift toward Property-Based Testing (PBT) to bridge the gap between the scale of random fuzzing and the rigor of formal methods, enabling the validation of critical invariants across the entire AI-enabling technology stack.
Miryung Kim is a Professor and Vice Chair of Graduate Studies in UCLA’s Computer Science Department. Recognizing an industry-wide shift toward data-intensive software engineering, she led early research on the role of data scientists in software teams. Her current research focuses on developer tools for data and compute-intensive systems, addressing scale and complexity challenges that traditional debugging and testing cannot meet. Her research established the significance of code clones in software evolution, demonstrating how recurring patterns can automate bug fixes and refactoring—insights that inform today’s AI-driven developer tools. For these contributions to data-driven software analytics, she received the IEEE TCSE New Directions Award. She was honored with the ACM SIGSOFT Influential Educator Award; eight of her former students and postdocs now hold faculty positions at institutions such as Columbia, Purdue, and Virginia Tech. She served as Program Co-Chair of FSE, delivered keynotes at ASE and ISSTA, and is currently an Amazon Scholar at AWS.
