publications | Jeffrey J. Ma

2025

Understanding Silent Data Corruption in LLM Training

Jeffrey Ma, Hengzhi Pei , Leonard Lausen , and George Karypis

2025

Abs arXiv

As the scale of training large language models (LLMs) increases, one emergent failure is silent data corruption (SDC), where hardware produces incorrect computations without explicit failure signals. In this work, we are the first to investigate the impact of real-world SDCs on LLM training by comparing model training between healthy production nodes and unhealthy nodes exhibiting SDCs. With the help from a cloud computing platform, we access the unhealthy nodes that were swept out from production by automated fleet management. Using deterministic execution via XLA compiler and our proposed synchronization mechanisms, we isolate and analyze the impact of SDC errors on these nodes at three levels: at each submodule computation, at a single optimizer step, and at a training period. Our results reveal that the impact of SDCs on computation varies on different unhealthy nodes. Although in most cases the perturbations from SDCs on submodule computation and gradients are relatively small, SDCs can lead models to converge to different optima with different weights and even cause spikes in the training loss. Our analysis sheds light on further understanding and mitigating the impact of SDCs.
A2Perf: Real-World Autonomous Agents Benchmark

Ikechukwu Uchendu , Jason Jabbour , Korneel Van Berghe , Joel Runevic , Matthew Stewart , Jeffrey Jian Ma, Srivatsan Krishnan , Austin V Huang , Izzeddin Gur , Colton Bishop , Paige Bailey , Wenjie Jiang , Ebrahim Songhori , Sergio Guadarrama , Jie Tan , J K Terry , Aleksandra Faust , and Vijay Janapa Reddi

2025

2024

FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

Jeffrey Ma, Alan Tu , Yiling Chen , and Vijay Reddi

Jun 2024

Abs arXiv

Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration. Asynchronous Federated Learning (AFL) methods have emerged as promising alternatives to their synchronous counterparts bounded by the slowest agent, yet they add additional challenges in convergence guarantees, fairness with respect to compute heterogeneity, and incorporation of staleness in aggregated updates. Specifically, AFL biases model training heavily towards agents who can produce updates faster, leaving slower agents behind, who often also have differently distributed data which is not learned by the global model. Naively upweighting introduces incentive issues, where true fast updating agents may falsely report updates at a slower speed to increase their contribution to model training. We introduce FedStaleWeight, an algorithm addressing fairness in aggregating asynchronous client updates by employing average staleness to compute fair re-weightings. FedStaleWeight reframes asynchronous federated learning aggregation as a mechanism design problem, devising a weighting strategy that incentivizes truthful compute speed reporting without favoring faster update-producing agents by upweighting agent updates based on staleness. Leveraging only observed agent update staleness, FedStaleWeight results in more equitable aggregation on a per-agent basis. We both provide theoretical convergence guarantees in the smooth, non-convex setting and empirically compare FedStaleWeight against the commonly used asynchronous FedBuff with gradient averaging, demonstrating how it achieves stronger fairness, expediting convergence to a higher global model accuracy. Finally, we provide an open-source test bench to facilitate exploration of buffered AFL aggregation strategies, fostering further research in asynchronous federated learning paradigms.

2021

Polymatrix Competitive Gradient Descent

Jeffrey Ma, Alistair Letcher , Florian Schäfer , Yuanyuan Shi , and Anima Anandkumar

Nov 2021

Abs arXiv

Many economic games and machine learning approaches can be cast as competitive optimization problems where multiple agents are minimizing their respective objective function, which depends on all agents’ actions. While gradient descent is a reliable basic workhorse for single-agent optimization, it often leads to oscillation in competitive optimization. In this work we propose polymatrix competitive gradient descent (PCGD) as a method for solving general sum competitive optimization involving arbitrary numbers of agents. The updates of our method are obtained as the Nash equilibria of a local polymatrix approximation with a quadratic regularization, and can be computed efficiently by solving a linear system of equations. We prove local convergence of PCGD to stable fixed points for n-player general-sum games, and show that it does not require adapting the step size to the strength of the player-interactions. We use PCGD to optimize policies in multi-agent reinforcement learning and demonstrate its advantages in Snake, Markov soccer and an electricity market game. Agents trained by PCGD outperform agents trained with simultaneous gradient descent, symplectic gradient adjustment, and extragradient in Snake and Markov soccer games and on the electricity market game, PCGD trains faster than both simultaneous gradient descent and the extragradient method.

2020

Diagnostic Image Quality Assessment and Classification in Medical Imaging: Opportunities and Challenges

Jeffrey Ma, Ukash Nakarmi , Cedric Yue Sik Kin , Christopher Sandino , Joseph Y. Cheng , Ali B. Syed , Peter Wei , John M. Pauly , and Shreyas Vasanawala

Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), May 2020

Abs arXiv HTML

Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivates the need to develop an automated framework capable of accessing medical image quality and detecting diagnostic and non-diagnostic images. In this paper, we explore several convolutional neural network-based frameworks for medical image quality assessment and investigate several challenges therein.
Analysis of Deep Learning models for Diagnostic Image Quality Assessment in Magnetic Resonance Imaging

Jeffrey Ma, Ukash Nakarmi , Cedric Yue Sik Kin , Joseph Y. Cheng , Christopher Sandino , Ali B. Syed , Peter Wei , John M. Pauly , and Shreyas Vasanawala

Proceedings of the 2020 28th International Society for Magnetic Resonance in Medicine (ISMRM) Annual Meeting, Aug 2020

Abs HTML

In this abstract we investigate deep learning frameworks for medical image quality assessment and automatic classification of diagnostic and non-diagnostic quality images.
Toward Continuous Social Phenotyping: Analyzing Gaze Patterns in an Emotion Recognition Task for Children With Autism Through Wearable Smart Glasses

Anish Nag , Nick Haber , Catalin Voss , Serena Tamura , Jena Daniels , Jeffrey Ma, Bryan Chiang , Shasta Ramachandran , Jessey Schwartz , Terry Winograd , Carl Feinstein , and Dennis P Wall

Journal of Medical Internet Research (JMIR), Apr 2020

Abs HTML

Several studies have shown that facial attention differs in children with autism. Measuring eye gaze and emotion recognition in children with autism is challenging, as standard clinical assessments must be delivered in clinical settings by a trained clinician. Wearable technologies may be able to bring eye gaze and emotion recognition into natural social interactions and settings.