Rui's (handsome) headshot

Rui Pan 潘瑞

CS Ph.D. student @ Princeton


I'm a 2nd-year CS Ph.D. student at Princeton University, advised by Prof. Ravi Netravali. I got my B.S. in CS and Math from University of Wisconsin-Madison, where I was fortunate to be advised by Prof. Shivaram Venkataraman on systems (cluster scheduling & resource management) for ML. I had also worked with Prof. Yiting Xia at Max Planck Institute for Informatics on networked systems for ML. I am broadly interested in the intersection between systems, networks, and machine learning. In particular, I enjoy working on systems and networks for machine learning and video applications.


(*Equal contributions)
  • Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
    Yinwei Dai*, Rui Pan*, Anand Iyer, Kai Li, Ravi Netravali
    arXiv 2023
    Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper explores an alternate strategy to taming throughput-latency tradeoffs by changing the granularity at which inference is performed. We present Apparate, a system that automatically applies and manages early exits (EEs) in ML models, whereby certain inputs can exit with results at intermediate layers. To cope with the time-varying overhead and accuracy challenges that EEs bring, Apparate repurposes exits to provide continual feedback that powers several novel runtime monitoring and adaptation strategies. Apparate lowers median response latencies by 40.5-91.5% and 10.0-24.2% for diverse CV and NLP workloads, respectively, without affecting throughputs or violating tight accuracy constraints.

  • Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
    Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, Aditya Akella
    USENIX NSDI 2023
    Dynamic adaptation has become an essential technique in accelerating distributed machine learning (ML) training: Recent studies have shown that dynamically adjusting model structure (e.g., lottery ticket hypothesis) or hyperparameters (e.g., batch size) can significantly accelerate training without sacrificing accuracy. However, existing ML cluster schedulers are not designed to handle dynamic adaptation. We show that existing schemes fail to provide fairness and degrade system efficiency when the training throughput changes over time under dynamic adaptation. We design Shockwave, a scheduler with future planning that builds on two key ideas. First, Shockwave extends classic market theory from static settings to dynamic settings to co-optimize efficiency and fairness. Second, Shockwave utilizes stochastic dynamic programming to handle uncertain, dynamic throughput. We build a system for Shockwave and validate its performance with both trace-driven simulation and cluster experiments. Results show that for traces of ML jobs with dynamic adaptation, Shockwave improves makespan by 1.3× and fairness by 2× when compared with existing fair scheduling schemes.

  • Efficient Flow Scheduling in Distributed Deep Learning Training with Echelon Formation
    Rui Pan*, Yiming Lei*, Jialong Li, Zhiqiang Xie, Binhang Yuan, Yiting Xia
    ACM HotNets 2022
    This paper discusses why flow scheduling does not apply to distributed deep learning training and presents EchelonFlow, the first network abstraction to bridge the gap. EchelonFlow deviates from the common belief that semantically related flows should finish at the same time. We reached the key observation, after extensive workflow analysis of diverse training paradigms, that distributed training jobs observe strict computation patterns, which may consume data at different times. We devise a generic method to model the drastically different computation patterns across training paradigms, and formulate EchelonFlow to regulate flow finish times accordingly. Case studies of mainstream training paradigms under EchelonFlow demonstrate the expressiveness of the abstraction, and our system sketch suggests the feasibility of an EchelonFlow scheduling system.

Some other non peer-reviewed write-ups include:
  • CS 759 Project Report: Cautiously Aggressive GPU Space Sharing for Improving Resource Utilization and Job Efficiency (pdf)
  • CS 744 Project Report: Comparing Black-Box Optimization Methods for Online DBMS Tuning (pdf)
  • AgDH: A System for Gathering and Disseminating Dairy Data (pdf)



  • AWS logo
    Applied Scientist Intern @ AWS, AWS AI Team

    May 2024 - Aug 2024

    Santa Clara, CA

  • Max Planck Institute logo
    🇩🇪Research Intern @ Max Planck Institute for Informatics (MPI-INF)

    Feb 2022 - Aug 2022

    Saarbrücken, Germany

    Advisor: Prof. Yiting Xia

Professional Activities


  • Checking out new places, either in person or on Google Maps. Cities I have lived in for more than a few months include: Shanghai, Pittsburgh, Madison, Berkeley, Saarbrücken, and Princeton.
  • Collecting postcards. I love postcards, send me one or let me know if you want one!
  • Sports. I play soccer and table tennis for fun and I am a fan of FC Barcelona.
  • Watching movies & making pop culture references.
  • Music. I used to play accordion and alto saxophone because of Chinese parenting.
  • Writing. I have a personal blog that hosts some paper reading notes and other random blog posts. Some of my most-visited writings include:


ruipan at cs dot princeton dot edu

@ 2023 Rui Pan. Powered by Bootstrap. Feel free to fork this website's source code, just remember to remove the analytics stuff.