Publications
2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
International Conference on Machine Learning (ICML), 2025
Palu: Compressing KV-Cache with Low-Rank Projection
International Conference on Learning Representations (ICLR), 2025
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025
2024
ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024
Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision
International Conference on Field-Programmable Logic and Applications (FPL), 2024
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
International Conference on Automated Machine Learning (AutoML), 2024
Towards Neural Architecture Search through Hierarchical Generative Modeling
International Conference on Machine Learning (ICML), 2024
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
International Conference on Machine Learning (ICML), 2024
Encodings for Prediction-based Neural Architecture Search
International Conference on Machine Learning (ICML), 2024
Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision
Design Automation Conference (DAC), 2024
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2024
On Latency Predictors for Neural Architecture Search
International Conference on Machine Learning and Systems (MLSYS), 2024
2023
M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs
International Conference on Field-Programmable Technology (FPT), 2023
DiviML: A Module-based Heuristic for Mapping Neural Networks onto Heterogeneous Platforms
International Conference on Computer-Aided Design (ICCAD), 2023
Multi-Predict: Few Shot Predictors for Efficient Neural Architecture Search
International Conference on Automated Machine Learning (AutoML), 2023
BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs
International Symposium On Field-Programmable Custom Computing Machines (FCCM), 2023
Learned Connectivity Sparsification for LUT-based Neural Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2023
2022
BLOX: Macro Neural Architecture Search Benchmark and Algorithms
Conference on Neural Information Processing Systems (NeurIPS), 2022
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference
International Symposium on Field-Programmable Gate Arrays (FPGA), 2022