Publications

2025

xKV: Cross-Layer SVD for KV-Cache Compression

Chi-Chih Chang, Chien-Yu Lin, Yash Akhauri, Wei-Cheng Lin, Kai-Chiang Wu, Luis Ceze, Mohamed Abdelfattah

arxiv: 2503.18893

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed Abdelfattah, Diana Marculescu

International Conference on Machine Learning (ICML), 2025

Palu: Compressing KV-Cache with Low-Rank Projection

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Mohamed Abdelfattah, Kai-Chiang Wu

International Conference on Learning Representations (ICLR), 2025

FlashDepth: Real-time Streaming Depth Estimation at 2K Resolution

Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec

arxiv: 2504.07093

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Yuzong Chen, Ahmed Fathy, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed Abdelfattah

IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025

2024

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Yash Akhauri, Ahmed Fathy, Jordan Dotzel, Zhiru Zhang, Alexander M. Rush, Safeen Huda, Mohamed Abdelfattah

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration

Yuzong Chen, Jian Meng, Jae-sun Seo, Mohamed Abdelfattah

IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024

Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision

Xilai Dai, Yuzong Chen, Mohamed Abdelfattah

International Conference on Field-Programmable Logic and Applications (FPL), 2024

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

International Conference on Automated Machine Learning (AutoML), 2024

Towards Neural Architecture Search through Hierarchical Generative Modeling

Lichuan Xiang, \L ukasz Dudziak, Mohamed Abdelfattah, Abhinav Mehrotra, Nicholas D. Lane, Hongkai Wen

International Conference on Machine Learning (ICML), 2024

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed Abdelfattah, Zhiru Zhang

International Conference on Machine Learning (ICML), 2024

Encodings for Prediction-based Neural Architecture Search

Yash Akhauri, Mohamed Abdelfattah

International Conference on Machine Learning (ICML), 2024

Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision

Ahmed Fathy, Susanne Balle, Deshanand Singh, Mohamed Abdelfattah

Design Automation Conference (DAC), 2024

PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration

Ahmed Fathy, Angela Cui, Javier Fernandez-Marques, Nicholas D. Lane, Mohamed Abdelfattah

ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2024

On Latency Predictors for Neural Architecture Search

Yash Akhauri, Mohamed Abdelfattah

International Conference on Machine Learning and Systems (MLSYS), 2024

2023

Multi-Predict: Few Shot Predictors for Efficient Neural Architecture Search

Yash Akhauri, Mohamed Abdelfattah

International Conference on Automated Machine Learning (AutoML), 2023

BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

Yuzong Chen, Mohamed Abdelfattah

International Symposium On Field-Programmable Custom Computing Machines (FCCM), 2023

Learned Connectivity Sparsification for LUT-based Neural Networks

Erwei Wang, Georgios Stavrou, Peter Cheung, George Constantinides, Mohamed Abdelfattah, James Davis

ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2023

2022

BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Thomas Chau, Lukasz Dudziak, Hongkai Wen, Nicholas D. Lane, Mohamed Abdelfattah

Conference on Neural Information Processing Systems (NeurIPS), 2022

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Hongxiang Fan, Thomas Chau, Stylianos Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed Abdelfattah

IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

Zero-Cost Operation Scoring in Differentiable Architecture Search

Lichuan Xiang, Lukasz Dudziak, Mohamed Abdelfattah, Thomas Chau, Nicholas D. Lane, Hongkai Wen

arxiv: 2106.06799.pdf

Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

Erwei Wang, James Davis, Georgios Stavrou, Peter Cheung, George Constantinides, Mohamed Abdelfattah

International Symposium on Field-Programmable Gate Arrays (FPGA), 2022