3

[Preprint] Automated Backend-Aware Post-Training Quantization

Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerator may demand different …

[ICML 2021 (Long Oral)] Characterizing Structural Regularities of Labeled Data in Overparameterized Models

Human learners appreciate that observations usually form hierarchies of regularities and sub-regularities. For example, English verbs have irregular cases that must be memorized (e.g., go - went) and regular cases that generalize well (e.g., kiss - …

[Preprint] Relay: A High-Level Compiler for Deep Learning

Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of …