FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.
The library provides efficient low-precision general matrix multiplication for small batch sizes and support for accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization. FBGEMM also exploits fusion opportunities in order to overcome the unique challenges of matrix multiplication at lower precision with bandwidth-bound operations.
FBGEMM_GPU (FBGEMM GPU Kernels Library) is a collection of high-performance PyTorch GPU operator libraries for training and inference. The library provides efficient table batched embedding bag, data layout transformation, and quantization supports.