One post tagged with "quantization"

View All Tags

Fast and Accurate GPU Quantization for Transformers

28 April 2023 · 23 min read

Lawrence Atkins

Machine Learning Engineer

David MacLeod

Machine Learning Engineer

As Transformer models increase in size, the computational cost of running inference also grows. Many organisations now face the challenge of deploying state-of-the-art models in a cost-effective way.