Ever since the release of the HuggingFace🤗 Transformers library, it has been incredibly simple to train, finetune and run state-of-the-art Transformer-based translation models. This has also accelerated the development of our recently launched Translation feature. However, deploying these models in a production setting on GPU servers is still not straightforward, so I want to share how we at Speechmatics were able to deploy a performant real-time translation service for more than 30 languages and open-sourced part of our solution in the process.
How to Deploy HuggingFace Translation Models on GPU Servers
· 13 min read