Microsoft yesterday announced the expanded availability of Azure OpenAI Service. With this expansion, customers in Australia East, Canada East, East United States 2, Japan East, and United Kingdom South will be able to access popular OpenAI models GPT-4 and GPT-35-Turbo. Before this expansion, Azure OpenAI Service was available in East United States, France Central, South Central United States, and West Europe. During the recent earnings call, Microsoft announced that Azure OpenAI Service is now serving over 11,000 customers, attracting an average of 100 new customers daily.
Today Microsoft also announced the general availability of Azure ND H100 v5 Virtual Machine (VM) series, featuring the latest NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking. This new VM series is designed specifically for AI workloads and is now available in the East United States and South Central United States Azure regions. Even though the VMs are generally available, customers have to register their interest in access to the new VMs.
The ND H100 v5 VMs include the following features:
- Equipped with eight NVIDIA H100 Tensor Core GPUs, these VMs promise significantly faster AI model performance than previous generations.
- The 4th Gen Intel Xeon Scalable processors as the foundation of these VMs, ensuring optimal processing speed.
- NVIDIA Quantum-2 ConnectX-7 InfiniBand with 400Gb/s per GPU with 3.2 Tb/s per VM of cross-node bandwidth ensures seamless performance across the GPUs, matching the capabilities of top-performing supercomputers globally.
- With PCIe Gen5 providing 64GB/s bandwidth per GPU, Azure achieves significant performance advantages between CPU and GPU.
- DDR5 memory is at the core of these VMs, delivering greater data transfer speeds and efficiency, making them ideal for workloads with larger datasets.
- With up to six times more speedup in matrix multiplication operations when using the new 8-bit FP8 floating point data type compared to FP16 in previous generations.
- With up to two times more speedup in large language models like BLOOM 175B end-to-end model inference, demonstrating their potential to optimize AI applications further.