We reported last month that the UK Met Office has purchased a £1.2bn Microsoft supercomputer from Microsoft. The supercomputer will be twice as powerful as any other in the country and rank in the top 25 in the world.
Now more detail has emerged regarding this device, courtesy of publication High-Performance Computing Wire.
The cluster is expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022.
The Met Office will initially receive four Microsoft Azure-integrated HPE Cray EX supercomputers with AMD Epyc Milan CPUs, coupled with an active data archive system capable of supporting nearly four exabytes of data that will deliver over 60 peak petaflops across the four quadrants, offering a sixfold increase in the service’s computing power.
Then 5-8 years later, the system will be upgraded with fourth-generation AMD Genoa CPUs, tripling its power to a total 18-times improvement over its current capacity.
“We realized we had to do something slightly different with our next procurement,” explained Richard Lawrence, an IT fellow for supercomputing at the Met Office. “It takes us on average about two years to procure any new supercomputer and then another year to bring into operation. So that’s a lot of time for a lot of people that we do with each procurement, and that’s really expensive and we don’t see particularly good value for us. So we wanted to see if we could change our approach to allow us to spend less time buying supercomputers and more time in utilizing them.”
Unlike before, the Met Office will not be managing the supercomputer themselves, but rather offload the work to Microsoft as a HPC-as-a-service installations.
“[Microsoft will] be providing us a supercomputer, all of the power for the supercomputer, the hosting for the supercomputer, and everything that’s supporting us in making use of that supercomputer as well,” Lawrence said. For the second generation, he elaborated, “we’ve built into the procurement a mechanism to allow us to analyze what’s available within the market and make sure that the refresh we get halfway through allows us to meet our performance goals and is proving to be good value for the money.”
The four systems cluster will also be hosted offsite for the first time, two each at two separate Microsoft datacenters in the southern UK, adding further resilience.
“The reason why we’re [splitting] into four is to give us a bit more flexibility when we are wanting to patch supercomputers and have more flexibility when one of them develops a fault and we need to switch operations to run in a different … supercomputer,” Lawrence explained.
“That’s a large investment – certainly the largest the Met Office has ever dealt with,” Lawrence said. “And the reason why we were successful in going out to the government and getting them to invest this amount is because we spent a large amount of time articulating the benefits, not just to the Met Office but to the wider UK.”