Microsoft Computational Network Toolkit Performs Most Efficiently Compared to Others


Microsoft has been a long time investor in speech recognition research, in fact they have been working on it for over 20 years, almost the same amount of time their research division has been around. We have recently seen the fruits of these labors come to the consumer through such products as Skype Translator, Project Oxford Speech API’s, and Cortana. Last year Microsoft released their Computational Network Toolkit (CNTK) open sourced sharing their deep learning tools with the entire speech research community.

Microsoft has been working on a new tool called Azure GPU Lab which will be released soon. CNTK combined with Azure GPU Lab has allowed Microsoft to build and train deep neural nets for Cortana speech recognition. The important news for this technique? Using this setup has allowed them to build and train these neural nets up to 10 times faster than previously used methods.

Microsoft has begun utilizing these tools in other areas of the company as well, and have seen similar performance improvements. With this in mind Microsoft is hoping the wider machine learning and AI communities will take advantage of their tools and by open sourcing them that shared ideas will move progress together, quicker.

Microsoft has compared their tools against four leading deep learning toolkits. Here is how they set it up and the results:

CNTK-ChartThere are a number of deep learning toolkits available from Torch, Theano and Caffe to the recently open sourced toolkits from Google and IBM. We compared CNTK with four popular toolkits. We focus on comparing the raw computational efficiency of different toolkits using simulated data with an effective mini batch size (8192) in order to fully utilize all GPUs. With a fully connected 4-layer neural network . . . the number of frames each toolkit can process per second is illustrated in the chart. We include two configurations on a single Linux machine with 1 and 4 GPUs (Nvidia K40) respectively. We also report our 8-GPU CNTK speed on Azure GPU Lab with 2 identical Linux machines (2 x 4 GPUs) as used in the baseline benchmark. CNTK compares favorably in computational efficiency for distributed deep learning (4 GPUs or 8 GPUs) on all these toolkits we tested. CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance.”

Microsoft is hoping to share how advanced their CNTK toolkit is with the AI community as a whole this Friday at the Neural Information Processing Systems Conference, and begin work on computing systems that can see, hear, speak, understand and even begin to reason. This is an exciting time for technology!