Microsoft open sources Distributed Machine Learning Toolkit to make big data research more efficient

Home » Microsoft

2 min. read

Published on November 14, 2015

by Pradeep Viswav

published on November 14, 2015

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

In order to enable the training of big models using just a modest cluster and in an efficient manner, Microsoft recently released Distributed Machine Learning Toolkit (DMTK), which contains both algorithmic and system innovations. This makes big data research more scalable, efficient and flexible.

The toolkit, available now on GitHub, is designed for distributed machine learning — using multiple computers in parallel to solve a complex problem. It contains a parameter server-based programing framework, which makes machine learning tasks on big data highly scalable, efficient and flexible. It also contains two distributed machine learning algorithms, which can be used to train the fastest and largest topic model and the largest word-embedding model in the world.

The toolkit offers rich and easy-to-use APIs to reduce the barrier of distributed machine learning, so researchers and developers can focus on core machine learning tasks like data, model and training.

The current version of DMTK includes the following components (more components will be added to the future versions):

• DMTK Framework: a flexible framework that supports unified interface for data parallelization, hybrid data structure for big model storage, model scheduling for big model training, and automatic pipelining for high training efficiency.

• LightLDA, an extremely fast and scalable topic model algorithm, with a O(1) Gibbs sampler and an efficient distributed implementation.

• Distributed (Multisense) Word Embedding, a distributed version of (multi-sense) word embedding algorithm.

Machine learning researchers and practitioners can also build their own distributed machine learning algorithms on top of our framework with small modifications to their existing single-machine algorithms.

Pradeep Viswav

Software and Services Expert

Pradeep is a Computer Science and Engineering Graduate. He was also a Microsoft Student Partner. He is currently working in a leading IT company.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply