Colin’s Blog October 2016
This blog post is a summation of the recent interesting developments that have been taking place at Microsoft.
I hope you enjoy! leave a comment, or reach out to us with any questions!
New utilities to boost Science Productivity
Two data science utilities are now available through GitHub to help boost your productivity: Interactive Data Exploration, Analysis and Reporting (IDEAR), and Automated Modeling and Reporting (AMAR).
IDEAR helps data scientists explore, visualize and analyze data, and helps provide insights into the data in an interactive manner. AMAR is a customizable tool to train machine learning models with hyper-parameter sweeping, compare the accuracy of those models and look at variable importance. With virtually no setup and coding effort, the modeling report can provide an initial assessment of the prediction accuracy of several commonly used machine learning approaches.
Find out more about these new utilities on the Cortana Intelligence and Machine Learning Blog.
Helping reduce time in computational sequencing a genome
Microsoft has come up with a way to significantly reduce the time it takes to do the major computational aspects of sequencing a genome.
Microsoft’s method of running the Burrows-Wheeler Aligner (BWA) and the Broad Institute’s Genome Analysis Toolkit (GATK) on its Azure cloud computing system is seven times faster than the previous version, allowing researchers and medical professionals to get results in just four hours instead of 28. BWA and GATK are two of the most common computational tools used in combination for genome sequencing.
Over time, experts say the ability to sequence genomic data of plants and animals also could hasten important breakthroughs in other research fields, such as renewable energy and efficient food production.
A ‘genomics revolution’ – The quicker Azure-based offering comes as the ability to analyze genomic data is becoming much more affordable, making it available to more people who need it and fueling a genomics revolution.
Microsoft holds a nonexclusive license from the Broad Institute to provide GATK on Azure. It plans to work with the Broad Institute to incorporate these performance improvements into future versions of GATK. Broad Institute would then make these improvements available to researchers.
Cloud computing is ideal for this type of computational work, because it takes a lot computing power, requires a lot of data storage and requests can come in fits and bursts. For most research centers and sciences facilities it would be too expensive to invest in the necessary computing capability, and impractical to take on the job of hosting all that data on their own, if only because the sheer volume of data is growing exponentially. As these tools become more useful, most researchers also want to focus on getting the results they need, rather than worrying about the technical side of things.
Eventually, the Microsoft team hopes to use another company strength – developing an ecosystem around a technology – to help research centers and other institutions implement these systems. Microsoft’s genomics team is talking to independent software vendors about ways to make that happen.