Indiana University Bloomington

DARPA awards $1.4 million grant to IU informatics professor for streamlining programming

Probabilistic problems currently require expensive, custom-crafted programs

  • March 25, 2014


BLOOMINGTON, Ind. -- By understanding, managing and inferring patterns from data, machine learning has brought us self-driving vehicles, spam filters and smartphone personal assistants. Now an Indiana University Bloomington computer scientist has received $1.4 million to give machine learning more muscle by making it applicable to greater amounts of more diverse data.

Chung-chieh “Ken” Shan, an assistant professor in the School of Informatics and Computing, will receive the funding from the U.S. Defense Department’s Defense Advanced Research Projects Agency over 46 months. The work will focus on probabilistic programming, a relatively new programming paradigm for managing uncertain information. Currently, most probabilistic problems require expensive programs custom-crafted by hard-to-find experts. These programs remain painfully slow with unpredictable performance when tackling large, complex data sets.

Shan’s charge is to develop a new probabilistic programming system that would let more people build machine learning applications, and to make those experts more effective at creating powerful applications that need less data to produce accurate results.

“Building probabilistic systems today is error-prone and requires painstaking manual effort,” said Shan, who came to IU in 2013 with a Bachelor of Arts in mathematics and a Ph.D. in computer science from Harvard University. “After that substantial time from a programmer, it can then take many computers considerable time to compute a subtly incorrect result.”

Manual implementation is required because the detailed reasoning needed to orchestrate and monitor inference techniques used in making decisions has not been mechanized, he said. Rather, programmers using machine learning preprocess the model input and post-process the inference output by hand.

“That expectation just does not scale to a greater variety of data and users, or to tuning for diverse hardware,” Shan said.

DARPA has recognized that demand for these capabilities is accelerating; yet every new application still requires a Herculean effort. So, if integrated models can be constructed across a wide variety of domains and tool types, the new systems could help revolutionize machine learning capabilities in fields including intelligence, natural language processing, predictive analytics and cybersecurity.

One group of researchers has already found that probabilistic computer modeling can interpret verbal autopsy data faster and cheaper than physician reviews. Another group is using machine learning to develop “smart drugs” that can automatically detect, diagnose and treat a variety of diseases using a cocktail of chemicals.

Shan said the primary challenges are to develop the theoretical underpinnings that would allow programming to move from a case-by-case, ad hoc basis to mechanization, and to build a system that offers efficient execution.

“We want a system that automatically orchestrates and monitors parallel, online and multi-method inference,” he said. “And the key technique for both efficiency and mechanization is a symbolic and executable representation of models and algorithms that integrates various inference techniques and summarizes the symmetries inherent in large data sets.”

If successful, the new systems would meet DARPA’s five primary objectives for advancing machine learning: Shorten code to make models faster and easier to understand; reduce development time and cost to encourage experimentation; facilitate construction of more sophisticated models that use rich domain knowledge and separate queries from underlying code; reduce the level of expertise needed to build applications; and support construction of integrated models across a wide variety of domains and tool types.

After receiving his Ph.D. in 2005, Shan spent six years at Rutgers University, a semester at Cornell University and a semester at Japan’s University of Tsukuba before coming to IU Bloomington last year.

Related Links

Chung-chieh "Ken" Shan

School of Informatics and Computing assistant professor Chung-chieh "Ken" Shan | Photo by Indiana University

Print-Quality Photo

Probabilistic programming is a new programming paradigm for managing uncertain information.

Probabilistic programming is a new programming paradigm for managing uncertain information. | Photo by DARPA

Print-Quality Photo

Media Contacts

Stephen Chaplin