Machine Learning at MIT

EVENTS

ML Seminars

ML Seminars (Spring 2025)

Brandon Wood (FAIR)
      UMA: A Family of Universal Models for Atoms
       Monday, June 30, 2025. 12 PM   32-G449 (Kiva)

Abstract:
The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms UMA, designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g. molecules, materials, and catalysts. We develop empirical scaling laws to help understand how to increase model capacity alongside dataset size to achieve the best accuracy. The UMA small and medium models utilize a novel architectural design we refer to as mixture of linear experts that enables increasing model capacity without sacrificing speed. For example, UMA-medium has 1.4B parameters but only ~50M active parameters per atomic structure. We evaluate UMA models on a diverse set of tasks across multiple domains and find that, remarkably, a single model without any fine-tuning can perform similarly or better than specialized models. We are releasing the UMA code, weights, and associated data to accelerate computational workflows and enable the community to continue to build increasingly capable AI models.
  BIO
I’m a research scientist on the FAIR chemistry team in San Francisco. My research lies at the intersection of deep learning, chemistry/physics, and large scale computing. Lately, I have been working on generalizable machine learning potentials and generative models for molecules and materials. Prior to joining FAIR, I was a postdoctoral fellow at NERSC and I completed my PhD with Kristin Persson at UC Berkeley.
Andre Martins (Instituto Superior Tecnico)
      Dynamic Sparsity for Machine Learning
       Monday, March 3, 2025. 11 AM   32-D463 (Star)

Abstract:
In this talk, I describe how sparse modeling techniques can be extended and adapted for facilitating dynamic sparsity in neural models, where different neural pathways are activated depending on the input. The building block is a family of sparse transformations induced by Tsallis entropies called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful for routing and interpretability. They can also be used to design new Fenchel-Young loss functions, replacing the cross-entropy loss. Variants of these sparse transformations and losses have been applied with success to machine translation, natural language inference, visual question answering, Hopfield networks, reinforcement learning, and other tasks. I will end by covering a recent application of these sparse losses to conformal prediction.
  BIO
André F. T. Martins (PhD 2012, Carnegie Mellon University and Instituto Superior Técnico; https://andre-martins.github.io/) is an Associate Professor at Instituto Superior Técnico, University of Lisbon, researcher at Instituto de Telecomunicações, and the VP of AI Research at Unbabel. His research, funded by a ERC Starting Grant (DeepSPIN) and Consolidator Grant (DECOLLAGE), among other grants, include machine translation, quality estimation, structure and interpretability in deep learning systems for NLP. His work has received several paper awards at ACL conferences. He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS), and he is a Fellow of the ELLIS society and co-director of the ELLIS Program in Natural Language Processing.
Rajesh Ranganath (NYU)
      Looking into the benefits and costs of flexibility through diffusion models and robustness
       Wednesday, February 19, 2025. 2:00 pm   32-G449 (Patil/Kiva)

  BIO
Rajesh Ranganath is an NYU Associate Professor working on causal, statistical, and probabilistic inference, out-of-distribution detection and generalization, generative modeling, interpretability, and AI for healthcare.

ML Seminars (Fall 2024)

Eric Vanden-Eijnden (NYU)
      Generative modeling with flows and diffusions
       Tuesday, December 3, 2024. 2:00 pm   32-G449 (Patil/Kiva)

Abstract:
Generative models based on dynamical transport have recently led to significant advances in unsupervised learning. At mathematical level, these models are primarily designed around the construction of a map between two probability distributions that transform samples from the first into samples from the second. While these methods were first introduced in the context of image generation, they have found a wide range of applications, including in scientific computing where they offer interesting ways to reconsider complex problems once thought intractable because of the curse of dimensionality. In this talk, I will discuss the mathematical underpinning of generative models based on flows and diffusions, and show how a better understanding of their inner workings can help improve their design. These results indicate how to structure the transport to best reach complex target distributions while maintaining computational efficiency, both at learning and sampling stages. I will also discuss applications of generative AI in scientific computing, in particular in the context of Monte Carlo sampling, with applications to the statistical mechanics and Bayesian inference, as well as probabilistic forecasting, with application to fluid dynamics and atmosphere/ocean science.
  BIO
Eric Vanden-Eijnden is a Professor at NYU Courant working on ML foundations and applications to scientific computing.
Volodymyr Kuleshov (Cornell Tech)
      Simple and effective discrete diffusion for modeling language and biological sequences
       Tuesday, November 19, 2024. 2:00 pm   32-G449 (Patil/Kiva)

Abstract:
While diffusion generative models excel at high-quality image generation, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods on discrete data such as text or biological sequences. Our work takes steps towards closing this gap via a simple and effective framework for discrete diffusion. This framework is simple to understand—it optimizes a mixture of denoising (e.g., masking) losses—and can be seen as endowing BERT-like models with principled samplers and variational estimators of log-likelihood. Crucially, our algorithms are not constrained to generate data sequentially, and therefore have the potential to improve long-term planning, controllable generation, and sampling speed. In the context of language modeling, our framework enables deriving masked diffusion language models (MDLMs), which achieve a new state-of-the-art among diffusion models, and approach AR quality. Combined with novel extensions of classifier-free and classifier-based guidance mechanisms, these algorithms are also significantly more controllable than AR models. Discrete diffusion extends beyond language to science, where it forms the basis of a new generation of DNA foundation models. Our largest models focus on plants and set a new state of the art in genome annotation, while also enabling effective generation. Discrete diffusion models hold the promise to advance progress in generative modeling and its applications in language understanding and scientific discovery.
  BIO
Volodymyr Kuleshov is the Joan Eliasoph, M.D. Assistant Professor at the Jacobs Technion-Cornell Institute at Cornell Tech and in the Computer Science Department at Cornell University. He obtained his Ph.D. in Computer Science from Stanford University, where he was the recipient of the Arthur Samuel Best Thesis Award. Kuleshov’s research interests are in the field of machine learning and its applications in scientific discovery, health, and sustainability. His work has been featured in Nature Biotechnology, Nature Medicine, Nature Communications, and has been recognized with an NSF CAREER award, NIH MIRA award, as well as multiple industry awards. Kuleshov is also a co-founder of Afresh, a startup that uses AI to drive down food waste and that is deployed in 10% of US supermarkets.
Swaroop Mishra (Google DeepMind)
      Instruction Following and Reasoning in Large Language Models
       Tuesday, November 12, 2024. 2:00 pm   32-D463 (Star)

  BIO
Swaroop Mishra is a Senior Research Scientist at Google DeepMind (formerly Google Brain), where he works on Gemini reasoning. He and his team recently built a system that received a Silver Medal at the International Mathematical Olympiad (IMO) 2024. His main research contributions are on instruction-tuning and reasoning methods, including natural instructions, super-natural instructions, reframing, question-decomposition, math via programs, help me think, and instruction-bias (EACL 2023 Outstanding Paper Award). He is a co-organizer of the MATH-AI workshop at NeurIPS 2022 and NeurIPS 2024. His work on "Natural Instructions" has recently received the AI2 "Lasting Impact Paper Award."
Andrew Gordon Wilson (NYU)
      Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
       Wednesday, November 6, 2024. 1:00 pm   32-G882 (Hewlett Room)

Abstract:
Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. We present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their compute-optimal scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute, which we then develop into a high-performance sparse mixture-of-experts layer.

  BIO
Andrew Gordon Wilson is a Professor at New York University working on a prescriptive approach to building autonomous intelligent systems, which involves understanding LLMs, vision models, distribution shifts, inductive biases, scalable inference, and ML for physics and biology.
Jonathan Frankle (Databricks)
      Training Modern LLMs from Scratch
       Wednesday, October 16, 2024. 2:00 pm   32-G449 (Patil/Kiva)

Abstract:
In this talk, I will describe the process of training contemporary LLMs from scratch based on my experience doing so at scale in industry with models like DBRX and MPT. I will start from the fundamental design decisions that go into building a model and the cost of doing so, and I will conclude with the logistics of training it, fine-tuning it, and aligning it with human preferences. Databricks believes in open science, so I will be able to openly share details about how we train industrial-grade LLMs.

  BIO
Jonathan Frankle is Chief AI Scientist at Databricks. He joined Databricks via the $1.3B acquisition of MosaicML, where he was a cofounder. His primary research interest is understanding how practical neural networks learn in practice with the goal of making them more efficient to train.
Finale Doshi-Velez (Harvard)
      Interpretability and Interaction for Improved AI Decision Support
       Tuesday, October 8, 2024. 11:00 am   32-G449 (Patil/Kiva)

Abstract:
Explanations have been proposed as a way to improve the human+AI performance in the context of AI decision support. By providing context for an AI recommendation, the reasoning goes, people will be able to use the decision support to ultimately make better choices. However, many studies have established that reality does not pan out this way: not only does AI decision support often fail to improve human+AI decision quality, but sometimes it makes it worse. Two factors affect whether an explanation is effective. First, its content must be appropriate for the use case. Indeed, identifying what explanations are needed for what use cases is the grand challenge in interpretable machine learning. Second, the delivery -- factors such as timing, engagement mechanisms -- also affect how people will use an AI recommendation and explanation. In this talk, I'll focus mostly on the first element, sharing ongoing work on optimizing explanations for specific properties and a sim2real approach for identifying promising explanations in-silico before moving on to expensive user studies. Regarding delivery, I'll also touch on recent work in which we use machine learning to personalize delivery strategies to the needs of different users. Through this presentation, I hope to not only present many interesting machine learning problems related to effective human+AI interaction, but also describe a path towards improving generalization in interpretable machine learning.

  BIO
Finale Doshi-Velez is a CS Professor at Harvard. She did her PhD at MIT, and her interests lie at the intersection of machine learning, healthcare, and interpretability.
Yoon Kim (MIT)
      Efficient Sequence Modeling with Linear Transformers
       Thursday, September 19, 2024. 2:00 pm   32-G449 (Patil/Kiva)

  BIO
Yoon Kim is an assistant professor at MIT. He works on large-scale models, language models, and symbolic control of neural networks.

Related Seminars (Fall 2024)

Irene Chen (UC Berkeley)
      Leveraging Large Datasets and Large Language Models to Improve Health Equity
       Thursday, October 17, 2024. 2:00 pm   32-G449 (Patil/Kiva)

Abstract:
The proliferation of medical data and the advancements of large language models (LLMs) promise to revolutionize healthcare; however, ensuring and increasing health equity remains a significant challenge. In this talk, I will present recent work on two critical aspects of this evolving landscape. First, I will examine the unexpected consequences of multi-source data scaling. Counter to intuition, adding training data can sometimes reduce overall accuracy, produce uncertain fairness outcomes, and diminish worst-subgroup performance. These findings underscore the complexity of working with disparate data sources in healthcare AI. Next, I will showcase innovative applications of LLMs in women's health. Through participatory design with healthcare workers and patients, we've developed guiding principles for LLM use in maternal health. Additionally, we demonstrate how LLMs can generate rationales for contraceptive medication switches using clinical notes. The talk concludes by emphasizing vigilance and ethical considerations as we advance towards more data-driven and AI-assisted healthcare.

  BIO
Irene is an Assistant Professor at UC Berkeley and UCSF in Computational Precision Health (CPH), Electrical Engineering and Computer Science (EECS), and Berkeley AI Research (BAIR). She is interested in how we can make machine learning systems for healthcare to be more robust, equitable, and impactful.

ML Seminars (Fall 2018)

ML Seminars typically happen on Wednesdays; exceptions noted in red.

Moritz Hardt (UC Berkeley), Sep 12, 32-D463, 3:30PM-4:30PM. When Recurrent Models Don't Need To Be Recurrent
Percy Liang (Stanford), Sep 19, 32-D463, 3:30PM-4:30PM. Adversaries, Extrapolation, and Language
Francis Bach (INRIA), Oct 17, 6-120, 4:30-5:30pm. Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
Emma Brunskill (Stanford), Oct 31, 34-101, 4:30-5:30pm. Towards Better Reinforcement Learning for High Stakes Domains
Sasha Rakhlin, Nov 14, 32-155, 4:30-5:30pm. Is Learning Compatible with (Over)fitting to the Training Data?

ML Seminars (Spring 2018)

Sanjoy Dasgupta (UCSD, San Diego, CA)
      Using interaction for simpler and better learning
       18 April, 2018. 3:00pm   32-G882

Abstract:
In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process.
We will see how interaction makes it possible to:
1. Learn DNF (disjunctive normal form) concepts.
2. Perform machine teaching in situations where the student’s concept class is unknown.
3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.

  BIO
Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning
Dan Roy (Univ Toronto)
      Nonvacuous Generalization Bounds for Deep Neural Networks via PAC-Bayes
       25 April, 2018. 4:30pm   32-141

Abstract:
A serious impediment to a rigorous understanding of the generalization performance of algorithms like SGD for neural networks is that most generalization bounds are numerically vacuous when applied to modern networks on real data sets. In recent work (Dziugaite and Roy, UAI 2017), we argue that it is time to revisit the problem of computing nonvacuous bounds, and show how the empirical phenomenon of "flat minima" can be operationalized using PAC-Bayesian bounds, yielding the first nonvacuous bounds for a large (stochastic) neural network on MNIST. The bound is obtained by first running SGD and then optimizing the distribution of a random perturbation of the weights so as to capture the flatness and minimize the PAC-Bayes bound. I will describe this work, its antecedents, its goals, and subsequent work, focusing on where others have and have not made progress towards understanding generalization according to our strict criteria.

Joint work with Gintare Karolina Dziugaite based on https://arxiv.org/abs/1703.11008, https://arxiv.org/abs/1712.09376, and https://arxiv.org/abs/1802.09583

  BIO
Dan Roy is an Assistant Professor in the Department of Statistical Sciences and, by courtesy, Computer Science at the University of Toronto, and a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of an Ontario Early Researcher Award and Google Faculty Research Award. Before joining U of T, Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Fellowship at Emmanuel College, University of Cambridge. Daniel earned his S.B., M.Eng., and Ph.D. from the Massachusetts Institute of Technology: his dissertation on probabilistic programming won an MIT EECS Sprowls Dissertation Award. Daniel's group works on foundations of machine learning and statistics.
Arthur Gretton (UCL, London, UK)
TBD
02 May, 2018. 4:00pm 32-G882
Shai Shalev-Shwartz (HUJI, Jerusalem, Israel)
TBD
07 May, 2018. 3:30pm Monday 32-G449
Joëlle Pineau (McGill; FAIR Montreal)
TBD
09 May, 2018. 4:30pm 32-141

ML Seminars (Fall 2017)

Yisong Yue. (Caltech).
      The dueling bandits problem
       8th Sep, 2017. 2pm-3pm   Friday   32-G882

Abstract:
In this talk, I will present the Dueling Bandits Problem, which is an online learning framework tailored towards real-time learning from subjective human feedback. In particular, the Dueling Bandits Problem only requires pairwise comparisons, which are shown to be reliably inferred in a variety of subjective feedback settings such as for information retrieval an recommender systems. I will provide an overview of the Dueling Bandits Problem with basic algorithmic results. I will then conclude by discussing some ongoing research directions with applications to personalized medicine.
This is joint work with Josef Broder, Bobby Kleinberg, Thorsten Joachims, Yanan Sui, Vincent Zhuang, and Joel Burdick.

  BIO
Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign. Yisong's research interests lie primarily in the theory and application of statistical machine learning. He is particularly interested in developing novel methods for spatiotemporal reasoning, structured prediction, interactive learning systems, and learning with humans in the loop. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive routing & allocation problems.
Alex Smola. Amazon
Sequence Modeling: From Spectral Methods and Bayesian Nonparametrics to Deep Learning
11th Sep, 2017. 3pm-4pm Monday 32-G463

Abstract:
In this talk I will summarize a few recent developments in the design and analysis of sequence models. Starting with simple parametric models such as HMMs for sequences we look at nonparametric extensions in terms of their ability to model more fine-grained types of state and transition behavior. In particular we consider spectral embeddings, nonparametric Bayesian models such as the nested Chinese Restaurant Franchise and the Dirichlet-Hawkes Process. We conclude with a discussion of deep sequence models for user return time modeling, time-dependent collaborative filtering, and large-vocabulary user profiling.
About the speaker: AWS Spotlight on Alex Smola
Noam Brown. CMU
      Libratus: Beating Top Humans in No-Limit Poker
      18th Sep, 2017. 3pm-54m   32-G449

Abstract:
Poker has been a challenge problem in AI and game theory for decades. As a game of imperfect information, poker involves obstacles not present in games like chess or Go. No program has been able to beat top professionals in large poker games, until now. In January 2017, our AI Libratus decisively defeated a team of the top professional players in heads-up no-limit Texas Hold'em. Libratus features a number of innovations which form a new approach to AI for imperfect-information games. The algorithms are domain-independent and can be applied to a variety of strategic interactions involving hidden information

This talk is based on joint work with Tuomas Sandholm.

  BIO
Noam Brown is a PhD student in computer science at Carnegie Mellon University advised by Professor Tuomas Sandholm. His research combines reinforcement learning and game theory to develop AIs capable of strategic reasoning in imperfect-information interactions. He has applied this research to creating Libratus, the first AI to defeat top humans in no-limit Texas Hold'em. His current research is focused on expanding the applicability of the technology behind Libratus to other domains.
Alekh Agarwal. MSR NYC
      Sample-Efficient Reinforcement Learning with Rich Observations
      20th Sep, 2017. 4pm-5pm   32-G882

Abstract:
This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity and is small in several practical settings. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

  BIO
Alekh Agarwal is a researcher in the New York lab of Microsoft Research, prior to which he obtained his PhD from UC Berkeley. Alekh’s research currently focuses on topics in interactive machine learning, including contextual bandits, reinforcement learning and online learning. Previously, he has worked on several topics in optimization including stochastic and distributed optimization. He has won several awards for his research including the NIPS 2015 best paper award.
Alex Smola. Amazon
Tutorial on Deep Learning with Apache MXNet Gluon
11th Oct, 2017. 2pm-5pm 54-100

Abstract:
Deep Learning short-course.
This tutorial introduces Gluon, a flexible new interface that pairs MXNet’s speed with a user-friendly frontend. Symbolic frameworks like Theano and TensorFlow offer speed and memory efficiency but are harder to program. Imperative frameworks like Chainer and PyTorch are easy to debug but they can seldom compete with the symbolic code when it comes to speed. Gluon reconciles the two, removing a crucial pain point by using just-in-time compilation and an efficient runtime engine for efficiency.
In this crash course, we’ll cover deep learning basics, the fundamentals of Gluon, advanced models, and multiple-GPU deployments. We will walk you through MXNet’s NDArray data structure and automatic differentiation tools. Well show you how to define neural networks at the atomic level, and through Gluon’s predefined layers. We’ll demonstrate how to serialize models and build dynamic graphs. Finally, we will show you how to hybridize your networks, simultaneously enjoying the benefits of imperative and symbolic deep learning.
About the speaker: AWS Spotlight on Alex Smola
Ohad Shamir. Weizmann Institute of Science, Israel
Failures of Gradient-Based Deep Learning
18th Oct, 2017. 4pm-5pm 32-G882

Abstract:
In recent years, deep learning has become the go-to solution for a broad range of applications, with a long list of success stories. However, it is important, for both theoreticians and practitioners, to also understand the associated difficulties and limitations. In this talk, I'll describe several simple problems for which commonly-used deep learning approaches either fail or suffer from significant difficulties, even if one is willing to make strong distributional assumptions. We illustrate these difficulties empirically, and provide theoretical insights explaining their source and (sometimes) how they can be remedied.

Includes joint work with Shai Shalev-Shwartz and Shaked Shammah.

BIO:
Ohad Shamir is a faculty member in the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science, Israel. He received a PhD in computer science from the Hebrew University in 2010, advised by Prof. Naftali Tishby. Between 2010-2013 he was a postdoctoral and associate researcher at Microsoft Research. His research focuses on machine learning, with emphasis on algorithms which combine practical efficiency and theoretical insight. He is also interested in the many intersections of machine learning with related fields, such as optimization, statistics, theoretical computer science and AI.
Michael Bronstein. USI Lugano (Switzerland)
Geometric Deep Learning: Going Beyond Euclidean Data
25 Oct, 2017. 4pm-5pm 32-G882

Abstract:
In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such geometric data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, recommendation systems, and web applications. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and applications as well as key difficulties and future research directions.

(based on M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst, "Geometric deep learning: going beyond Euclidean data", IEEE Signal Processing Magazine 34(4):18-42, 2017)

BIO:
Michael Bronstein (PhD with distinction 2007, Technion, Israel) is a professor at USI Lugano, Switzerland and Tel Aviv University, Israel. He also serves as a Principal Engineer at Intel Perceptual Computing. During 2017-2018 he is a fellow at the Radcliffe Institute for Advanced Study at Harvard University. Michael's main research interest is in theoretical and computational methods for geometric data analysis. He authored over 150 papers, the book Numerical geometry of non-rigid shapes (Springer 2008), and over 20 granted patents. He was awarded three ERC grants, Google Faculty Research award (2016), and Rudolf Diesel fellowship (2017) at TU Munich. He was invited as a Young Scientist to the World Economic Forum, an honor bestowed on forty world’s leading scientists under the age of forty. Michael is a Senior Member of the IEEE, alumnus of the Technion Excellence Program and the Academy of Achievement, ACM Distinguished Speaker, and a member of the Young Academy of Europe. In addition to academic work, Michael is actively involved in commercial technology development and consulting to start-up companies. He was a co-founder and technology executive at Novafora (2005-2009) developing large-scale video analysis methods, and one of the chief technologists at Invision (2009-2012) developing low-cost 3D sensors. Following the multi-million acquisition of Invision by Intel in 2012, Michael has been one of the key developers of the Intel RealSense technology.
Katherine Heller. Duke University
      Machine Learning for Healthcare Data
      29th Nov, 2017; 4pm-5pm 32-G882

Abstract:
We will discuss multiple ways in which healthcare data is acquired and machine learning methods are currently being introduced into clinical settings. This will include: 1) Modeling disease trends and other predictions, including joint predictions of multiple conditions, from electronic health record (EHR) data using Gaussian processes. 2) Predicting surgical complications and transfer learning methods for combining databases 3) Using mobile apps and integrated sensors for improving the granularity of recorded health data for chronic conditions and 4) The combination of mobile app and social network information in order to predict the spread of contagious disease. Current work in these areas will be presented and the future of machine learning contributions to the field will be discussed.

  BIO
Katherine Heller is an Assistant Professor in Statistical Science at Duke University. She is the recent recipient of a Google faculty research award, a first round BRAIN initiative award from the NSF, as well as a CAREER award. She received her PhD from the Gatsby Computational Neuroscience Unit at UCL, and was a postdoc at the University of Cambridge on an EPSRC postdoc fellowship, and at MIT on an NSF postdoc fellowship.

ML Seminars (Spring 2017)

Amir Globerson. Tel Aviv University
Efficient Optimization of a Convolutional Network with Gaussian Inputs
1st March, 2017; 5pm-6pm 32-G643
Mehryar Mohri. Courant Institute, NYU
Online Learning for Time Series Prediction
8th March, 2017; 4pm-5pm 32-G463
Lester Mackey. Microsoft Research
Measuring Sample Quality with Kernels
15 March, 2017; 4pm-5pm 32-G463
Ben Recht. UC Berkeley
Optimization Challenges in Deep Learning
22 March, 2017; 4pm-5pm 32-G463
Ruslan Salakhutdinov Carnegie Mellon University, Pittsburgh, PA
Learning Deep Unsupervised and Multimodal Models
05th Apr, 2017; 4pm-5pm 34-101
Jeff Miller. Harvard University, Cambridge
Robust Bayesian inference via coarsening
26th Apr, 2017; 3pm-4pm 32-G575
Ryan Adams. Harvard University and Google Brain
Building Probabilistic Structure into Massively Parameterized Models
10th May, 2017; 4pm-5pm 32-141

ML Seminars (Fall 2016)

Honglak Lee University of Michigan, Ann Arbor
Deep architectures for visual reasoning, multimodal learning, and decision-making
16th Nov, 2016; 4pm-5pm 32-G463
Elad Hazan Princeton University
A Non-generative Framework and Convex Relaxations for Unsupervised Learning
26th Oct, 2016; 4pm-5pm 32-G463
Tina Eliassi-Rad (Northeastern)
The Reasonable Effectiveness of Roles in Complex Networks
19th Oct, 2016; 32-G575
Carlo Morselli School of Criminology, University of Montreal
Criminal Networks
29th Sep, 2016; 4pm-5pm 4-237
Gah-Yi Vahn (LSB)
The data-driven (s, S) policy: why you can have confidence in censored demand data
5th Oct, 2016; 4:00 PM to 5:00 PM 32-G575
Le Song (Georgia Tech).
Discriminative Embedding of Latent Variable Models for Structured Data
16th Sep, 2016; 2pm-3pm 32-G882
Ashish Kapoor (MSR Redmond).
Safe Decision Making Under Uncertainty
14th Sep, 2016; 4pm-5pm 32-D507
Alan Malek (UC Berkeley).
Minimax strategies for online linear regression, square-loss prediction, and time series prediction
15th Aug, 2016; 11am 32-D677
Sashank Reddi (CMU).
Faster Stochastic Methods for Nonconvex Optimization in Machine Learning
13th July, 2016; 3pm 32-G882
Andre Wibisono (UC Berkeley).
A variational perspective on accelerated methods in optimization
14th July, 2016; 3pm 32-G882