Introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction; formulation of learning problems; representation, over-fitting, generalization; clustering, classification, probabilistic modeling; and methods such as support vector machines, hidden Markov models, and neural networks. Students taking graduate version complete additional assignments. Meets with 6.862 when offered concurrently. Recommended prerequisites: 6.006 and 18.06. Enrollment may be limited; no listeners.

Probabilistic modeling for problems of inference and machine learning from data, emphasizing analytical and computational aspects. Distributions, marginalization, conditioning, and structure; graphical and neural network representations. Belief propagation, decision-making, classification, estimation, and prediction. Sampling methods and analysis. Introduces asymptotic analysis and information measures. Computational laboratory component explores the concepts introduced in class in the context contemporary applications.

F18

(2.087, 6.0002, 6.01, 18.03, or 18.06) & (6.008, 6.041B, 14.30, 16.09, or 18.05)

Hands-on analysis of data demonstrates the interplay between statistics and computation. Includes four modules, each centered on a specific data set, and introduced by a domain expert. Provides instruction in specific, relevant analysis methods and corresponding algorithmic aspects. Potential modules may include medical data, gene regulation, social networks, finance data (time series), traffic, transportation, weather forecasting, policy, or industrial web applications. Projects address a large-scale data analysis question. Students taking graduate version complete additional assignments. Limited enrollment; priority to Statistics and Data Science minors and to juniors and seniors.

Introduction to the methodological foundations of data science, emphasizing basic concepts, but also modern methodologies. Learning of distributions and their parameters. Testing of multiple hypotheses. Linear and nonlinear regression and prediction. Classification. Learning of dynamical models. Uncertainty quantification. Model validation. Causal inference. Applications and case studies drawn from electrical engineering, computer science, the life sciences, finance, and social networks.

Principles, techniques, and algorithms in machine learning from the point of view of statistical inference; representation, generalization, and model selection; and methods such as linear/additive models, active learning, boosting, support vector machines, non-parametric Bayesian methods, hidden Markov models, Bayesian networks, and convolutional and recurrent neural networks. Recommended prerequisite: 6.036 or other previous experience in machine learning.

Among different approaches in modern machine learning, the course focuses on a regularization perspective and includes both shallow and deep networks. The content is roughly divided into two parts. In the first part, key algorithmic ideas are introduced, with an emphasis on the interplay between modeling and optimization aspects. Algorithms that will be discussed include classical regularization networks (regularized least squares, SVM, logistic regression),stochastic gradient methods, implicit regularization, sketching, sparsity based methods and deep neural networks. In the second part, key ideas in statistical learning theory will be developed to analyze the properties of the various algorithms previously introduced. Classical concepts like generalization, uniform convergence and Rademacher complexities will be developed, together with topics such as bounds based on margin, stability, and privacy. The final part of the course focuses on deep learning networks. It will introduce an emerging theoretical framework addressing three key puzzles in deep learning: approximation theory -- which functions can be represented more efficiently by deep networks than shallow networks -- optimization theory -- why can stochastic gradient descent easily find global minima -- and machine learning -- whether classical learning theory can explain generalization in deep networks. It will also discuss connections with the architecture of visual cortex, which was the original inspiration of the layered local connectivity of modern networks and may provide ideas for future developments of deep learning.

Introduction to statistical inference with probabilistic graphical models. Directed and undirected graphical models, and factor graphs, over discrete and Gaussian distributions; hidden Markov models, linear dynamical systems. Sum-product and junction tree algorithms; forward-backward algorithm, Kalman filtering and smoothing. Min-sum and Viterbi algorithms. Variational methods, mean-field theory, and loopy belief propagation. Particle methods and filtering. Building graphical models from data, including parameter estimation and structure learning; Baum-Welch and Chow-Liu algorithms. Selected special topics.

S19

linear algebra and probability (e.g. 18.06/18.700 and 6.041/6.431)

In this research-oriented course we will introduce graphical models in the framework of exponential families. We will see that polynomial equations and combinatorial constraints naturally arise and call for algebraic and combinatorial methods to advance the statistical methodology.
In particular, we will highlight the role of conic duality for Gaussian graphical models and polyhedral geometry for discrete graphical models. We will also develop methods for causal inference making use of the inherent combinatorial and algebraic structure in directed graphical models. Finally, we will discuss graphical models with hidden variables by highlighting the connections to tensor decompositions.
The overarching goal of this course is to provide an overview of the interplay of techniques from combinatorics, and applied algebraic geometry, with problems arising in statistics, in particular in graphical models. Specific topics include exponential families, Grobner bases, conditional independence ideals, Bayesian networks, determinantal varieties, and hyperbolic polynomials.

Introduction to principles of Bayesian and non-Bayesian statistical inference. Hypothesis testing and parameter estimation, sufficient statistics; exponential families. EM agorithm. Log-loss inference criterion, entropy and model capacity. Kullback-Leibler distance and information geometry. Asymptotic analysis and large deviations theory. Model order estimation; nonparametric statistics. Computational issues and approximation techniques; Monte Carlo methods. Selected topics such as universal inference and learning, and universal features and neural networks.

Machine Learning for Healthcare

S19

6.034 or 6.438 or 6.806 or 6.036 or 6.867 or 9.520

Introduces students to machine learning in healthcare, including the
nature of clinical data and the use of machine learning for risk
stratification, disease progression modeling, precision medicine,
diagnosis, subtype discovery, and improving clinical workflows. Topics
include causality, interpretability, algorithmic fairness, time-series
analysis, graphical models, deep learning and transfer learning. Guest
lectures by clinicians from the Boston area and course projects with
real clinical data emphasize subtleties of working with clinical data
and translating machine learning into clinical practice. Limited to
55.

As both the number of data sets and data set sizes grow, practitioners are interested in learning increasingly complex information and interactions from data. Probabilistic modeling in general, and Bayesian approaches in particular, provide a unifying framework for flexible modeling that includes prediction, estimation, and coherent uncertainty quantification. In this course, we will cover the modern challenges of Bayesian inference, including (but not limited to) speed of approximate inference, making use of distributed architectures, streaming data, and complex data interactions. We will study Bayesian nonparametric models, wherein model complexity grows with the size of the data; this allows us to learn, e.g., a greater diversity of topics as we read more documents from Wikipedia, identify more friend groups as we process more of Facebook's network structure, etc.

Modern machine learning systems are often built on top of algorithms that do not have provable guarantees, and it is the subject of debate when and why they work. In this class, we will focus on designing algorithms whose performance we can rigorously analyze for fundamental machine learning problems. We will cover topics such as: nonnegative matrix factorization, tensor decomposition, sparse coding, learning mixture models, matrix completion and inference in graphical models. Almost all of these problems are computationally hard in the worst-case and so developing an algorithmic theory is about (1) choosing the right models in which to study these problems and (2) developing the appropriate mathematical tools (often from probability, geometry or algebra) in order to rigorously analyze existing heuristics, or to design fundamentally new algorithms.

Probabilistic Systems Analysis (Intro to Probability)

F18, S19

Fundamentals of Probability (Graduate)

Probability and Statistics

S19

Probability and Random Variables (Undergraduate)

F18,S19

Calc II

Fundamentals of Statistics

---

Linear Algebra (Undergraduate)

F18, S19

Matrix Methods in Data Analysis, Signal Processing, and ML (Undergraduate)

Optimization Methods (Undergraduate)

F18

Introduction to Mathematical Programming (Graduate)

F18