ML Seminars (Fall 2024)


  • Jonathan Frankle (Databricks)
          Training Modern LLMs from Scratch

           Wednesday, October 16, 2024. 2:00 pm   32-G449 (Patil/Kiva)

     Abstract:

    In this talk, I will describe the process of training contemporary LLMs from scratch based on my experience doing so at scale in industry with models like DBRX and MPT. I will start from the fundamental design decisions that go into building a model and the cost of doing so, and I will conclude with the logistics of training it, fine-tuning it, and aligning it with human preferences. Databricks believes in open science, so I will be able to openly share details about how we train industrial-grade LLMs.

      BIO

    Jonathan Frankle is Chief AI Scientist at Databricks. He joined Databricks via the $1.3B acquisition of MosaicML, where he was a cofounder. His primary research interest is understanding how practical neural networks learn in practice with the goal of making them more efficient to train.
  • Finale Doshi-Velez (Harvard)
          Interpretability and Interaction for Improved AI Decision Support

           Tuesday, October 8, 2024. 11:00 am   32-G449 (Patil/Kiva)

     Abstract:

    Explanations have been proposed as a way to improve the human+AI performance in the context of AI decision support. By providing context for an AI recommendation, the reasoning goes, people will be able to use the decision support to ultimately make better choices. However, many studies have established that reality does not pan out this way: not only does AI decision support often fail to improve human+AI decision quality, but sometimes it makes it worse. Two factors affect whether an explanation is effective. First, its content must be appropriate for the use case. Indeed, identifying what explanations are needed for what use cases is the grand challenge in interpretable machine learning. Second, the delivery -- factors such as timing, engagement mechanisms -- also affect how people will use an AI recommendation and explanation. In this talk, I'll focus mostly on the first element, sharing ongoing work on optimizing explanations for specific properties and a sim2real approach for identifying promising explanations in-silico before moving on to expensive user studies. Regarding delivery, I'll also touch on recent work in which we use machine learning to personalize delivery strategies to the needs of different users. Through this presentation, I hope to not only present many interesting machine learning problems related to effective human+AI interaction, but also describe a path towards improving generalization in interpretable machine learning.

      BIO

    Finale Doshi-Velez is a CS Professor at Harvard. She did her PhD at MIT, and her interests lie at the intersection of machine learning, healthcare, and interpretability.
  • Yoon Kim (MIT)
          Efficient Sequence Modeling with Linear Transformers

           Thursday, September 19, 2024. 2:00 pm   32-G449 (Patil/Kiva)

      BIO

    Yoon Kim is an assistant professor at MIT. He works on large-scale models, language models, and symbolic control of neural networks.

Related Seminars (Fall 2024)


  • Irene Chen (UC Berkeley)
          Leveraging Large Datasets and Large Language Models to Improve Health Equity

           Thursday, October 17, 2024. 2:00 pm   32-G449 (Patil/Kiva)

     Abstract:

    The proliferation of medical data and the advancements of large language models (LLMs) promise to revolutionize healthcare; however, ensuring and increasing health equity remains a significant challenge. In this talk, I will present recent work on two critical aspects of this evolving landscape. First, I will examine the unexpected consequences of multi-source data scaling. Counter to intuition, adding training data can sometimes reduce overall accuracy, produce uncertain fairness outcomes, and diminish worst-subgroup performance. These findings underscore the complexity of working with disparate data sources in healthcare AI. Next, I will showcase innovative applications of LLMs in women's health. Through participatory design with healthcare workers and patients, we've developed guiding principles for LLM use in maternal health. Additionally, we demonstrate how LLMs can generate rationales for contraceptive medication switches using clinical notes. The talk concludes by emphasizing vigilance and ethical considerations as we advance towards more data-driven and AI-assisted healthcare.

      BIO

    Irene is an Assistant Professor at UC Berkeley and UCSF in Computational Precision Health (CPH), Electrical Engineering and Computer Science (EECS), and Berkeley AI Research (BAIR). She is interested in how we can make machine learning systems for healthcare to be more robust, equitable, and impactful.

ML Seminars (Fall 2018)

ML Seminars typically happen on Wednesdays; exceptions noted in red.
  • Moritz Hardt (UC Berkeley), Sep 12, 32-D463, 3:30PM-4:30PM. When Recurrent Models Don't Need To Be Recurrent
  • Percy Liang (Stanford), Sep 19, 32-D463, 3:30PM-4:30PM. Adversaries, Extrapolation, and Language
  • Francis Bach (INRIA), Oct 17, 6-120, 4:30-5:30pm. Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
  • Emma Brunskill (Stanford), Oct 31, 34-101, 4:30-5:30pm. Towards Better Reinforcement Learning for High Stakes Domains
  • Sasha Rakhlin, Nov 14, 32-155, 4:30-5:30pm. Is Learning Compatible with (Over)fitting to the Training Data?

ML Seminars (Spring 2018)


  • Sanjoy Dasgupta (UCSD, San Diego, CA)
          Using interaction for simpler and better learning

           18 April, 2018. 3:00pm   32-G882

     Abstract:

    In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process.
    We will see how interaction makes it possible to:
    1. Learn DNF (disjunctive normal form) concepts.
    2. Perform machine teaching in situations where the student’s concept class is unknown.
    3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.

      BIO

    Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning
  • Dan Roy (Univ Toronto)
          Nonvacuous Generalization Bounds for Deep Neural Networks via PAC-Bayes

           25 April, 2018. 4:30pm   32-141

     Abstract:

    A serious impediment to a rigorous understanding of the generalization performance of algorithms like SGD for neural networks is that most generalization bounds are numerically vacuous when applied to modern networks on real data sets. In recent work (Dziugaite and Roy, UAI 2017), we argue that it is time to revisit the problem of computing nonvacuous bounds, and show how the empirical phenomenon of "flat minima" can be operationalized using PAC-Bayesian bounds, yielding the first nonvacuous bounds for a large (stochastic) neural network on MNIST. The bound is obtained by first running SGD and then optimizing the distribution of a random perturbation of the weights so as to capture the flatness and minimize the PAC-Bayes bound. I will describe this work, its antecedents, its goals, and subsequent work, focusing on where others have and have not made progress towards understanding generalization according to our strict criteria.

    Joint work with Gintare Karolina Dziugaite based on https://arxiv.org/abs/1703.11008, https://arxiv.org/abs/1712.09376, and https://arxiv.org/abs/1802.09583

      BIO

    Dan Roy is an Assistant Professor in the Department of Statistical Sciences and, by courtesy, Computer Science at the University of Toronto, and a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of an Ontario Early Researcher Award and Google Faculty Research Award. Before joining U of T, Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Fellowship at Emmanuel College, University of Cambridge. Daniel earned his S.B., M.Eng., and Ph.D. from the Massachusetts Institute of Technology: his dissertation on probabilistic programming won an MIT EECS Sprowls Dissertation Award. Daniel's group works on foundations of machine learning and statistics.
  • Arthur Gretton (UCL, London, UK)
          TBD

           02 May, 2018. 4:00pm   32-G882
  • Shai Shalev-Shwartz (HUJI, Jerusalem, Israel)
          TBD

           07 May, 2018. 3:30pm   Monday   32-G449
  • Joëlle Pineau (McGill; FAIR Montreal)
          TBD

           09 May, 2018. 4:30pm   32-141

ML Seminars (Fall 2017)

  • Yisong Yue. (Caltech).
          The dueling bandits problem

           8th Sep, 2017. 2pm-3pm   Friday   32-G882

     Abstract:

    In this talk, I will present the Dueling Bandits Problem, which is an online learning framework tailored towards real-time learning from subjective human feedback. In particular, the Dueling Bandits Problem only requires pairwise comparisons, which are shown to be reliably inferred in a variety of subjective feedback settings such as for information retrieval an recommender systems. I will provide an overview of the Dueling Bandits Problem with basic algorithmic results. I will then conclude by discussing some ongoing research directions with applications to personalized medicine.
    This is joint work with Josef Broder, Bobby Kleinberg, Thorsten Joachims, Yanan Sui, Vincent Zhuang, and Joel Burdick.

      BIO

    Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign. Yisong's research interests lie primarily in the theory and application of statistical machine learning. He is particularly interested in developing novel methods for spatiotemporal reasoning, structured prediction, interactive learning systems, and learning with humans in the loop. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive routing & allocation problems.
  • Alex Smola. Amazon
          Sequence Modeling: From Spectral Methods and Bayesian Nonparametrics to Deep Learning

          11th Sep, 2017. 3pm-4pm Monday 32-G463

     Abstract:

    In this talk I will summarize a few recent developments in the design and analysis of sequence models. Starting with simple parametric models such as HMMs for sequences we look at nonparametric extensions in terms of their ability to model more fine-grained types of state and transition behavior. In particular we consider spectral embeddings, nonparametric Bayesian models such as the nested Chinese Restaurant Franchise and the Dirichlet-Hawkes Process. We conclude with a discussion of deep sequence models for user return time modeling, time-dependent collaborative filtering, and large-vocabulary user profiling.
    About the speaker: AWS Spotlight on Alex Smola
  • Noam Brown. CMU
          Libratus: Beating Top Humans in No-Limit Poker

          18th Sep, 2017. 3pm-54m   32-G449

     Abstract:

    Poker has been a challenge problem in AI and game theory for decades. As a game of imperfect information, poker involves obstacles not present in games like chess or Go. No program has been able to beat top professionals in large poker games, until now. In January 2017, our AI Libratus decisively defeated a team of the top professional players in heads-up no-limit Texas Hold'em. Libratus features a number of innovations which form a new approach to AI for imperfect-information games. The algorithms are domain-independent and can be applied to a variety of strategic interactions involving hidden information

    This talk is based on joint work with Tuomas Sandholm.

      BIO

    Noam Brown is a PhD student in computer science at Carnegie Mellon University advised by Professor Tuomas Sandholm. His research combines reinforcement learning and game theory to develop AIs capable of strategic reasoning in imperfect-information interactions. He has applied this research to creating Libratus, the first AI to defeat top humans in no-limit Texas Hold'em. His current research is focused on expanding the applicability of the technology behind Libratus to other domains.
  • Alekh Agarwal. MSR NYC
          Sample-Efficient Reinforcement Learning with Rich Observations

          20th Sep, 2017. 4pm-5pm   32-G882

     Abstract:

    This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity and is small in several practical settings. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

    This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

      BIO

    Alekh Agarwal is a researcher in the New York lab of Microsoft Research, prior to which he obtained his PhD from UC Berkeley. Alekh’s research currently focuses on topics in interactive machine learning, including contextual bandits, reinforcement learning and online learning. Previously, he has worked on several topics in optimization including stochastic and distributed optimization. He has won several awards for his research including the NIPS 2015 best paper award.
  • Alex Smola. Amazon
          Tutorial on Deep Learning with Apache MXNet Gluon

          11th Oct, 2017. 2pm-5pm 54-100

     Abstract:

    Deep Learning short-course.
    This tutorial introduces Gluon, a flexible new interface that pairs MXNet’s speed with a user-friendly frontend. Symbolic frameworks like Theano and TensorFlow offer speed and memory efficiency but are harder to program. Imperative frameworks like Chainer and PyTorch are easy to debug but they can seldom compete with the symbolic code when it comes to speed. Gluon reconciles the two, removing a crucial pain point by using just-in-time compilation and an efficient runtime engine for efficiency.
    In this crash course, we’ll cover deep learning basics, the fundamentals of Gluon, advanced models, and multiple-GPU deployments. We will walk you through MXNet’s NDArray data structure and automatic differentiation tools. Well show you how to define neural networks at the atomic level, and through Gluon’s predefined layers. We’ll demonstrate how to serialize models and build dynamic graphs. Finally, we will show you how to hybridize your networks, simultaneously enjoying the benefits of imperative and symbolic deep learning.
    About the speaker: AWS Spotlight on Alex Smola
  • Ohad Shamir. Weizmann Institute of Science, Israel
          Failures of Gradient-Based Deep Learning

          18th Oct, 2017. 4pm-5pm 32-G882

     Abstract:

    In recent years, deep learning has become the go-to solution for a broad range of applications, with a long list of success stories. However, it is important, for both theoreticians and practitioners, to also understand the associated difficulties and limitations. In this talk, I'll describe several simple problems for which commonly-used deep learning approaches either fail or suffer from significant difficulties, even if one is willing to make strong distributional assumptions. We illustrate these difficulties empirically, and provide theoretical insights explaining their source and (sometimes) how they can be remedied.

    Includes joint work with Shai Shalev-Shwartz and Shaked Shammah.

    BIO:

    Ohad Shamir is a faculty member in the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science, Israel. He received a PhD in computer science from the Hebrew University in 2010, advised by Prof. Naftali Tishby. Between 2010-2013 he was a postdoctoral and associate researcher at Microsoft Research. His research focuses on machine learning, with emphasis on algorithms which combine practical efficiency and theoretical insight. He is also interested in the many intersections of machine learning with related fields, such as optimization, statistics, theoretical computer science and AI.
  • Michael Bronstein. USI Lugano (Switzerland)
          Geometric Deep Learning: Going Beyond Euclidean Data

          25 Oct, 2017. 4pm-5pm 32-G882

     Abstract:

    In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such geometric data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, recommendation systems, and web applications. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and applications as well as key difficulties and future research directions.

    (based on M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst, "Geometric deep learning: going beyond Euclidean data", IEEE Signal Processing Magazine 34(4):18-42, 2017)

    BIO:

    Michael Bronstein (PhD with distinction 2007, Technion, Israel) is a professor at USI Lugano, Switzerland and Tel Aviv University, Israel. He also serves as a Principal Engineer at Intel Perceptual Computing. During 2017-2018 he is a fellow at the Radcliffe Institute for Advanced Study at Harvard University. Michael's main research interest is in theoretical and computational methods for geometric data analysis. He authored over 150 papers, the book Numerical geometry of non-rigid shapes (Springer 2008), and over 20 granted patents. He was awarded three ERC grants, Google Faculty Research award (2016), and Rudolf Diesel fellowship (2017) at TU Munich. He was invited as a Young Scientist to the World Economic Forum, an honor bestowed on forty world’s leading scientists under the age of forty. Michael is a Senior Member of the IEEE, alumnus of the Technion Excellence Program and the Academy of Achievement, ACM Distinguished Speaker, and a member of the Young Academy of Europe. In addition to academic work, Michael is actively involved in commercial technology development and consulting to start-up companies. He was a co-founder and technology executive at Novafora (2005-2009) developing large-scale video analysis methods, and one of the chief technologists at Invision (2009-2012) developing low-cost 3D sensors. Following the multi-million acquisition of Invision by Intel in 2012, Michael has been one of the key developers of the Intel RealSense technology.
  • Katherine Heller. Duke University
          Machine Learning for Healthcare Data

          29th Nov, 2017; 4pm-5pm 32-G882

     Abstract:

    We will discuss multiple ways in which healthcare data is acquired and machine learning methods are currently being introduced into clinical settings. This will include: 1) Modeling disease trends and other predictions, including joint predictions of multiple conditions, from electronic health record (EHR) data using Gaussian processes. 2) Predicting surgical complications and transfer learning methods for combining databases 3) Using mobile apps and integrated sensors for improving the granularity of recorded health data for chronic conditions and 4) The combination of mobile app and social network information in order to predict the spread of contagious disease. Current work in these areas will be presented and the future of machine learning contributions to the field will be discussed.

      BIO

    Katherine Heller is an Assistant Professor in Statistical Science at Duke University. She is the recent recipient of a Google faculty research award, a first round BRAIN initiative award from the NSF, as well as a CAREER award. She received her PhD from the Gatsby Computational Neuroscience Unit at UCL, and was a postdoc at the University of Cambridge on an EPSRC postdoc fellowship, and at MIT on an NSF postdoc fellowship.

ML Seminars (Spring 2017)

  • Amir Globerson. Tel Aviv University
          Efficient Optimization of a Convolutional Network with Gaussian Inputs
          1st March, 2017; 5pm-6pm   32-G643
  • Mehryar Mohri. Courant Institute, NYU
          Online Learning for Time Series Prediction
          8th March, 2017; 4pm-5pm   32-G463
  • Lester Mackey. Microsoft Research
          Measuring Sample Quality with Kernels
          15 March, 2017; 4pm-5pm   32-G463
  • Ben Recht. UC Berkeley
          Optimization Challenges in Deep Learning
          22 March, 2017; 4pm-5pm   32-G463
  • Ruslan Salakhutdinov Carnegie Mellon University, Pittsburgh, PA
          Learning Deep Unsupervised and Multimodal Models
          05th Apr, 2017; 4pm-5pm   34-101
  • Jeff Miller. Harvard University, Cambridge
          Robust Bayesian inference via coarsening
          26th Apr, 2017; 3pm-4pm   32-G575
  • Ryan Adams. Harvard University and Google Brain
          Building Probabilistic Structure into Massively Parameterized Models
          10th May, 2017; 4pm-5pm   32-141

ML Seminars (Fall 2016)

  • Honglak Lee University of Michigan, Ann Arbor
          Deep architectures for visual reasoning, multimodal learning, and decision-making
          16th Nov, 2016; 4pm-5pm   32-G463
  • Elad Hazan Princeton University
          A Non-generative Framework and Convex Relaxations for Unsupervised Learning
          26th Oct, 2016; 4pm-5pm   32-G463
  • Tina Eliassi-Rad (Northeastern)
          The Reasonable Effectiveness of Roles in Complex Networks
          19th Oct, 2016;   32-G575
  • Carlo Morselli School of Criminology, University of Montreal
          Criminal Networks
          29th Sep, 2016; 4pm-5pm   4-237
  • Gah-Yi Vahn (LSB)
          The data-driven (s, S) policy: why you can have confidence in censored demand data
          5th Oct, 2016; 4:00 PM to 5:00 PM   32-G575
  • Le Song (Georgia Tech).
          Discriminative Embedding of Latent Variable Models for Structured Data
          16th Sep, 2016; 2pm-3pm   32-G882
  • Ashish Kapoor (MSR Redmond).
          Safe Decision Making Under Uncertainty
          14th Sep, 2016; 4pm-5pm   32-D507
  • Alan Malek (UC Berkeley).
          Minimax strategies for online linear regression, square-loss prediction, and time series prediction
          15th Aug, 2016; 11am   32-D677
  • Sashank Reddi (CMU).
          Faster Stochastic Methods for Nonconvex Optimization in Machine Learning
          13th July, 2016; 3pm   32-G882
  • Andre Wibisono (UC Berkeley).
          A variational perspective on accelerated methods in optimization
          14th July, 2016; 3pm   32-G882