Statistical Learning (IEMS-402, Northwestern)

Graduate course, Northwestern IEMS, 2025

Required graduate course on mathematical foundations of data science. Syllabus:[link]

Objectives:

This course provides foundational and advanced concepts in statistical learning theory, essential for analyzing complex data and making informed predictions. Students will delve into both asymptotic and non-asymptotic analyses of machine learning algorithms, addressing critical challenges such as model bias, variance, and robustness in uncertain environments. Toward the end of the course, students will apply these principles to modern machine learning contexts, including the scaling laws/benign overfitting of deep learning, generative AI, and language models. (e.g. Neural Tagent Kernel, Mean-Field Limit of Neural Network and In-context Learning)

ChatGPT Tutor: The [link] here provides a Large Language Model agents that is specifically trained for this course.

If you have a question, to get a response from the teaching staff quickly we strongly encourage you to post it to the class Piazza forum. For suggestions to improve Yiping’s teaching, you can use the (anonymous) form. You can also leave a private matter here. If you wish to contact me via email, kindly include the tag “[IEMS402]” in the subject line. This will help ensure that I do not overlook your message.

Syllabus

Text Book

Prelinminary

Here is a review of all the preliminary we’ll use in this class:

  • Probability and Optimization Review: [link]
    • Different notion of convergence in probability is useful but not required in our class. Our class will just do intuituive proof.

Scheldue

[Homework 1 DUE]
  • Lecture 3: 1.13
  • Lecture 4: Concept 1.15
    • Diagram of Learning: Supervised, Unsupervised, Semi-supervised, Self-supervised, Generative AI
    • spectral clustering and t-SNE
    • Infomax and self-supervised learning
    • Relationship between spectral clustering, t-SNE and self-supervised learning
    • Suggested Reading:
      • Zhou X, Belkin M. Semi-supervised learning by higher order regularization Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 892-900.
      • Hjelm R D, Fedorov A, Lavoie-Marchildon S, et al. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
      • Linderman G C, Steinerberger S. Clustering with t-SNE, provably. SIAM journal on mathematics of data science, 2019, 1(2): 313-332.
      • HaoChen J Z, Wei C, Gaidon A, et al. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems, 2021, 34: 5000-5011.
    • Advanced Reading:
      • X. Cheng and N. Wu. “Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation”. Applied and Computational Harmonic Analysis, 61, 132-190 (2022)
      • Cai T T, Ma R. Theoretical foundations of t-sne for visualizing high-dimensional clustered data[J]. Journal of Machine Learning Research, 2022, 23(301): 1-54.
[Homework 2 DUE]
  • Lecture 5: 1.20 [No Class] (Martin Luther King Jr. Day)
  • Lecture 6: 1.22
    • Asymptotic normality
    • Inverse function theorem, (Implicit) Delta Method
    • Moment Methods
[Homework 3 DUE][Homework 4 DUE][Homework 5 DUE]
  • Lecture 11: 2.10 [Advanced Topic]
    • Generalization Theory of Neural Network [Textbook]
    • Other Ideas of Generalization:
      • PAC-Bayes, Algorithm Stability, …
    • Suggested Reading:
  • Lecture 12: [Midterm] 2.12
  • Lecture 13: 2.17
    • Ledoux-Talagrand Contraction Principle
    • Dudley’s theorem
  • Lecture 14: 2.19 [Advanced Topic]
    • Localized Complexity
    • Non-parametric Least-square
    • Suggested Reading:
[Homework 6 DUE]
  • Lecture 15: 2.24
  • Lecture 15: 2.26
    • Distrbution Shift (Stanford CS329D Machine Learning Under Distribution Shift)
    • Distributionally Robust Optimization
    • Suggested Reading:
      • Toward a inductive modeling language for distribution shifts
      • Geirhos R, Jacobsen J H, Michaelis C, et al. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020, 2(11): 665-673.
      • Sagawa S, Koh P W, Hashimoto T B, et al. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization.
    • Suggested Reading:
      • Duchi J, Namkoong H. Variance-based regularization with convex objectives. Journal of Machine Learning Research, 2019, 20(68): 1-55. (Neurips 2017 Best paper)
      • Duchi J C, Namkoong H. Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 2021, 49(3): 1378-1406.
      • Hu W, Niu G, Sato I, et al. Does distributionally robust supervised learning give robust classifiers? International Conference on Machine Learning. PMLR, 2018: 2029-2037.
[Homework 7 DUE][Homework 8 DUE]
  • Lecture 17: 3.10
  • Lecture 18: 3.12 [Advanced Topic]
    • Implicit Bias ([LTFP] Section 12.1)
    • Large Language Model
      • In-context Learning, Chain-of-thoughts and Circuit Theory, Alignment of AI
    • Advacned Reading:
      • Kim J, Nakamaki T, Suzuki T. Transformers are minimax optimal nonparametric in-context learner. arXiv preprint arXiv:2408.12186, 2024.
      • Von Oswald J, Niklasson E, Randazzo E, et al. Transformers learn in-context by gradient descent International Conference on Machine Learning. PMLR, 2023: 35151-35174.
      • Shen L, Mishra A, Khashabi D. Do pretrained Transformers Really Learn In-context by Gradient Descent?. arXiv preprint arXiv:2310.08540, 2023.
      • Giannou A, Yang L, Wang T, et al. How Well Can Transformers Emulate In-context Newton’s Method?. arXiv preprint arXiv:2403.03183, 2024.
      • Feng G, Zhang B, Gu Y, et al. Towards revealing the mystery behind chain of thought: a theoretical perspective. Advances in Neural Information Processing Systems, 2024, 36.
      • Lee J, Xie A, Pacchiano A, et al. Supervised pretraining can learn in-context reinforcement learning. Advances in Neural Information Processing Systems, 2024, 36.

Homework

Your grade will be computed by max(HW1,HW8)+max(HW2,HW3)+max(HW4,HW5)+max(HW6,HW7).

Exams

  • [Practice Mid-Term Exam]
    • Modern Machine Learning Concepts, Bias and Variance Trade-off
    • Kernel Smoothing, Asymptotic Theory, Influnce Function Concentration Inequality, Uniform Bound
  • [Practice Final Exam]
    • Rademacher complexity, Covering Number, Dudley’s theorem
    • RKHS, Optimal Transport, Robust Learning

Other Reading

General Policies

In general, we do not grant extensions on assignments/exams. There are several exceptions:

  • Medical Emergencies: If you are sick and unable to complete an assignment or attend class, please go to University Health Services. For minor illnesses, we expect grace days or our late penalties to provide sufficient accommodation. For medical emergencies (e.g. prolonged hospitalization), students may request an extension afterward by contacting their Student Liaison or Academic Advisor and having them reach out to the instructor on their behalf. Please plan ahead if possible.
  • Family/Personal Emergencies: If you have a family emergency (e.g. death in the family) or a personal emergency (e.g. mental health crisis), please contact your academic adviser or Counseling and Psychological Services (CaPS). In addition to offering support, they will reach out to the instructors for all your courses on your behalf to request an extension.
  • University-Approved Absences: If you are attending an out-of-town university-approved event (e.g. multi-day athletic/academic trip organized by the university), you may request an extension for the duration of the trip. You must provide confirmation of your attendance, usually from a faculty or staff organizer of the event.

Accommodations for Students with Disabilities: If you have a disability and have an accommodation letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with the Moses Center for Student Accessibility as early as possible in the semester for assistance. (Telephone: 212-998-4980, Website: http://www.nyu.edu/csd) We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them. Please note that it is your responsibility to schedule exams at the Moses Center, and to ensure that you are receiving all accommodations you are approved for.

Collaboration among Students: The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes (including code) are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone.

The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved. Specifically, each assignment solution must include answering the following questions:

  • Did you receive any help whatsoever from anyone in solving this assignment? Yes / No. If you answered ‘yes’, give full details: ____ (e.g. “Jane Doe explained to me what is asked in Question 3.4”)
  • Did you give any help whatsoever to anyone in solving this assignment? Yes / No. If you answered ‘yes’, give full details: _____ (e.g. “I pointed Joe Smith to section 2.3 since he didn’t know how to proceed with Question 2”)