Statistical Learning for Data Analysis (IEMS-304, Northwestern)
Undergraduate course, Northwestern IEMS, 2025
Required undergraduate course on predictive modeling in data science. Syllabus:[pdf]
Objectives:
- Understand common data structures in modern predictive and explanatory modeling in business, engineering, and science and how to formulate the appropriate solutions.
- Learn R software basics and how to use it for various regression and classification problems.
- Develop ability to fit appropriate linear and logistic models, including model selection and model diagnostics.
- Develop ability to interpret fitted linear and logistic regression models for both explanatory and predictive purposes.
- Learn concept in regression and classification with nonlinearity, including maximum likelihood estimation, cross-validation, ridge and lasso.
- Learn how to fit and interpret popular supervised learning models including trees, smoothers, nearest neighbors, random forests, and boosted trees.
Logistics
Time and Location: Monday, Wednesday and Friday, 9.00 A.M.- 9.50 A.M. Tech L251
Office Hour: Monday 10 A.M. Tech M237
TA Office Hour:
ChatGPT Tutor: The link here provides a Large Language Model agents that is specifically trained for this course.
Preliminary: [Note]
Scheldue:
Textbook:
- ISL: James, Gareth, et al. An introduction to statistical learning.
- 229: [Stanford CS229 Lecture Note]
Lecture 1: Introduction to Statistical Learning
(4.1,4.2,4.4) [Feedback Form]
Slide:[pdf] [annotated slide], Reference: ISL Section 2
- Logistics
- What is and why statistical learning?
- Supervised Learning, Unsupervised learning, Reinforcement Learning
Reading
Lab 1: Statistical Review (4.4)
Reference: [Note]
Homework 1: (Due:4.11)
Homework: [pdf], [latex], Data: [miles.csv]
Lecture 2: Simple Linear Regression
(4.7,4.9,4.11)[Feedback Form]
Slide:[pdf] [annotated slide], [additional note on math]
Reference: ISL Section 3.1, 4.3
- Simple Linear Regression and closed form of Least Square
- Unbias and Varaince, Confidence Interval
- log likelihood, logistic regression
- Gradient descent and Newton methods
Lab 2: Linear Algebra Review (4.11)
Slide: [pdf]
Homework 2: (Due:4.18)
Lecture 3: Multiple Linear Regression
(4.14,4.16,4.18,4.23,4.25) [Feedback Form]
[application examples] [annotated slide]
Slide:[pdf] [annotated slide], Reference: ISL Section 3.2,3.3
- Multiple Linear Regression, Degree of Freedom
- t-test, confidence interval
- Categorical Predictor and interaction
- Leverage and Influence
- Residual Diagnostics Reading:
- [Linear Regression]
Lab 3: Linear Regression
(4.18)
Midterm 1 (4.21)
- Taxonomy of Learning, Bias and Variance Trade-off
- Optimization
- Linear Regression and Statistical Inference
Cheatsheet:
Lecture 4: Model and Variable Selection, Shrinkage, and Multicollinearity
(4.28,4.30,5.2,5.5)[Feedback Form]
Slide:[pdf] [annotated slide], Reference: ISL Section 6
- Model and Variable Selection,
- Multicollinearity
- James-Stein Estimator, Ridge Regression, Lasso, ISTA
Lab 4: Shrinkage
(5.2)
Lecture 5: Basic Nonlinear and Nonparametric regression/classification
(5.5,5.12,5.14) Slide:[pdf] [annotated slide], Reference: ISL Section 2.2,3.5,4.6.5, 5.2,5.3,7
- K-Nearest Neighboor, nonlinear regression
- Bootstrap and Conformal Prediction [note]
Readings
Midterm 2 (5.09)
- Linear Regression
- Model Selection
- Shrinkage
Cheatsheet:
Lecture 6: Trees and Neural Network
(5.16,5.19,5.21)[Feedback Form]
Slide:[pdf] [annotated slide], Reference: ISL Section 8.1, 229 Section 7
- Neural Network
- Regression Tree, Classification Tree
Readings -[Decision Tree] -[Random Forest]
Lab 5: Nonlinear Regression
(5.23)
Lecture 7: Unsupervised Learning
(5.23,5.26,5.28)[Feedback Form]
Slide:[pdf] [annotated slide], Reference: ISL Section 10, 229 Section 10, 12
- $k-$means, Spectral Clustering, PCA
Lab 6: Unsupervised Learning
(5.30)
Lecture 8: Ensemble/Committee Methods
(5.30,6.2,6.4)[Feedback Form]
Slide:[pdf] [annotated slide], Reference: ISL Section 8.2
Final Review
(6.8) Cheat Sheet:
ChatGPT
While the use of AI tools to aid in problem-solving is becoming increasingly prevalent, it is important to note that relying solely on AI to complete your homework is not in accordance with the expectations of this course. Submitting AI-generated solutions without proper acknowledgment is a violation of ethical guidelines and academic standards.
General Policies
In general, we do not grant extensions on assignments/exams. There are several exceptions:
- Medical Emergencies: If you are sick and unable to complete an assignment or attend class, please go to University Health Services. For minor illnesses, we expect grace days or our late penalties to provide sufficient accommodation. For medical emergencies (e.g. prolonged hospitalization), students may request an extension afterward by contacting their Student Liaison or Academic Advisor and having them reach out to the instructor on their behalf. Please plan ahead if possible.
- Family/Personal Emergencies: If you have a family emergency (e.g. death in the family) or a personal emergency (e.g. mental health crisis), please contact your academic adviser or Counseling and Psychological Services (CaPS). In addition to offering support, they will reach out to the instructors for all your courses on your behalf to request an extension.
- University-Approved Absences: If you are attending an out-of-town university-approved event (e.g. multi-day athletic/academic trip organized by the university), you may request an extension for the duration of the trip. You must provide confirmation of your attendance, usually from a faculty or staff organizer of the event.