Statistical Learning for Data Analysis (IEMS-304, Northwestern)

Undergraduate course, Northwestern IEMS, 2025

Required undergraduate course on predictive modeling in data science. Syllabus:[pdf]

Objectives:

Understand common data structures in modern predictive and explanatory modeling in business, engineering, and science and how to formulate the appropriate solutions.
Learn R software basics and how to use it for various regression and classification problems.
Develop ability to fit appropriate linear and logistic models, including model selection and model diagnostics.
Develop ability to interpret fitted linear and logistic regression models for both explanatory and predictive purposes.
Learn concept in regression and classification with nonlinearity, including maximum likelihood estimation, cross-validation, ridge and lasso.
Learn how to fit and interpret popular supervised learning models including trees, smoothers, nearest neighbors, random forests, and boosted trees.

Logistics

Time and Location: Monday, Wednesday and Friday, 9.00 A.M.- 9.50 A.M. Tech L251

Office Hour: Monday 10 A.M. Tech M237

TA Office Hour:

ChatGPT Tutor: The link here provides a Large Language Model agents that is specifically trained for this course.

Preliminary: [Note]

Scheldue:

[Cheat Sheet and Sample Exam]

Textbook:

ISL: James, Gareth, et al. An introduction to statistical learning.
229: [Stanford CS229 Lecture Note]

Lecture 1: Introduction to Statistical Learning

(4.1,4.2,4.4) [Feedback Form]

Slide:[pdf] [annotated slide], Reference: ISL Section 2

Logistics
What is and why statistical learning?
Supervised Learning, Unsupervised learning, Reinforcement Learning

Reading

Lab 1: Statistical Review (4.4)

Reference: [Note]

Homework 1: (Due:4.11)

Homework: [pdf], [latex], Data: [miles.csv]

Lecture 2: Simple Linear Regression

(4.7,4.9,4.11)[Feedback Form]

Slide:[pdf] [annotated slide], [additional note on math]

Reference: ISL Section 3.1, 4.3

Simple Linear Regression and closed form of Least Square
Unbias and Varaince, Confidence Interval
log likelihood, logistic regression
Gradient descent and Newton methods

Lab 2: Linear Algebra Review (4.11)

Slide: [pdf]

Homework 2: (Due:4.18)

Homework: [pdf], [latex],

Lecture 3: Multiple Linear Regression

(4.14,4.16,4.18,4.23,4.25) [Feedback Form]

[application examples] [annotated slide]

Slide:[pdf] [annotated slide], Reference: ISL Section 3.2,3.3

Multiple Linear Regression, Degree of Freedom
t-test, confidence interval
Categorical Predictor and interaction
Leverage and Influence
Residual Diagnostics Reading:
[Linear Regression]

Midterm 1 (4.21)

Taxonomy of Learning, Bias and Variance Trade-off
Optimization
Linear Regression and Statistical Inference

Homework 3: (Due:5.2)

Homework: [pdf], [latex], [data]

Lab 3: Linear Regression

(5.2)

Midterm 2 (5.09)

Linear Regression
Model Selection
Shrinkage

Lecture 4: Model and Variable Selection, Shrinkage, and Multicollinearity

(4.28,4.30,5.2,5.5)[Feedback Form]

Slide:[pdf] [annotated slide], Reference: ISL Section 6

Model and Variable Selection,
Multicollinearity
James-Stein Estimator, Ridge Regression, Lasso, ISTA

Lab 4: Shrinkage

(5.2)

Homework 4: (Due:5.16)

Homework: [pdf], [latex], [data]

Homework 5: (Due: 5.23)

Homework: [pdf] [latex]

Lecture 5: Basic Nonlinear and Nonparametric regression/classification

(5.5,5.12,5.14) Slide:[pdf] [annotated slide], Reference: ISL Section 2.2,3.5,4.6.5, 5.2,5.3,7

K-Nearest Neighboor, nonlinear regression
Bootstrap and Conformal Prediction [note]

Homework 6: (Due: 5.31)

Homework: [pdf] [latex] [data HW6] [spambase.txt] [spambase_info.txt]

Readings

[Curse of Dimensionality in Classification]

Cheatsheet:

Lecture 6: Trees and Neural Network

(5.16,5.19,5.21)[Feedback Form]

Slide:[pdf] [annotated slide], Reference: ISL Section 8.1, 229 Section 7

Neural Network
Regression Tree, Classification Tree

Readings -[Decision Tree] -[Random Forest]

Lab 5: Nonlinear Regression

(5.23)

Lecture 7: Unsupervised Learning

(5.23,5.26,5.28)[Feedback Form]

Slide:[pdf] [annotated slide], Reference: ISL Section 10, 229 Section 10, 12

$k-$means, Spectral Clustering, PCA

Lab 6: Unsupervised Learning

(5.30)

Final Review

(6.8) Cheat Sheet:

ChatGPT

While the use of AI tools to aid in problem-solving is becoming increasingly prevalent, it is important to note that relying solely on AI to complete your homework is not in accordance with the expectations of this course. Submitting AI-generated solutions without proper acknowledgment is a violation of ethical guidelines and academic standards.

General Policies

In general, we do not grant extensions on assignments/exams. There are several exceptions:

Medical Emergencies: If you are sick and unable to complete an assignment or attend class, please go to University Health Services. For minor illnesses, we expect grace days or our late penalties to provide sufficient accommodation. For medical emergencies (e.g. prolonged hospitalization), students may request an extension afterward by contacting their Student Liaison or Academic Advisor and having them reach out to the instructor on their behalf. Please plan ahead if possible.
Family/Personal Emergencies: If you have a family emergency (e.g. death in the family) or a personal emergency (e.g. mental health crisis), please contact your academic adviser or Counseling and Psychological Services (CaPS). In addition to offering support, they will reach out to the instructors for all your courses on your behalf to request an extension.
University-Approved Absences: If you are attending an out-of-town university-approved event (e.g. multi-day athletic/academic trip organized by the university), you may request an extension for the duration of the trip. You must provide confirmation of your attendance, usually from a faculty or staff organizer of the event.

Share on

Twitter Facebook LinkedIn

Yiping Lu