2022 Reading Group

7 minute read

Published: April 14, 2022

Aim: rate of learning is a function space with uniform convergence property(As an example: Holder functions) in different settings.

Textbook and Reference

weak convergence and empirical processes by van der Vaart [WCEP]
Concentration inequalities: a nonasymptotic theory of independence by Boucheron et al. [CI]
Asymptotic Statistics, by van der Vaart [AS]
High Dimensional Probability, by Roman Vershynin [HDP]
High Dimensional Statistics, by Martin Wainwright [HDS]
Empirical Processes in M-Estimation by van der Vaart [M-Estimate]
Introduction to Semi-parametric Statistics by Bodhisattva Sen [Semi-Para]
Introduction to Nonparametric Statistics by Bodhisattva Sen [Non-para]
Mathematical Statistics: A Non-asymptotic Approach by A.Rakhlin

Syllabus

Week 1: Concentration Inequality and Basic Asymptotic Statistics

Basic Concentration Inequality for Sub-gaussian/Sub-exp random variables
Asymptotic normality, delta method, moment method
Efficiency of the models, local asymptotic normality Reference:
[HDP] Chapter 1
[AS] Chapter 2 3 4 5
[Semi-Para] Chapter 2 and Chapter 3 Reading:
Han J, Hu R, Long J. A class of dimension-free metrics for the convergence of empirical measures. Stochastic Processes and their Applications, 2023, 164: 242-287.

Week 2: Uniform Law of Large Number

Symmetrization
Covering Number, Duley Integral Reference:
[WCEP] Chapter 2

Week3: Uniform Center Limit Theorem(Optional)

Motivation: Goodness-of-fit test, Kolmogorov and Smironv Statistics
Uniform Center Limit Theorem, functional delta methods Reference:
Applications of Empirical Process Theory Section 4.2 Reading:
Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality (NeurIPS), 2020
Gretton A, Borgwardt K M, Rasch M J, et al. A kernel two-sample test. The Journal of Machine Learning Research, 2012, 13(1): 723-773.

Week4: Nonparametric Statistics 1: RKHS(Optional)

Reading:

Dieuleveut A, Bach F. Nonparametric stochastic approximation with large step-sizes
Fischer S, Steinwart I. Sobolev norm learning rates for regularized least-squares algorithms. Journal of Machine Learning Research, 2020, 21(205): 1-38.

Week5: Min-Max Lower Bound

Reference:

https://web.stanford.edu/class/ee378c/

Week6: Nonparametric Statistics 2: Localized Complexity

Reference:

Applications of Empirical Process Theory Section 4.3
geoffrey chinot‘s thesis : section 1
Lecture Note 2: Generalization Errors by Taiji Suzuki(In Japaness)
Towards Problem-dependent Optimal Learning Rates (More detailed version: Xu Y, Zeevi A. Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory. arXiv preprint arXiv:2011.06186, 2020.)
Rakhlin A, Sridharan K, Tsybakov A B. Empirical entropy, minimax regret and minimax risk. Bernoulli, 2017, 23(2): 789-824.
Variance Regularized Slow rate in [Orthogonal Statistical Learning paper] Reading:
2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization by Vladimir Koltchinskii
Yang J, Sun S, Roy D M. Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes NeurIPS. 2019: 10802-10812.
Rakhlin A, Sridharan K, Tsybakov A B. Empirical entropy, minimax regret and minimax risk. Bernoulli, 2017, 23(2): 789-824.
Kur G, Putterman E, Rakhlin A. On the Variance, Admissibility, and Stability of Empirical Risk Minimization. Advances in Neural Information Processing Systems, 2024, 36.

Week8: Non-parametric Statistics 4: Deep Learning

Neural Scaling Laws and explaining Neural Scaling Laws by non-parametric statistics Reference:
Explaining Neural Scaling Laws by Yasaman Bahr et al. arxiv:2102.06701
Suzuki T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality ICLR2019
Chen M, Jiang H, Liao W, et al. Nonparametric regression on low-dimensional manifolds using deep relu networks Neurips 2019
Singh S, Uppal A, Li B, et al. Nonparametric density estimation under adversarial losses Neurips 2018
local radamencher complexity of DNN: http://ibis.t.u-tokyo.ac.jp/suzuki/lecture/2020/intensive2/%E8%AC%9B%E7%BE%A92.pdf

Week 9: Non-parametric Statistics 5: Classification

Tsybakov A B. Optimal aggregation of classifiers in statistical learning[J]. The Annals of Statistics, 2004, 32(1): 135-166.
Kim Y, Ohn I, Kim D. Fast convergence rates of deep neural networks for classification[J]. arXiv preprint arXiv:1812.03599, 2018. Using Inf covering number
Bos T, Schmidt-Hieber J. Convergence rates of deep ReLU networks for multiclass classification. arXiv preprint arXiv:2108.00969, 2021.
Audibert J Y, Tsybakov A B. Fast learning rates for plug-in classifiers[J]. The Annals of statistics, 2007, 35(2): 608-633.

Week9: Semi-parametric Statistics

Reference:

[Semi-Para] section 3 4
Foster D J, Syrgkanis V. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.
Mackey L, Syrgkanis V, Zadik I. Orthogonal machine learning: Power and limitations International Conference on Machine Learning. PMLR, 2018: 3375-3383.
- Causal Inference Tutorial by Rahul Singh
Stats361: Causal Inference by Stefan Wager Reading:
Knowledge Distillation as Semi-Parametric Inference ICLR 2021
Demirer M, Syrgkanis V, Lewis G, et al. Semi-parametric efficient policy learning with continuous actions. arXiv preprint arXiv:1905.10116, 2019.
Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. 2018. Reading：
Chernozhukov V, Newey W, Singh R, et al. Adversarial Estimation of Riesz Representers. arXiv preprint arXiv:2101.00009, 2020. https://arxiv.org/pdf/2105.15197.pdf

Week11: Non-parametric Bandits

Reference

Foster D, Rakhlin A. Beyond UCB: Optimal and efficient contextual bandits with regression oracles International Conference on Machine Learning. PMLR, 2020: 3199-3210.
Hu Y, Kallus N, Mao X. Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes Conference on Learning Theory. PMLR, 2020: 2007-2010.
Philippe Rigollet and Assaf Zeevi. Nonparametric bandits with covariates. Conference on Learning Theory (COLT), page 54, 2010.
Krishnamurthy S K, Hadad V, Athey S. Adapting to misspecification in contextual bandits with offline regression oracles[J]. arXiv preprint arXiv:2102.13240, 2021. Reading
Wang T, Rudin C. Bandits for BMO Functions International Conference on Machine Learning. PMLR, 2020: 9996-10006.
Wang T, Ye W, Geng D, et al. Towards practical Lipschitz stochastic bandits. arXiv preprint arXiv:1901.09277, 2019.

Week12: Non-parametric Weakly Supervised Learning

Active Learning, Disagreement factor-based methods
Non-parametric transfer learning Reference:
Works by Samory Kpotufe
Cai T T, Wei H. Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. The Annals of Statistics, 2021, 49(1): 100-128.
Stanislav Minsker. Plug-in approach to active learning. Journal of Machine Learning Research, 13 (Jan):67–90, 2012a.
Castro R M, Nowak R D. Minimax bounds for active learning[J]. IEEE Transactions on Information Theory, 2008, 54(5): 2339-2353.

Week13: Empirical Landscape: uniform convergence of gradients

Reference:

https://github.com/tengyuma/cs229m_notes/blob/main/Winter2021/pdf/02-17-2021.pdf
https://github.com/tengyuma/cs229m_notes/blob/main/Winter2021/pdf/03-01-2021.pdf
https://github.com/tengyuma/cs229m_notes/blob/main/Winter2021/pdf/03-03-2021.pdf
Mei S, Bai Y, Montanari A. The landscape of empirical risk for nonconvex losses. Annals of Statistics, 2018, 46(6A): 2747-2774.
Foster D J, Sekhari A, Sridharan K. Uniform convergence of gradients for non-convex learning and optimization[J]. arXiv preprint arXiv:1810.11059, 2018.

Week 14: Drawback Of Empirical Process, interpolation estimators

Reference:

Yang Z, Bai Y, Mei S. Exact gap between generalization error and uniform convergence in random feature models[J]. arXiv preprint arXiv:2103.04554, 2021.
Average case complexity, reference:

Week 15: non-P-Donscker

Example: log-concave densities, convex functions. Cases that ERM is not optimal References:

Kur, Gil, Yuval Dagan, and Alexander Rakhlin. “Optimality of maximum likelihood for log-concave density estimation and bounded convex regression.” arXiv preprint arXiv:1903.05315 (2019).
Kur G, Rakhlin A. On the Minimal Error of Empirical Risk Minimization. arXiv preprint arXiv:2102.12066, 2021.

Week16 Learning Dynamic systems

Hang H, Steinwart I. A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning[J]. The Annals of Statistics, 2017, 45(2): 708-743.
Foster D, Sarkar T, Rakhlin A. Learning nonlinear dynamical systems from a single trajectory Learning for Dynamics and Control. PMLR, 2020: 851-861.
Nickl R, Ray K. Nonparametric statistical inference for drift vector fields of multi-dimensional diffusions. The Annals of Statistics, 2020, 48(3): 1383-1408.
Hang H, Steinwart I. Fast learning from α-mixing observations[J]. Journal of Multivariate Analysis, 2014, 127: 184-199.

Reading List

Fusheng Liu, Haizhao Yang, Soufiane Hayou, and Qianxiao Li. Connecting optimization and generalization via gradient flow path length. arXiv preprint arXiv:2202.10670, 2022.
Adaptive learning rates for support vector machines working on data with low intrinsic dimension
Which Spaces can be Embedded in Reproducing Kernel Hilbert Spaces?

Share on

Twitter Facebook LinkedIn

Yiping Lu

2022 Reading Group

Textbook and Reference

Syllabus

Week 1: Concentration Inequality and Basic Asymptotic Statistics

Week 2: Uniform Law of Large Number

Week3: Uniform Center Limit Theorem(Optional)

Week4: Nonparametric Statistics 1: RKHS(Optional)

Week5: Min-Max Lower Bound

Week6: Nonparametric Statistics 2: Localized Complexity

Week8: Non-parametric Statistics 4: Deep Learning

Week 9: Non-parametric Statistics 5: Classification

Week9: Semi-parametric Statistics

Week11: Non-parametric Bandits

Week12: Non-parametric Weakly Supervised Learning

Week13: Empirical Landscape: uniform convergence of gradients

Week 14: Drawback Of Empirical Process, interpolation estimators

Week 15: non-P-Donscker

Week16 Learning Dynamic systems

Reading List

Share on

You May Also Enjoy

2024 Spring Group Meeting