# Information Theoretic Methods In Statistics

## Course information

**Instructor**: Jonathan Niles-Weed (jnw@cims.nyu.edu)

### Description

This class will develop the necessary information theory methods to analyze and prove optimality of statistical procedures. Particular attention will be paid to techniques applicable to high-dimensional and non-parametric estimation problems.

### Prerequisites

Probability, mathematical statistics. No prior exposure to information theory is required.

### Resources

Lectures will be based on the following two sources:

- “Information-theoretic Methods for High-dimensional Statistics” (Yihong Wu)
- “Information theoy and statistics” (John Duchi)

### List of topics

- Decision theory, Bayes & minimax risk [Wu Chapters 1-2]
- f-divergences, Neyman-Pearson lemma [Duchi Chapters 2.2-2.3, Wu Chapter 4]
- Inequalities among f-divergences, variational inequalities and duality, Cramer-Rao bound [Duchi Chapters 2.2-2.3, Wu Chapter 6]
- Le Cam’s method, Assouad’s method [Duchi Chapters 7-8, Wu Chapters 9-10]
- Mutual information, Fano’s inequality [Duchi Chapter 7, Wu Chapters 13]
- Yang-Barron [Duchi Chapter 10, Wu Chapter 16]
- Functional estimation, method of fuzzy hypotheses [Wu Chapters 22-23]
- Polynomial methods, Cai-Low [Wu Chapter 25]

## Logistics

Class: Tuesdays, 11 am - 12:50 pm, Warren Weaver 312

### Office Hours

By appointment.

### Grading

Final project, which can be:

- a piece of original research
- a written mathematical summary of a paper or topic not covered in class

The project is due Monday, May 9 at 5 PM Eastern Time.

The final project is *not* intended to be difficult.
If you opt for the written summary approach, a mathematical write-up of ~3 pages
is sufficient.
Feel free to skip unimportant or tecnical preliminaries: it’s more important to
capture the essential pieces of the argument.
Given the topics of the class, you may focus on lower bounds if you prefer.

Here are some papers that could be good sources for projects:

- “On the minimax rate of the Gaussian sequence model under bounded convex constraints” (Neykov, arXiv:2201.07329)
- “Minimax Risk Over Hyperrectangles, and Implications” (Donoho et al., Annals of Statistics ‘90)
- “Minimax estimation of linear and quadratic functionals on sparsity classes” (Collier et al., Annals of Statistics ‘17)
- “Volume Ratio, Sparsity, and Minimaxity under Unitarily Invariant Norms” (Ma and Wu, IEEE Transactions on Information Theory ‘15)
- “On estimation of the L_r norm of a regression function” (Lepski et al., PTRF ‘99)
- “Minimax risk over l_p-Balls for l_q-error” (Donoho and Johnstone, PTRF ‘94)
- “Geometrizing rates of convergence, II.” (Donoho and Liu, Annals of Statistics ‘91)
- “Dualizing Le Cam’s method, with applications to estimating the unseens” (Polyankiy and Wu, arXiv:1902.05616)
- “Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates” (Balakrishnan and Wasserman, Annals of Statistics ‘19)
- “Optimal rates of estimation for multi-reference alignment” (Bandeira et al., Mathematical Statistics and Learning ‘20)
- “Optimal Rates of Aggregation” (Tsybakov, Computational Learning Theory and Kernel Machines ‘03)