Mathematical Statistics - MATH-UA.0234 (Spring 2019)

Course description

Lectures: Tuesdays, Thursdays, 12:30-1:45PM, in CIWW (Courant Institute) 201.
Office hours: Tuesdays 10:00-11:00AM, Wednesdays 12:30-1:30pm, Thursdays 2:00-3:00PM Office WWH926.
If needed, possibility to set up appointments by email (thomasl@math.nyu.edu)
Recitation sessions: Fridays, 2:00-3:15PM in CIWW 201. T.A. Alexisz Gaál (Office WWH830).
Course description: An introduction to statistics, on the mathematical side.
Prerequisites: Theory of probability (UA.0233). Basic linear algebra can help, but is not mandatory.
Textbook: "All of statistics", by L. Wasserman. Available for free (PDF/EPUB format) from the publisher via the NYU network. See also the book webpage for errata and data.
Other references:
  • The excellent material (slides, videos, etc.) of the MIT class "Statistics for applications" is accessible online. There is a large intersection with our class (perhaps a bit more "applied" and less "technical").
  • If you are interested by an introduction to "data science" in Python this is a great book (there are of course many other resources online).
  • This book is a very good, non-technical introduction to "statistical learning"
  • This (Stochastics by H.-O. Georgii) could be another textbook, more technical than the one we use.
    This (Introduction to Mathematical Statistics and Its Applications, by Larsen and Marx) is a more "user-friendly" textbook, that can be interesting to look at.
Grading: Homework (25%), Midterm (35%), Final (40%)
Class outline
  • Introduction, reminders of probability theory.
  • Parametric estimation, method of moments, max-likelihood, M-estimators. Some general theory of estimators. The non-parametric case.
  • Hypothesis testing, confidence intervals, p-value. Some classical tests. The non-asymptotic case.
  • Regression and introduction to statistical learning.
  • If and when time permits: a word on Bayesian statistics, PCA, etc.
Sessions
  • Jan 29: Logistics and introduction. Definition of a statistical model, a statistic, an estimator. Reminders on probability distributions.
    Sections 6.1--6.2.
    A note on the distinction between parametric/non-parametric models (reading is optional).
  • Jan 31: Estimators: definition of unbiased, consistent, asymptotically normal. Example: the empirical mean. Reminders (convergence in probability, in distribution, LLN, CLT).
    Sections 6.3.1., see also Section 5.1 -- 5.4
  • Homework 1 due Friday, February 8th. Solution.
  • Feb 5: Applications of empirical mean to build a confidence interval. Estimation of the cumulative distribution function: pointwise analysis, Glivenko-Cantelli.
    Sections 6.3.2, 7.1
  • Feb 7: Quantitative tools: Markov, Tchebychev, Chernoff, Hoeffding inequalities.
    Section 4.1
  • Homework 2 due Friday, February 15th. Solution.
  • Feb 12: Conclusion about non-parametric estimation of CDF. Asymptotic versus non-asymptotic confidence intervals for Bernoulli random variables, the "complementary error function".
    Sections 6.3.2, 7.1
  • Feb 14: Parametric estimation: reminder on some families, method of moments, max-likelihood.
    Sections 9.1, 9.2, 9.3
  • Homework 3 due Friday, February 22nd. Solution.
  • Feb 19: The method of moments/MLE: examples (Bernoulli, uniform, Gaussian...)
    Examples (not sections) 9.4, 9.5, 9.10, 9.11, 9.12
  • Feb 21: Some properties of MLE, connection with Kullback-Leibler divergence, Fisher information and asymptotic normality.
    Sections 9.4, 9.5, 9.6, 9.7
  • Homework 4 due Friday, March 1st. Solution.
  • Feb 26: Conclusion about MLE: Fisher information, proof of asymptotic normality, Newton-Raphson method, efficiency.
    Sections 9.7, 9.8, 9.13.1, 9.13.4
  • Feb 28: Hypothesis testing, null versus alternative hypothesis, power function, size and level of a test.
    Section 10.0
  • Homework 5 due Friday, March 15th. Solution.
  • March 5: Hypothesis testing: Wald's test.
    Section 10.1.
  • March 7: End of Wald's test: control on type I and type II errors.
    Section 10.1, 10.2.
  • March 12: Midterm Syllabus for the midterm, Midterm, and solutions.
  • March 14: The bootstrap method. Chapter 8.
    Lecture notes.
  • (March 15: No recitation.)
  • March 19: Spring recess.
  • March 21: Spring recess.
  • March 26: $p$-values, the $\chi^2$ distribution and the $\chi^2$ test.
    Sections 10.2--10.4
  • March 28: Goodness-of-fit tests.
    Section 10.8
  • Homework 6 due Friday, April 5th. Solution.
  • April 2: Some terminology of statistical learning.
    Lecture notes.
  • April 4: Again.
  • Homework 7 due Friday, April 12th. Solution
  • April 9: Simple linear regression.
    Sections 13.1, 13.3
  • April 11: Multiple linear regression, geometric interpretation.
  • Homework 8 due Friday, April 19th. Solution
  • April 16: Correlation, covariance.
    Sections 14.1, 14.2
  • April 18: Independence, and testing independence. Section 15.1
  • Homework 9 due Friday, April 26th. Solution
  • April 23: Testing independence (end). Testing causality.
    Sections 15.1, 15.2, 16.1
  • April 25: Causality (end), some paradoxes.
    Sections 16.1, 16.2, 16.3, 16.4
  • Homework 10 due Friday, May 3rd. Solution
  • April 30: Recap on tests.
  • May 2: Some technical points: efficiency of the MLE. Gaussian vectors, transforming a Gaussian vector into independent components (Section 14.3)
  • May 7: Practice/review Practice 1 Practice 2
  • May 9: Practice/review
  • Friday, May 10th, 2pm - 3:50pm: Final exam. Syllabus for the final
    Final exam and elements of solution.