Questa è una vecchia versione del documento!
Indice
Statistics for Machine Learning A.Y. 2024/25
This is the home page of a Ph.D. level course offered at the National Ph.D. in AI - Society. The program covers the basic methodologies, techniques and tools of statistical analysis. This includes basic knowledge of probability theory, random variables, convergence theorems, statistical models, estimation theory, hypothesis testing, bayesian inference, causal reasoning. Other topics covered include bootstrap, expectation-maximization, and applications to data science problems.
The course is an extract of the M.Sc. level course Statistics for Data Science.
Instructor
- Salvatore Ruggieri
- Università di Pisa
- Office hours: Tuesdays h 14:00 - 16:00 or by appointment, at the Department of Computer Science, room 321/DO, or via Teams.
Hours and rooms
The course will be offered in one week to be fixed in the period February 2025 - May 2025. A Teams channel will be used to post news, notes, Q&A, and other stuff related to the course. The lectures will be live-streamed and recorded.
Pre-requisites
Students should be comfortable with most of the topics on mathematical calculus covered in:
- [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1.
Extra-lessons refreshing such notions may be planned in the first part of the course.
Mandatory Teaching Material
The following is the mandatory text book:
- [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005.
Software
Some running examples will be provided using the R programming language. However, knowledge of R is not required nor mandatory for the exam.
Exams
Ph.D. students may do an exam in the form of a report on an advanced topic/survey to be agreed upon. The topics is typically related/relevant to the objectives of the Ph.D. studies of the student.
Student project
- The project replaces the written part of the examination
Class calendar
Lessons will be NOT be live-streamed, but recordings of past years are available here for non-attending students.
To watch the recordings online, you must be connected to the unipi.it VPN. Alternatively, right click on the link and download the whole file, then watch it locally on your device using e.g. VLC media player.
Slides and R scripts might be updated after the classes to align with actual content of lessons and to correct typos. Be sure to download the updated versions.
# | Date | Room | Topic | Mandatory teaching material |
---|---|---|---|---|
01 | 20/02 16-18 | Fib-C | Introduction. Probability and independence. rec01 (.mp4) | [T] Chpts. 1-3 slides01 (.pdf) |
02 | 22/02 14-16 | Fib-C | R basics. rec02 (.mp4) | [R] Chpts. 1,2.1-2.3 slides02 (.pdf), script02 (.R) |
03 | 23/02 11-13 | Fib-C | Bayes' rule and applications. rec03 (.mp4) | [T] Chpt. 3 slides03 (.pdf), script03 (.R) |
04 | 27/02 16-18 | Fib-C | Discrete random variables. rec04 (.mp4) | [T] Chpts. 4, 9.1, 9.2, 9.4 [R] Chpt. 3 slides04 (.pdf), script04 (.R) |
05 | 29/02 14-16 | Fib-C | Discrete random variables (continued). rec05 (.mp4) | |
06 | 01/03 11-13 | Fib-C | Recalls: derivatives and integrals. rec06 (.mp4) | [P] Chpt. 1-8 slides06 (.pdf), script06 (.R) |
07 | 05/03 16-18 | Fib-C | R data access and programming. rec07 (.mp4) | [R] Chpt. 2.3,2.4 script07 (.zip) |
08 | 07/03 14-16 | Fib-C | Continuous random variables.rec08 (.mp4) | [T] Chpts. 5, 9.2-9.4 [R] Chpt. 3 slides08 (.pdf), script08 (.R) |
09 | 08/03 11-13 | Fib-C | Expectation and variance. Computations with random variables.rec09 (.mp4) | [T] Chpts. 7,8 slides09 (.pdf), script09 (.R) |
10 | 12/03 16-18 | Fib-C | Expectation and variance. Computations with random variables (continued). Moments. Functions of random variables. rec10 (.mp4) | [T] Chpts. 9-11 slides10 (.pdf), script10 (.zip) |
11 | 14/03 14-16 | Fib-C | Functions of random variables (continued). Distances between distributions. rec11 (.mp4) | Murphy's book Chpt. 6 slides11 (.pdf), script11 (.R) |
12 | 15/03 11-13 | Fib-C | Simulation. rec12 (.mp4) | [T] Chpts. 6.1-6.2 slides12 (.pdf), script12 (.R) script12_sol07 (.R) |
13 | 19/03 16-18 | Fib-C | Power laws and Zipf's law. rec13 (.mp4) | Newman's paper Sect I, II, III(A,B,E,F) slides13 (.pdf), script13 (.R) |
14 | 21/03 14-16 | Fib-C | Law of large numbers. The central limit theorem. rec14 (.mp4) | [T] Chpts. 13-14 slides14 (.pdf), script14 (.R) |
15 | 22/03 11-13 | Fib-C | Graphical summaries. Kernel Density Estimation. rec15 (.mp4) | [T] Chpt. 15, [R] Chpt. 4 slides15 (.pdf), script15 (.R) |
16 | 26/03 16-18 | Fib-C | Numerical summaries.rec16 (.mp4) | [T] Chpt. 16, [R] Chpt. 4 slides16 (.pdf), script16 (.R) |
17 | 28/03 14-16 | Fib-C | Data preprocessing in R. Estimators.rec17 (.mp4) | [R] Chpt. 10, [T] Chpts. 17.1-17.3script17 (.R), dataprep.R |
18 | 04/04 14-16 | Fib-C | Unbiased estimators. Efficiency and MSE.rec18 (.mp4) | [T] Chpts. 19, 20 slides18 (.pdf), script18 (.R) |
19 | 05/04 11-13 | Fib-C | Maximum likelihood estimation.rec19 (.mp4) | [T] Chpt. 21 s4dsln.pdf Chpt. 1 slides19 (.pdf), script19 (.R) |
20 | 09/04 16-18 | Fib-C | Linear regression. Least squares estimation.rec20 (.mp4) | [T] Chpts. 17.4,22 [R] Chpt. 6 s4dsln.pdf Chpt. 2 slides20 (.pdf), script20 (.R) |
21 | 11/04 14-16 | Fib-C | Non-linear, and multiple linear regression.rec21 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 s4dsln.pdf Chpt. 2 slides21 (.pdf), script21 (.R) |
22 | 12/04 11-13 | Fib-C | Issues with linear regression. Logistic regression.rec22 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 slides22 (.pdf), script22 (.zip) |
23 | 16/04 16-18 | Fib-C | Statistical decision theory.rec23 (.mp4) | s4dsln.pdf Chpt. 4 slides23 (.pdf), script23 (.R) |
24 | 18/04 14-16 | Fib-C | Statistical decision theory (continued).rec24 (.mp4) | |
25 | 19/04 11-13 | Fib-C | Statistical decision theory (continued). Project presentation. | |
26 | 23/04 16-18 | Fib-C | Confidence intervals: mean, proportion, linear regression.rec26 (.mp4) | [T] Chpts. 23.1,23.2,23.4,24.3,24.4 s4dsln.pdf Chpt. 3 slides26 (.pdf), script26 (.R) |
27 | 30/04 16-18 | Fib-C | Confidence intervals (continued). Bootstrap and resampling methods.rec27 (.mp4) | [T] Chpts. 18.1-18.3,23.3 slides27 (.pdf), script27 (.R) |
28 | 02/05 14-16 | Fib-C | Bootstrap and resampling methods (continued).rec28 (.mp4) | |
29 | 03/05 11-13 | Fib-C | Hypotheses testing. One-sample tests of the mean and application to linear regression.rec29 (.mp4) | [T] Chpts. 25,26,27, [R] Chpts. 5.1,5.2 s4dsln.pdf Chpt.3.3 slides29 (.pdf), script29 (.R) |
s03 | 07/05 16-18 | Fib-C | Mandatory seminar: Introduction to causal modeling and reasoning. Speakers: I. Beretta and M. Cinquini. rec_s03 (.mp4) | slides_s03 (.pdf) |
30 | 09/05 14-16 | Fib-C | One-sample tests of the mean and application to linear regression (continued). Classifier performance metrics in R. rec30 (.mp4) | slides30 (.pdf), script30 (.R) |
31 | 10/05 11-13 | Fib-C | Two-sample tests of the mean and applications to classifier comparison. rec31 (.mp4) | [T] Chpt. 28, [R] Chpts. 5.3-5.7 slides31 (.pdf), script31 (.R) |
32 | 14/05 16-18 | Fib-C | Multiple-sample tests of the mean and applications to classifier comparison.rec32 (.mp4) | [R] Chpt. 7 slides32 (.pdf), script32 (.R) |
33 | 16/05 14-16 | Fib-C | Fitting distributions. Testing independence/association.rec33 (.mp4) | [R] Chpt. 8 K-S, slides33 (.pdf), script33 (.R) |
34 | 17/05 11-13 | Fib-C | Fitting distributions. Testing independence/association (continued). Project Q&A. | |
35 | 21/05 16-18 | Fib-C | Project Q&A. |
Past years
Moreover, this course of 9 ECTS replaces an older 6 ECTS version: Statistical Methods for Data Science A.Y. 2020/21 (500PP). The 6 ECTS version is discontinued. Students having the 6 ECTS version in their study plan can still take the 6 ECTS version exam for the A.Y. 2021/22, 2022/23 and 2023/24. However, there will no specific project for the 6 ECTS version.