Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente |
phdai:sml [20/10/2024 alle 11:55 (5 mesi fa)] – [Exams] Salvatore Ruggieri | phdai:sml [25/02/2025 alle 21:27 (2 settimane fa)] (versione attuale) – Salvatore Ruggieri |
---|
| |
This is the home page of a Ph.D. level course offered at the National Ph.D. in AI - Society. | This is the home page of a Ph.D. level course offered at the National Ph.D. in AI - Society. |
The program covers the basic methodologies, techniques and tools of statistical analysis. This includes basic knowledge of probability theory, random variables, convergence theorems, statistical models, estimation theory, hypothesis testing, bayesian inference, causal reasoning. Other topics covered include bootstrap, expectation-maximization, and applications to data science problems. | The program covers the basic methodologies, techniques and tools of statistical analysis. This includes basic knowledge of probability theory, random variables, convergence theorems, statistical models, estimation theory, hypothesis testing, bayesian inference, causal reasoning. Other topics covered include bootstrap, expectation-maximization, and applications to machine learning problems. |
| |
The course is an extract of the M.Sc. level course [[mds:sds:start|Statistics for Data Science]]. | The course is an extract of the M.Sc. level course [[mds:sds:start|Statistics for Data Science]]. |
| |
=====Instructor===== | =====Instructors===== |
| |
| * **Andrea Pugnana** |
| * Università di Pisa |
| * [[https://andrepugni.github.io/]] |
| * [[[email protected]]] |
| |
* **Salvatore Ruggieri** | * **Salvatore Ruggieri** |
* [[http://pages.di.unipi.it/ruggieri/]] | * [[http://pages.di.unipi.it/ruggieri/]] |
* [[[email protected]]] | * [[[email protected]]] |
* **Office hours:** Tuesdays h 14:00 - 16:00 or by appointment, at the Department of Computer Science, room 321/DO, or via Teams. | |
| |
=====Hours and rooms===== | |
| |
The course will be offered in one week to be fixed in the period February 2025 - May 2025. | |
A Teams channel will be used to post news, notes, Q&A, and other stuff related to the course. The lectures will be live-streamed and recorded. | |
| |
=====Pre-requisites===== | =====Pre-requisites===== |
* **[P]** J. Ward, J. Abdey. **Mathematics and Statistics**. University of London, 2013. __Chapters 1-8 of Part 1__. | * **[P]** J. Ward, J. Abdey. **Mathematics and Statistics**. University of London, 2013. __Chapters 1-8 of Part 1__. |
| |
Extra-lessons refreshing such notions may be planned in the first part of the course. | You can refresh such notions through this [[http://131.114.72.230/sds/video/sds06_20220225.mp4|recording]] and {{:mds:sds:s4ds06.pdf|slides}}. |
| |
| |
=====Exams===== | =====Exams===== |
| |
Ph.D. students may do an exam in the form of a report on an advanced topic/survey to be agreed upon. The topics is typically related/relevant to the objectives of the Ph.D. studies of the student. | Ph.D. students may do an exam (on voluntary basis) in the form of a report and a presentation on an advanced topic/survey to be agreed upon. The topic is typically related/relevant to the objectives of the Ph.D. studies of the student. |
=====Student project===== | |
| |
* The project replaces the written part of the examination | |
* {{:mds:sds:s4ds.project.2024.pdf |Project description and rules and Q&A}}. | |
* [[http://131.114.72.230/sds/video/s4ds.project.2024.mp4|Recording of project description (.mp4)]] | |
| |
| Ph.D. students will receive an attendance statement if they attend at least 7 out of the 10+1 classes. |
| |
=====Class calendar===== | =====Class calendar===== |
| |
| Please, subscribe to the [[https://teams.microsoft.com/l/team/19%3AWNufde35gV7296FvkTLCt7tHKShmbJQTqZT8h5lsPMA1%40thread.tacv2/conversations?groupId=914fe1d1-5fb2-4023-be47-e5a9e4eda8a8&tenantId=c7456b31-a220-47f5-be52-473828670aa1|course Teams channel]] to receive updates on the course. |
| |
Lessons will be **NOT** be live-streamed, but recordings of past years are available here for non-attending students.\\ | Lessons will be both live-streamed (see Teams channel) and in presence at the Department of Computer Science, University of Pisa. |
| |
To watch the recordings online, you must be connected to the [[https://start.unipi.it/en/help-ict/vpn/|unipi.it VPN]]. Alternatively, right click on the link and download the whole file, then watch it locally on your device using e.g. [[http://www.videolan.org/vlc/|VLC media player]]. | Teaching material might be updated **after the classes** to align with actual content and to correct typos. //Be sure to download the updated versions.// |
| |
Slides and R scripts might be updated **after the classes** to align with actual content of lessons and to correct typos. Be sure to download the updated versions. | |
| |
^ # ^ Date ^ Room ^ Topic ^ Mandatory teaching material ^ | |
|01| 20/02 16-18| Fib-C | Introduction. Probability and independence. [[http://131.114.72.230/sds/video/sds01_20220215.mp4|rec01 (.mp4)]] | **[T]** Chpts. 1-3 {{:mds:sds:s4ds01.pdf|slides01 (.pdf)}}| | |
|02| 22/02 14-16| Fib-C | R basics. [[http://131.114.72.230/sds/video/sds02_20220217.mp4|rec02 (.mp4)]] | **[R]** Chpts. 1,2.1-2.3 {{:mds:sds:s4ds02.pdf|slides02 (.pdf)}}, {{:mds:sds:s4ds02.r|script02 (.R)}}| | |
|03| 23/02 11-13| Fib-C | Bayes' rule and applications. [[http://131.114.72.230/sds/video/sds03_20220218.mp4|rec03 (.mp4)]] | **[T]** Chpt. 3 {{:mds:sds:s4ds03.pdf|slides03 (.pdf)}}, {{:mds:sds:s4ds03.r|script03 (.R)}}| | |
|04| 27/02 16-18 | Fib-C | Discrete random variables. [[http://131.114.72.230/sds/video/sds04_20220222.mp4|rec04 (.mp4)]] | **[T]** Chpts. 4, 9.1, 9.2, 9.4 **[R]** Chpt. 3 {{:mds:sds:s4ds04.pdf|slides04 (.pdf)}}, {{:mds:sds:s4ds04.r|script04 (.R)}}| | |
|05| 29/02 14-16 | Fib-C | Discrete random variables (continued). [[http://131.114.72.230/sds/video/sds05_20220224.mp4|rec05 (.mp4)]] | | | |
|06| 01/03 11-13 | Fib-C | Recalls: derivatives and integrals. [[http://131.114.72.230/sds/video/sds06_20220225.mp4|rec06 (.mp4)]] | **[P]** Chpt. 1-8 {{:mds:sds:s4ds06.pdf|slides06 (.pdf)}}, {{:mds:sds:s4ds06.r|script06 (.R)}}| | |
|07| 05/03 16-18 | Fib-C | R data access and programming. [[http://131.114.72.230/sds/video/sds07_20220301.mp4|rec07 (.mp4)]] | **[R]** Chpt. 2.3,2.4 {{:mds:sds:s4ds07.zip|script07 (.zip)}} | | |
|08| 07/03 14-16 | Fib-C | Continuous random variables.[[http://131.114.72.230/sds/video/sds08_20220303.mp4|rec08 (.mp4)]] | **[T]** Chpts. 5, 9.2-9.4 **[R]** Chpt. 3 {{:mds:sds:s4ds08.pdf|slides08 (.pdf)}}, {{:mds:sds:s4ds08.r|script08 (.R)}}| | |
|09| 08/03 11-13 | Fib-C | Expectation and variance. Computations with random variables.[[http://131.114.72.230/sds/video/sds09_20220304v2.mp4|rec09 (.mp4)]] | **[T]** Chpts. 7,8 {{:mds:sds:s4ds09.pdf|slides09 (.pdf)}}, {{:mds:sds:s4ds09.r|script09 (.R)}}| | |
|10| 12/03 16-18| Fib-C | Expectation and variance. Computations with random variables (continued). Moments. Functions of random variables. [[http://131.114.72.230/sds/video/sds10_20220308v3.mp4|rec10 (.mp4)]] | **[T]** Chpts. 9-11 {{:mds:sds:s4ds10.pdf|slides10 (.pdf)}}, {{:mds:sds:s4ds10.zip|script10 (.zip)}} | | |
|11| 14/03 14-16| Fib-C | Functions of random variables (continued). Distances between distributions. [[http://131.114.72.230/sds/video/sds11_20240314.mp4|rec11 (.mp4)]] | {{:mds:sds:murphychpt6.pdf|Murphy's book}} Chpt. 6 {{:mds:sds:s4ds11.pdf|slides11 (.pdf)}}, {{:mds:sds:s4ds11.R|script11 (.R)}} | | |
|12| 15/03 11-13 | Fib-C | Simulation. [[http://131.114.72.230/sds/video/sds12_20220311v2.mp4|rec12 (.mp4)]] | **[T]** Chpts. 6.1-6.2 {{:mds:sds:s4ds12.pdf|slides12 (.pdf)}}, {{:mds:sds:s4ds12.r|script12 (.R)}} {{:mds:sds:s4ds12_sol07.r|script12_sol07 (.R)}}| | |
|13| 19/03 16-18 | Fib-C | Power laws and Zipf's law. [[http://131.114.72.230/sds/video/sds13_20220315.mp4|rec13 (.mp4)]] | [[https://arxiv.org/pdf/cond-mat/0412004.pdf | Newman's paper]] Sect I, II, III(A,B,E,F) {{:mds:sds:s4ds13.pdf|slides13 (.pdf)}}, {{:mds:sds:s4ds13.r|script13 (.R)}}| | |
|14| 21/03 14-16| Fib-C | Law of large numbers. The central limit theorem. [[http://131.114.72.230/sds/video/sds14_20220317.mp4|rec14 (.mp4)]] | **[T]** Chpts. 13-14 {{:mds:sds:s4ds14.pdf|slides14 (.pdf)}}, {{:mds:sds:s4ds14.R|script14 (.R)}} | | |
|15| 22/03 11-13 | Fib-C | Graphical summaries. Kernel Density Estimation. [[http://131.114.72.230/sds/video/sds15_20220322.mp4|rec15 (.mp4)]] | **[T]** Chpt. 15, **[R]** Chpt. 4 {{:mds:sds:s4ds15.pdf|slides15 (.pdf)}}, {{:mds:sds:s4ds15.r|script15 (.R)}}| | |
|16| 26/03 16-18| Fib-C | Numerical summaries.[[http://131.114.72.230/sds/video/sds16_20220324.mp4|rec16 (.mp4)]] | **[T]** Chpt. 16, **[R]** Chpt. 4 {{:mds:sds:s4ds16.pdf|slides16 (.pdf)}}, {{:mds:sds:s4ds16.r|script16 (.R)}} | | |
|17| 28/03 14-16 | Fib-C |Data preprocessing in R. Estimators.[[http://131.114.72.230/sds/video/sds17_20220325.mp4|rec17 (.mp4)]] | **[R]** Chpt. 10, **[T]** Chpts. 17.1-17.3{{:mds:sds:s4ds17.r|script17 (.R)}}, {{ :mds:sds:dataprep.r | dataprep.R}} | | |
|18| 04/04 14-16 | Fib-C | Unbiased estimators. Efficiency and MSE.[[http://131.114.72.230/sds/video/sds18_20220329.mp4|rec18 (.mp4)]] | **[T]** Chpts. 19, 20 {{:mds:sds:s4ds18.pdf|slides18 (.pdf)}}, {{:mds:sds:s4ds18.r|script18 (.R)}} | | |
|19| 05/04 11-13 | Fib-C | Maximum likelihood estimation.[[http://131.114.72.230/sds/video/sds19_20220331.mp4|rec19 (.mp4)]] | **[T]** Chpt. 21 {{ :mds:sds:s4dsln.pdf |}} Chpt. 1 {{:mds:sds:s4ds19.pdf|slides19 (.pdf)}}, {{:mds:sds:s4ds19.r|script19 (.R)}} | | |
|20| 09/04 16-18 | Fib-C | Linear regression. Least squares estimation.[[http://131.114.72.230/sds/video/sds20_20220405.mp4|rec20 (.mp4)]] | **[T]** Chpts. 17.4,22 **[R]** Chpt. 6 {{ :mds:sds:s4dsln.pdf |}} Chpt. 2 {{:mds:sds:s4ds20.pdf|slides20 (.pdf)}}, {{:mds:sds:s4ds20.r|script20 (.R)}} | | |
|21| 11/04 14-16 | Fib-C | Non-linear, and multiple linear regression.[[http://131.114.72.230/sds/video/sds21_20220407.mp4|rec21 (.mp4)]] | **[R]** Chpt. 12.1,13,16.1-16.2 {{ :mds:sds:s4dsln.pdf |}} Chpt. 2 {{:mds:sds:s4ds21.pdf|slides21 (.pdf)}}, {{:mds:sds:s4ds21.R|script21 (.R)}} | | |
|22| 12/04 11-13 | Fib-C | Issues with linear regression. Logistic regression.[[http://131.114.72.230/sds/video/sds22_20220408.mp4|rec22 (.mp4)]] | **[R]** Chpt. 12.1,13,16.1-16.2 {{:mds:sds:s4ds22.pdf|slides22 (.pdf)}}, {{:mds:sds:s4ds22.zip|script22 (.zip)}} | | |
|23| 16/04 16-18 | Fib-C | Statistical decision theory.[[http://131.114.72.230/sds/video/sds23_20220412.mp4|rec23 (.mp4)]] | {{ :mds:sds:s4dsln.pdf |}} Chpt. 4 {{:mds:sds:s4ds23.pdf|slides23 (.pdf)}}, {{:mds:sds:s4ds23.r|script23 (.R)}} | | |
|24| 18/04 14-16 | Fib-C | Statistical decision theory (continued).[[http://131.114.72.230/sds/video/sds24_20220421.mp4|rec24 (.mp4)]] | | | |
|25| 19/04 11-13 | Fib-C | Statistical decision theory (continued). Project presentation. | | | |
|26| 23/04 16-18| Fib-C | Confidence intervals: mean, proportion, linear regression.[[http://131.114.72.230/sds/video/sds26_20220422.mp4|rec26 (.mp4)]] | **[T]** Chpts. 23.1,23.2,23.4,24.3,24.4 {{ :mds:sds:s4dsln.pdf |}} Chpt. 3 {{:mds:sds:s4ds26.pdf|slides26 (.pdf)}}, {{:mds:sds:s4ds26.r|script26 (.R)}} | | |
|27| 30/04 16-18| Fib-C | Confidence intervals (continued). Bootstrap and resampling methods.[[http://131.114.72.230/sds/video/sds27_20220426.mp4|rec27 (.mp4)]] | **[T]** Chpts. 18.1-18.3,23.3 {{:mds:sds:s4ds27.pdf|slides27 (.pdf)}}, {{:mds:sds:s4ds27.r|script27 (.R)}} | | |
|28| 02/05 14-16| Fib-C | Bootstrap and resampling methods (continued).[[http://131.114.72.230/sds/video/sds28_20220428.mp4|rec28 (.mp4)]] | | | |
|29| 03/05 11-13| Fib-C | Hypotheses testing. One-sample tests of the mean and application to linear regression.[[http://131.114.72.230/sds/video/sds29_20220429.mp4|rec29 (.mp4)]] | **[T]** Chpts. 25,26,27, **[R]** Chpts. 5.1,5.2 {{ :mds:sds:s4dsln.pdf |}} Chpt.3.3 {{:mds:sds:s4ds29.pdf|slides29 (.pdf)}}, {{:mds:sds:s4ds29.r|script29 (.R)}} | | |
|s03| 07/05 16-18| Fib-C | //Mandatory seminar:// Introduction to causal modeling and reasoning. Speakers: I. Beretta and M. Cinquini. [[http://131.114.72.230/sds/video/sds_s03_20240507.mp4|rec_s03 (.mp4)]] | {{:mds:sds:s4ds_s03.pdf|slides_s03 (.pdf)}}| | |
|30| 09/05 14-16| Fib-C | One-sample tests of the mean and application to linear regression (continued). Classifier performance metrics in R. [[http://131.114.72.230/sds/video/sds30_2022mix.mp4|rec30 (.mp4)]] | {{:mds:sds:s4ds30.pdf|slides30 (.pdf)}}, {{:mds:sds:s4ds30.r|script30 (.R)}} | | |
|31| 10/05 11-13| Fib-C | Two-sample tests of the mean and applications to classifier comparison. [[http://131.114.72.230/sds/video/sds31_2022mix.mp4|rec31 (.mp4)]] | **[T]** Chpt. 28, **[R]** Chpts. 5.3-5.7 {{:mds:sds:s4ds31.pdf|slides31 (.pdf)}}, {{:mds:sds:s4ds31.r|script31 (.R)}} | | |
|32| 14/05 16-18| Fib-C | Multiple-sample tests of the mean and applications to classifier comparison.[[http://131.114.72.230/sds/video/sds32_2022mix.mp4|rec32 (.mp4)]] | **[R]** Chpt. 7 {{:mds:sds:s4ds32.pdf|slides32 (.pdf)}}, {{:mds:sds:s4ds32.r|script32 (.R)}} | | |
|33| 16/05 14-16| Fib-C | Fitting distributions. Testing independence/association.[[http://131.114.72.230/sds/video/sds33_2022mix.mp4|rec33 (.mp4)]] | **[R]** Chpt. 8 {{ :mds:smd:ks.pdf | K-S}}, {{:mds:sds:s4ds33.pdf|slides33 (.pdf)}}, {{:mds:sds:s4ds33.r|script33 (.R)}} | | |
|34| 17/05 11-13| Fib-C | Fitting distributions. Testing independence/association (continued). Project Q&A. | | | |
|35| 21/05 16-18| Fib-C | Project Q&A. | | | |
| |
| |
| |
=====Seminars of past years===== | |
| |
In some years, speakers were invited to give a seminar on advanced topics. Here it is a list of seminars held in past years. | |
| |
^ # ^ Date ^ Room ^ Topic ^ Teaching material ^ | ^ # ^ Date ^ Room ^ Topic ^ Teaching material ^ |
|s01| 04/05/2022 9-11| Gerace+Teams | Bias in statistics and causal reasoning. Speaker: prof. Fabrizia Mealli [[http://131.114.72.230/sds/video/sds_s01_20220504.mp4|rec_s01 (.mp4)]] | {{:mds:sds:s4ds_s01.pdf|slides_s01 (.pdf)}} [[https://statistics.fas.harvard.edu/files/statistics-2/files/statistical_paradises_and_paradoxes.pdf|Optional reading]] | | |01| 17/3 11-13|Sem. Est| Introduction. Probability and independence. Bayes' rule. Speaker: A. Pugnana| ... | |
|s02| 04/05/2022 11-13| Gerace+Teams | Bias in statistics and causal reasoning (continued). Speaker: prof. Fabrizia Mealli [[http://131.114.72.230/sds/video/sds_s02_20220504.mp4|rec_s02 (.mp4)]] | | | |02| 20/3 14-16| Sem. Est| Discrete and continuous random variables. Speaker: A. Pugnana| ... | |
| |03| 24/3 11-13| Sem. Est| Expectation and variance. Computations with random variables. Moments. Speaker: A. Pugnana| ... | |
=====Past years===== | |04| 27/3 14-16| Sem. Est| Functions of random variables. Distances between distributions. Simulation. Speaker: A. Pugnana| ... | |
| |05| 31/3 11-13| Sem. Est| Law of large numbers. The central limit theorem. Graphical summaries. Kernel Density Estimation. Numerical summaries. Speaker: A. Pugnana| ... | |
* [[mds:sds:2022|Statistics for Data Science A.Y. 2022/23]] | |06| 10/4 16-18 | Sem. Est | Unbiased estimators. Efficiency and MSE. Maximum likelihood estimation. Speaker: S. Ruggieri. | ... | |
| |07| 14/4 11-13 | Sem. Est | Statistical decision theory. Speaker: S. Ruggieri. | ... | |
Moreover, this course of 9 ECTS replaces an older 6 ECTS version: [[mds:smd: |Statistical Methods for Data Science A.Y. 2020/21 (500PP)]]. The 6 ECTS version is discontinued. Students having the 6 ECTS version in their study plan can still take the 6 ECTS version exam for the A.Y. 2021/22, 2022/23 and 2023/24. However, there will no specific project for the 6 ECTS version. | |08| 28/4 11-13 | Sem. Ovest | Confidence intervals and Hypotheses testing. Fitting distributions. Testing independence/association. Speaker: S. Ruggieri. | ... | |
| |09| 5/5 11-13 | Sem. Est | Bootstrap and resampling methods. Speaker: S. Ruggieri. | ... | |
| |10| 8/5 16-18 | Sem. Ovest | Multiple-sample tests of the mean and applications to classifier comparison. Speaker: S. Ruggieri. | ... | |
| |Extra| 12/05 14-16 | Fib-C | //Seminar:// Introduction to causal modeling and reasoning. Speakers: I. Beretta and M. Cinquini. | ... | |
| |