Strumenti Utente

Strumenti Sito



Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
mds:sds:2022 [12/04/2023 alle 11:44 (23 mesi fa)] – eliminata Salvatore Ruggierimds:sds:2022 [08/08/2024 alle 12:34 (7 mesi fa)] (versione attuale) Salvatore Ruggieri
Linea 1: Linea 1:
 +====== Statistics for Data Science (628PP) A.Y. 2022/23 ======
 +  * **Salvatore Ruggieri**
 +    * Università di Pisa
 +    * [[]]
 +    * [[[email protected]]]   
 +    * **Office hours:** Wednesdays h 14:00 - 16:00 or by appointment, at the Department of Computer Science, room 321/DO, or via Teams.
 +^  Day of Week  ^  Hour  ^  Room  ^ 
 +|  Wednesday  |  9:00 - 11:00  |  Fib-C  | 
 +|  Thursday  |  11:00 - 13:00  |  Fib-C  | 
 +|  Friday  |  14:00 - 16:00  |  Fib-C  | 
 +Students should be comfortable with most of the topics on mathematical calculus covered in:
 +  * **[P]** J. Ward, J. Abdey. **Mathematics and Statistics**. University of London, 2013. __Chapters 1-8 of Part 1__.
 +Extra-lessons refreshing such notions may be planned in the first part of the course.
 +=====Mandatory Teaching Material=====
 +The following are //mandatory text books//:
 +  * **[T]** F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. **A Modern Introduction to Probability and Statistics**. Springer, 2005.
 +  * **[R]** P. Dalgaard. **Introductory Statistics with R**. 2nd edition, Springer, 2008.
 +  * selected chapters of other books for advanced topics 
 +  * [[|R]]
 +  * [[|R Studio]]
 +=====Preliminary program and calendar=====
 +  * [[|Preliminary program]].
 +  * [[|Calendar of lessons]].
 +__//There are no mid-terms//.__ The exam consists of a written part and an oral part. The written part consists of exercises on the topics of the course. Each question is assigned a grade, summing up to 30 points. Students are admitted to the oral part if they receive a grade of at least 18 points. The written part consists of open questions and exercises. Example written texts: **{{ :mds:sds:sds_sample1.pdf | sample1}}**, **{{ :mds:sds:sds_sample2.pdf | sample2}}**. The oral part consists of critical discussion of the written part and of open questions and problem solving on the topics (both theory and R programming) of the course.
 +Registration to exams is mandatory (**beware of the registration deadline!**): [[|register here]]\\
 +^  Date  ^  Hour  ^  Room  ^  Notes  ^
 +|  6/3/2024  |  11:00 - 13:00  |  Dip. Inf. - Seminari Est  |  [[|Extra-ordinary exam]]  |
 +<!-- [[|Extra-ordinary exam]] -->
 +=====Student project=====
 +  * The project replaces the written part of the examination
 +  * {{:mds:sds:sds.project.2023.pdf |Project description and rules and Q&A}}.
 +=====Teams channel =====
 +A [[|Teams channel]] will be used to post news, Q&A, and other material related to the course.
 +=====Class calendar=====
 +Lessons will be **NOT** be live-streamed, but recordings of past years are available here for non-attending students.\\
 +To watch the recordings online, you must be connected to the [[| VPN]]. Alternatively, right click on the link and download the whole file, then watch it locally on your device using e.g. [[|VLC media player]].
 +Slides and R scripts might be updated after the classes to align with actual content of lessons and to correct typos. Be sure to download the updated versions.
 +^ # ^ Date ^ Room ^ Topic ^ Teaching material ^ 
 +|01| 22/02 9-11| Fib-C | Introduction. Probability and independence.  [[|rec01 (.mp4)]] | **[T]** Chpts. 1-3 {{:mds:sds:sds01.pdf|slides01 (.pdf)}}|
 +|02| 23/02 11-13| Fib-C | R basics. [[|rec02 (.mp4)]] | **[R]** Chpts. 1,2.1-2.3 {{:mds:sds:sds02.pdf|slides02 (.pdf)}},  {{:mds:sds:sds02.r|script02 (.R)}}| 
 +|03| 24/02 14-16| Fib-E | Bayes' rule and applications. [[|rec03 (.mp4)]] | **[T]** Chpt. 3 {{:mds:sds:sds03.pdf|slides03 (.pdf)}},  {{:mds:sds:sds03.r|script03 (.R)}}| 
 +|04| 01/03 9-11 | Fib-C | Discrete random variables. [[|rec04 (.mp4)]] | **[T]** Chpts. 4, 9.1, 9.2, 9.4 **[R]** Chpt. 3 {{:mds:sds:sds04.pdf|slides04 (.pdf)}},  {{:mds:sds:sds04.r|script04 (.R)}}| 
 +|05| 02/03 11-13 | Fib-C | Discrete random variables (continued). [[|rec05 (.mp4)]] |  |
 +|06| 03/03 14-16 | Fib-C | Recalls: derivatives and integrals. [[|rec06 (.mp4)]] | **[P]** Chpt. 1-8 {{:mds:sds:sds06.pdf|slides06 (.pdf)}},  {{:mds:sds:sds06.r|script06 (.R)}}|
 +|07| 08/03 9-11 | Fib-C | R data access and programming. [[|rec07 (.mp4)]] | **[R]** Chpt. 2.3,2.4 {{|script07 (.zip)}} |
 +|08| 09/03 11-13 | Fib-C | Continuous random variables.[[|rec08 (.mp4)]] | **[T]** Chpts. 5, 9.2-9.4 **[R]** Chpt. 3 {{:mds:sds:sds08.pdf|slides08 (.pdf)}},  {{:mds:sds:sds08.r|script08 (.R)}}| 
 +|09| 10/03 14-16 | Fib-C | Expectation and variance. Computations with random variables.[[|rec09 (.mp4)]] | **[T]** Chpts. 7,8 {{:mds:sds:sds09.pdf|slides09 (.pdf)}},  {{:mds:sds:sds09.r|script09 (.R)}}|
 +|10| 15/03 9-11| Fib-C | Expectation and variance. Computations with random variables (continued).[[|rec10 (.mp4)]] |   |
 +|11| 16/03 11-13| Fib-C | Moments. Functions of random variables.[[|rec11 (.mp4)]] | **[T]** Chpts. 9-11 {{:mds:sds:sds11.pdf|slides11 (.pdf)}}, {{|script11 (.zip)}} |
 +|12| 17/03 14-16 | Fib-C | Simulation. [[|rec12 (.mp4)]] | **[T]** Chpts. 6.1-6.2 {{:mds:sds:sds12.pdf|slides12 (.pdf)}},  {{:mds:sds:sds12.r|script12 (.R)}} {{:mds:sds:sds12_sol07.r|script12_sol07 (.R)}}|
 +|13| 22/03 9-11 | Fib-C | Power laws and Zipf's law. [[|rec13 (.mp4)]] | [[ | Newman's paper]] Sect I, II, III(A,B,E,F) {{:mds:sds:sds13.pdf|slides13 (.pdf)}},  {{:mds:sds:sds13.r|script13 (.R)}}|
 +|14| 23/03 11-13| Fib-C | Law of large numbers. The central limit theorem. [[|rec14 (.mp4)]] | **[T]** Chpts. 13-14  {{:mds:sds:sds14.pdf|slides14 (.pdf)}}, {{:mds:sds:sds14.R|script14 (.R)}} |
 +|15| 24/03 14-16 | Fib-C | Graphical summaries. Kernel Density Estimation. [[|rec15 (.mp4)]] | **[T]** Chpt. 15, **[R]** Chpt. 4  {{:mds:sds:sds15.pdf|slides15 (.pdf)}}, {{:mds:sds:sds15.r|script15 (.R)}}|
 +|16| 29/03 9-11| Fib-C | Numerical summaries.[[|rec16 (.mp4)]] | **[T]** Chpt. 16, **[R]** Chpt. 4 {{:mds:sds:sds16.pdf|slides16 (.pdf)}}, {{:mds:sds:sds16.r|script16 (.R)}} |
 +|17| 30/03 11-13 | Fib-C |Data preprocessing in R. Estimators.[[|rec17 (.mp4)]] | **[R]** Chpt. 10, **[T]** Chpts. 17.1-17.3{{:mds:sds:sds17.r|script17 (.R)}}, {{ :mds:sds:dataprep.r | dataprep.R}} |
 +|18| 31/03 14-16 | Fib-C  | Unbiased estimators. Efficiency and MSE.[[|rec18 (.mp4)]] | **[T]** Chpts. 19, 20  {{:mds:sds:sds18.pdf|slides18 (.pdf)}}, {{:mds:sds:sds18.r|script18 (.R)}} |
 +|19| 05/04 9-11 | Fib-C | Maximum likelihood estimation.[[|rec19 (.mp4)]] | **[T]** Chpt. 21 {{ :mds:sds:sdsln.pdf |}} Chpt. 1  {{:mds:sds:sds19.pdf|slides19 (.pdf)}}, {{:mds:sds:sds19.r|script19 (.R)}} |
 +|20| 06/04 11-13 | Fib-C  | Linear regression. Least squares estimation.[[|rec20 (.mp4)]] | **[T]** Chpts. 17.4,22 **[R]** Chpt. 6 {{ :mds:sds:sdsln.pdf |}} Chpt. 2 {{:mds:sds:sds20.pdf|slides20 (.pdf)}}, {{:mds:sds:sds20.r|script20 (.R)}} |
 +|21| 12/04 9-11 | Fib-C  | Non-linear, and multiple linear regression.[[|rec21 (.mp4)]] | **[R]** Chpt. 12.1,13,16.1-16.2 {{ :mds:sds:sdsln.pdf |}} Chpt. 2 {{:mds:sds:sds21.pdf|slides21 (.pdf)}}, {{:mds:sds:sds21.R|script21 (.R)}} |
 +|22| 13/04 11-13 | Fib-C  | Issues with linear regression. Logistic regression.[[|rec22 (.mp4)]] | **[R]** Chpt. 12.1,13,16.1-16.2 {{:mds:sds:sds22.pdf|slides22 (.pdf)}}, {{|script22 (.zip)}} |
 +|23| 14/04 14-16 | Fib-C  | Statistical decision theory.[[|rec23 (.mp4)]] | {{ :mds:sds:sdsln.pdf |}} Chpt. 4 {{:mds:sds:sds23.pdf|slides23 (.pdf)}}, {{:mds:sds:sds23.r|script23 (.R)}} |
 +|24| 19/04 9-11 | Fib-C  | Statistical decision theory (continued).[[|rec24 (.mp4)]] |  |
 +|25| 20/04 11-13 | Fib-C | Statistical decision theory (continued). Project presentation. | [[|See student project]] |
 +|26| 21/04 14-16 | Fib-C | Confidence intervals: mean, proportion, linear regression.[[|rec26 (.mp4)]] | **[T]** Chpts. 23.1,23.2,23.4,24.3,24.4 {{ :mds:sds:sdsln.pdf |}} Chpt. 3 {{:mds:sds:sds26.pdf|slides26 (.pdf)}}, {{:mds:sds:sds26.r|script26 (.R)}} |
 +|27| 26/04 9-11| Fib-C| Bootstrap and resampling methods.[[|rec27 (.mp4)]] | **[T]** Chpts. 18.1-18.3,23.3 {{:mds:sds:sds27.pdf|slides27 (.pdf)}}, {{:mds:sds:sds27.r|script27 (.R)}} |
 +|28| 27/04 11-13| Fib-C | Bootstrap and resampling methods (continued).[[|rec28 (.mp4)]] |   |
 +|29| 28/04 14-16| Fib-C | Hypotheses testing. One-sample tests of the mean and application to linear regression.[[|rec29 (.mp4)]]  | **[T]** Chpts. 25,26,27, **[R]** Chpts. 5.1,5.2 {{ :mds:sds:sdsln.pdf |}} Chpt.3.3 {{:mds:sds:sds29.pdf|slides29 (.pdf)}}, {{:mds:sds:sds29.r|script29 (.R)}} |
 +|30| 3/05 9-11| Fib-C  | One-sample tests of the mean and application to linear regression (continued).[[|rec30 (.mp4)]] |  |
 +|31| 4/05 11-13| Fib-C  | Two-sample tests of the mean and applications to classifier comparison.[[|rec31 (.mp4)]] | **[T]** Chpts. 28, **[R]** Chpts. 5.3-5.7 {{:mds:sds:sds31.pdf|slides31 (.pdf)}}, {{:mds:sds:sds31.r|script31 (.R)}}  |
 +|32| 5/05 14-16| Fib-C  | Two-sample tests of the mean and applications to classifier comparison (continued).[[|rec32 (.mp4)]] |  |
 +|33| 10/05 9-11| Fib-C | Multiple-sample tests of the mean and applications to classifier comparison.[[|rec33 (.mp4)]] | **[R]** Chpt. 7 {{:mds:sds:sds33.pdf|slides33 (.pdf)}}, {{:mds:sds:sds33.r|script33 (.R)}} |
 +|34| 11/05 11-13| Fib-C | Fitting distributions. Testing independence/association.[[|rec34 (.mp4)]] | **[R]** Chpt. 8 {{ :mds:smd:ks.pdf | K-S}}, {{:mds:sds:sds34.pdf|slides34 (.pdf)}}, {{:mds:sds:sds34.r|script34 (.R)}} |
 +|35| 12/05 14-16| Fib-C  | Fitting distributions. Testing independence/association (continued). Project Q&A.  |  |
 +|36| 17/05 9-11| Fib-C | Project Q&A.  |  |
 +=====Seminars of past years=====
 +In some years, speakers were invited to give a seminar on advanced topics. Here it is a list of seminars held in past years.
 +^ # ^ Date ^ Room ^ Topic ^ Teaching material ^ 
 +|s01| 04/05/2022 9-11| Gerace+Teams | Bias in statistics and causal reasoning. Speaker: prof. Fabrizia Mealli [[|rec_s01 (.mp4)]]  | {{:mds:sds:s4ds_s01.pdf|slides_s01 (.pdf)}} [[|Optional reading]] |
 +|s02| 04/05/2022 11-13| Gerace+Teams | Bias in statistics and causal reasoning (continued). Speaker: prof. Fabrizia Mealli [[|rec_s02 (.mp4)]]  |   |
 +=====Past years=====
 +This course of 9 ECTS replaces an older 6 ECTS version: [[mds:smd: |Statistical Methods for Data Science A.Y. 2020/21 (500PP)]]. The 6 ECTS version is discontinued. Students having the 6 ECTS version in their study plan can still take the 6 ECTS version exam for the A.Y. 2021/22, 2022/23 and 2023/24. However, there will no specific project for the 6 ECTS version.
mds/sds/2022.1681299887.txt.gz · Ultima modifica: 12/04/2023 alle 11:44 (23 mesi fa) da Salvatore Ruggieri

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki