====== Decision Support Systems - Module II (6 ECTS): LABORATORY OF DATA SCIENCE (2023/2024) ====== This is the second module of [[mds:dss:start|Decision Support Systems]] (801AA, 12 ECTS), previously called Laboratory of Data Science (664AA, 6 ECTS). **Instructors**: * **Anna Monreale** * KDD Laboratory, Università di Pisa * [[http://pages.di.unipi.it/amonreale/]] * [[anna.monreale@unipi.it]] * Office hours: Tuesday: 11:00-13:00 online using Teams or at the Department of Computer Science, room 374/E (Please ask an appointment by email). * Telephone +39-050-2213119 * **Cristiano Landi** * KDD Laboratory, Univesità di Pisa * [[cristiano.landi@phd.unipi.it]] * Office hours: Tuesday: 14:00-16:00 online using Teams or at the Department of Computer Science, room 343 (Please ask for an appointment by email). ====== Hours and Rooms ====== **Classes ** ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 09:00 - 11:00 | Room Lab. M | | Wednesday | 14:00 - 16:00 | Room Lab. H | A [[https://teams.microsoft.com/l/channel/19%3a8a60419ca5ec46dabe98174af70283e1%40thread.tacv2/Module%2520II%2520-%2520Laboratory%2520of%2520Data%2520Science?groupId=6bc87f32-e2c1-46b8-9c9f-928cae8bbe4d&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students. ====== Learning Material ====== ===== Slides & Recordings of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Recordings of each lecture will be made available for non-attending students. ===== Past Exams ===== * {{ :mds:lbi:2016midterm1text.pdf |2016/17 text}}, {{ :mds:lbi:2015fallmidterm1text.pdf | 2015/16 text}} and {{ :mds:lbi:2015wintermidterm1.zip | 2015/16 solution}}, {{:mds:lbi:2015midterm1text.pdf | 2014/15 text}} and {{ :mds:lbi:2015midterm1.zip |2014/2015 solution}}, {{ :mds:lbi:2014midterm1text.pdf | 2013/14 text}},{{ :mds:lbi:2013midterm1.pdf | 2012/13 text }} and {{ :mds:lbi:2013midterm1.zip |2012/13 solution}}. ===== Software===== * Anaconda with Python 3.7 (Please, avoid Python 3.8) * SQL Server 2019 Developer Edition or next:[[https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver16|SQL Server 2019 Management Studio]]. * Visual Studio Community 2022. Install/include SSDT workload in installation manager of visual studio: instructions here Italian: [[https://learn.microsoft.com/it-it/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15#ssdt-for-visual-studio-2022|Data Tools Visual Studio 2022 IT]] English: [[https://learn.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15#ssdt-for-visual-studio-2022|Data Tools Visual Studio 2022 EN]]. * Microsoft Excel * [[https://powerbi.microsoft.com/it-it/desktop/| Power BI Desktop]] **Note**: preconfigured virtual machines can be found in the [[https://teams.microsoft.com/l/channel/19%3a8a60419ca5ec46dabe98174af70283e1%40thread.tacv2/Module%2520II%2520-%2520Laboratory%2520of%2520Data%2520Science?groupId=6bc87f32-e2c1-46b8-9c9f-928cae8bbe4d&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams channel]] for both AMD64 (Intel/AMD) and ARM (Apple Silicon) architectures. ===== F.A.Q. ===== * [[http://www.sid.unipi.it/polo2/2015/03/26/connessione-alle-reti-wifi/ | Connection to wi-fi]] * [[http://www.sid.unipi.it/polo2/studenti/ | F.A.Q.s about the labs]] * [[https://start.unipi.it/help-ict/vpn/ | Unipi VPN ]] * [[https://autenticazione.unipi.it/auth/auth.signin | Unipi Authentication]] to access the VPN, make sure that network access services are enabled on you profile. Follow this link to access your Unipi profile. ====== Class calendar - (2023-2024) ====== ^ ^ Day ^ Topic ^ Slides ^ Data/Software ^ References ^ Video Lectures ^ Teacher| |1. |19.09 09:00-11:00| Introduction to the Course. BI Architecture. File data access. | {{ :mds:lbi:2023-lds.01.introduction.pptx |}}{{ :mds:lbi:2023-lds.02.bi_architectures.pptx |}}{{ :mds:lbi:2023-lds.03.file_data_access.pptx |}}| |-** BI technology:** [[https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext | An Overview of Business Intelligence Technology]] - **File access:** {{ :mds:lbi:filesystem.pdf | File System Interface}} | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/Ee07D1eAEOBKgSm07pea8vwB5K_PHB5H7RMJiiTQOBBmGA?e=UtKOEa&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video]]| Pellungrini | |2. |20.09 14:00-16:00| Representation formats: CSV, FLV, ARFF, XML. Python Recap.| {{ :mds:lbi:lds.04.python.pdf |}}| | - **File Formats:** [[http://www.stat.auckland.ac.nz/~paul/ItDT | Introduction to data technologies(Chps. 5, 6)]], [[http://weka.wikispaces.com/ARFF+(stable+version)|Weka ARFF Format]], [[http://weka.wikispaces.com/XRFF|XRFF Format]] - **Python reference:** [[https://www.spronck.net/pythonbook/ | Free python book with exercises]] | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/Efdgh7Je6GhNhwITA2Yy3LYBrw4Bnd7FtVzVNu4u9WuyMA?e=8NN28v&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video]] | Pellungrini| |3. |26.09 09:00-11:00| Python Recap. + Python Excercises| | {{ :mds:lbi:max_subseq_student_sol_2023.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EezPV373UilBlaozzn6AiVsBHm58eq8fakNFrHwsmQ2BKw?e=heD2UG&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19 |video1]] [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ESBSI-5LHohPuw_Lkn-DNG8BEJMFmi1ElQTbNXxNyBlFTQ?e=HxtWqs&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video2]] [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EUNKwKLAqfhChaL8qrezevEB9V7IASoxVNzHnvZClxE33w?e=6jJaZY&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video3]] | Monreale, Landi| |4. |27.09 14:00-16:00| Python File Access + Exercises | {{ :mds:lbi:lds.05.fileaccess-python2021.pdf |}}| {{ :mds:lbi:data1.zip |}} {{ :mds:lbi:270923solutions.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EXM9R7_lRglNprErjSjj5YUBzOBie5qBRZbMAWOs8974Eg?e=5tJOtV&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video]] | Pellungrini, Landi| |5. |03.10 9:00-11:00| Python File Access + Exercises | | {{ :mds:lbi:03102023solutions.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EdS0HqSsj0dMkOhCdKH19AEBEYUZLxpIi-tphnsfMKoQJA?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=SQJcmg|video]] | Pellungrini, Landi| |6. |04.10 14:00-16:00| Python File Access Exercises + RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lds.06.relational_data_access_2023.pdf |}} {{ :mds:lbi:ex-customers.pdf |}}| {{ :mds:lbi:data-customers.zip |}} {{ :mds:lbi:ex-customers_solution2023.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EexIkz2jGUVJuyz82d04ZoQBQQ1uYZ5gEYZI5MRhb0JdwQ?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=hgCjLh|video]] | Pellungrini, Landi| |7. |10.10 9:00-11:00| RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | Same slides as previous lecture | | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETZbyVa90KVFk8ExBm2_xj8BMpVuntyVpH88pvuu6Pv6Sg?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=9A2uud|video]] | Monreale, Landi | |8. |11.10 14:00-16:00| Exercise on Stratified Sampling + SQL server management demo | {{ :mds:lbi:lds.07.sqlserver.pdf |}} | {{ :mds:lbi:2023-code-db-samples.zip |}}| | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/ETrphePLewRFlEO1uDWi7I8BoVlYbdXuh5U9566sNoeCZg?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=850E1f|video]] | Monreale, Landi | |9. |17.10 9:00-11:00| Stratified Sampling + SQL server management demo + ETL tools: SQL Server Integration Services (SSIS). | {{ :mds:lbi:lds.08.etlandssis.pdf |}} | {{ :mds:lbi:stratifiedsampling.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EX0hGBBl0z9PrGELGU4SPKMB1WTp7SIo4CdVu-0Sqe52lg?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=NGymIC|video_prev_years]] | Monreale, Landi | |10. |18.10 14:00-16:00| ETL tools: SQL Server Integration Services (SSIS). | | {{ :mds:lbi:20231018_ssis_examples.zip |}} | | [[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EfLPvXHYwNlEnSAhrqiuDIkBNXiJUkmXPLJ-_yUelJG97A?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=kn5dPN|video]] | Pellungrini, Landi | |11. |24.10 09:00-11:00| ETL practice: Pipeline | same slides as previous lecture| | | | Monreale, Landi | |12. |25.10 14:00-16:00| ETL practice: Stratified sampling + Dissimilarity Index | {{ :mds:lbi:ex-midterm.pdf |}} | | | | Monreale, Landi | |13. |31.10 09:00-11:00| ETL: Surrogate Keys + SCD | {{ :mds:lbi:exercisefact_table.pdf }} | {{ :mds:lbi:lds-ssis-projects.zip}} | | [[https://unipiit.sharepoint.com/:v:/s/a__td_61299/Eckgf1qGEnlHraYA8aRS7-cBpIdbHjnlaztyRDoxIMRkWA?e=fgUEpX&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19|video]] | Monreale, Landi | |14. |07.11 9:00-11:00| CDC Process + Dissimilarity Index | [[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/mds/lbi/2015midterm1text.pdf|Exercise MPD]] | {{ :mds:lbi:lds-ssis-projects-full.zip }} | | [[https://unipiit.sharepoint.com/sites/a__td_61299/Shared%20Documents/Module%20II%20-%20Laboratory%20of%20Data%20Science/Recordings/Riunione%20in%20_Module%20II%20-%20Laboratory%20of%20Data%20Science_-20231107_092319-Registrazione%20della%20riunione.mp4?web=1|video]] | Monreale, Landi | |15. |08.11 14:00-16:00| Project Support | | | | | Monreale, Pellungrini, Landi | |16. |14.11 09:00-11:00| DW + SSAS | {{ :mds:lbi:lds.09.dwandolap.pdf |}}{{ :mds:lbi:lds.09.ssas-21.pdf |}} | |1) SSAS (olap): documentation; 2) S. Harinath et al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6 | [[https://unipiit.sharepoint.com/:v:/s/a__td_61299/EY9SV0IEiG1DieCGnwtqBYgBX7NTwx2h7CaudAGfLxkJ7w?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZyIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=3T8EC4|video]] | Monreale, Landi | |17. |15.11 14:00-16:00| Olab Cube | |{{ :mds:lbi:15-foodmart_monreale_full.zip |}} Instructions for the SSAS project: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as _foodmart (2) from project properties select 'Deployment', then rename the database as _foodmart; (3) click on the button "show all files" just above "Solution explorer" right click on "view code" on the .database file that is visualized, and then change the ID from current name into _foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you may[[ http://technet.microsoft.com/en-us/library/ms175630.aspx#bkmk_newusingwizard|import the project]] from the SSAS server and rename it as _foodmart (step 4 is still necessary).| | [[https://unipiit.sharepoint.com/:v:/s/a__td_61299/Ed9j5fa7SMhMjX0tjXMLELwBHuFNgarGx8t1nL0OMxEO1Q?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZy1FbWFpbCIsInJlZmVycmFsQXBwUGxhdGZvcm0iOiJXZWIiLCJyZWZlcnJhbE1vZGUiOiJ2aWV3In19&e=NyAVmX|video]] | Monreale, Landi | | |21.11 09:00-11:00| Canceled | | | | | | |18. |22.11 14:00-16:00| OLAP Cube, Measure setup, Calculated Members, Excel power pivot integration. | same slides of the last lecture | {{ :mds:lbi:foodmart_monreale_full.zip |}}| |[[https://unipiit.sharepoint.com/:v:/s/a__td_61299/EZNRBOWu8f9Fg38SZVtOv6YBp9OuwidZIDDOsEjj9OSQqQ?e=CP6ZhH|Video]] | Pellungrini, Landi| |19. |28.11 09:00-11:00| Visual Studio advanced Features and MDX first examples | same slides of the last lecture | | |[[https://unipiit.sharepoint.com/:v:/s/Registrazioni628/EXAG2WGKbu9GiBB578Q-_TwBZRmzFHEwtXeBFpOFsykrBQ?e=rvABxQ&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZy1MaW5rIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXcifX0%3D|Video]] | Pellungrini, Landi| |20. |29.11 14:00-16:00| MDX | | | | [[https://unipiit.sharepoint.com/:v:/s/a__td_61299/EQWfqpb76wdHk-ih_3oIPjUB07CpXiy-NsgJl3fGPp1VtA?e=fAgBsa|Video1]] [[https://unipiit.sharepoint.com/:v:/s/a__td_61299/EQH0oiTu6OVDpt7NWZGiGogBkZHQafN4ZlX-jfD_1If0SQ?e=6yerN6|Video2]]| Pellungrini, Landi| |21. |05.12 09:00-11:00| MDX | {{ :mds:lbi:mdx-practice.zip |}} {{ :mds:lbi:msx-queries-generate.zip |}} | | | | Monreale| |22. |06.12 14:00-16:00| MDX Practice | {{ :mds:lbi:practice2.mdx.zip |}} | | | |Pellungrini| |23. |12.12 09:00-11:00| PowerBI, PowerPivot, MDX Practice| {{ :mds:lbi:mdxqueryies-12dec.mdx.zip |}} | | | |Monreale| |24. |13.12 14:00-16:00| MDX Practice + Project Discussion| see the file with mdx queries of the previous lecture | | | |Monreale, Pellungrini, Landi| ====== Exams ====== //There are no mid-terms//. The exam of Decision Support Systems (801AA, 12 ECTS) consists of a written part and an oral part on the topics of the first module (50% of the final grade), and a lab project with discussion on the topics of the second module (50% of the final grade). For the rules of the first module visit the [[http://didawiki.di.unipi.it/doku.php/mds/dsd/start|Module I: Decision Support Databases]]. For details on the Lab project read with attention the next section. Module I and Module II must be passed at maximum distance of one year between them (they can be taken in any order). **PROJECT ** A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting. The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers). Each part of the project **must be documented** with a brief pdf report (no more that 2/3 pages) describing your solution. **Project to be delivered within 31 December 2023 ** * First part of the project consists in the **assignments** described here: {{ :mds:lbi:lds_project_2023_part_1.pdf |}} * Second part of the project consist in the **assignments** described here: {{ :mds:lbi:lds_project_2023_part_2.pdf |}} * Third part of the project consist in the **assignments** described here: {{ :mds:lbi:lds_project_2023_part_3.pdf |}} * Remember to re-submit all three parts of the project with your third part, as specified in the document above. * **Dataset:** {{ :mds:lbi:lds_project_2023.zip |}} * **Deadline**: First deadline 15 November * **Deadline**: Second deadline 8 December * **Deadline**: Third deadline - **Project to be delivered during the exam sessions ** Students who did not deliver the above project within 31 December 2023 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project. ** Please write to all teachers!** ===== Exam sessions ===== Registration to the written exam is mandatory (**pay attention at the deadline for registering!**): [[https://esami.unipi.it/esami2/|register here]]\\ Please indicate in the notes "Only Lab" for doing only the discussion of the lab project; "Only DSD" for doing only the written+oral part of the DSD module; or "DSD+Lab" for doing both. **Important:** the date of the discussion of the lab project will be communicated to you. The dates at the [[https://esami.unipi.it/esami2/|registration website]] regard **only** the written part of the DSD module. =====Past Editions ===== * [[LDS 2022-2023]] * [[LDS 2021-2022]] * [[LDS 2020-2021]] * [[LDS 2019-2020]] * [[LDS 2018-2019]] * [[LBI 2017-2018]]