Indice
LABORATORY OF DATA SCIENCE (2020/2021)
Instructors:
- Anna Monreale
- KDD Laboratory, Università di Pisa
- Office hours: Wednesday: 11:00-13:00 online using Teams (Appointment by email).
- Telephone +39-050-2213119
- Roberto Pellungrini
- KDD Laboratory, Università di Pisa
- Office hours: Thursday 14:00-16:00, Online using Teams (Appointment by email).
- Telephone +39-050-2212728
News
- [10/11/2020] Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as <your account>_foodmart (2) from project properties select 'Deployment', then rename the database as <your account>_foodmart; (3) click on the button “show all files” just above “Solution explorer” right click on “view code” on the .database file that is visualized, and then change the ID from ruggieri_foodmart into <your account>_foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you mayimport the project from the SSAS server and rename it as <your account>_foodmart (step 4 is still necessary).
- [13/09/2020]: The lecture will be online. You can join the class by using this link: https://teams.microsoft.com/l/team/19%3ad3bfc8ae4ed04ea99ccd7857a43e101e%40thread.tacv2/conversations?groupId=eb724cb5-3ac7-4b8e-9280-05e191cf8477&tenantId=c7456b31-a220-47f5-be52-473828670aa1
Hours and Rooms
Classes
Lessons will be held onilne by Teams Platform
Day of Week | Hour | Room |
---|---|---|
Tuesday | 11:00 - 12:45 | Teams |
Thursday | 11:00 - 12:45 | Teams |
Link to Teams module: https://teams.microsoft.com/l/team/19%3ad3bfc8ae4ed04ea99ccd7857a43e101e%40thread.tacv2/conversations?groupId=eb724cb5-3ac7-4b8e-9280-05e191cf8477&tenantId=c7456b31-a220-47f5-be52-473828670aa1
Learning Material
Slides & Registration of the classes
- The slides used in the course will be inserted in the calendar after each class.
- Registration of each lecture will be available on Teams
Past Exams
Software
- Anaconda with Python 3.7 (Please, avoid Python 3.8)
- SQL Server 2019 Developer Edition:SQL Server 2019 Management Studio.
- For Data Tools we will publish instruction soon.
- Microsoft Excel
F.A.Q.
Class calendar - (2020-2021)
Day | Topic | Slides | Data/Software | References | Teacher | |
---|---|---|---|---|---|---|
1. | 15.09 11:00-12:45 | Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML | 2020-lds.01.introduction.pptx.pdf 2020-lds.02.bi_architectures.pptx.pdf 2020-lds.03.file_data_access.pptx.pdf | - BI technology: An Overview of Business Intelligence Technology - File access: File System Interface - File Formats: Introduction to data technologies(Chps. 5, 6), Weka ARFF Format, XRFF Format | Monreale | |
2. | 17.09 11:00-13:00 | Python Recap | 2020-lds.04.python.pptx.pdf | Free Python book: http://www.spronck.net/pythonbook/ | Pellungrini | |
3. | 22.09 11:00-13:00 | File Access in Python | 2020-lds.05.fileaccess-python.pptx.pdf | census.csv.zip csv2arrf.py.zip Collection of files | Pellungrini | |
4. | 24.09 11:00-13:00 | Lab practice: XML2CSV/CSV2JSON file format conversion | lds.file.format.zip | Pellungrini | ||
5. | 29.09 11:00-13:00 | Python Exercises | ex-customers.pdf | data-customers.zip ex-customers_solution.zip | Pellungrini | |
6. | 01.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lbi.06.relational_data_access-complete.pdf | Monreale | ||
7. | 06.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lbi.06.relational_data_access-complete.pdf | code-db-samples.zip | Monreale | |
8. | 08.10 11:00-13:00 | Stratified Sampling Ex, SQL Management Studio Demo, Project Explaination | lds.07.sqlserver.pdf | stratifiedsampling.zip | Monreale, Pellungrini | |
9. | 13.10 11:00-13:00 | ETL, Sequel Server Data Tools Demo | lds.08.etlandssis.pdf | lds_first_project.zip | Pellungrini | |
10. | 15.10 11:00-13:00 | SSIS exercises | ex-midterm.pdf | Pellungrini | ||
11. | 20.10 11:00-13:00 | stratified sampling with SSIS + SSIS practice | 2015midterm1text.pdflds_first_project.zip | Monreale/Pellungrini | ||
12. | 22.10 11:00-13:00 | SSIS practice + Project support | 2015midterm1text.pdf | Monreale/Pellungrini | ||
13. | 27.10 11:00-13:00 | SSIS: Surrogate Keys | Monreale/Pellungrini | |||
14. | 29.10 11:00-13:00 | SSIS: slowly changing dimensions | 2016ssis.zip | Monreale/Pellungrini | ||
15. | 03.11 11:00-13:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. | lds.09.dwandolap.pdf | lbi.08.afdemo.sql.zip | Monreale/Pellungrini | |
16. | 05.11 11:00-13:00 | OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, | lds.09.dwandolap.pdf lds.10.ssas.pdf | 1) SSAS (olap): documentation; 2) S. Harinath et al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6. | Monreale/Pellungrini | |
17. | 10.11 11:00-13:00 | OLAP with SQL Server Analysis Services (SSAS): dimensions, hierarchies. Data cubes, Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel. | lds.10.ssas.pdf | monreale_foodmart.zip Notice: Please read the instructions in the Section NEWS! | Pivot Tables in Excel: G. Harvey. Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2. | Monreale/Pellungrini |
18. | 12.11 11:00-13:00 | OLAP explorative data analysis with Pivot Tables in Excel. | foodmartexplorative.xlsx | Monreale/Pellungrini | ||
19. | 17.11 11:00-13:00 | Introduction to MDX | MDX: 1) documentation and a useful guide on ordering; 2) S. Harinath ed al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3. | Monreale/Pellungrini | ||
20. | 19.11 11:00-13:00 | Introduction to MDX | mdx-practice-2020.txt.zip mdx-ex.pdf | Monreale | ||
21. | 24.11 11:00-13:00 | Practice on MDX | lbi.09.mdxsample.mdx.zip lbi.09.mdxpractice.mdx.zip | Pellungrini | ||
22. | 25.11 11:00-13:00 | project check | Please do exercize on MDX. Here: ex-mdx.pdf you can find other queries that we will solve during the next lectures | Monreale/Pellungrini | ||
23. | 01.12 11:00-13:00 | Microstrategy presentation | Monreale/Pellungrini | |||
24. | 03.12 11:00-13:00 | PowerBI Desktop + Correction Ex. MDX | lds.12.powerbi.pdf | mdx-corrections.mdx.zip | Monreale/Pellungrini | |
25. | 10.12 11:00-13:00 | Microstrategy Viz | mircostrategy-material.zip If you need the password please check the common chat in Teams, or wirte an email to the teachear. | Monreale/Pellungrini |
Exams
PROJECT
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.
The project has to be performed by a team of 2 students (at most 3 after asking authorization for that to the teachers).
Project to be delivered within 31 December 2020
- First part of the project consists in the assignments described here: Project Description Part 1
- Second part of the project cosist in the assignments described here: Project Description Part 2
- Third part of the project cosist in the assignments described here: Project Description Part 3
- Remember to re-submit all three parts of the project with your third part, as specified in the document above.
- Dataset: project_data.zip
- Deadline: the fist part has to be delivered within November, 18th 2020.
- Deadline: the second part has to be delivered within December, 4th 2020.
- Deadline: the third part has to be delivered within December, 31st 2020.
Project to be delivered during the exam sessions Students who did not deliver the above project within 31 December 2020 need to ask by email a new project to the teachers. The project that will be assigned will require about 2 weeks of work and after the delivery it will be discussed during the oral exam. For those students, the oral exams will also cover some practical parts that could not be included in the project.
Exam sessions
Session | Date | Time | Room | Notes | Marks |
---|
Extra sessions A.A. 2019/20
Date | Time | Room | Notes | Results |
---|---|---|---|---|