Questa è una vecchia versione del documento!
<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'ruggieri/teaching/lbi/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'PDS', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'ZIPs', fname); }); jQuery('a[href$=“.mp4”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); jQuery('a[href$=“.flv”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); }); </script> </html> ====== LABORATORY OF DATA SCIENCE (2018/2019) ====== Teacher: * Anna Monreale * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa * http://kdd.isti.cnr.it/homes/monreale/ * anna [dot] monreale [at] unipi [dot] it * Office hours by appointment, Room 374/DO, Dept. of Computer Science. * Telephone +39-050-2213119 Teaching assistant: * Roberto Pellungrini * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa * roberto [dot] pellungrini [at] di [dot] unipi [dot] it * Office hours: Thursday 14:30-16:30, Room 384/DO, Dept. of Computer Science. * Telephone +39-050-2212728 ====== News ===== * [11-02-2019]:Results of the written exam of 06/02/2019 Results and Oral dates * [21-01-2019]:Results of the written exam of 16/01/2019 Results and Oral dates * [27-12-2018]:Results of the second midterm Results and Oral dates * [01/12/2018]: Text of Exercises in MDX and Analytical SQL: exercises_mdx.pdf * [13-11-2018]:Results of the first midterm Results * [07-11-2018]:Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as <your account>_foodmart; (2) from project properties select 'Deployment', then rename the database as <your account>_foodmart; (3) click on the button “show all files” just above “Solution explorer” right click on “view code” on the .database file that is visualized, and then change the ID from ruggieri_foodmart into <your account>_foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you mayimport the project from the SSAS server and rename it as <your account>_foodmart (step 4 is still necessary). * [20-10-2018]: Here you can find exercises simular to those you can find in the first mid-term. Please try to address them and on October 25, 2018 during the lesson we will discuss the solutions. * [09-10-2018]: The lesson of Sept, 17 will be recovered on October 25, 2018 Room M * [09-09-2018]: Lessons will start on Monday, 24th. Please, see details below. ====== Hours and Rooms ====== Classes Lessons will be held at: Polo Didattico “L. Fibonacci”, Via F. Buonarroti 4, Pisa. ^ Day of Week ^ Hour ^ Room ^ | Monday | 09:00 - 11:00 | LAB M | | Tuesday| 11:00 - 13:00 | LAB M | Office hours by appointment, Room 374/DO, Dept. of Computer Science. ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be published in the calendar after each class ===== Past Exams ===== * 2016/17 text, 2015/16 text and 2015/16 solution, 2014/15 text and 2014/2015 solution, 2013/14 text, 2012/13 text and 2012/13 solution. ===== Software===== * Anaconda with Python 3.5 * SQL Server 2016 Developer Edition: - Mandatory: SQL Server 2016 Management Studio and SQL Server 2016 Data Tools (no need to install Microsoft Visual Studio 2015! choose option 4 “Download SSDT as an ISO image”) - Optional (not recommended on laptops): SQL Server 2016 Developer Edition can be downloaded from Microsoft or can be downloaded from MSDN-AA. During installation, set the following options as a minimum. * Microsoft Excel * Power BI Desktop * WEKA: https://www.cs.waikato.ac.nz/ml/weka/ * WEKA API: Wrapper in Python - https://pypi.org/project/python-weka-wrapper/ ===== F.A.Q. ===== * Connection to wi-fi * F.A.Q.s about the labs ====== Class calendar - (2018-2019) ====== ^ ^ Day ^ Topic ^ Slides ^ Registration ^ Data/Software ^ References ^ | | 17.09 09:00-11:00 | Canceled - The lesson will be recovered on October 19, 2018 Room I h:11-13 | | | | | | 18.09 11:00-13:00 | Canceled - The lesson will be recovered on October 25, 2018 Room M h:9-11 | | | | |1. | 24.09 09:00-11:00 | Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML| lds.01.introduction.pdf lds.02.bi_architectures.pdf lds.03.file_data_access.pdf| Video 24/09/2018 | | - BI technology: An Overview of Business Intelligence Technology - File access: File System Interface - File Formats: Introduction to data technologies(Chps. 5, 6), Weka ARFF Format, XRFF Format | |2. | 25.09 11:00-13:00 | Python Recap | Python Recap | Video 25/09/2018 | | | |3. | 01.10 09:00-11:00 | File data access in Python. Lab practice on file access. | lds.05.fileaccess-python.pdf |Video 01/10/2018 | Sample data code-2018-09-25.zip| | |4. | 02.10 11:00-13:00 | Lab practice on file access and transformation from CSV2ARFF file format. | |Video 02/10/2018 | xmlelements2csv.zip csv2arff.zip code-2018-10-01.zip| | |5. | 08.10 09:00-11:00 | Lab practice on file access. | |Video 08/10/2018 | ex-customers.pdf data-customers.zip| | |6. | 09.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lbi.06.relationaldataaccess-1.pdf |Video 09/10/2018 | | | |7. | 15.10 09:00-11:00 | Lab practice: stratified sampling in ODBC. | lbi.06.relational_data_access-complete.pdf|Video 15/10/2018 | code-2018-10-15.zip| | |8. | 16.10 11:00-13:00 | Introduction to SQL Server. ETL tools: SQL Server Integration Services (SSIS). | lds.07.sqlserver.pdf lds.08.etlandssis.pdf|Video 16/10/2018 | stratifiedsampling.zip| | |9. | 19.10 11:00-13:00 | SSIS samples and lab practice: update and pipeline. | | Video 19/10/2018| lds-ssis-samples.zip ex-midterm.pdf| |10. | 22.10 09:00-11:00 | SSIS samples and lab practice: sampling, update, surrogate keys. | |Video 22/10/2018 | | | |11. | 23.10 11:00-13:00 | SSIS samples and lab practice: surrogate keys, slowly changing dimensions, Mid-term practice| | Video 23/10/2018| 2016ssis.zip | |12. | 25.10 09:00-11:00 | SSIS samples and lab practice: surrogate keys, slowly changing dimensions, Mid-term practice| | Video 25/10/2018| Dissimilarity.py MDP.py exam 14/4/2015 siss-mdp.zip ssis-dissimilarityindex.zip | |13. | 05.11 09:00-11:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. | lds.09.dwandolap.pdf | Video 05/11/2018 First Part Video 05/11/2018 Second Part|lbi.08.afdemo.sql.zip |For DW and OLAP: Decision support databases course lecture notes. | |14. | 07.11 11:00-13:00 |OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. Data cubes.| lds.10.ssas.pdf | Video 06/11/2018| monreale_foodmart.zip Notice: Please read the instructions in the Section NEWS! | 1) SSAS (olap): documentation; 2) S. Harinath et al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6. | |15.|12.11 09:00-11:00 |Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel. | | Video 12/11/2018 First Part Video 12-13/11/2018 | | Pivot Tables in Excel: G. Harvey. Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2. | |16.|13.11 11:00-13:00 |Calculated metrics. ROLAP and MOLAP in SSAS. | |The Video of the previous lecture includes also the topic of this lecture. | foodmartexplorative.xlsx | MDX: 1) documentation and a useful guide on ordering; 2) S. Harinath ed al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3. | |17.|19.11 09:00-11:00 |Practice with MDX. | | Video 19/11/2018 | lbi.09.mdxsample.mdx.zip | | |18.|20.11 11:00-13:00 |Practice with MDX. | | Video 20/11/2018 |lbi.09.mdxpractice.mdx.zip | | |19.|26.11 09:00-11:00 |Practice with MDX. | | Video 26/11/2018| 20170208.pdf | | |20.|27.11 11:00-13:00 | Reporting with Power BI Desktop. Data Mining pre-processing in WEKA. | lds.12.powerbi.pdf lds.13.weka.pdf| Video 27/11/2018 | weka.3.7.9.light.zipwekapatch.zip | | |21.|03.12 09:00-11:00 |WEKA Classification. | meta-cost-classification.pdf | Video 03/12/2018| lsd.practice.ee.pdf Data-ee | | |22.|04.12 11:00-13:00 | WEKA Classification (practice) & AR | lds.14.associationrules.pdf | Video 1 04/12/2018 Video 2 04/12/2018 Video 3 04/12/2018| lds.practicesolution.ee.pdf | | |23.|10.12 09:00-11:00 |WEKA AR & Practice. Weka API. | lds.15.wekaapi.pdf | Video 1 10/12/2018 Video 2 10/12/2018| Python example for WEKA API | | |22.|11.12 11:00-13:00 | Practice for the second midterm| | | Queries sec. Midterm Weka practice Exercise on MDX| | ====== Exams ====== ===== Mid-term exams ===== Rule: Students may do the second mid-term even if they did have the first mid-term. ^ ^ Date ^ Hour ^ Room^ Notes ^ Marks ^ |29.10.2018| 09:00 - 12:00| Room M | | | |17.12.2018| 09:00 - 12:00| Room M | | | ===== Exam sessions ===== Rule: Students having at least one mid-term exam may do only one part of the written exam in the exam sessions. ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ |1.|16.01.2019| 09:00 - 12:00| Room M | | | |2.|06.02.2019| 09:00 - 12:00| Room M | | | |3.|18.06.2019| 09:00 - 13:00| Room H | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September.| | |4.|09.07.2019| 09:00 - 13:00| Room H |Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | | =====Extra sessions A.A. 2017/18===== ^ Date ^ Time ^ Room ^ Notes ^ Results ^ |29.10.2018| 09:00 - 12:00| Room M | | | =====Past Editions ===== * LABORATORY OF DATA SCIENCE (2018/2019) * BUSINESS INTELLIGENCE LAB (2017/2018)