Questa è una vecchia versione del documento!
<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['', '', ''] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'ruggieri/teaching/lbi/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'PDS', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'ZIPs', fname); }); jQuery('a[href$=“.mp4”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); jQuery('a[href$=“.flv”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'LBI', 'Videos', fname); }); }); </script> </html> ====== LABORATORY OF DATA SCIENCE (2019/2020) ====== Teacher: * Anna Monreale * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa * * anna [dot] monreale [at] unipi [dot] it * Office hours: Monday 9:00-11:00 or by appointment, Room 374/DO, Dept. of Computer Science. * Telephone +39-050-2213119 Teaching assistant: * Roberto Pellungrini * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa * roberto [dot] pellungrini [at] di [dot] unipi [dot] it * Office hours: Wednesday 14:30-16:30, Room 384/DO, Dept. of Computer Science. * Telephone +39-050-2212728 ====== News ===== * [27-12-2019]: Results second midterm test and proposal for oral exam: results-second-midtermtest.pdf. If the proposed date and/or time is not fine for you please write me an email. * [26-11-2019]: Additional Lectures: Friday, Nov 29, 14-16 room L1 and Friday, Dec 06, 14-16 room M. * [23-11-2019]: Exam of the first mid-term test: Results * [21-11-2019]: Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as <your account>_foodmart (2) from project properties select 'Deployment', then rename the database as <your account>_foodmart; (3) click on the button “show all files” just above “Solution explorer” right click on “view code” on the .database file that is visualized, and then change the ID from ruggieri_foodmart into <your account>_foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you mayimport the project from the SSAS server and rename it as <your account>_foodmart (step 4 is still necessary). * [18-11-2019]: The lesson of Tuesday 19/11/2019 will be canceled. * [02-11-2019]: Since we will do the mid-term on 5 Nov the next week the lesson of Thursday will be canceled * [31-10-2019]: On November 4, 11-13, in Room C I'm organising an additional lesson dedicated to practice for the written exam. * [02-10-2019]: Instructions for Microsoft tools installation are available in the Software section. * [09-09-2019]: Lessons will start on Tuesday, 24th. Please, see details below. ====== Hours and Rooms ====== Classes Lessons will be held at: Polo Didattico “L. Fibonacci”, Via F. Buonarroti 4, Pisa. ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 11:00 - 13:00 | LAB M | | Thursday | 11:00 - 13:00 | LAB M | Office hours by appointment, Room 374/DO, Dept. of Computer Science. ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be published in the calendar after each class ===== Past Exams ===== * 2016/17 text, 2015/16 text and 2015/16 solution, 2014/15 text and 2014/2015 solution, 2013/14 text, 2012/13 text and 2012/13 solution. ===== Software===== * Anaconda with Python 3.5 * SQL Server 2016 Developer Edition: SQL Server 2016 Management Studio or SQL Server 2017 Management Studio and SQL Server 2016 Data Tools. For Data Tools my suggestion is to install the version SSDT for VS2015 17.4 which is the same version installed in the laboratory computer. Note: It is mandatory to install Integration Services and Analysis Services . So, during the installation you must select these two elements. * Instruction for SQL Server will be available soon - Optional (not recommended on laptops): SQL Server 2016 Developer Edition can be downloaded from Microsoft or can be downloaded from MSDN-AA. * Microsoft Excel * Power BI Desktop * WEKA: * WEKA API: Wrapper in Python - ===== F.A.Q. ===== * Connection to wi-fi * F.A.Q.s about the labs ====== Class calendar - (2019-2020) ====== ^ ^ Day ^ Topic ^ Slides ^ Registration ^ Data/Software ^ References ^ | | 17.09 11:00-13:00 | Canceled - The lesson will be recovered. | | | | | | 19.09 11:00-13:00 | Canceled - The lesson will be recovered. | | | | |1. | 24.09 11:00-13:00 | Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML|2019-lds.01.introduction.pdf 2019-lds.02.bi_architectures.pdf lds.03.file_data_access.pdf| Video on Introduction Video on File Access | | - BI technology: An Overview of Business Intelligence Technology - File access: File System Interface - File Formats: Introduction to data technologies(Chps. 5, 6), Weka ARFF Format, XRFF Format | |2. | 26.09 11:00-13:00 | Python Recap | Python Recap | Video 26/09/2019 | | | |3. | 01.10 11:00-13:00 | File data access in Python. Lab practice on file access. | lds.05.fileaccess-python.pdf | Video File Access - Python | Sample data| |4. | 03.10 11:00-13:00 |Lab practice on file access and transformation from CSV2ARFF file format. | | Video CSV2ARFF || | |5. | 08.10 11:00-13:00 | Lab practice on file access. | |Video | ex-customers.pdf| | |6. | 10.10 11:00-13:00 | Practice + RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | lbi.06.relationaldataaccess-1.pdf |Video on RDBMS access - Part1 | SolutionEx: 2018-10-09 | | |7. | 15.10 11:00-13:00 | Lab practice: stratified sampling in ODBC. | lbi.06.relational_data_access-complete.pdf |Video on RDBMS access - Part2 || | |8. | 17.10 11:00-13:00 | Introduction to SQL Server. ETL tools: SQL Server Integration Services (SSIS). | lds.07.sqlserver.pdf lds.08.etlandssis.pdf|Video on Sol. Stratified Sampling and ETL tools || | |9. | 22.10 11:00-13:00 | SSIS samples and lab practice pipeline. | | Video on SISS| ex-midterm.pdf| |10. | 24.10 11:00-13:00 | SSIS Dissimilarity - Mid-term practice| | | exam 14/4/2015 | |11. | 29.10 11:00-13:00 | Stratified Sampling + Update| | | Exercises: 20190618.pdf 20190401.pdf | |12. | 31.10 11:00-13:00 | Practice for Midterm 20190206.pdf| | | | |13. | 04.11 11:00-13:00 | Practice for Midterm | | | Ex. Python | |14. | 12.11 11:00-13:00 | SSIS: surrogate keys, slowly changing dimensions| | Video 2019-11-12 | | |15. | 14.11 11:00-13:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. |lds.09.dwandolap.pdf | Video 2019-11-14 | | | |10.11 11:00-13:00 | Cancelled | | | | | |16. | 21.11 11:00-13:00 |OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. Data cubes.| lds.10.ssas.pdf | First Video 21/11/2019 Second Video 21/11/2019 | Notice: Please read the instructions in the Section NEWS! | 1) SSAS (olap): documentation; 2) S. Harinath et al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6. | |17.|26.11 11:00-13:00 |Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel. | | Video 26/11/2019 | | Pivot Tables in Excel: G. Harvey. Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2. | |18.|28.11 11:00-13:00 |ROLAP and MOLAP in SSAS. MDX. | | Video 28/11/2019 | | MDX: 1) documentation and a useful guide on ordering; 2) S. Harinath ed al. Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3. | |19.|29.11 11:00-13:00 |Calculated metrics. MDX Demo. | | Video on ExcelReport Video on MDXQuries | foodmartexplorative.xlsx| | |20.|03.12 11:00-13:00 |Practice with MDX. | | Thi part is covered by the previous video | | | |21.|05.12 11:00-13:00 | Practice with MDX | | Video 5/12/19 | | | |22.|06.12 11:00-13:00 | Reporting with Power BI Desktop. Data Mining pre-processing in WEKA. | lds.12.powerbi.pdf lds.13.weka.pdf| Video 6/12/19|| |23.|10.12 11:00-13:00 | API WEKA |lds.15.wekaapi.pdf | Video 10/12/2019| training set for exercise on Weka validation set for exercise on Weka Python example for WEKA API| |24.|12.12 11:00-13:00 | Practice for the second midterm| | | Queries to solve with MDX (this file is a more complete version of that one published the last lecture) Exercise on MDX Solution Ex.| | ====== Exams ====== ===== Mid-term exams ===== Rule: Students may do the second mid-term even if they did have the first mid-term. ^ Date ^ Hour ^ Room^ Notes ^ Marks ^ |5/11/2019 | 14:00 | H | | | |17/12/2019 | 14:00 | M | | | ===== Exam sessions ===== Rule: Students having at least one mid-term exam may do only one part of the written exam in the exam sessions. ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Extra sessions A.A. 2018/19===== ^ Date ^ Time ^ Room ^ Notes ^ Results ^ |5/11/2019 | 14:00 | H | | | =====Past Editions ===== * LABORATORY OF DATA SCIENCE (2019/2020) * LABORATORY OF DATA SCIENCE (2018/2019) * BUSINESS INTELLIGENCE LAB (2017/2018) * LBI 2016/2017