Questa è una vecchia versione del documento!
<html> <!– Google Analytics –> <script type=“text/javascript” charset=“utf-8”> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); ga('personalTracker.require', 'linker'); ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); ga('personalTracker.require', 'displayfeatures'); ga('personalTracker.send', 'pageview', 'ruggieri/teaching/bda/'); setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000); </script> <!– End Google Analytics –> <!– Global site tag (gtag.js) - Google Analytics –> <script async src=“https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W”></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-LPWY0VLB5W'); </script> <!– Capture clicks –> <script> jQuery(document).ready(function(){ jQuery('a[href$=“.pdf”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'BDA', 'PDFs', fname); }); jQuery('a[href$=“.r”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'BDA', 'Rs', fname); }); jQuery('a[href$=“.zip”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'BDA', 'ZIPs', fname); }); jQuery('a[href$=“.mp4”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'BDA', 'Videos', fname); }); jQuery('a[href$=“.flv”]').click(function() { var fname = this.href.split('/').pop(); ga('personalTracker.send', 'event', 'BDA', 'Videos', fname); }); }); </script> </html> ====== Big Data Analytics A.A. 2022/23 ====== This year, the course 599AA Big Data Analytics (BDA) is replaced by 783AA Geospatial Analytics (GSA). For any questions, please contact Luca Pappalardo (luca [dot] pappalardo [at] isti [dot] cnr [dot] it). ====== Learning goals ====== In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs, and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives: * introducing to the emergent field of big data analytics and social mining; * introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling; * guide students to the development of an open-source and reproducible big data analytics project, based on the analysis of real-world datasets. ====== Module 1: Big Data Analytics and Social Mining ====== In this module, analytical methods and processes are presented through exemplary cases studies in challenging domains, organized according to the following topics: * The Big Data Scenario and the new questions to be answered * Sports Analytics: - Soccer data landscape and injury prediction - Analysis and evolution of sports performance * Mobility Analytics - Mobility data landscape and mobility data mining methods - Understanding Human Mobility with vehicular sensors (GPS) - Mobility Analytics: Novel Demography with mobile-phone data * Social Media Mining - The social media data landscape: Facebook, Linked-in, Twitter, Last_FM - Sentiment analysis. example from human migration studies - Discussion on ethical issues of Big Data Analytics * Well-being&Now-casting - Nowcasting influenza with retail market data - Predicting well-being from human mobility patterns * Paper presentations by students ====== Module 2: Big Data Analytics Technologies ====== This module will provide to the students the technologies to collect, manipulate and process big data. In particular, the following tools will be presented: * Python for Data Science * The Jupyter Notebook: developing open-source and reproducible data science * MongoDB: fast querying and aggregation in NoSQL databases * GeoPandas: analyze geo-spatial data with Python * Scikit-learn: machine learning in Python * Keras: deep learning in Python ====== Module 3: Laboratory for Interactive Project Development ====== During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed. * 1st Mid Term: Data Understanding and Project Formulation * 2nd Mid Term: Model(s) construction and evaluation * 3rd Mid Term: Model interpretation/explanation * Exam: Final Project results ====== Calendar ====== 15/09 (Mod. 1) Introduction to the course, The Big Data scenario lesson1_introduction_to_the_course_2021.pdf 17/09 (Mod. 2) Python for Data Science and the Jupyter Notebook: developing open-source and reproducible data science * How to install Jupyter notebook: https://jupyter.readthedocs.io/en/latest/install.html * Python notebooks: https://jovian.ai/jonpappalord/collections/bda-2021-2022 * datasets: data_python_for_data_science.zip 22/09 (Mod. 2) Data Exploration and Understanding practice in Python * Python notebooks: https://jovian.ai/jonpappalord/collections/bda-2021-2022 * datasets: data_python_for_data_science.zip 24/09 (Mod. 3) Presentation of datasets for the project bda21_22_datasets_1_.pdf 29/09 (Mod. 2) Scikit-learn: programming tools for data mining (part 1) https://jovian.ai/jonpappalord/classification 01/10 (Mod. 2) Scikit-learn: programming tools for data mining (part 2) https://jovian.ai/jonpappalord/clustering 6/10 (Mod. 2) Geopandas and scikit-mobility: managing geographic data in Python (part 1) * datasets: https://bit.ly/301XRwF * code: https://jovian.ai/jonpappalord/bda-geopandas 8/10 (Mod. 2) Geopandas and scikit-mobility: managing geographic data in Python (part 2) * https://jovian.ai/jonpappalord/collections/scikit-mobility-tutorial 13/10 (Mod. 1) Case study 1: Injury prediction and how to deal with unbalanced datasets and perform feature selection: bda_2122_injury_forecasting.pdf * Prevedere è meglio che curare: AI al servizio dello sport https://www.youtube.com/watch?v=ZrTSLCB7ZLg 15/10 (Mod. 2) Feature selection in Python * notebook: https://jovian.ai/jonpappalord/feature-selection * dataset1: https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009/version/2 * dataset2: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data 20/10 (Mod. 3) MidTerm1 * BigData-Islanders * WeMine * cpu_in_flames 22/10 (Mod. 3) MidTerm1 * How I Met Your Big Data * SLM * The Missing Values 27/10 (Mod. 3) Comments and discussion on first Mid Term 1 tips_mid_1_bda2122.pdf 29/10 (Mod. 1) Case Study 2: How to use Data Science to nowcast well-being bda_wellbeing.pdf 03/11 (Mod. 1) Case Study 3: Performance evaluation in sports * bda_2122_evaluting_soccer_performance.pdf * bda_2122_performance_evaluation.pdf 05/11 NO LESSON 10/11 (Mod. 2) Interpretations and Explanations 1: https://jovian.ai/jonpappalord/explanations 12/11 (Mod. 2) Interpretations and Explanations 2: https://jovian.ai/jonpappalord/explanations2 17/11 (Mod. 3) Mid Term2 * How I Met Your Big Data * WeMine * The Missing Values 19/11 (Mod.3) Mid Term2 * BigData-Islanders * SLM * cpu_in_flames 24/11 NO LESSON 26/11 NO LESSON 01/12 (Mod. 3) Paper presentations * BigData-Islanders * SLM 03/12 (Mod. 3) Paper presentations * cpu_in_flames * The Missing Values 10/12 (Mod. 3) Paper presentations * How I met your Big Data * WeMine 15/12 (Mod. 3) Mid Term 3 * How I Met Your Big Data * BigData-Islanders * cpu_in_flames 17/12 (Mod. 3) Mid Term 3 * WeMine * SLM * The Missing Values ===== Exam (Appelli) ===== - Jan 26th, 2022 - Feb 11th, 2022 ====== Previous Big Data Analytics websites ====== Big Data Analytics A.A. 2021/22 Big Data Analytics A.A. 2020/21 Big Data Analytics A.A. 2019/20 Big Data Analytics A.A. 2018/19 Big Data Analytics A.A. 2017/18 Big Data Analytics A.A. 2016/17 Big Data Analytics A.A. 2015/16