Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente |
mds:txa:start [29/10/2024 alle 11:03 (5 mesi fa)] – versione precedente ripristinata (20/12/2022 alle 09:37 (23 mesi fa)) Sid Polo2 | mds:txa:start [09/12/2024 alle 10:48 (3 mesi fa)] (versione attuale) – [Previous editions] Laura Pollacci |
---|
| ====== Text Analytics (635AA) A.Y. 2023/24 ====== |
====== Text Analytics (635AA) A.Y. 2022/23 ====== | |
| |
| |
==== Teacher ==== | ==== Teacher ==== |
| |
[[http://luciacpassaro.github.io/|Lucia Passaro]] (lucia.passaro [at] unipi [dot] it) | [[https://laurapollacci.github.io/txa.html|Laura Pollacci]] (laura.pollacci [at] di [dot] unipi [dot] it) |
| |
Office hours: Monday 16-18 via [[https://teams.microsoft.com/l/chat/0/0?[email protected]|Teams]] | Office hours: |
| |
| |
==== Schedule ==== | ==== Schedule ==== |
| |
^ Day ^ Hour ^ Room ^ | ^ Day ^ Hour ^ Room ^ |
| Monday | 9-11 | Fib M1 | | | Thursday | 16-18 | Fib C1 | |
| Friday| 11-13 | Fib M1 | | | Friday| 11-13 | Fib M1 | |
| |
| |
[[https://teams.microsoft.com/l/team/19%3au_2NWnfXHAGPknxec1GtEY5y8UrjGRSAQjuJ1tySJ7w1%40thread.tacv2/conversations?groupId=414d90af-3f1a-4188-9dd7-21ac607e5c1f&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Team of the class]] | [[https://teams.microsoft.com/l/channel/19%3aiBnp7L1JmbHPmkQ3NcO3NrxPDZB-RhMvlQzMRdCrWFM1%40thread.tacv2/Generale?groupId=9e5370ba-93b4-41d0-b0b1-b7464ab92f11&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Team of the class]] |
| |
==== Objectives ==== | ==== Objectives ==== |
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. | The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. |
The main objectives of the course are: | The main objectives of the course are: |
- Learning essential techniques, algorithms, and models used in natural language processing. | - Learning essential techniques, algorithms, and models used in natural language processing. |
- Understanding of the architectures of typical text analytics applications and of libraries for building them. | - Understanding of the architectures of typical text analytics applications and of libraries for building them. |
- Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts. | - Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts. |
| |
* Transfer learning | * Transfer learning |
* Quantification | * Quantification |
| |
| |
==== Lecture Notes ==== | ==== Lecture Notes ==== |
| |
^ Date ^ Lecture ^ Slides ^ Material / Reference ^ | ^ Date ^ Lecture ^ Slides ^ Material / Reference ^ |
| 2022/09/16 | Introduction to the course, NLP & Text Analytics. | [[https://drive.google.com/file/d/1wc6yvn6Y5QrFXyFw53xeB4M6MsMmWssS/view?usp=sharing| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[https://drive.google.com/file/d/17T4zo2uGssKBa_MrHsLW-uSmyP_ZJvpj/view?usp=sharing| Chp. 1]].| | | 2023/09/21 | Introduction to the course, NLP & Text Analytics. | [[https://drive.google.com/file/d/11BPheGG5YiZcNeFFQirMMSIrObEsbayf/view?usp=drive_link| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[https://drive.google.com/file/d/1v455MySmNo5qVSRle676L0pjc4wktVNh/view?usp=drive_link| Chp. 1]].| |
| 2022/09/19 | Reminds on Probability. Language and Probability. | [[https://drive.google.com/file/d/1-exk-JS0_Oa3Eg1ApTGlxonQlL3KbTQG/view?usp=sharing| 2 - Reminds on Probability]]| | | | 2023/09/22 | Reminds on probability. | [[https://drive.google.com/file/d/1fH8sjhnh9dlPcPMwpAYSbsP0tbCaamSV/view?usp=sharing| 2 - Reminds on probability]]| |
| 2022/09/23 | Introduction to Python.| [[https://drive.google.com/file/d/1lpyA0N4K0d0ZTrJgokot1NwC_w4HG6gG/view?usp=sharing| 3 - Introduction to Python]]|[[https://drive.google.com/file/d/1BubwKtByCankjnbClWErvSsw9EjCLnte/view?usp=sharing|Introduction to Python - Notebook.]]| | | 2023/09/28 | Introduction to Python. | [[https://drive.google.com/file/d/1fOn73KfDqlaU-0dgXs4-qkIbm8ZCg8Px/view?usp=sharing| 3 - Introduction to Python]]| [[https://drive.google.com/file/d/16BIcJuP4vB5b5oUmV03R7fX_-wRaFI8Y/view?usp=sharing | L3 - Introduction_to_Python.ipynb]] | |
| 2022/09/30 | Introduction to Python (continued). Project Presentation and Important Dates. | [[https://drive.google.com/file/d/1FjCYvOkZDWomEsJuXD32Vl_155kxnKik/view?usp=sharing|Project and Dates]]| | | | 2023/09/29 | Introduction to Python - part 2. Project and Dates | [[https://drive.google.com/file/d/11E-3DWARykKVZDuB1vuDoXySAPPWYFoq/view?usp=sharing| 4 - Project and Dates]]| |
| 2022/10/03 | Probabilistic Language Models. | [[https://drive.google.com/file/d/1B5HfPtPgK41Ig_NWrPim6YxK3mCF-XSj/view?usp=sharing| 5 - Probabilistic Language models]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1OXSjwE0-ZN6DZ4MELOMp8JVy-tP2_4Iw/view?usp=sharing| Chp. 3]]. [[https://drive.google.com/file/d/1osuyJi5ZbBMghOrQz_IVqMsfxi2-1Vzj/view?usp=sharing| Probabilistic Language Models - Notebook]].| | | 2023/10/05 | Probabilistic language models| [[https://drive.google.com/file/d/1Nj6FgcBSK9otmJwjDj2bxWWulCzPlHZb/view?usp=drive_link|5 - Probabilistic language models]]| D. Jurafsky, J.H. Martin. [[ https://drive.google.com/file/d/1K3B0s0-T3NnpfgmR6NGsZdwWqGoa0S5Q/view?usp=drive_link|Ch3]] [[https://drive.google.com/file/d/13r6wn4jlrOncZ0zUc5efmu2RgqDGUz2g/view?usp=drive_link|L5 Probabilistic Language Model.ipynb]] | |
| 2022/10/07 | Text Indexding: Strings, Regular Expressions and BS4. | [[https://drive.google.com/file/d/1hkkjm5saUiKqL-9KgGgozTBupgIOus74/view?usp=sharing| 6 - Text Indexing-1]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1RO_PGJj0a8v_N0dnw5iK4nGiabaAKZc5/view?usp=sharing| Chp. 2]]. [[https://drive.google.com/file/d/1IX8qSNdSbTFz5n1yMsMqtX9QU6HOofdv/view?usp=sharing| Strings, Regular Expressions and BS4 - Notebook]].| | | 2023/10/06| Text Indexding: Strings, Regular Expressions and BS4. | [[https://drive.google.com/file/d/1Zp6vqh5Wj9YzwtpcgMSxm7NUZ_oN8SW7/view?usp=sharing| 6 - Text indexing 1]] | D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1SH4Em84AEHNzc6OzrhjvW_ggo_0nJiOx/view?usp=sharing|Ch2]] [[https://drive.google.com/file/d/13miwALDtad7ERoObFnlPjeUYBaAfwZGF/view?usp=sharing|L6.1 - Strings Regular expressions and BS4.ipynb]]| |
| 2022/10/10 | Text Indexding: Linguistic annotation. NLTK. | [[https://drive.google.com/file/d/11AjdH0K1W5OytgdofaxlCP_nQlI-rRTB/view?usp=sharing| 6 - Text Indexing-2]]|[[https://drive.google.com/file/d/1uigGIb0_9bX2Gb5g6SX51JHyN3kxN4y_/view?usp=sharing| Linguistic annotation with NLTK - Notebook]].| | | 2023/10/12| Linguistic annotation. NLTK. | [[https://drive.google.com/file/d/1t2WNuMZ1PAE4i_GgPbd-DCJWx8gWnhQf/view?usp=sharing| 6 - Text Indexing 2]]|[[https://drive.google.com/file/d/14ahCe4h45MHn_yMhUbOwO7o8Ms9jl-sD/view?usp=sharing|L6.2 - Linguistic annotation with NLTK.ipynb]] | |
| 2022/10/14 | Text Indexding: Collocations with Gensim. stanza. spacy. Feature selection. | [[https://drive.google.com/file/d/13RDX2D2m8Bhkv0_qddvpWndBYQWoKYpY/view?usp=sharing| 6 - Text Indexing-3]]|[[https://drive.google.com/file/d/12L7nHe9TvZJPSS4RaiyGyIaPnx8cXkrN/view?usp=sharing| L6.3.4 - collocations - stanza - spacy - Notebooks]].| | |2023/10/13| //Lesson canceled due to UNIPI orientation days.//| |
| 2022/10/17 | Text Indexding: Vector space models. | [[https://drive.google.com/file/d/1AhhYq-1mCGqtVcUnvoiAb7c2WYs4CSm2/view?usp=sharing| 6 - Text Indexing-4]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1A1aKTIQh8CnEU8QBkmet1iADpTBAdUHR/view?usp=sharing| Chp. 6]]. [[https://drive.google.com/file/d/1dyn540ISuJ8wMlBkUoFH5J9ctNIcHj54/view?usp=sharing| L6.5 - Vector space model - toy example - Notebook]].| | |2023/10/19| Feature Selection| [[https://drive.google.com/file/d/1iWDaF7BXykUrRwOrIfc8ERlOewXaOQm7/view?usp=sharing|6 - Text Indexing 3]] | [[https://drive.google.com/file/d/1mD4v_ts0A1CHcTrU9nIYz-Nugvok1jks/view?usp=sharing |L6.3 - Gensim collocations - Stanza - Spacy (Notebooks)]] | |
| 2022/10/21 | Machine Learning for Text Analytics. | [[https://drive.google.com/file/d/1eHQR4GhtPjgN7muIRLfyQBXcgM4oQmUK/view?usp=sharing| 10 - Machine Learning for Text Analytics]]| | | |2023/10/20| Vector space models | [[https://drive.google.com/file/d/1JIKfDSAZh3raAfRB_tTGFNqjxgfukKGy/view?usp=sharing|6 - Text Indexing 4]] | D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1Hj3n4qCuZpTIrS_M352QAyH70xC6Fxrg/view?usp=share_link|Chp. 6.]] [[https://drive.google.com/file/d/1RUJYFizlp1ldl2DbmZDDXvw8WhDS6E4k/view?usp=sharing|L6.4 - Vector space model - toy example]]| |
| 2022/10/24 | Student project presentations: proposal, brainstorming, discussion. | | | |2023/10/26| //Lesson canceled//| |
| 2022/10/28 | Student project presentations: proposal, brainstorming, discussion. | | | |2023/10/27| //Lesson canceled//| |
| 2022/11/04 | Machine Learning for Text Analytics. Experiments and Practice. | [[https://drive.google.com/file/d/1HXC4pHde9D7bYYAw4vM6ihS2u7kChgO8/view?usp=share_link| 13 - Experiments]]| [[https://drive.google.com/file/d/1xbRmZ-HudXRIqBbQDpXjOlUNyrq7-yop/view?usp=share_link| Classification sklearn - Notebook.]]| | |2023/11/02| Machine Learning for Text Analytics. | [[ https://drive.google.com/file/d/1zc925Q0yzdmh2nvB0McdQBOeVgJ1aD3R/view?usp=sharing| 10 - Machine Learning for Text Analytics]] - corrected| |
| 2022/11/07 | Topic Modeling. | [[https://drive.google.com/file/d/1ytnJjLHtLT97gCNbzBp_I2TCcY7bfjMN/view?usp=share_link| 14 - Topic modeling]]| Zhai and Massung (2016) Text Data Management and Analysis. [[https://drive.google.com/file/d/1iJ71WZIpWP-cWxLtvsf5L4vp_epJH0uV/view?usp=share_link| Chp 17]].[[https://drive.google.com/file/d/1fKpyNYs9kNlPJpiYiDkzO_j6TkyHM8sS/view?usp=share_link| Topic Modeling - Notebooks.]]| | |2023/11/03| Machine Learning for Text Analytics: Design Experimental Protocols. Student presentations: How to. | [[https://drive.google.com/file/d/1gaaWVORZnp7gJ6ZGloKlyQTSw07SZ8in/view?usp=sharing| 11 - Design Experimental Protocols]]. [[https://drive.google.com/file/d/1b5I7NhRXuzjk93Pea6pyxzzCw31OhD8Z/view?usp=sharing| 11.1 - Student presentations: How to]] | [[https://drive.google.com/file/d/1X0BYS66px-aTYoDVZzx2sTixmaX4agrP/view?usp=sharing | L.11 - Classification with SkLearn]] | |
| 2022/11/11 | A primer on Neural Networks. | [[https://drive.google.com/file/d/1_snMjfUb1z5YLBEHft6HJo65w4EWYD-v/view?usp=share_link|15 - A Primer on Neural Networks]]| | | |2023/11/09| Student project presentations: proposal, brainstorming, discussion. | |
| 2022/11/14 | A primer on Neural Networks (continued). Practice.| | [[https://drive.google.com/file/d/1UEKJ_E1hD92E4OPw5HUOhvf1NKxxTR2T/view?usp=share_link| From SVM to NN, Classification with Keras - Notebooks.]]| | |2023/11/10| Student project presentations: proposal, brainstorming, discussion. | |
| 2022/11/18 | Neural Language Models. Word2vec | [[https://drive.google.com/file/d/1Juf8aMqg_c5wW1KvQxfzvz2A6diV4a4A/view?usp=share_link| 17 - Neural Language Models-1]]|[[https://drive.google.com/file/d/1ffEsnsmb_o3iX9YBkS095UMPrMOMGrdO/view?usp=share_link|Word2vec with Gensim - Notebook.]]| | |2023/11/16| Topic Modeling | [[https://drive.google.com/file/d/1M7EMWkYfqDWZjf6W22yIVJLK0QbJTh_v/view?usp=sharing|12 - Topic Modeling]] | Zhai and Massung (2016) Text Data Management and Analysis. [[https://drive.google.com/file/d/1Cwzon44c0-7b_4bbHyUO6ArolacQFY_5/view?usp=sharing|Chp 17]]. [[https://drive.google.com/file/d/1-Iyz860uAII3pplAk_VMqi5gK5N_S4pD/view?usp=sharing |L.12 -Topic Modeling - Notebook.]]. [[https://drive.google.com/file/d/1H60PV4Wt5gRs_B6MB4J2YJ-gsiySf6lv/view?usp=sharing|L.12.1 - Topic Modeling pyLDAvis - Notebook]]| |
| 2022/11/21 | Neural Language Models. Doc2vec. Transformer. BERT. | [[https://drive.google.com/file/d/10_VjJacKzajp7yNuSOhZo-nNUJjChkcN/view?usp=share_link| 18 - Neural Language Models-2]]|D. Jurafsky, J.H. Martin. Chps. [[https://drive.google.com/file/d/14oI6vsl4KCpGyamBbVjeYPTuzWOSNEtV/view?usp=share_link|7]] [[https://drive.google.com/file/d/1wonZ08i0etFhEMSjQEVU2vKf6UyUjEHb/view?usp=share_link|9]][[https://drive.google.com/file/d/1BsCfRzp3t6xAe4GfUTZbHfBujXyxtdAA/view?usp=share_link|11]].[[https://drive.google.com/file/d/1hs6ffqsn1gLM6RXSsFcTYjfh-AjFYXDu/view?usp=share_link|Doc2vec with Gensim - Notebook.]]| | |2023/11/17| A primer on Neural Networks |[[https://drive.google.com/file/d/1MS7upbsydqkPMIRfYv9pKHXz2mfGb1ST/view?usp=sharing |13 - A primer on Neural Networks]] | |
| 2022/11/25 | Seminar (Alessandro Bondielli). Evaluating strategies for Automatic Profiling of Résumés.| |[[https://drive.google.com/file/d/1dopSg44-kSGhIo3nv2wLFis7IRePp6xM/view?usp=share_link|A case study.]] | | |2023/11/23|Neural Networks | [[https://drive.google.com/file/d/13tQ1m-ogPR3R_PSAWLDomvPmsBal8E55/view?usp=sharing | 14 - Neural Networks]] | [[https://drive.google.com/file/d/1ZP9WN4OTSw2VoO7jWIpJlWBh_oGFwxjN/view?usp=sharing| From SVM to NN, Classification with Keras - Notebooks.]] | |
| 2022/12/02 | Student project presentations: ongoing experiments. Discussion. | | | |2023/11/24| Neural Language Models | [[https://drive.google.com/file/d/1vezeT7l6Wd9D0otEYXSAjg0ih1XoggmW/view?usp=sharing| 15 - Neural Language Models]]| D. Jurafsky, J.H. Martin. Chps. [[https://drive.google.com/file/d/10SjSlr4bk6jBWTEkA4vsTUomB8y4iJ-C/view?usp=sharing|7]] [[https://drive.google.com/file/d/1MkfAsC-rY6HuWM6ZTS1TB8LoLxN-sPPq/view?usp=sharing|9]] [[https://drive.google.com/file/d/1P3j4qTH6IH_R42huYLL83cvPd1Ci2Ar1/view?usp=sharing|11]] | |
| 2022/12/05 | Student project presentations: ongoing experiments. Discussion. | | | |2023/11/30| Student project presentations: ongoing experiments. Neural Language Models Practice | [[https://drive.google.com/file/d/1Dc0l2zQfX9poOymZKrhYiHMUiv9TT7m_/view?usp=sharing|16 - Neural Language Models Word2Vec]]| [[https://drive.google.com/file/d/14BIROGvYzNjbmmVzZqeiY-tLkhRAR8tW/view?usp=sharing |Word2vec - Notebook.]]| |
| 2022/12/09 | Fine-tuning BERT. Advanced applications (Conversational Agents, Affective Computing).| [[https://drive.google.com/file/d/1RdiNnhM5he2ZIfLBFZ-dPdhlDZMZEHjO/view?usp=share_link| 22 - Advanced applications]]| [[https://drive.google.com/file/d/1q7ZsRYoA4fL4e0VRytezq1-b6s-FpkJD/view?usp=share_link|BERT finetune - Notebooks]]. Recommended chapters: D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1BWfVPq4HiTWzvUHGaqaEkJEWieQxhf-g/view?usp=share_link|20]];[[https://drive.google.com/file/d/148pdYBYtUCwCHR349-HDjEMJMS11ONDi/view?usp=share_link|24]].| | |2023/12/01| Student project presentations: ongoing experiments. Neural Language Models Practice | [[https://drive.google.com/file/d/1R4Yfr5v8ygsK61dV-h-mZhU_iY0OuZmK/view?usp=sharing|17 - Neural Language Models Doc2Vec]]|[[https://drive.google.com/file/d/1JaGXJE-rF3Yvmtd1Je8NCdDapLiL17Pg/view?usp=sharing|Doc2Vec - Notebook]]| |
| |2023/12/07| Neural Language Models - part 2 |[[https://drive.google.com/file/d/1QxmavpSIjX1x46UkNR1RflY64Sbc3vLs/view?usp=sharing|Neural Language Models - part 2]]| |
| |2023/12/11| BERT. Project Submission |[[https://drive.google.com/file/d/1JX6HCObZYtLUApYJDl1ftDTl5nKn-aHi/view?usp=sharing| 19 - Bert]]. [[https://drive.google.com/file/d/1GOwUTqWnkONM-SI8D0JANGKuqX0pBp35/view?usp=sharing|Project Submission]]| [[ https://drive.google.com/file/d/1JX6HCObZYtLUApYJDl1ftDTl5nKn-aHi/view?usp=sharing|Bert - Notebooks]] | |
| |2023/12/14| Advanced Topics | [[ https://drive.google.com/file/d/14zg2w7-s_cpIJQBwGfXoj_yfjZNZLYQh/view?usp=sharing |20 - Advanced Topics]]| Recommended chapters: D. Jurafsky, J.H. Martin. [[https://drive.google.com/file/d/1ik_BGxKUNAi5GwQZQv4vI9Gqvv4wkWK9/view?usp=sharing|20]];[[https://drive.google.com/file/d/1VJbNelq63EagAxdgleJu2isJVBBb_vkl/view?usp=sharing|24]].| |
| |
==== Exam ==== | ==== Exam ==== |
| |
| |
** Non-Attending students ** | ** Non-Attending students ** |
| |
The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course. | The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course. |
| |
Further bibliography will be indicated as a material for the single lessons. | Further bibliography will be indicated as a material for the single lessons. |
==== Previous editions ==== | |
| |
| |
| ==== Previous editions ==== |
| * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1730200187|2022-2023]] |
* [[http://didawiki.cli.di.unipi.it/doku.php/mds/txa/start?rev=1649067582|2021-2022]] | * [[http://didawiki.cli.di.unipi.it/doku.php/mds/txa/start?rev=1649067582|2021-2022]] |
* [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1612257498|2020-2021]] | * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1612257498|2020-2021]] |