- +====== Text Analytics (635AA) A.Y. 2023/24 ======
-====== Text Analytics (635AA) A.Y. 2022/23 ======+
 ==== Teacher ==== ==== Teacher ====
-[[|Lucia Passaro]] (lucia.passaro [at] unipi [dot] it)+[[|Laura Pollacci]] (laura.pollacci [at] di [dot] unipi [dot] it)
-Office hours: Monday 16-18 via [[[email protected]|Teams]]+Office hours:
 ==== Schedule ==== ==== Schedule ====
-^ Day ^ Hour ^ Room ^  +^ Day ^ Hour ^ Room ^ 
-Monday 9-11 | Fib M1 |+Thursday 16-18 | Fib C1 |
 | Friday| 11-13 | Fib M1 | | Friday| 11-13 | Fib M1 |
-[[|Team of the class]]+[[|Team of the class]]
 ==== Objectives ==== ==== Objectives ====
-The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. +The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form.
 The main objectives of the course are: The main objectives of the course are:
   - Learning essential techniques, algorithms, and models used in natural language processing.   - Learning essential techniques, algorithms, and models used in natural language processing.
-  - Understanding of the architectures of typical text analytics applications and of libraries for building them. +  - Understanding of the architectures of typical text analytics applications and of libraries for building them.
   - Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.   - Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.
   * Transfer learning   * Transfer learning
   * Quantification   * Quantification
 ==== Lecture Notes ==== ==== Lecture Notes ====
 ^ Date ^ Lecture ^ Slides ^ Material / Reference ^ ^ Date ^ Lecture ^ Slides ^ Material / Reference ^
-2022/09/16 | Introduction to the course, NLP & Text Analytics. | [[| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[| Chp. 1]].| +2023/09/21 | Introduction to the course, NLP & Text Analytics. | [[| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[| Chp. 1]].| 
-2022/09/19 | Reminds on Probability. Language and Probability. | [[| 2 - Reminds on Probability]]+2023/09/22 | Reminds on probability. | [[| 2 - Reminds on probability]]| 
-2022/09/23 | Introduction to Python.| [[| 3 - Introduction to Python]]|[[|Introduction to Python Notebook.]]| +2023/09/28 | Introduction to Python. | [[| 3 - Introduction to Python]]| [[ | L3 Introduction_to_Python.ipynb]] | 
-2022/09/30 | Introduction to Python (continued). Project Presentation and Important Dates| [[|Project and Dates]]+2023/09/29 | Introduction to Python - part 2. Project and Dates | [[| 4 - Project and Dates]]| 
-2022/10/03 | Probabilistic Language Models. | [[| 5 - Probabilistic Language models]]|D. Jurafsky, J.H. Martin.[[ 3]][[| Probabilistic Language Models - Notebook]].+2023/10/05 | Probabilistic language models| [[|5 - Probabilistic language models]]| D. Jurafsky, J.H. Martin. [[|Ch3]] [[|L5 Probabilistic Language Model.ipynb]] | 
-2022/10/07 | Text Indexding: Strings, Regular Expressions and BS4. | [[| 6 - Text Indexing-1]]|D. Jurafsky, J.H. Martin.[[| Chp. 2]][[| StringsRegular Expressions and BS4 - Notebook]].+2023/10/06| Text Indexding: Strings, Regular Expressions and BS4. | [[| 6 - Text indexing 1]] | D. Jurafsky, J.H. Martin. [[|Ch2]]  [[|L6.1 - Strings Regular expressions and BS4.ipynb]]| 
-2022/10/10 Text Indexding: Linguistic annotation. NLTK. | [[| 6 - Text Indexing-2]]|[[| Linguistic annotation with NLTK - Notebook]].+2023/10/12| Linguistic annotation. NLTK. | [[| 6 - Text Indexing 2]]|[[|L6.2 - Linguistic annotation with NLTK.ipynb]] | 
-2022/10/14 Text Indexding: Collocations with Gensim. stanza. spacy. Feature selection. | [[| 6 - Text Indexing-3]]|[[| L6.3.4 - collocations - stanza - spacy - Notebooks]].+|2023/10/13//Lesson canceled due to UNIPI orientation days.//| 
-2022/10/17 Text Indexding: Vector space models| [[| 6 - Text Indexing-4]]|D. Jurafsky, J.H. Martin.[[| Chp. 6]][[| L6.- Vector space model - toy example - Notebook]].+|2023/10/19| Feature Selection| [[|6 - Text Indexing 3]] | [[ |L6.3 - Gensim collocations - Stanza Spacy (Notebooks)]] | 
-2022/10/21 | Machine Learning for Text Analytics. | [[| 10 - Machine Learning for Text Analytics]]+|2023/10/20| Vector space models | [[|6 - Text Indexing 4]] | D. Jurafsky, J.H. Martin. [[|Chp. 6.]] [[|L6.- Vector space model - toy example]]| 
-2022/10/24 Student project presentationsproposal, brainstorming, discussion| |  +|2023/10/26| //Lesson canceled//
-| 2022/10/28 | Student project presentations: proposal, brainstorming, discussion. | |   +|2023/10/27| //Lesson canceled//
-| 2022/11/04 | Machine Learning for Text AnalyticsExperiments and Practice. | [[ Experiments]]| [[ sklearn - Notebook.]]| +|2023/11/02| Machine Learning for Text Analytics. | [[| 10 - Machine Learning for Text Analytics]] - corrected
-2022/11/07 | Topic Modeling| [[ - Topic modeling]]| Zhai and Massung (2016) Text Data Management and Analysis. [[| Chp 17]].[[| Topic Modeling - Notebooks.]]+|2023/11/03Machine Learning for Text AnalyticsDesign Experimental Protocols. Student presentations: How to. | [[ - Design Experimental Protocols]]. [[ Student presentations: How to]] | [[ L.11 - Classification with SkLearn]] | 
-| 2022/11/11 | A primer on Neural Networks[[|15 A Primer on Neural Networks]]|  +|2023/11/09| Student project presentations: proposal, brainstorming, discussion. | 
-2022/11/14 | A primer on Neural Networks (continued). Practice.| | [[ SVM to NN, Classification with Keras Notebooks.]]| +|2023/11/10| Student project presentations: proposal, brainstorming, discussion. | 
-2022/11/18 | Neural Language Models. Word2vec | [[ - Neural Language Models-1]]|[[|Word2vec with Gensim Notebook.]]| +|2023/11/16| Topic Modeling | [[|12 - Topic Modeling]] | Zhai and Massung (2016) Text Data Management and Analysis. [[|Chp 17]]. [[ |L.12 -Topic Modeling - Notebook.]]. [[|L.12.1 Topic Modeling pyLDAvis - Notebook]]| 
-2022/11/21 | Neural Language Models. Doc2vec. Transformer. BERT. | [[ - Neural Language Models-2]]|D. Jurafsky, J.H. Martin. Chps. [[|7]] [[|9]][[|11]].[[|Doc2vec with Gensim Notebook.]]+|2023/11/17| A primer on Neural Networks |[[ |13 A primer on Neural Networks]] | 
-| 2022/11/25 | Seminar (Alessandro Bondielli). Evaluating strategies for Automatic Profiling of Résumés.| |[[|A case study.]] | +|2023/11/23|Neural Networks | [[ 14 - Neural Networks]] | [[ SVM to NN, Classification with Keras Notebooks.]] | 
-2022/12/02 | Student project presentations: ongoing experiments. Discussion. | |  +|2023/11/24| Neural Language Models | [[ - Neural Language Models]]| D. Jurafsky, J.H. Martin. Chps. [[|7]] [[|9]] [[|11]] 
-2022/12/05 Student project presentationsongoing experimentsDiscussion. | |  +|2023/11/30| Student project presentations: ongoing experimentsNeural Language Models Practice | [[|16 Neural Language Models Word2Vec]]| [[ |Word2vec - Notebook.]]| 
-2022/12/09 Fine-tuning BERT. Advanced applications (Conversational Agents, Affective Computing).| [[ Advanced applications]]| [[|BERT finetune - Notebooks]]. Recommended chapters: D. Jurafsky, J.H. Martin.[[|20]];[[|24]].| +|2023/12/01| Student project presentations: ongoing experiments. Neural Language Models Practice | [[|17 - Neural Language Models Doc2Vec]]|[[|Doc2Vec - Notebook]]
- +|2023/12/07Neural Language Models - part 2 |[[|Neural Language Models - part 2]]
 +|2023/12/11| BERT. Project Submission |[[ Bert]]. [[|Project Submission]]| [[|Bert - Notebooks]] 
 +|2023/12/14| Advanced Topics | [[ |20 - Advanced Topics]]| Recommended chapters: D. Jurafsky, J.H. Martin. [[|20]];[[|24]].|
 ==== Exam ==== ==== Exam ====
Linea 78: Linea 80:
-** Non-Attending students ** +** Non-Attending students **
 The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course. The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course.
Linea 94: Linea 96:
 Further bibliography will be indicated as a material for the single lessons. Further bibliography will be indicated as a material for the single lessons.
-==== Previous editions ==== 
 +==== Previous editions ====
 +  * [[|2022-2023]]
   * [[|2021-2022]]   * [[|2021-2022]]
   * [[|2020-2021]]   * [[|2020-2021]]
