Indice

Machine Learning: Neural Networks and Advanced Models (AA2)

Apprendimento Automatico: Reti Neurali e Modelli Avanzati

Instructor: Davide Bacciu

Contact: email - phone 050 2212749

Office: Room 367, Dipartimento di Informatica, Largo B. Pontecorvo 3, Pisa

Office Hours: Tuesday, 17-19


News

(04/04/2016) Note for Students of Academic Year 2015/2016 The AA2 course is inactive during year 2015/2016. Students interested in the course can take the replacement course "Computational Neuroscience" from the M.Sc. in Bionics Engineering.

(02/04/2015) List of midterm assignments to students is now out

(13/03/2015) Midterm reading list and dates now out

(25/02/2015) Updated course schedule - A new course schedule is now in place. Beware that the lecture of Monday 02/03/2015 will exceptionally be held in room C1 at 16-18

(29/01/2015) Course information - Added course description, topics and reference materials.

(20/01/2015) Course Didawiki first online - Preliminary information on course schedule. More information to come by early February.

Course Information

Weekly Schedule

The course is held on the second term. The preliminary schedule for A.A. 2014/15 is provided in table below.

Note that the lecture of Monday 02/03/2015 will exceptionally be held in room C1 at 16-18.

Day Time Room
Monday 11-13 C1
Thursday 14-16 C1

Objectives

Machine learning has recently become a central area of computer science, playing a major role in the development of a wide range of advanced applications. Machine learning solutions are used to address a variety of problems in computers science (e.g. search engines, machine vision), engineering (e.g. robotics, signal processing) as well as in other research and application areas (biology, chemistry, medicine), leading to novel multidisciplinary areas, such as BioInformatics, ChemInformatics, biomedical signal processing, etc. Providing solutions to challenging applications requires the ability to design machine learning models capable of dealing with complex domains, that include noisy, hard-to-interpret, semantically rich information, such as natural language documents, images and videos, as well as non-vectorial relational information, such as sequences, trees and graphs in general.

The goal of this course is to provide knowledge to become a specialist in the field of design of novel machine learning models for such advanced applications and complex data domains. The students are expected to gain knowledge of state-of-the-art machine learning models such as recurrent neural networks, reservoir computing, deep learning, kernel methods and probabilistic generative models. The course focuses on the treatment of complex application domains (images, biomedical data, etc) and non-vectorial information, throughout the introduction of adaptive methods for the processing of sequences and structures of variable dimension. Much emphasis is given to the synergy between the development of advanced learning methodologies and the modelling of innovative interdisciplinary applications for complex domains of the Natural Sciences, as well as to the introduction of the students to innovative research themes.

Students completing the course are expected to gain in-depth knowledge of the selected research topics, understand their theory and applications, to be able to individually read, understand and discuss research works in the field. The course is targeted at students who are pursuing specializations in machine learning and computational intelligence, but it is of interest for data mining and information retrieval specialists, roboticists and those with a bioinformatics curricula.

Please feel free to contact the Instructor for advices on the machine learning curricula or on the availability of final projects.

Course Prerequisites

Course prerequisites include knowledge of machine learning fundamentals (e.g. covered through course AA1). Knowledge of elements of probability and statistics, calculus and optimization algorithms are a plus.

Course Overview

The course introduces advanced machine learning models and interdisciplinary applications, with a focus on the adaptive processing of complex data and structured information.

The course is articulated in four parts. The first three parts introduce advanced models associated with three major machine learning paradigms, that are neural networks, probabilistic and Bayesian learning and kernel methods. We will follow an incremental approach starting from the introduction of learning models for sequential data processing and showing how these can be extended to deal with more complex structured domains. The fourth part is devoted to discussing advanced applications, with particular emphasis on multidisciplinary applications. These case studies will show how innovative learning models are introduced from the need to provide solutions to novel applications.

The course hosts guest seminars by national and international researchers working on the field as well as by companies that are engaged in the development of advanced applications using machine learning models.

Topics covered - dynamical recurrent neural networks; reservoir computing; graphical models and Bayesian learning; hidden Markov models; Markov random fields; latent variable models; non-parametric and kernel-based methods; learning in structured domains (sequences, trees and graphs); unsupervised learning for complex data; deep learning; emerging topics and applications in machine learning.

Textbook and Teaching Materials

The course does not have an official textbook covering all its contents. However, good reference books covering parts of the course are listed at the bottom of this section (note that some of them have an electronic version freely available for download).

Lecture slides will be made available on this page by the end of the lessons and they should be sufficient (together with course attendance) to prepare the final exam. Suggested readings are also proposed in the detailed lecture schedule.

The official language of the course is English: all materials, references and books are in English. Classes will be held in English if international students are attending.

Neural Networks:

  [NN] Simon O. Haykin
  Neural Networks and Learning Machines
  Pearson (2008) 

Probabilistic Models:

  [BRML] David Barber
  Bayesian Reasoning and Machine Learning
  Cambridge University Press (2012)

A PDF version of [BRML] and of the associated software are freely available

Inference and Learning:

  [MCK] David J.C. MacKay
  Information Theory, Inference, and Learning Algorithms
  Cambridge University Press (2003)

A PDF version of [MCK] is freely available

Kernel Methods:

  [KM] John Shawe-Taylor and Nello Cristianini
  Kernel Methods for Pattern Analysis
  Cambridge University Press (2004) 

Lectures

Date Room Topic References Additional Material
1 23/2/15 (16-18) C1 Introduction to the course: motivations and aim; course housekeeping (exams, timetable, materials); introduction to structured data slides
2 26/2/15 (14-16) C1 Recurrent Neural Networks: basic models (guest lecture by Alessio Micheli). Time representation: explicit/implicit; feedbacks; shift operator; simple recurrent neural networks [NN] Sect. 15.1, 15.2
3 02/3/15 (16-18) C1 Recurrent Neural Networks: basic models (guest lecture by Alessio Micheli). Properties; transductions; unfolding; RNN taxonomy [NN] Sect. 15.2, 15.3, 15.5
4 05/3/15 (14-16) C1 Recurrent Neural Networks: learning algorithms (guest lecture by Alessio Micheli). BPTT (outline); RTRL (development) [NN] Sect 15.6, 15.7, 15.8
5 09/3/15 (11-13) C1 Recurrent Neural Networks: Reservoir Computing and Echo State Networks (guest lecture by Claudio Gallicchio) slides [1][2] Reservoir Computing and Echo State Networks [3] Echo State Networks
[4] Markovianity and Architectural Factors
6 12/3/15 (14-16) C1 Probabilistic and Graphical Models: probability refresher; conditional independence; graphical model representation; Bayesian Networks slides [BRML] Chapter 1 and 2
[BRML] Sect. 3.1, 3.2 and 3.3.1
7 16/3/15 (11-13) C1 Directed and Undirected Graphical Models: Bayesian Networks; Markov Networks; Markov Blanket; d-separation; structure learning slides [BRML] Sect. 3.3 (Directed Models)
[BRML] Sect. 4.1, 4.2.0-4.2.2 (Undirected Models)
[BRML] Sect. 4.5 (Expressiveness)
8 19/3/15 (14-16) C1 Inference in Graphical Models: inference on a chain; factor graphs; sum-product algorithm; elements of approximate inference slides [BRML] Sect. 4.4 (Factor Graphs)
[BRML] Sect. 5.1.1 (Variable Elimination and Inference on Chain)
[BRML] Sect. 5.1.2-5.1.5 (Sum-product Algorithm)
[5] Factor graphs and the sum-product algorithm
[McK] Sect. 26.1 and 26.2 (More on sum-product)
[BRML] Sect. 28.3-28.5 and [McK] Chapter 33 (Variational Inference)
[BRML] Sect. 27.1-27.4 and [McK] Chapter 29 (Sampling methods)
9 23/3/15 (11-13) C1 Dynamic Bayesian Networks I: Hidden Markov Models; forward-backward algorithm; generative models for sequential data [BRML] Sect. 23.1.0 (Markov Models)
[BRML] Sect. 23.2.0-23.2.4 (HMM and forward backward)
[7] A classical tutorial introduction to HMMs
10 26/3/15 (14-16) C1 Processing of structured domain in ML: Recursive Neural Networks for trees (guest lecture by Alessio Micheli) [6] General framework for adatptive processing of structured data
11 30/3/15 (11-13) C1 Dynamic Bayesian Networks II: EM learning; applications of HMM slides [BRML] Sect. 23.3.1 (Learning in HMM)
[BRML] Sect. 23.4.2 (Input-output HMM)
[BRM] Sect. 23.4.4 (Dynamic BN)
[7] A classical tutorial introduction to HMMs
12 02/4/15 (14-16) C1 Question & Answering; Exercises on graphical models; Midterm exam arrangments
MID 14/4/15 (15-18) C1 Midterm Exams
13 16/4/15 (14-16) C1 Generative Modeling of Tree-Structured Data slides [7] [8] Bottom-up hidden tree Markov models
[9] Top-down hidden tree Markov model
[10] Learning tree transductions
[11] Tree visualization on topographic maps
14 20/4/15 (11-13) C1 Latent Topic Models slides [BRML] Sect. 20.4-20.6.1 [12] LDA foundation paper
[13] A gentle introduction to latent topic models
15 23/4/15 (14-16) C1 Reservoir Computing for Trees and Graphs (guest lecture by Claudio Gallicchio) slides [14] TreeEsn
[15] GraphEsn
[16] Additional on GraphEsn
[17] Constructive NN for graphs
16 27/4/15 (11-13) C1 Deep Learning slides [18] A classic divulgative paper from the initiator of Deep Learning
[19] Recent review paper
[20] A freely available book on deep learning from Microsoft RC
17 30/4/15 (14-16) C1 Kernel and non-parametric methods: kernel method refresher; kernels for complex data (sequences, trees and graphs); convolutional kernels; adaptive kernels slides [KM] Chapters 2 and 9 - Kernel methods refresher and kernel construction
[KM] Chapter 11 - Kernels for structured data
[KM] Chapter 12 - Generative kernels
[21] Generative kernels on hidden states multisets
18 04/5/15 (11-13) C1 Kernel and non-parametric methods: Linear and Non-Linear Dimensionality Reduction (guest lecture by Alexander Schulz) slides [BRML] Sect. 15.1-15.2 PCA
[BRML] Sect. 15.7 Kernel PCA
[22] t-SNE
19 07/5/15 (14-16) C1 Kernel and non-parametric methods: Recent Advances in Dimensionality Reduction (guest lecture by Alexander Schulz) slides
20 11/5/15 (11-13) C1 An Overview of ML research at UNIPI; final project proposals
21 18/5/15 (11-13) C1 Company Talk: Henesis (Artificial Perception)
22 21/5/15 (14-16) C1 Company Talk: Kode Solutions
23 21/5/15 (16-18) C1 Final lecture: course wrap-up; final project assignments; exam information

Exams

Course examination for students attending the lectures is performed in 3 stages: a midterm assignment, a final project and an oral presentation. Passing the exam requires to successfully complete ALL the 3 stages.

Midterm Assignment

Students will be asked to pick 1 article from the reading list and to prepare a short presentation to be given in front of the class. The presentation, in order to be successful, should (at least) answer the questions associated to the article in the reading list, which tipically include a mathematical derivation of a major theoretical result or of a learning algorithm reported in the paper. The assignment is due in the middle of the term.

Midterm assigment for academic year 2014/15

Time: Tuesday 14th April 2015, h. 15.00 - Room: C1

Reading list for academic year 2014/15

Reading list assignments for academic year 2014/15

Final project

Students can choose from a set of topics/problems suggested by the instructor or propose his/her own topic to investigate (within the scope of the course). Projects can be of the following type

Students must select the project type and topic before the last lecture of the course. The project report/software should be handled (at least) 7 days before its oral presentation.

NEW!! Project reports should be formatted using the provided LaTex or MS Word templates.

Oral Presentation

Prepare a seminar on the project to be discussed in front of the instructor and anybody interested. Students are expected to prepare slides for a 25 minutes presentation which should summarize the ideas, models and results in the report. The exposition should demonstrate a solid understanding of the main ideas in the report as well as of the key concepts of the course.

Alternative Exam Modality

Working students and those not attending course lectures will handle a final project as above and will also be subject to an oral examination including both an oral presentation of the project as well as an examination on the course program (models, algorithms and theoretical results). Students should contact the instructor by mail to arrange project topics and examination dates.

Further Readings

[1] M. Lukosevicius, H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review vol. 3(3), pag. 127-149, 2009

[2] H. Jaeger, H. Haas, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, vol.304, pag. 78-80, 2004

[3] H. Jaeger, The “echo state” approach to analysing and training recurrent neural networks, GMD - German National Research Institute for Computer Science, Tech. Rep., 2001

[4] C. Gallicchio, A. Micheli, Architectural and markovian factors of echo state networks, Neural Networks, vol. 24(5), pag. 440–456, 2011

[5] Kschischang, Frank R., Brendan J. Frey, and H-A. Loeliger. “Factor graphs and the sum-product algorithm.” Information Theory, IEEE Transactions on 47.2 (2001): 498-519.

[6] P. Frasconi, M. Gori, and A. Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks. Vol. 9, No. 5, pp. 768-786, 1998.

[7] Lawrence R. Rabiner:a tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, pages 257-286

[8] D. Bacciu, A. Micheli and A. Sperduti, “Compositional Generative Mapping for Tree-Structured Data - Part I: Bottom-Up Probabilistic Modeling of Trees”, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1987-2002, 2012

[9] M. Diligenti, P. Frasconi, M. Gori, “Hidden tree Markov models for document image classification”, IEEE Transactions. Pattern Analysis and Machine Intelligence, Vol. 25, pp. 519-523, 2003

[10] D. Bacciu, A. Micheli and A. Sperduti, “An Input-Output Hidden Markov Model for Tree Transductions”, Neurocomputing, Elsevier, Vol. 112, pp. 34-46, Jul, 2013

[11] D. Bacciu, A. Micheli and A. Sperduti, “Compositional Generative Mapping for Tree-Structured Data - Part II: Topographic Projection Model”, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 2, pp. 231-247, Feb 2013

[12] D. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003

[13] D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012.

[14] C. Gallicchio, A. Micheli. Tree Echo State Networks. Neurocomput. 101, 319-337, 2013.

[15] C. Gallicchio, A. Micheli. Graph echo state networks, Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, 2010.

[16] C. Gallicchio, A. Micheli. Supervised State Mapping of Clustered GraphESN States, In Frontiers in Artificial Intelligence and Applications, WIRN11, Vol. 234, pp. 28-35, 2011

[17] A. Micheli, Neural network for graphs: a contextual constructive approach, IEEE Transactions on Neural Networks, volume 20 (3), pag. 498-511, doi: 10.1109/TNN.2008.2010350, 2009

[18] G.E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.Science 313.5786 (2006): 504-507.

[19] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 35(8) (2013): 1798-1828.

[20] L. Deng and D. Yu. Deep Learning Methods and Applications, 2014

[21] D. Bacciu, A. Micheli and A. Sperduti, Integrating Bi-directional Contexts in a Generative Kernel for Trees, Proceedings of the 2014 IEEE International Joint Conference on Neural Networks (IJCNN'14), pp.4145 - 4151, IEEE, 2014

[22] L. van der Maaten, G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research, Vol. 9, pp. 2579-2605, 2008