Questa è una vecchia versione del documento!
Indice
BIOINFORMATICS
Lecturer: Nadia Pisanti
NEWS
[May 5th] Seminars' schedule have been published.
[March 8th] Check all the information below concerning the e-larning procedure.
[February 17th] The class of February 24th will terminate 15' earlier.
[February 17th] This page has been created!
CLASSES IN COVID-19 TIMES
Regular frontal classes are suspended from March 5th.
As a consequence, the classes that should have taken place after March 5th will be recorded and made available for download via a google drive linked in this web page. Exceptions:
- Classes of April 20th and April21st (papers assignments for final exams) will take place on-line at the link you will find then in this web page.
- Classes of May 12th, May 18th and May 19th (students' seminars) will also take place on-line.
Prof. Pisanti makes office hours in the same time slots as those of the classes via Skype provided an appointment is arranged by mail (in order schedule time slots). By email we can exchange Skype names and agree the exact time, so please indicate within your email request the amount of time you expect to need; this will allow to schedule your requests.
LEARNING GOALS
This course has the goal to give the student an overview of algorithmic methods that have been conceived for the analysis of genomic sequences. We will focus both on theoretical and combinatorial aspects as well as on practical issues such as whole genomes sequencing, sequences alignments, the inference of repeated patterns and of long approximated repetitions, the computation of genomic distances, and several biologically relevant problems for the management and investigation of genomic data. The exam (see below for its description) has the goal to evaluate the students understanding of the problems and the methods described in the course. Moreover, the exam is additionally meant as a chance to learn how a scientific paper is like, and how to make an oral presentation on scientific/technical topics, that is designed for a specific audience.
CONTENTS
A brief introduction to molecular biology: DNA, proteins, the cell, the synthesis of a protein.
Sequences Alignments: Dynamic Programming methods for local, global, and semi-local alignments. Computing the Longest Common Subsequences. Multiple Alignments.
Pattern Matching: Exact Pattern Matching: algorithms (Knuth-)Morris-Pratt, Boyer-Moore, Karp-Rabin with preprocessing of the pattern. Algorithm with preprocessing of the text: use of indexes.
Motifs Extraction: KMR Algorithm for the extracion of exact motifs and its modifications for the inference of approximate motifs.
Finding Repetitions: Algorithms for the inference of long approximate repetitions. Filters for preprocessing.
Fragment Assembly: Genomes sequencing: some history, scientific opportunities, and practical problems. Some possible approaches for the problem of assembling sequenced fragments. Link with the “Shortest common superstring” problem, the Greedy solution. Data structures for representing and searching sequencing data.
New Generation Sequencing: Applications of High Throughput Sequencing and its algorithmic problems and challenges. Investigating data types resulting from the existing biotechnologies, and the possible data structures and algorithms for their storage and analysis.
KNOWLEDGE REQUIREMENTS
A basic course on algorithmic.
STUDY MATERIAL
SEQUENCES ALIGNMENTS Alignments
PATTERN MATCHING ExactPatternMatching e KnuthMorrisPratt&BoyerMoore KarpRabin87
SUFFIX TREE SuffixTrees
EXACT MOTIFS EXTRACTION KarpMillerRosenberg
APPROXIMATE MOTIFS EXTRACTION KMRC Speller StructuredMotifs
FINDING REPETITIONS: FILTERING TUIUIU
FRAGMENT ASSEMBLY FragmentAssembly
NEW GENERATION SEQUENCING NewGenerationSequencing
OVERVIEW OF SEQUENCING TECNOLOGIES SequencingTechnologies
HAPLOTYPE ASSEMBLY WhatsHap
BUBBLES IN DE BRUIJN GRAPHS (slides) Bubbles
EXAM ASSIGNMENT
Each student is assigned a paper that is a very recent scientific work on topics related to those of the course (tipically it is a paper accepted for publication in the proceedings of an international conference that is going to be held in a few weeks/months). The paper is part of a pool of possible papers selected by the lecturer. The paper assignment follows a brief description of all papers in the pool made by the lecturer, and a bidding phase of the students over such papers. Once the student has his/her paper assigned, the task is to prepare and make a presentation of that work that: (1) describes the results presented in that paper, (2) is suited for the actual audience (that will be the course class) as for comprehension opportunity, (3) sticks to the allowed time slot.
Students presentations usually take place all together somewhen at the end of the course. Exceptions are possible upon request for specific needs. Once the course is over, students can undergo the examination anytime during the academic year by agreeing an appointment: please, send an email to the teacher.
SEMINARS SCHEDULE
Tuesday, May 12th, starting at 11:00:
Lorenzo Beretta: Parameterized Algorithms for Matrix Completion With Radius Constraints
Monday, May 18th, starting at 14:00:
Alessio Russo: On Indeterminate Strings Matching
Alessandro Poggiali: Efficient tree-structured categorical retrieval
Lorenzo Carfagna: Double String Tandem Repeats
Tuesday, May 19th, starting at 11:00:
Pham Tran Huong Giang: Distance Measures for Tumor Evolutionary Trees
Jessica Sartori: Single-Cell Tumor Phylogeny Inference with copy-Number constrained mutation losses
Marco Marino: MosaicFlye: Resolving Long Mosaic Repeats Using Long Reads
Monday, May 25th, starting at 14:00:
Lorenzo Gazzella: Chaining with Overlaps Revisited
Lorenzo Volpi: AStarix: Fast and Optimal Sequence-to-Graph Alignment
Giulia Fois: Spectral Jaccard Similarity: A new approach to estimating pairwise sequence alignments
Michele Zoncheddu: Triplet-based similarity score for fully multi-labeled trees with poly-occurring labels
Davide Domenico Cocca: Strain-aware assembly of genomes from mixed samples using flow variation graphs
"REGISTRO DELLE LEZIONI"
RECORDED LECTURES and MATERIAL
Lecture Date | Topic | Material | |
---|---|---|---|
RECORDED LECTURES | EACH CLASS HAS ONE WHITEBOARD FILE AND TWO VIDEO FILES | Whiteboard is a .pdf and Video* are two (part1 and part2) .mp4 file WHICH YOU NEED TO DOWNLOAD TO HAVE THE AUDIO | |
3/9/2020 | Morris and Pratt Algorithm | ExactPatternMatching Whiteboard VideoPart1 VideoPart2 | |
3/10/2020 | Knuth Morris and Pratt Algorithm, Boyer and Moore Algorithm: idea | KMP&BM Whiteboard VideoPart1 VideoPart2 | |
3/16/2020 | Boyer and Moore Algorithm, Karp and Rabin Algorithm. Multiple Patterns or network expression: Aho-Corasick automata. | KarpRabin87 Whiteboard VideoPart1 VideoPart2 | |
3/17/2020 | Preprocessing the text: Suffix Trees and Suffix Arrays. Approximate Pattern Matching with Hamming Distance, Edit Distance, Don't Care Symbols, Degenerte Alphabets. | Suffix Tree Whiteboard VideoPart1 VideoPart2 | |
3/23/2020 | Motifs Inference: problem definitions with all its variants and applications in Genomics. Exact Motifs Inference with Suffix Trees: algorithm, complexity, examples. Maximality notions and their characterization on suffix trees. | Whiteboard VideoPart1 VideoPart2 | |
3/24/2020 | Exact Motifs Inference with the naming technique: KMR algorithm: description, complexity analysis, examples. Approximate Motifs Extraction: possible notions of approximation, problem definitions, and solutions. | KarpMillerRosenberg Whiteboard VideoPart1 VideoPart2 | |
3/30/2020 | Approximate Motifs Inference with Suffix Tree: Speller algorithm: description, complexity analysis, examples. Structured Motifs Extraction via Suffix Tree. | Speller StructuredMotifs Whiteboard VideoPart1 VideoPart2 | |
3/31/2020 | Approximate Motifs Inference with the naming technique: KMRC algorithm: description, complexity analysis, examples. | KMRC Whiteboard VideoPart1 VideoPart2 | |
4/6/2020 | Finding long repetitions | Filtering Whiteboard VideoPart1 VideoPart2 | |
4/7/2020 | Haplotype Assembly | WhatsHap Whiteboard VideoPart1 VideoPart2 | |
4/20/2020 14:00-16:00 | PAPER ASSIGNMENT 1 | ON-LINE LECTURE at this link List of Papers CPM RECOMBa RECOMBb RECOMBc RECOMBd RECOMBe Recorded Lecture | |
4/21/2020 | PAPER ASSIGNMENT 2 | ON-LINE LECTURE at this link ReadMe Recorded Lecture | |
4/27/2020 | THIS CLASS WILL NOT TAKE PLACE (too many hours in schedule) | ||
4/28/2020 | Fragment Assembly 1 | Fragment Assembly Whiteboard VideoPart1 VideoPart2 | |
5/4/2020 | Fragment Assembly 2 | Fragment Assembly Whiteboard VideoPart1 VideoPart2 | |
5/5/2020 | Bubbles in de Bruin Graphs | Whiteboard VideoPart1 VideoPart2 | |
5/11/2020 | THIS CLASS WILL NOT TAKE PLACE (too many hours in schedule) | ||
5/12/2020 11:00-12:00 | STUDENTS' SEMINARS | ON-LINE SHORT LECTURE at this link | |
5/18/2020 14:00-16:00 | STUDENTS' SEMINARS | ON-LINE LECTURE at this link | |
5/19/2020 11:00-13:00 | STUDENTS' SEMINARS | ON-LINE LECTURE at this link | |
5/25/2020 14:00-16:00 | STUDENTS' SEMINARS | ON-LINE LECTURE at this link |