Introduction to probabilistic graphical models

Paper readings

Description

The reading gives you the opportunity to study in greater depth certain concepts of the course. The topic has to be linked with algorithms, concepts or methods presented in class, but beyond this requirement, the choice is quite open. In particular, it may be tailored to your interests.

The work expected for the paper reading is to read one or several research papers (this depends on the topic you choose) and to write a report that presents the method in detail. In particular, we expect that the main technical ideas of the method will be presented with the appropriate level of details and equations in the report. If libraries are available that already implement the method, simple experiments can complement the presentation of the method, but we are not expecting that you implement complicated algorithms.

The project should be done in groups of three students (to the extend possible). Once you have chosen the reading you would like to do, it is mandatory to have it validated by the teachers (by email, see instructions below) because we would like the readings undertaken to be as diverse as possible.

Evaluation

The reading count for 20% of the final grade for the class. Evaluation will be made on:
* A report (less than 4 pages) to be handed in by January 16th, 2018. The report has to be written in such a way that any student who has followed the class can understand (no need to introduce graphical model concepts). The report has to clearly present (in French or English) the studied problem and the existing approaches. You will be more evaluated on the clarity of the report rather than on its length.

IMPORTANT
(1) Each student or group of students has to obtained the agreement of teachers (by email) before November 22nd, 2016.
(2) Each student or group of students has to submit a small intermediate report (one page, as a latex or text document) presenting the main idea already understood and the plan for the rest of the work, so that some feedback may possibly be given. This intermediate report has to submitted on the Moodle before December 20th, 2016.

List of papers

Note that some rows of the table collects papers on the same topic, but that they do not correspond systematically to sets of items to read for a single reading.

Probabilistic PCA and Probabilistic CCA	Interpretation of PCA as a graphical model close to factorial analysis. A situation where EM has no local minima. Tipping, M. E., Bishop, C. M. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B 61(3):611-622. [pdf] CCA (Canonical Correlation Analysis) is analogous to PCA for the joint analysis of two random vectors X and Y. D. R Hardoon, S. Szedmak, et J. Shawe-Taylor, Canonical correlation analysis: an overview with application to learning methods, Neural Computation 16, no. 12 (2004): 2639-2664. F. R Bach et M. I Jordan, A probabilistic interpretation of canonical correlation analysis,Dept. Statist., Univ. California, Berkeley, CA, Tech. Rep 688 (2005).
Learning graph structure - multinomial models	For complete discrete data, learning of parameters and directed acyclic graph. D. Heckerman, D. Geiger, D. Chickering. Learning Bayesian networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20:197-243, 1995.
Learning graph structure - Gaussian models	For complete Gaussian data, learning of parameters and directed acyclic graph. D. Geiger, D. Heckerman. Learning Gaussian networks. Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 235--243.
Variational methods for inference	Class of method for approximate inference. An introduction to variational methods for graphical models. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. In M. I. Jordan (Ed.), Learning in Graphical Models, Cambridge: MIT Press, 1999 Its application to Bayesian inference. Beal, M.J. and Ghahramani, Z. Variational Bayesian Learning of Directed Graphical Models with Hidden Variables To appear in Bayesian Analysis 1(4), 2006.
Simulation methods for inference (particle filtering)	A simulation for dynamic graphical models Chapter of "polycopie" S. Arulampalam, S. Maskell, N. J. Gordon, and T. Clapp, A Tutorial on Particle Filters for On-line Non-linear/Non-Gaussian Bayesian Tracking, IEEE Transactions of Signal Processing, Vol. 50(2), pages 174-188, February 2002. Doucet A., Godsill S.J. and Andrieu C., "On sequential Monte Carlo sampling methods for Bayesian filtering," Statist. Comput., 10, 197-208, 2000
Semi-Markovian models	A class of models allowing to model the time spent in any given state for a Markov Chain and an HMM. Note from Kevin Murphy [pdf]
Learning parameters in an undirected graphical model (Markov random fields)	Chapter 9 of "polycopie" and articles.
Dynamic graphical models	Chapter of "polycopie". Specific topics to be defined.
General applications of the sum-product algorithms (e.g., to the FFT)	The generalized distributive law, S. M. Aji, R. J. Mceliece Information Theory, IEEE Transactions on, Vol. 46, No. 2. (2000), pp. 325-343.
Independent Component Analysis	A. Hyvarinen, E. Oja (2000): Independent Component Analysis: Algorithms and Application, Neural Networks, 13(4-5):411-430, 2000. Course of Herve LeBorgne: http://www.eeng.dcu.ie/~hlborgne/pub/th_chap3.pdf Jean-Francois Cardoso Dependence, Correlation and Gaussianity in Independent Component Analysis. Journal of Machine Learning Research 4(Dec):1177--1203, 2003. [pdf]
Clustering through a mixture of PCA	M. E Tipping et C. M Bishop, Mixtures of probabilistic principal component analyzers, Neural computation 11, no. 2 (1999): 443-482.
Stochastic relational models	E. M Airoldi et al., Mixed membership stochastic block models for relational data with application to protein-protein interactions, dans Proceedings of the International Biometrics Society Annual Meeting, 2006.
Conditional Random Fields	Charles Sutton, Andrew McCallum An Introduction to Conditional Random Fields for Relational Learning . In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning. MIT Press. 2007.
Dirichlet Process	R. M Neal, Markov chain sampling methods for Dirichlet process mixture models, Journal of computational and graphical statistics 9, no. 2 (2000): 249-265. Y. W Teh, Dirichlet Process, Submitted to Encyclopedia of Machine Learning (2007). A. Ranganathan, The Dirichlet Process Mixture (DPM) Model (2004).
Factorial HMM	Z. Ghahramani et M. I Jordan, Factorial hidden Markov models, Machine learning 29, no. 2 (1997): 245-273
Generalized PCA	M. Collins, S. Dasgupta, et R. E Schapire, A generalization of principal component analysis to the exponential family, Advances in neural information processing systems 1 (2002): 617-624.
Structure learning by L1 regularization	J. Friedman, T. Hastie, et R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9, no. 3 (2008): 432.
Mixture of log-concave densities	Interesting non-parametric models for unimodal distributions. T Chang et G. Walther, Clustering with mixtures of log-concave distributions, Computational Statistics & Data Analysis 51, no. 12 (2007): 6242-6251. G. Walther, Inference and modeling with log-concave distributions, Statistical Science 25 (2010).
Learning non-linear dynamical systems	Idea: use non-linear Kalman filters to learn the dynamics of a non-linear dynamical system form noisy observation of the system: Roweis, Sam, and Zoubin Ghahramani. Learning Nonlinear Dynamical Systems Using the Expectation–maximization Algorithm. Kalman Filtering and Neural Networks 6 (2001): 175–220.
Hawkes process estimation	Hawkes processes are stochastic processes that allow to describe cascade or networks of events that trigger one another via delayed self-exitation. It is possible to design an EM algorithm to learn the structure of the stochastic process Lewis, E., & Mohler, G. (2011). A nonparametric EM algorithm for multiscale Hawkes processes. Preprint.
Particle Filtering	Christian A. Naesseth, Fredrik Lindsten, Thomas B. Schön (2014), Sequential Monte Carlo for Graphical Models See also the section on particle filter in chapter 21 of the polycopié

Examples of applications of Graphical models

These articles present classical applications. They may give you ideas for an applicative project or may be used for article reviews.

Bioinformatics	Chapter of polycopie (ask login/pwd by e-mail) Phylogenetic HMM: A. Siepel et D. Haussler, Phylogenetic hidden Markov models, Statistical methods in molecular evolution (2005), 3, 325-351.
Vision/Speech	Articles from Kevin Murphy: "Using the Forest to See the Trees:A Graphical Model Relating Features, Objects and Scenes" Kevin Murphy, Antonio Torralba, William Freeman. NIPS'03 (Neural Info. Processing Systems) Dynamic Bayesian Networks for Audio-Visual Speech Recognition A. Nefian, L. Liang, X. Pi, X. Liu and K. Murphy. EURASIP, Journal of Applied Signal Processing, 11:1-15, 2002 Optimization for MAP inference in computer vision: MRF Optimization via Dual Decomposition: Message-Passing Revisited, Komodakis, Paragios, Tziritas, ICVV 2007. Longer technical report version Learning tree structured context model for object recognition: In object recognition applying independent single-object detectors may produce semantically incorrect results and even perfect detectors cannot solve context-related tasks in scene understanding. With a probabilistic graphical model it is possible to capture contextual information of a scene and apply it to object recognition and scene understanding problems. Choi, M. J., Torralba, A., & Willsky, A. S. (2012). A tree-based context model for object recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(2), 240-252.
Robotics	Automatic construction of maps Simultaneous Localization and Mapping with Sparse Extended Information Filters Thrun et al. The International Journal of Robotics Research.2004; 23: 693-716. (see also chapter of "polycopie" on Kalman filtering)
Text	Naive Bayes: A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998. Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. [ .ps.gz \| .pdf \| code ]
Text - Natural language processing	S. Vogel, H. Ney, and C. Tillmann. HMM-based word alignment in statistical translation. In Proceedings of the 16th conference on Computational linguistics, pp. 836-841, Morristown, NJ, USA, 1996. Association for Computational Linguistics. Non contextual probabilistic grammars: Notes de cours de CMU, 1999

Deep learning and graphical models

Connectionnist Temporal Classification	An algorithm for recurrent neural network for classification along a sequence: Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning (pp. 369-376). ACM.
Restricted Boltzman Machines	A probabilistic form of neural network: Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: an introduction. Pattern Recognition, 47(1), 25-39.

Implementation of algorithms

N most probable configurations	Implementation of an algorithm (HMM or more complex graphs), from the following articles: Dennis Nilsson, Jacob Goldberger. An Efficient Algorithm for Sequentially finding the N-Best List , IJCAI, 1999 Chen Yanover, Yair Weiss, Finding the M Most Probable Configurations Using Loopy Belief Propagation, NIPS 2003.
Computation of tree-width	Comparing the classical heuristics and finer methods: Mark Hopkins and Adnan Darwiche A Practical Relaxation of Constant-Factor Treewidth Approximation Algorithms Proceedings of the First European Workshop on Probabilistic Graphical Models 2002 Also some exact methods Stefan Arnborg, Derek G. Corneil, Andrzej Proskurowski, Complexity of finding embeddings in a k-tree, SIAM Journal on Algebraic and Discrete Methods (1997)