The basic idea behind this learning algorithm is pretty simple. First of all, it is an RLS method in kernel feature space, which means it is capable of learning a nonlinear regression in an online manner. Kernel RLS methods usually face memory and computational problems, since their solutions require to invert a large kernel matrix that incorporates information about all processed data. As new data patterns (x,y) arrive (remember this is set in an online environment), this matrix grows and the computation of the regression output becomes more costly in each step. The typical approach to avoid these memory and computational problems is to restrict the data patterns to a minimum number that allows to approximate the solution sufficiently well. Each time a newly arrived pattern is found to be sufficiently interesting, it is added to a "dictionary".

In contrast to these techniques, which make the dictionary expand (albeit slowly), the technique of this paper

*fixes*the size of the dictionary. As a consequence, apart from adding patterns to the dictionary, we also have to prune patterns. While the pruning itself involves only simple algebra, the criterion to determine what data to prune is more interesting. We chose to use a relevance-based criterion in this paper, motivated by the good results it yielded. More specifically, in each time step we calculate how relevant each stored pattern is with respect to the regression, and subsequently prune the least relevant pattern. (Details and references are in the paper itself.) As a result, after running a number of iterations of this algorithm, it obtains a compact set of patterns that summarizes the observed patterns pretty well.

I particularly like the interpretation of this algorithm as a simple learning method that deals with memory restrictions. Every time this method receives a new data pattern, it must determine actively what pattern to "forget" in order to maintain its memory size. So it's a way of "optimal forgetting". And as the results in the paper point out, it is also pretty efficient in doing so.

Figures: The previous figure is one of the slides. The below figure is an illustration of the "label update" procedure that can be introduced to equip the algorithm with tracking capability. I.e., in a time-varying environment the regression surface can be changing, as indicated by the new data pattern (xn,yn). The proposed procedure updates the labels in the vicinity of this point to reflect those changes.

But enough talking about this paper. My friends and colleagues are also presenting theirs, so if you are at the ICASSP be sure to check them out:

- Il Park and Jose Principe, "Quantification of Inter-trial Non-stationarity in Spike Trains from Periodically Stimulated Neural Cultures" (Multivariate and Multimodal Analysis of Brain Signals, Tuesday, March 16, 13:50 - 14:10).
- Sohan Seth, Jose C. Principe. A conditional distribution function based approach to design nonparametric tests of independence and conditional independence (Learning Theory and Models III, Wednesday, March 17, 16:00 – 18:00).
- David Ramirez, Javier Via, Ignacio Santamaria, Roberto Lopez-Valcarce, Louis L. Scharf, "Multiantenna Spectrum Sensing: Detection Of Spatial Correlation Among Time-Series With Unknown Spectra" (SPCOM-L1: Spectrum Sensing for Cognitive Radio I, Tuesday, March 16, 14:10 - 14:30).
- Javier Vía, David Ramírez, Ignacio Santamaría, Luis Vielva, "Widely And Semi-Widely Linear Processing Of Quaternion Vectors" (Detection and Estimation Techniques - II, Thursday, March 18, 13:30 - 15:30).
- Luis Vielva, Javier Vía, Jesús Gutiérrez, Óscar González, Jesús Ibáñez, Ignacio Santamaría, "Building A Web Platform For Learning Advanced Digital Communications Using A Mimo Testbed" (Signal Processing Education: Signal Processing Education, Tuesday, March 16, 14:30 - 14:50)
- Abhishek Singh, Jose Principe. Kernel width adaptation in information theoretic cost functions (Learning Theory and Models III, Wednesday, March 17, 16:00 – 18:00).
- Abhishek Singh, Jose Principe. A closed form recursive solution for maximum correntropy training (Learning Theory and Models III, Time: Wednesday, March 17, 16:00 – 18:00).
- Abhishek Singh, Tejaswi Tamminedi, Guy Yosiphon, Anurag Ganguli, Jacob Yadegar. Hidden markov models for modeling blood pressure data to predict acute hypotension (Bioinformatics and Biomedical Signal Processing, Tuesday, March 16, 16:00 – 18:00).

PS: The paper

## 2 comments:

Finite resource learning problems are very important indeed. It forces one to compress as much as possible to the existing storage. I like the simple yet elegant idea of this paper.

nice stuff!!

Post a Comment