Overview
Partially observable Markov decision process (POMDP) is a
popular model for planning under uncertainty.
Classification and regression are standard statistical tools for
reconstructing a source (or its attributes) from noise-corrupted
data. Studies of POMDPs and classification/regression have been
mostly pursued independently in the past. Recently, however, there
have emerged a number of papers reporting using classification/regression
techniques to solve POMDPs or using a POMDP to build cost-sensitive classifiers.
Much work, however, is still underway in exploring the possibilities of how
POMDP and classification/regression techniques can be applied to each other
in a mutually beneficial way. The aim of this workshop is to bring together
researchers from the POMDP community and researchers from the statistical
learning community, and to create an opportunity for exchanging views and
reporting on-going work on how a POMDP and a classifier/regressor can mutually
benefit each other.
The possibilities of research on this subject have not at all been explored
to their full extent and it is time to bring this new interdisciplinary
area to the attention of additional researchers. We believe that a broader range
of contributions will be stimulated to both POMDP and classification/regression
by looking at them from new and unified perspectives.
This is a full-day workshop, consisting of invited and contributed presentations and having
an emphasis on interactive discussions.
Related work
Kearns et al. [1] showed that the concept "sample complexity"
used in classification can be extended to the POMDP, and they
established an upper bound on the number of trajectories that must
be used to insure good generalization. Their work is pioneering in
trajectory-based methods and in relating POMDP to classification.
Several researchers investigated using modern classifiers like the
SVM to learn MDP policies, including Dietterich and Wang [2],
Lagoudakis and Parr [3], and Blatt and Hero [7]. Bagnell et al. [4]
reported some preliminary results on classification-based policy
search in POMDPs, and Langford and Zadrozny [5] did some theoretic
analysis on this. Mahadeva [6] and Li et al. [8] studied the
regression methods in POMDPs.
Along the contrary line, Dimitrakakis and Bengio [11] reported using MDP
as a gating network in mixture of experts; Bonet and Geffner [9], Guo [10]
applied POMDP techniques to classification problems in which the class features and
mis-classification are cost-sensitive. The main drawback of the
methods in [9-10] is that the features are assumed independent.
Relaxation of this naive Bayes assumption is studied in [12] and
encouraging results are reported.
The work in [1-12] signals nontrivial relationships between POMDPs and
classification/regression that can be utilized to the benefits of both.
References
- M. Kearns, Y. Mansour and A. Y. Ng., "Approximate planning in large POMDPs via reusable trajectories", NIPS 12, 2000
- T. Dietterich, X. Wang, "Batch Value Function Approximation via Support Vectors", NIPS 14, 2001
- M. Lagoudakis, R. Parr, "Reinforcement Learning as Classification: Leveraging Modern Classifiers", ICML, 2003
- J. A. Bagnell, S. Kakade, A. Y. Ng and J. Schneider, "Policy search by dynamic programming", NIPS 16, 2004
- J. Langford, B. Zadrozny, "Relating Reinforcement Learning Performance to Classification Performance", ICML, 2005
- S. Mahadeva, "Proto-Value Functions: Developmental Reinforcement Learning", ICML, 2005
- D. Blatt, A. Hero, "From Weighted Classification to Policy Search", NIPS, 2005
- H. Li, L. He, X. Liao, S. Ji, L. Carin, "Region-Based Value Iteration and Its Application to Robot Navigation in a Minefield",
NIPS Workshop on Machine Learning Based Robotics in Unstructured Environments, 2005
- B. Bonet, H. Geffner, "Learning Sorting and Decision Trees with POMDPs", ICML, 1998
- A. Guo, "Decision-theoretic Active Sensing for Autonomous Agents", AAMAS, July 2003
- C. Dimitrakakis, S. Bengio, "Online Policy Adaptation for Ensemble Classifiers",
Proceedings of European Symposium on Artificial Neural Networks, 28-30, 2004
- H. Li, X. Liao, L. Carin, "A Value-directed Bayesian Classifier", ICASSP, 2006
Organization
Organizing Committee
Xuejun Liao, Duke University, USA
Lawrence Carin, Duke University, USA
Program Committee
Alfred Hero , University of Michigan at Ann Arbor, USA
Carey E. Priebe, Johns Hopkins University, USA
Ronald Parr, Duke University, USA
Carey Schwartz, DARPA/DSO, USA
Douglas Cochran, Arizona State University , USA
Vikram Krishnamurthy, University of British Columbia, Canada
David Castanon, Boston University, USA