Workshop on

POMDP, Classification and Regression: Relationships and Joint Utilization

Held in conjunction with
ICAPS'06

16th International Conference on Automated Planning & Scheduling

June 6-10th, 2006
Ambleside, The English Lake District, U.K


Schedule
Call for Papers (in plain text)
Overview
Related Work
Topics
Important Dates
Paper Submissions
Organization

New! The papers and slides have been posted.

The schedule is now available. There is a moderated discussion after the technical presentations. All participants of the workshop are welcome to participate in the discussion and answer the many challenging open questions posted in the schedule. In addition, the participants are welcome to propose their own questions relevant to the workshop theme. If you want your questions posted in advance in the workshop schedule, you can do so by sending your questions to xjliao@ee.duke.edu.

Overview

Partially observable Markov decision process (POMDP) is a popular model for planning under uncertainty. Classification and regression are standard statistical tools for reconstructing a source (or its attributes) from noise-corrupted data. Studies of POMDPs and classification/regression have been mostly pursued independently in the past. Recently, however, there have emerged a number of papers reporting using classification/regression techniques to solve POMDPs or using a POMDP to build cost-sensitive classifiers.

Much work, however, is still underway in exploring the possibilities of how POMDP and classification/regression techniques can be applied to each other in a mutually beneficial way. The aim of this workshop is to bring together researchers from the POMDP community and researchers from the statistical learning community, and to create an opportunity for exchanging views and reporting on-going work on how a POMDP and a classifier/regressor can mutually benefit each other.

The possibilities of research on this subject have not at all been explored to their full extent and it is time to bring this new interdisciplinary area to the attention of additional researchers. We believe that a broader range of contributions will be stimulated to both POMDP and classification/regression by looking at them from new and unified perspectives.

This is a full-day workshop, consisting of invited and contributed presentations and having an emphasis on interactive discussions.

Back to top

Related work

Kearns et al. [1] showed that the concept "sample complexity" used in classification can be extended to the POMDP, and they established an upper bound on the number of trajectories that must be used to insure good generalization. Their work is pioneering in trajectory-based methods and in relating POMDP to classification.

Several researchers investigated using modern classifiers like the SVM to learn MDP policies, including Dietterich and Wang [2], Lagoudakis and Parr [3], and Blatt and Hero [7]. Bagnell et al. [4] reported some preliminary results on classification-based policy search in POMDPs, and Langford and Zadrozny [5] did some theoretic analysis on this. Mahadeva [6] and Li et al. [8] studied the regression methods in POMDPs.

Along the contrary line, Dimitrakakis and Bengio [11] reported using MDP as a gating network in mixture of experts; Bonet and Geffner [9], Guo [10] applied POMDP techniques to classification problems in which the class features and mis-classification are cost-sensitive. The main drawback of the methods in [9-10] is that the features are assumed independent. Relaxation of this naive Bayes assumption is studied in [12] and encouraging results are reported.

The work in [1-12] signals nontrivial relationships between POMDPs and classification/regression that can be utilized to the benefits of both.

References

  1. M. Kearns, Y. Mansour and A. Y. Ng., "Approximate planning in large POMDPs via reusable trajectories", NIPS 12, 2000
  2. T. Dietterich, X. Wang, "Batch Value Function Approximation via Support Vectors", NIPS 14, 2001
  3. M. Lagoudakis, R. Parr, "Reinforcement Learning as Classification: Leveraging Modern Classifiers", ICML, 2003
  4. J. A. Bagnell, S. Kakade, A. Y. Ng and J. Schneider, "Policy search by dynamic programming", NIPS 16, 2004
  5. J. Langford, B. Zadrozny, "Relating Reinforcement Learning Performance to Classification Performance", ICML, 2005
  6. S. Mahadeva, "Proto-Value Functions: Developmental Reinforcement Learning", ICML, 2005
  7. D. Blatt, A. Hero, "From Weighted Classification to Policy Search", NIPS, 2005
  8. H. Li, L. He, X. Liao, S. Ji, L. Carin, "Region-Based Value Iteration and Its Application to Robot Navigation in a Minefield", NIPS Workshop on Machine Learning Based Robotics in Unstructured Environments, 2005
  9. B. Bonet, H. Geffner, "Learning Sorting and Decision Trees with POMDPs", ICML, 1998
  10. A. Guo, "Decision-theoretic Active Sensing for Autonomous Agents", AAMAS, July 2003
  11. C. Dimitrakakis, S. Bengio, "Online Policy Adaptation for Ensemble Classifiers", Proceedings of European Symposium on Artificial Neural Networks, 28-30, 2004
  12. H. Li, X. Liao, L. Carin, "A Value-directed Bayesian Classifier", ICASSP, 2006
Back to top

Topics

We seek submissions of contributed work in answering the many challenging questions that are summarized in the following topics. Submissions on related topics are also welcome.

  • Trajectories-based policy search by using classification and regression approaches.
  • Value function and Q-function approximation using neural networks, kernel methods, etc.
  • Novel methods for translating policy learning into classification/regression problems.
  • Application of classification/regression techniques to MDPs with a very large discrete state space or a continuous state space.
  • The use of classification/regression techniques in modeling and policy learning for POMDPs with a continuous observation space, or a continuous action space, or a continuous state space.
  • Application of POMDP to non-myopic active learning in SVM, logistic regression, and other discriminative classifiers.
  • POMDP methods for cost-sensitive feature selection, sensor scheduling, with the ultimate goal of classification or regression.
  • Methods for relaxing the naive Bayes assumption in cost-sensitive classification.
  • Planning and decision making in mixture of experts and Bayesian networks.
Back to top

Important Dates

  • Paper Submission Deadline (Extended): March 15, 2006
  • Notification of acceptance/rejection: March 30, 2006
  • Camera-ready Copy Due Date: April 15, 2006
  • Workshop date: June 7, 2006
Back to top

Paper Submissions

Authors are encouraged to submit papers electronically in PDF format. Papers must be formatted using the AAAI style template and must not exceed 10 pages in length. Please send submissions by e-mail to either xjliao@ee.duke.edu or lcarin@ee.duke.edu.

Back to top

Organization

Organizing Committee

Xuejun Liao, Duke University, USA
Lawrence Carin, Duke University, USA

Program Committee

Alfred Hero , University of Michigan at Ann Arbor, USA
Carey E. Priebe, Johns Hopkins University, USA
Ronald Parr, Duke University, USA
Carey Schwartz, DARPA/DSO, USA
Douglas Cochran, Arizona State University , USA
Vikram Krishnamurthy, University of British Columbia, Canada
David Castanon, Boston University, USA

Back to top