|9:00 - 9:30
A Brief Tutorial on the Partially Observable Markov Decision Process and Its Applications
| Lawrence Carin |
|9:30 - 10:00 ||
Optimal Sensor Scheduling via Classification Reduction of Policy Search (CROPS)
| Doron Blatt and Alfred O. Hero |
|      Abstract   The problem of sensor scheduling in multi-modal sensing
systems is formulated as the sequential choice of experiments
problem and solved via reinforcement learning
methods. The sequential choice of experiments
problem is a partially observed Markov decision problem
(POMDP) in which the underlying state of nature is
the system’s state and the sensors’ data are noisy state
observations. The goal is to find a policy that sequentially
determines the best sensor to deploy based on past
data, which maximizes a given utility function while
minimizing the deployment cost. Several examples are
considered in which the exact model of the measurements
given the state of nature is unknown but a generative
model (a simulation or an experiment) is available.
The problem is formulated as a reinforcement learning
problem and solved via a reduction to a sequence of supervised
classification subproblems. Finally, a simulation
and an experiment with real data demonstrate the
promise of our approach.
|10:00 - 10:30||
Adaptation of the Simulated Risk Disambiguation Protocol to a Discrete Setting
| Al Aksakalli, Donniell E. Fishkind, and Carey E. Priebe |
|       Abstract   Suppose a spatial arrangement of possibly hazardous
regions needs to be speedily and safely traversed, and
there is a dynamic capability of discovering the true
nature of each hazard when in close proximity of it;
the traversal may enter the associated region only if it
is revealed to be nonhazardous. The problem of identifying
an optimal policy for where and when to execute
disambiguations so as to minimize the expected
length of the traversal can be cast both as a completely
observed Markov decision process (MDP) and a partially
observed Markov decision process (POMDP) and
has been proven intractable in many broad settings. In
this manuscript, we adapt the basic strategy of a policy
called the simulated risk disambiguation protocol
of Fishkind et al. (2006) to a different, discretized
setting (a Canadian Traveller Problem with dependent
edge probabilities), and we compare the performance
of this adapted policy against the performance of the
optimal policy—on a class of instances that are small
enough for the optimal policy to be computed. On random
such instances, the adapted simulated risk disambiguation
protocol performed nearly as well as the optimal
protocol, and used significantly less computational
|10:30 - 11:00||
|11:00 - 11:30||
Application of Partially Observable Markov Decision Processes to Robot Navigation in a Minefield
| Lihan He, Shihao Ji, and Lawrence Carin |
| ||      Abstract   We consider the problem of a robotic sensing system navigating
in a minefield, with the goal of detecting potential
mines at low false alarm rates. Two types of sensors are
used, namely, electromagnetic induction (EMI) and groundpenetrating
radar (GPR). A partially observable Markov decision
process (POMDP) is used as the decision framework for
the minefield problem. The POMDP model is trained with
physics-based features of various mines and clutters of interest.
The training data are assumed sufficient to produce a
reasonably good model. We give a detailed description of the
POMDP formulation for the minefield problem and provide
example results based on measured EMI and GPR data.
| 11:30 - 12:30
Panel (Open) Discussions
We will dynamically augment the questions/topics.
If you have interesting questions/topics relevant to the workshop theme,
you are welcome to send them to us and we will post them here.
- What are emerging applications where POMDP theory can have impact
(military and civilian) and what kinds of models are appropriate for these?
- Common datasets for evaluating POMDP algorithms for sensing applications.
- Bottlenecks in POMDP search and how to address them?
- In what type of problems have POMDP approaches worked, i.e. beat out myopic approaches, and how to identify these problems?
- Reinforcement learning approach for approximating POMDP optimal policies: is it practical?