# Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

@inproceedings{Zheng2014RobustBI, title={Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise}, author={Jiangchuan Zheng and Siyuan Liu and Lionel M. Ni}, booktitle={AAAI}, year={2014} }

Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decision Process from behaviors of experts in support of decision-making. Most recent work on IRL assumes the same level of trustworthiness of all expert behaviors, and frames IRL as a process of seeking reward function that makes those behaviors appear (near)- optimal. However, it is common in reality that noisy expert behaviors disobeying the optimal policy exist, which may degrade the IRL performance… Expand

#### 43 Citations

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem

- Computer Science, Mathematics
- ArXiv
- 2019

Experimental results on standard benchmarks such as objectworld and pendulum show that the proposed algorithm can effectively learn the latent reward function in complex, high-dimensional environments. Expand

Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise

- Computer Science
- ArXiv
- 2021

It is shown that the marginal MAP (MMAP) approach significantly improves on the previous IRL technique under occlusion in both formative evaluations on a toy problem and in a summative evaluation on an onion sorting line task by a robot. Expand

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

- Computer Science, Mathematics
- AAAI
- 2018

A sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the $\alpha$-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function is proposed. Expand

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

- Computer Science, Mathematics
- ICML
- 2019

A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations. Expand

Active Learning from Critiques via Bayesian Inverse Reinforcement Learning

- Computer Science
- 2017

A novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, 2) utilizes trajectory segmentation to expedite the critique / labeling process, and 3) predicts the user’s critiques to generate the most highly informative trajectory queries. Expand

ROBUST IMITATION VIA DECISION-TIME PLANNING

- 2020

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via… Expand

Active Reward Learning from Critiques

- Computer Science
- 2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018

This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries. Expand

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

- Computer Science, Mathematics
- ICML
- 2019

Mixed findings suggest that at least for the foreseeable future, agents need a middle ground between the flexibility of data-driven methods and the useful bias of known human biases. Expand

Inverse Reinforcement Learning From Like-Minded Teachers

- Computer Science
- AAAI
- 2021

It is demonstrated that inverse reinforcement learning algorithms that satisfy a certain property — that of matching feature expectations — yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. Expand

Inverse Reinforcement Learning from Failure

- Computer Science
- AAMAS
- 2016

This paper proposes inverse reinforcement learning from failure (IRLF), a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex and derives update rules for learning reward functions and policies. Expand

#### References

SHOWING 1-10 OF 28 REFERENCES

Maximum Entropy Inverse Reinforcement Learning

- Computer Science
- AAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand

Bayesian Inverse Reinforcement Learning

- Computer Science
- IJCAI
- 2007

This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions. Expand

Apprenticeship learning via inverse reinforcement learning

- Computer Science
- ICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand

Active Learning for Reward Estimation in Inverse Reinforcement Learning

- Computer Science
- ECML/PKDD
- 2009

An algorithm is proposed that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states, to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. Expand

Inverse Reinforcement Learning with PI 2

- 2010

We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination of… Expand

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

- Mathematics, Computer Science
- NIPS
- 2011

A probabilistic algorithm that allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions. Expand

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

- Computer Science, Mathematics
- UAI
- 2007

A novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem is proposed. Expand

Bayesian Multitask Inverse Reinforcement Learning

- Computer Science, Mathematics
- EWRL
- 2011

The main contribution is to formalise the problem of inverse reinforcement learning as statistical preference elicitation, via a number of structured priors, whose form captures the authors' biases about the relatedness of different tasks or expert policies. Expand

Maximum margin planning

- Computer Science
- ICML
- 2006

This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Expand

Supervised Probabilistic Robust Embedding with Sparse Noise

- Computer Science
- AAAI
- 2012

This paper proposes a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises and devise a twofold variational EM learning algorithm in which the update of model parameters has analytical solution. Expand