Learning from Demonstrations with High-Level Side Information

M. Wen, I. Papusha, and U. Topcu

Final paper (IJCAI 2017)
DOI: 10.24963/ijcai.2017/426

Abstract

We consider the problem of learning from demonstration, where extra side information about the demonstration is encoded as a co-safe linear temporal logic formula. We address two known limitations of existing methods that do not account for such side information.

The policies that result from existing methods, while matching the expected features or likelihood of the demonstrations, may still be in conflict with high-level objectives not explicit in the demonstration trajectories.
Existing methods fail to provide a priori guarantees on the out-of-sample generalization performance with respect to such high-level goals.

This lack of formal guarantees can prevent the application of learning from demonstration to safety-critical systems, especially when inference to state space regions with poor demonstration coverage is required.

In this work, we show that side information, when explicitly taken into account, indeed improves the performance and safety of the learned policy with respect to task implementation. Moreover, we describe an automated procedure to systematically generate the features that encode side information expressed in temporal logic.

Citation

M. Wen, I. Papusha, and U. Topcu. “Learning from Demonstrations with High-Level Side Information,” International Joint Conference on Artificial Intelligence (IJCAI), pp. 3055–3061, Melbourne, Australia, August 19–25, 2017.