Learning the Minimal Representation of a Dynamic System from Transition Data
44 Pages Posted: 18 Feb 2021 Last revised: 23 Apr 2021
Date Written: January 10, 2021
This paper proposes a novel framework for learning a concise MDP model of a continuous state space dynamic system from given observed transition data. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system's feature space into regions constituting states of a finite deterministic MDP representing the system. We discuss what is theoretically the minimal MDP representation that preserves the values, and therefore the optimal policy, of the dynamic system. We define formally the problem of learning such a concise representation from transition data without exploration. To solve this problem, we introduce an in-sample property on partitions of the feature space we name coherence, and show that if the class of possible partitions is of finite VC dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample PAC convergence guarantees. This theoretical insight motivates our Minimal Representation Learning (MRL) algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments.
Keywords: offline reinforcement learning, statistical learning, off-policy evaluation, data-driven decision making, state representation learning, MDP state aggregation
Suggested Citation: Suggested Citation