Continuous‐Time Mean–Variance Portfolio Selection: A Reinforcement Learning Framework

36 Pages Posted: 7 Oct 2020

See all articles by Haoran Wang

Haoran Wang

The Vanguard Group, Inc.

Xun Yu Zhou

Columbia University - Department of Industrial Engineering and Operations Research (IEOR)

Date Written: October 1, 2020

Abstract

We approach the continuous‐time mean–variance portfolio selection with reinforcement learning (RL). The problem is to achieve the best trade‐off between exploration and exploitation, and is formulated as an entropy‐regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time‐decaying variance. We then prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm and its variant outperform both traditional and deep neural network based algorithms in our simulation and empirical studies.

Keywords: empirical study, entropy regularization, Gaussian distribution, mean–variance portfolio selection, policy improvement, reinforcement learning, simulation, stochastic control, theorem, value function

Suggested Citation

Wang, Haoran and Zhou, Xunyu, Continuous‐Time Mean–Variance Portfolio Selection: A Reinforcement Learning Framework (October 1, 2020). Mathematical Finance, Vol. 30, Issue 4, pp. 1273-1308, 2020, Available at SSRN: https://ssrn.com/abstract=3706168 or http://dx.doi.org/10.1111/mafi.12281

Haoran Wang

The Vanguard Group, Inc.

100 Vanguard Blvd
Malvern, PA 19355
United States

Xunyu Zhou (Contact Author)

Columbia University - Department of Industrial Engineering and Operations Research (IEOR) ( email )

331 S.W. Mudd Building
500 West 120th Street
New York, NY 10027
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
0
Abstract Views
149
PlumX Metrics