Estimating Policy Functions in Payments Systems Using Reinforcement Learning

Available as: PDF

High-value payments systems (HVPSs) are used to settle transactions between large financial institutions and are considered the core national financial infrastructure. In this paper, we use machine learning techniques to understand the behaviour of banks participating in the Canadian HVPS. This understanding could help regulators design policies to ensure the safety and efficiency of these systems.

In particular, we want to understand a key decision that participating banks make in the HVPS: how much liquidity they provide at the beginning of the day. Initial liquidity is necessary to process payments but is costly to participants. Yet posting too little risks delaying those payments, which is also costly to the bank. The chosen amount of initial liquidity is a strategic decision, because the bank can use incoming payments from other participants to send their own payments; however, the timing of those incoming payments depends on the amount of liquidity other participants post.

Because this problem is analytically complex, we use reinforcement learning (RL) to estimate the best-response function. Using RL, we avoid modelling the bank's strategies; instead, the RL algorithm learns a strategy through the interaction with the payments system environment. In a simplified setting for which we know the optimal behaviour, we demonstrate that RL techniques can replicate the expected behaviour of participating banks. In more elaborate settings, liquidity decisions are too complex to solve analytically. The RL agents learned to reduce the total cost of processing payments despite having partial knowledge of the environment or the payments flow. Our results show that RL techniques are helpful in understanding the behaviour of participants in payments systems. Future work will explore the possibility of using the estimated RL policies to design more efficient payments systems.