Partial-Index and Reinforcement Learning

Many network control problems can be cast as a Markov Decision Process (MDP). However, as the network size increases, MDP is known to suffer from the curse-of-dimensionality. Although reinforcement learning methods can be used to learn the optimal control policies, since they often do not exploit model-based insights from the problem structure, they often take a long time to train, lead to control rules that are difficult to interpret, and are slow to adapt to changes. Thus, it remains an open question how to develop scalable, efficient, and interpretable solutions for such problems.

In our work, we have studied one example of such a problem, i.e., to schedule data sources in a wireless system with multiple heterogeneous and unreliable channels to minimize the total expected Age-of-Information (AoI). Although one could formulate this problem as a discrete-time MDP, it would suffer from the drawbacks described earlier. Our main contribution is to develop a new approach called partial index, which significantly generalizes the classical Whittle’s index to systems with multiple heterogeneous resources. The partial index captures how much an agent/source values the service at each channel, and can be computed independently of other agents given all channels’ current costs. As a result, it allows us to decompose the original large-scale MDP into per-agent MDPs with much lower complexity. This notion of partial index could be useful for many other problems where agents interact through sharing multiple resources. We are currently studying how to integrate partial index with reinforcement learning to produce highly scalable and adaptive solutions for constantly-changing multi-agent environments.

Selected publications:

  • Yihan Zou, Kwang Taik Kim, Xiaojun Lin, and Mung Chiang, ``Minimizing Age-of-Information in Heterogeneous Multi-Channel Systems: A New Partial-Index Approach,'' in ACM MobiHoc, July 2021. (ACM MobiHoc 2021 Best Paper Award) [pdf] .