Fig. 1: Proof-of-concept restless multi-armed bandit (RMAB)-inspired recommendation system. | npj Unconventional Computing

Fig. 1: Proof-of-concept restless multi-armed bandit (RMAB)-inspired recommendation system.

From: Energy efficient training of private recommendation systems using multi-armed bandit models and analog in-memory computing

Fig. 1

Users navigate through the main web page to specific category pages, referred to as either the initial state (S0) or current state (S1) utilized to estimate cost (λ) and Whittle index, respectively. Specifically, S (S0 and S1) serves as input to each core in the nonvolatile memory (NVM) crossbar, which cores are programmed to represent the weight values of neural networks for each arm (A~D), in this case, the contents. Based on the initially estimated λ to compute Whittle index, the agent selects the arm with the highest Whittle index at each S1 to maximize the total discounted rewards (TDR) for recommendation.

Back to article page