SMONAC: Supervised Multiobjective Negative Actor-Critic for Sequential Recommendation.
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Recent research shows that the sole accuracy metric may lead to the homogeneous and repetitive recommendations for users and affect the long-term user engagement. Multiobjective reinforcement learning (RL) is a promising method to achieve a good balance in multiple objectives, including accuracy, diversity, and novelty. However, it has two deficiencies: neglecting the updating of negative action Q values and limited regulation from the RL Q-networks to the (self-)supervised learning recommendation network. To address these disadvantages, we develop the supervised multiobjective negative actor-critic (SMONAC) algorithm, which includes a negative action update mechanism and multiobjective actor-critic mechanism. For the negative action update mechanism, several negative actions are randomly sampled during each time updating, and then, the offline RL approach is utilized to learn their Q values. For the multiobjective actor-critic mechanism, accuracy, diversity, and novelty Q values are integrated into the scalarized Q value, which is used to criticize the supervised learning recommendation network. The comparative experiments are conducted on two real-world datasets, and the results demonstrate that the developed SMONAC achieves tremendous performance promotion, especially for the metrics of diversity and novelty.