Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the hypervolume identifiability issue, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose BOFormer, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly achieves better performance than the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems.
In single-objective Bayesian optimization, an RL-based AF (e.g., FSAF (Hsieh et al., 2021)) takes the posterior mean and standard deviation and the best function value observed so far as input and then outputs the AF value. An direct extension to MOBO simply takes into account the same set of information about all the $K$ objective functions. The hypervolume identifiability issue can be illustrated by comparing the hypervolume improvement incurred by the sample $x_3$ in the two different scenarios below. Clearly, despite that the AF inputs at $x_3$ are the same in both scenarios, the increases in hypervolume upon sampling $x_3$ are rather different. Hence, the increase in hypervolume is not identifiable solely based on the AF input of the existing RL-based AFs.
To tackle hypervolume identifiability issue, we propose to rethink MOBO from the perspective of non-Markovian RL via sequence modeling. We propose BOFormer, which leverages the sequence modeling capability of the Transformer architecture and thereby minimizes the generalized temporal difference loss. BOFormer comprises two distinct networks as shown above: The upper network functions as the policy network, utilizing the historical data and the Q-value predicted by the target network to estimate the Q-values for action selection. The lower network serves as the target network, responsible for constructing Q-values for past observation-action pairs.
In this paper, we address MOBO problems from the perspective of RL-based AF by identifying and tackling the inherent hypervolume identifiability issue. We achieve this goal by first presenting a generalized DQN framework and implementing it through BOFormer, which leverages the sequence modeling capability of Transformers and incorporates multiple enhancements for MOBO. Our experimental results show that BOFormer is indeed a promising approach for general-purpose multi-objective black-box optimization.
@inproceedings{
hung2025boformer,
title={{BOF}ormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian {RL}},
author={Yu Heng Hung and Kai-Jie Lin and Yu-Heng Lin and Chien-Yi Wang and Cheng Sun and Ping-Chun Hsieh},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=UnCKU8pZVe}
}