Abstract:
When allocating resources across innovation efforts, managers, investors, and policymakers face a trade-off between incurring high current costs and achieving future cost reductions, as technologies evolve endogenously with experience and economic conditions. Future innovation is also uncertain, giving rise to a non-linear dynamic portfolio optimization problem. Yet traditional optimization methods typically treat innovation as exogenous and discard associated uncertainties. As a result, these methods are computationally intensive, difficult to interpret, and overly restrictive in their solution space. In this paper, we develop an alternative dynamic portfolio optimization framework that integrates endogenous learning, showing how Multi-Armed Bandit (MAB) strategies based on the Gittins index can be applied to technology portfolios with empirically validated experience curves. Our approach is computationally efficient and explicitly captures the exploration-exploitation trade-off, dynamically adjusting investment decisions in response to observed cost declines. The resulting optimal strategy is risk-seeking as opposed to risk-neutral. We demonstrate the practical relevance of our method through an application to the U.S. light vehicle sector and the decision to invest in low-carbon technologies, illustrating how dynamic MAB-based strategies outperform static alternatives that ignore endogenous uncertainties. Taken together, our findings suggest that adaptive, risk-seeking strategies may facilitate more rapid market transitions, providing important implications for both innovation policy design and private investment.
Citation:
Baumgärtner, C.L., Köhler-Schindler, L. & Pless, J. (2025), 'Innovation Bandits: A Dynamic Portfolio Strategy with Endogenous Rewards', INET Oxford Working Paper Series 2025-11.