Back to All Events

Contextual Bandit Learning in Real-World Applications - Marco Rossi (Microsoft Research)

Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice faces fundamental challenges, and no general system exists that supports them completely. To address this problem, we created the Decision Service: the first end-to-end system for contextual learning. The Decision Service enables all aspects of contextual bandit learning using a loop of four system abstractions: explore (the decision space), log, learn, and deploy. Notably, our new explore and log abstractions ensure the system collects correct, unbiased data, enabling online learning and offline experimentation. The Decision Service has a simple user interface and has been applied with strong results in a variety of settings, such as content recommendation, revenue lift in landing page, tech support, and machine failure handling.


Marco Rossi is a Sr. Data Scientist working in online learning at Microsoft Research New York City. Previously, he was a Sr. Researcher at a computer vision start-up, where he used unsupervised learning to design image recognition algorithms. He received his undergraduate and graduate degrees in Telecommunications Engineering from Politecnico di Milano, and he obtained his PhD in Electrical Engineering from NJIT. He is passionate about using data and Machine Learning to address real-world problems.