Reinforcement Learning (RL) algorithms are an alternative to traditional model-based control that learn from data the optimal actions to take. Unlike the latter, RL methods do not need an in-built model of their dynamical system, enabling them to successfully make decisions when the true model is complicated or not perfectly known during design. Unfortunately, their application to many settings, such as autonomous robotics and smart buildings, is hampered by their need for large amounts of data. This project focuses on improving the data-efficiency of RL systems, using Bayesian inference and reasoning techniques similar to those from chess-playing AI. We will study systems that take into account the long-term value of a certain decision, both in terms of the benefits it achieves and the information it provides for future decisions. Solving these challenges will enable application of RL in domains such as personalised education, digital health, robotics, and the smart grid.