In reinforcement learning, the learning agent always faces a tradeoff between exploration and exploitation. Often, exploration is implemented as taking random actions, but in dangerous tasks, this can lead to highly negative rewards – damage to the agent or other parts of the environment. This week we discuss methods to minimize the risks in exploration and learn good policies both safely and efficiently.