Ih the recent evolution of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. JITAI can be operationalized by a sequence of decision rules (e.g., treatment policies) that takes the user’s current context as input and specifies whether and what type of an intervention should be provided at the moment. In this paper, we develop a Reinforcement Learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as the data is being collected from the user. This work is motivated by our collaboration on HeartSteps, a physical activity mobile health study. The RL algorithm developed in this paper will be used to decide, five times per day, whether to deliver a context-tailored activity suggestion.