Online transfer learning strategy for enhancing the scalability and deployment of deep reinforcement learning control in smart buildings
In recent years, advanced control strategies based on Deep Reinforcement Learning (DRL) proved to be effective in optimizing the management of integrated energy systems in buildings, reducing energy costs and improving indoor comfort conditions when compared to traditional reactive controllers. However, the scalability and implementation of DRL controllers are still limited since they require a considerable amount of time before converging to a near-optimal solution. This issue is currently addressed in literature through the offline pre-training of the DRL agent. However this solution results in two main critical issues: (1) the need to develop a building surrogate model to perform the training task, and (2) the need to perform a fine-tuning process over several training episodes to obtain a near-optimal control policy.
In this context, this paper introduces an Online Transfer Learning (OTL) strategy that exploits two knowledge-sharing techniques, weight-initialization and imitation learning, to transfer a DRL control policy from a source office building to various target buildings in a simulation environment coupling EnergyPlus and Python.
A DRL controller based on discrete Soft Actor–Critic (SAC) is trained on the source building to manage the operation of a cooling system consisting of a chiller and a thermal storage. Several target buildings are defined to benchmark the performance of the OTL strategy with that of a Rule-Based Controller (RBC) and two DRL-based control strategies, deployed in offline and online fashion. The strategy adopted for OTL emulates the real world implementation with a simulation process by implementing the transferred DRL agent for a single episode in the target buildings. Target buildings have the same geometrical features and are served by the same energy system as the source building, but differ in terms of weather conditions, electricity price schedules, occupancy patterns, and building envelope efficiency levels. The results show that the OTL strategy can reduce the cumulated sum of temperature violations on average by 50% and 80% respectively when compared to RBC and online DRL while enhancing the energy system operation with electricity cost savings ranging between 20% and 40%. The OTL agent performs slightly worse than the offline DRL controller but it does not require any modeling effort and can be implemented directly on target buildings emulating a real-world implementation.