We spoke with Chris Yeh, a Resnick Scholar and graduate student in CMS working with Adam Wierman, about his project to build "gyms" based on real energy and sustainability environments to test the performance of new algorithms.
What are "gyms" for algorithm development and why are they important?
Since 2016, when OpenAI (the creator of ChatGPT) released their OpenAI Gym software, the term "gym" has become synonymous with a software environment in which a controllable agent (or multiple agents) interact with the environment. For example, a "gym" could be a video game environment, or it could be a robot simulator. A software agent (e.g., a video game character, or a robot controller) submits "actions" to the gym environment, which then simulates that action in the environment and returns a reward to the agent. For example, in the case of a video game, the reward might be positive if the agent makes progress, or negative if the agent makes a mistake.
These "gyms" serve as useful test beds for control algorithms, specifically reinforcement learning (RL) algorithms that learn to maximize reward from repeated interactions with an environment. Whereas the original OpenAI Gym was useful for prototyping RL algorithms that worked on simpler video games, there is growing interest in developing gyms that more realistically simulate real-world systems such as the electrical grid, the stock market, and self-driving vehicles.
What prompted you make a new set of these for sustainability applications?
Prof. Wierman's research group has a history of developing efficient control algorithms that in theory should be great for real-world control tasks, such as for energy systems. However, there are very few sustainability-focused publicly-available gyms, and all of the existing gyms have some significant drawbacks, leading to difficulty evaluating these algorithms on sustainability-focused problems. Having had prior experience leading a multi-university team of researchers to build out a different sustainability-focused remote sensing dataset (SustainBench), I figured I could do the same but for sustainability-focused RL gyms. We ended up focusing on five environments that describe real world situations, and are built using real data, creating a more accurate test situation. The five environments are (1) an EV charging network, (2) battery storage systems bidding in the electricity market, (3) scheduling data center jobs to maximize clean energy use, (4) controlling thermal power plant inputs to minimize fuel consumption, and (5) intelligently coordinating AC systems to reduce building-wide energy consumption.
What has the roll-out of the gyms been like? Have you seen interest in adopting them?
The roll-out of the gyms is just getting started! We published a very initial version of SustainGym back in December 2022, and many researchers have reached out to me about gaining access to SustainGym. Since then, we have been working diligently to fix bugs and properly package our software for release. In mid-September, our paper was accepted to the NeurIPS 2023 conference (which will take place in December), and we are preparing the final version of our paper and software for release this week. Next week, I will also be presenting SustainGym at the INFORMS conference (in Phoenix, AZ) to raise awareness among the operations research community, which often overlaps with the reinforcement learning community. All All of the project information and code can be found on the project site.
Are there any surprising results that you've seen since you started using these to test new algorithms?
Yes! We've found so far that off-the-shelf RL algorithms, which do incredibly well on the original OpenAI Gym suite, do not necessarily perform that well on SustainGym. This shows that the RL research community has perhaps been too focused on doing well on a particular benchmark, and we hope that SustainGym will provide another benchmark for the RL research community to target. We also found that the performance of these off-the-shelf RL algorithms degrades when tested on environments that have changed over time. Finally, we showed that multi-agent RL algorithms (whose performance have been less well-studied) tend to perform as well as, if not better than, single-agent RL algorithms, especially on environments that have changed over time.
What comes next?
SustainGym opens the doors to a number of interesting research directions. First, we want to design RL algorithms that are more robust to changes in the environment. Second, we want to further study why/how multi-agent RL algorithms sometimes perform better. Third, we want to study RL algorithms that can specifically take advantage of unique properties present in sustainable energy systems. And lastly, if we are able to improve the performance of these algorithms significantly, then we may consider reaching out to more industry partners to see if there is wider interest in adopting them in actual systems.