CleanRL-supported Papers / Projects

CleanRL has become an increasingly popular deep reinforcement learning library, especially among practitioners who prefer more customizable code. Since its debut in July 2019, CleanRL has supported many open source projects and publications. Below are some CleanRL-supported projects and publications.

Feel free to edit this list if your project or paper has used CleanRL.

Publications

Md Masudur Rahman and Yexiang Xue. "Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning." In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), 2022. https://arxiv.org/pdf/2210.07312.pdf
Centa, Matheus, and Philippe Preux. "Soft Action Priors: Towards Robust Policy Transfer." arXiv preprint arXiv:2209.09882 (2022). https://arxiv.org/pdf/2209.09882.pdf
Weng, Jiayi, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu et al. "Envpool: A highly parallel reinforcement learning environment execution engine." In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=BubxnHpuMbG
Huang, Shengyi, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, and Weixun Wang. "The 37 Implementation Details of Proximal Policy Optimization." International Conference on Learning Representations 2022 Blog Post Track, https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Huang, Shengyi, and Santiago Ontañón. "A closer look at invalid action masking in policy gradient algorithms." The International FLAIRS Conference Proceedings, 35. https://journals.flvc.org/FLAIRS/article/view/130584
Schmidt, Dominik, and Thomas Schmied. "Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari." Deep Reinforcement Learning Workshop at the 35th Conference on Neural Information Processing Systems, https://arxiv.org/abs/2111.10247
Dossa, Rousslan Fernand Julien, Shengyi Huang, Santiago Ontañón, and Takashi Matsubara. "An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization." IEEE Access 9 (2021): 117981-117992. https://ieeexplore.ieee.org/abstract/document/9520424
Huang, Shengyi, Santiago Ontañón, Chris Bamford, and Lukasz Grela. "Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning." In 2021 IEEE Conference on Games (CoG), pp. 1-8. IEEE, 2021. https://ieeexplore.ieee.org/abstract/document/9619076
Huang, Shengyi, and Santiago Ontañón. "Measuring Generalization of Deep Reinforcement Learning Applied to Real-time Strategy Games", AAAI 2021 Reinforcement Learning in Games Workshop, http://aaai-rlg.mlanctot.info/papers/AAAI21-RLG_paper_33.pdf
Bamford, Chris, Huang, Shengyi, and Lucas, Simon, "Griddly: A platform for AI research in games", AAAI 2021 Reinforcement Learning in Games Workshop, https://arxiv.org/abs/2011.06363
Huang, Shengyi, and Santiago Ontañón. "Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games." AIIDE Workshop on Artificial Intelligence for Strategy Games, https://arxiv.org/abs/2010.03956
Huang, Shengyi, and Santiago Ontañón. "Comparing Observation and Action Representations for Deep Reinforcement Learning in $\mu $ RTS." AIIDE Workshop on Artificial Intelligence for Strategy Gamee, October 2019 https://arxiv.org/abs/1910.12134