Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN

Romadhan Edy Prasetyo; Sumanto Sumanto; Indra Chaidir; Adi Supriyatna

doi:10.35335/mandiri.v14i2.455

Authors

Romadhan Edy Prasetyo Universitas Bina Sarana Informatika, Indonesia
Sumanto Sumanto Universitas Bina Sarana Informatika, Indonesia
Indra Chaidir Universitas Bina Sarana Informatika, Indonesia
Adi Supriyatna Universitas Bina Sarana Informatika, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i2.455

Keywords:

Bitcoin, Cryptocurrency Trading, Deep Q-Network, Proximal Policy Optimization, Reinforcement Learning

Abstract

Bitcoin’s high volatility demands automated strategies that adapt to changing market regimes while managing risk. This study compares Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) for Bitcoin trading using hourly BTC/USDT data from 2019 to early 2025. The models are trained to generate buy and sell signals from technical indicators including the Relative Strength Index (RSI), MA20, volatility, Moving Average Convergence Divergence (MACD), volume trend, SMA200, and a weekly trend filter. All features are computed on hourly bars. The evaluation shows that PPO tends to trade more aggressively and delivers higher performance during bullish phases, though with greater risk in unstable markets. By contrast, DQN trades more selectively and maintains better stability in sideways or choppy conditions. These findings support the effectiveness of reinforcement learning for adaptive cryptocurrency trading and highlight complementary strengths between PPO and DQN across market regimes.

References

Avramelou, L., Nousi, P., Passalis, N., & Tefas, A. (2024). Deep reinforcement learning for financial trading using multi-modal features. Expert Systems with Applications, 238, 121849. https://doi.org/10.1016/j.eswa.2023.121849

Baradja, A., & Tjendrowasono, T. I. (2024). Pengaplikasian Deep Reinforcement Q-Learning Untuk Prediksi Perdagangan Valas Otomatis. Jurnal Rekayasa Sistem Informasi Dan Teknologi, 1(3), 190–198. https://doi.org/10.59407/jrsit.v1i3.519

Faturohman, T., & Nugraha, T. (2022). ISLAMIC STOCK PORTFOLIO OPTIMIZATION USING DEEP REINFORCEMENT LEARNING. Journal of Islamic Monetary Economics and Finance, 8(2), 181–200. https://doi.org/10.21098/jimf.v8i2.1430

Fegiyanto, R., Hermawan, A., & Ardiani, F. (2024). Prediksi Harga Crypto dengan Algoritma Jaringan Saraf Tiruan. Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 5(3), 2265–2275. https://doi.org/10.35870/jimik.v5i3.728

Firsov, D. V., Silvestrov, S. N., Kuznetsov, N. V., Zolotarev, E. V., & Pobyvaev, S. A. (2023). Using PPO Models to Predict the Value of the BNB Cryptocurrency. Emerging Science Journal, 7(4), 1206–1214. https://doi.org/10.28991/ESJ-2023-07-04-012

Huang, C. S. J., & Su, Y.-S. (2024). Trading Strategy of the Cryptocurrency Market Based on Deep Q-Learning Agents. Applied Artificial Intelligence, 38(1). https://doi.org/10.1080/08839514.2024.2381165

Huang, Y., Lu, X., Zhou, C., & Song, Y. (2023). DADE-DQN: Dual Action and Dual Environment Deep Q-Network for Enhancing Stock Trading Strategy. Mathematics, 11(17), 3626. https://doi.org/10.3390/math11173626

Huang, Y., Zhou, C., Zhang, L., & Lu, X. (2024). A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization. Mathematics, 12(24), 4020. https://doi.org/10.3390/math12244020

Indriyanti, I., Ichsan, N., Fatah, H., Wahyuni, T., & Ermawati, E. (2025). Prediksi Jangka Pendek Harga Bitcoin Dengan Metode Arima. INTECOMS: Journal of Information Technology and Computer Science, 8(1), 163–167. https://doi.org/10.31539/intecoms.v8i1.14446

Jeong, D. W., & Gu, Y. H. (2024). Pro Trader RL: Reinforcement learning framework for generating trading knowledge by mimicking the decision-making patterns of professional traders. Expert Systems with Applications, 254, 124465. https://doi.org/10.1016/j.eswa.2024.124465

Jing, L., & Kang, Y. (2024). Automated cryptocurrency trading approach using ensemble deep reinforcement learning: Learn to understand candlesticks. Expert Systems with Applications, 237, 121373. https://doi.org/10.1016/j.eswa.2023.121373

Kochliaridis, V., Kouloumpris, E., & Vlahavas, I. (2023). Combining deep reinforcement learning with technical analysis and trend monitoring on cryptocurrency markets. Neural Computing and Applications, 35(29), 21445–21462. https://doi.org/10.1007/s00521-023-08516-x

Kong, M., & So, J. (2023). Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning. Applied Sciences, 13(1), 633. https://doi.org/10.3390/app13010633

Liu, F., Li, Y., Li, B., Li, J., & Xie, H. (2021). Bitcoin transaction strategy construction based on deep reinforcement learning. Applied Soft Computing, 113, 107952. https://doi.org/10.1016/j.asoc.2021.107952

Liu, X.-Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C., Wang, Z., & Guo, J. (2022). FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 1835–1849). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/0bf54b80686d2c4dc0808c2e98d430f7-Paper-Datasets_and_Benchmarks.pdf

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

Moch Farryz Rizkilloh, & Sri Widiyanesti. (2022). Prediksi Harga Cryptocurrency Menggunakan Algoritma Long Short Term Memory (LSTM). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 25–31. https://doi.org/10.29207/resti.v6i1.3630

Otabek, S., & Choi, J. (2024). Multi-level deep Q-networks for Bitcoin trading strategies. Scientific Reports, 14(1), 771. https://doi.org/10.1038/s41598-024-51408-w

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1–8. https://jmlr.org/papers/v22/20-1364.html

Saepudin, D., & Rauf, K. (2025). Application of Deep Reinforcement Learning for Stock Trading on The Indonesia Stock Exchange. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 14(1), 144–157. https://doi.org/10.23887/janapati.v14i1.83775

Schnaubelt, M. (2022). Deep reinforcement learning for the optimal placement of cryptocurrency limit orders. European Journal of Operational Research, 296(3), 993–1006. https://doi.org/10.1016/j.ejor.2021.04.050

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms.

Théate, T., & Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173, 114632. https://doi.org/10.1016/j.eswa.2021.114632

Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., & Younis, O. G. (2024). Gymnasium: A Standard Interface for Reinforcement Learning Environments.

Zhang, J., Cai, K., & Wen, J. (2024). A survey of deep learning applications in cryptocurrency. IScience, 27(1), 108509. https://doi.org/10.1016/j.isci.2023.108509

Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Accreditation Certificate

QUICK MENU

Current Issue

Language

Information

Jurnal Mandiri IT