Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN
DOI:
https://doi.org/10.35335/mandiri.v14i2.455Kata Kunci:
Bitcoin, Cryptocurrency Trading, Deep Q-Network, Proximal Policy Optimization, Reinforcement LearningAbstrak
Bitcoin’s high volatility demands automated strategies that adapt to changing market regimes while managing risk. This study compares Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) for Bitcoin trading using hourly BTC/USDT data from 2019 to early 2025. The models are trained to generate buy and sell signals from technical indicators including the Relative Strength Index (RSI), MA20, volatility, Moving Average Convergence Divergence (MACD), volume trend, SMA200, and a weekly trend filter. All features are computed on hourly bars. The evaluation shows that PPO tends to trade more aggressively and delivers higher performance during bullish phases, though with greater risk in unstable markets. By contrast, DQN trades more selectively and maintains better stability in sideways or choppy conditions. These findings support the effectiveness of reinforcement learning for adaptive cryptocurrency trading and highlight complementary strengths between PPO and DQN across market regimes.
Referensi
Avramelou, L., Nousi, P., Passalis, N., & Tefas, A. (2024). Deep reinforcement learning for financial trading using multi-modal features. Expert Systems with Applications, 238, 121849. https://doi.org/10.1016/j.eswa.2023.121849
Baradja, A., & Tjendrowasono, T. I. (2024). Pengaplikasian Deep Reinforcement Q-Learning Untuk Prediksi Perdagangan Valas Otomatis. Jurnal Rekayasa Sistem Informasi Dan Teknologi, 1(3), 190–198. https://doi.org/10.59407/jrsit.v1i3.519
Faturohman, T., & Nugraha, T. (2022). ISLAMIC STOCK PORTFOLIO OPTIMIZATION USING DEEP REINFORCEMENT LEARNING. Journal of Islamic Monetary Economics and Finance, 8(2), 181–200. https://doi.org/10.21098/jimf.v8i2.1430
Fegiyanto, R., Hermawan, A., & Ardiani, F. (2024). Prediksi Harga Crypto dengan Algoritma Jaringan Saraf Tiruan. Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 5(3), 2265–2275. https://doi.org/10.35870/jimik.v5i3.728
Firsov, D. V., Silvestrov, S. N., Kuznetsov, N. V., Zolotarev, E. V., & Pobyvaev, S. A. (2023). Using PPO Models to Predict the Value of the BNB Cryptocurrency. Emerging Science Journal, 7(4), 1206–1214. https://doi.org/10.28991/ESJ-2023-07-04-012
Huang, C. S. J., & Su, Y.-S. (2024). Trading Strategy of the Cryptocurrency Market Based on Deep Q-Learning Agents. Applied Artificial Intelligence, 38(1). https://doi.org/10.1080/08839514.2024.2381165
Huang, Y., Lu, X., Zhou, C., & Song, Y. (2023). DADE-DQN: Dual Action and Dual Environment Deep Q-Network for Enhancing Stock Trading Strategy. Mathematics, 11(17), 3626. https://doi.org/10.3390/math11173626
Huang, Y., Zhou, C., Zhang, L., & Lu, X. (2024). A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization. Mathematics, 12(24), 4020. https://doi.org/10.3390/math12244020
Indriyanti, I., Ichsan, N., Fatah, H., Wahyuni, T., & Ermawati, E. (2025). Prediksi Jangka Pendek Harga Bitcoin Dengan Metode Arima. INTECOMS: Journal of Information Technology and Computer Science, 8(1), 163–167. https://doi.org/10.31539/intecoms.v8i1.14446
Jeong, D. W., & Gu, Y. H. (2024). Pro Trader RL: Reinforcement learning framework for generating trading knowledge by mimicking the decision-making patterns of professional traders. Expert Systems with Applications, 254, 124465. https://doi.org/10.1016/j.eswa.2024.124465
Jing, L., & Kang, Y. (2024). Automated cryptocurrency trading approach using ensemble deep reinforcement learning: Learn to understand candlesticks. Expert Systems with Applications, 237, 121373. https://doi.org/10.1016/j.eswa.2023.121373
Kochliaridis, V., Kouloumpris, E., & Vlahavas, I. (2023). Combining deep reinforcement learning with technical analysis and trend monitoring on cryptocurrency markets. Neural Computing and Applications, 35(29), 21445–21462. https://doi.org/10.1007/s00521-023-08516-x
Kong, M., & So, J. (2023). Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning. Applied Sciences, 13(1), 633. https://doi.org/10.3390/app13010633
Liu, F., Li, Y., Li, B., Li, J., & Xie, H. (2021). Bitcoin transaction strategy construction based on deep reinforcement learning. Applied Soft Computing, 113, 107952. https://doi.org/10.1016/j.asoc.2021.107952
Liu, X.-Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C., Wang, Z., & Guo, J. (2022). FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 1835–1849). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/0bf54b80686d2c4dc0808c2e98d430f7-Paper-Datasets_and_Benchmarks.pdf
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Moch Farryz Rizkilloh, & Sri Widiyanesti. (2022). Prediksi Harga Cryptocurrency Menggunakan Algoritma Long Short Term Memory (LSTM). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 25–31. https://doi.org/10.29207/resti.v6i1.3630
Otabek, S., & Choi, J. (2024). Multi-level deep Q-networks for Bitcoin trading strategies. Scientific Reports, 14(1), 771. https://doi.org/10.1038/s41598-024-51408-w
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1–8. https://jmlr.org/papers/v22/20-1364.html
Saepudin, D., & Rauf, K. (2025). Application of Deep Reinforcement Learning for Stock Trading on The Indonesia Stock Exchange. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 14(1), 144–157. https://doi.org/10.23887/janapati.v14i1.83775
Schnaubelt, M. (2022). Deep reinforcement learning for the optimal placement of cryptocurrency limit orders. European Journal of Operational Research, 296(3), 993–1006. https://doi.org/10.1016/j.ejor.2021.04.050
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms.
Théate, T., & Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173, 114632. https://doi.org/10.1016/j.eswa.2021.114632
Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., & Younis, O. G. (2024). Gymnasium: A Standard Interface for Reinforcement Learning Environments.
Zhang, J., Cai, K., & Wen, J. (2024). A survey of deep learning applications in cryptocurrency. IScience, 27(1), 108509. https://doi.org/10.1016/j.isci.2023.108509
Unduhan
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Romadhan Edy Prasetyo, Sumanto Sumanto, Indra Chaidir, Adi Supriyatna

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.




