Piano fingering is a personal process for pianists to determine the appropriate finger one should use to play a musical note. In this paper, we propose a novel reinforcement learning framework with deep reinforcement learning with music score as the environment. Four environments are constructed from different right-hand monophonic music scores from various eras, types, and forms of classical music. Given current hand position information as the state, the pianist agent has to learn to choose the optimum action finger. We also propose a reward function that uses the fingering difficulty rules and reformulates them to compute the maximum negative difficulty of a fingering combination. We aim to explore how each approach method performs in the piano fingering generation and to identify the optimum approach between off-policy and on-policy model-free deep reinforcement learning. The results demonstrated that the off-policy method outperformed the other in training and evaluation while solving the problem using the DQN agent. In addition, the experiment showed a promising result for the future fingering generation without human supervision.