Monitoring three-dimensional flux distribution in a nuclear reactor core is essential for improving safety and economics, which requires strategically placed in-core detectors. However, the deployment of these sensors is often constrained by physical, industrial, and economic limitations. This study treats optimizing the location of in-core detectors as a Markov decision process and develops a reinforcement learning (RL)–based framework to provide a solution for detector placement given a fixed number of detectors and available detector positions. The RL-based framework contains an environment consisting of a Proper Orthogonal Decomposition–based power reconstruction function paired with a novel reward function based on the power reconstruction error and a well-educated agent that updates the detector placement. Four RL algorithms including Proximal Policy Optimization, Deep Q-Network, Advantage Actor-Critic, and Monte Carlo Tree Search are investigated to optimize the detector placement and are analyzed. Genetic Algorithm (GA), a traditional optimization approach, is applied for comparison. The findings reveal that RL outperforms GA in terms of the quality of optimal solutions, demonstrating an inclination toward locating a global solution. Moreover, the flexible nature of RL enables the integration of developed novel reward functions from a specific reactor core into other reactors, considering the particular engineering requirements within the RL-based framework, thereby enhancing the optimization of in-core detector configurations.