The increasing adoption of industrial robot arms in advanced manufacturing has heightened the need for flexible trajectory planning methods that go beyond traditional offline programming (OLP) tools, which are often expensive, proprietary, and limiting. This study introduces an OLP-free pipeline designed to generate robot trajectory data and optimize paths for six-degree-of-freedom (6-DOF) robot arms using discrete reinforcement learning. Initially, five-axis NC code derived from CAD/CAM data is transformed into tool center point (TCP) trajectories through coordinate transformations. An analytical inverse kinematics solver then produces multiple joint solutions for each TCP pose, creating a discrete action space from which the learning agent can select feasible joint configurations along the trajectory. A reward function that considers variations in joint velocity and acceleration, as well as pose error, facilitates the simultaneous optimization of motion smoothness and tracking accuracy. The optimized trajectories are validated using an open-source physics simulator, showing enhanced motion stability, accuracy, and collision safety compared to conventional OLP-based paths. This proposed framework provides a flexible and cost-effective alternative to commercial OLP tools and lays a scalable foundation for future applications in automated and collaborative manufacturing systems.