This research paper introduces a new approach to machine learning that prioritizes acquiring safe actions while striking a balance between optimality, encountering hazardous situations, and swiftly identifying unsafe acts. Led by Juan Andres Bazerque and Enrique Mallada, the research team conducted comprehensive studies in two distinct scenarios to demonstrate the effectiveness of their approach. By incorporating reasonable assumptions about exploration, they devised an algorithm capable of detecting all unsafe actions within a limited number of iterations. Additionally, the team addressed the challenge of finding optimal policies for a Markov decision process (MDP) with nearly certain constraints.