top of page

Precision vs Recall

Lexicon

Classification

AI Performance Metrics

Overview

Precision and recall are two fundamental metrics used to evaluate the performance of classification models, particularly in imbalanced datasets. Precision measures the proportion of true positive predictions among all positive predictions made by the model, focusing on reducing false positives. Recall, on the other hand, measures the proportion of true positive predictions among all actual positive cases, emphasizing the reduction of false negatives. The trade-off between these metrics arises because improving one often leads to a decrease in the other. For example, increasing precision by being more conservative in positive predictions may decrease recall, causing the model to miss more actual positives. Conversely, maximizing recall may reduce precision by increasing the number of false positives. This trade-off is critical in applications where the costs of false positives and false negatives differ significantly. However, one limitation is that focusing solely on either metric can lead to suboptimal real-world outcomes, so harmonized metrics like the F1 score are often used for balanced assessment.

Governance Context

Within AI governance, frameworks such as the EU AI Act and the NIST AI Risk Management Framework require organizations to document and justify model performance metrics, including precision and recall, to ensure transparency and accountability. For example, the EU AI Act obligates providers to implement appropriate risk mitigation based on the model's false positive and false negative rates, particularly in high-risk applications like biometric identification or medical diagnostics. The NIST framework recommends regular monitoring and threshold adjustment to align model behavior with organizational risk appetite and societal impact. Additionally, ISO/IEC 23894:2023 encourages organizations to define acceptance criteria for precision and recall based on stakeholder needs and application context. Concrete obligations include: (1) documenting and reporting precision and recall values and their justification in regulatory filings; (2) implementing ongoing monitoring and periodic review of these metrics to ensure compliance with evolving risk thresholds and to support incident response processes. These obligations help prevent harm from model misclassification and support compliance with ethical and regulatory standards.

Ethical & Societal Implications

The precision-recall trade-off has significant ethical and societal implications, as it directly affects who is wrongly targeted or overlooked by AI systems. In healthcare, low precision can lead to overtreatment, while low recall can leave diseases undetected. In law enforcement, prioritizing recall may lead to privacy violations and wrongful accusations, disproportionately impacting marginalized communities. In financial services, low recall could allow fraud to go undetected, harming consumers and institutions. Transparent reporting, stakeholder engagement, and aligning model performance with ethical norms and societal values are essential to mitigate these risks.

Key Takeaways

Precision and recall measure different types of model errors: false positives and false negatives.; The trade-off between precision and recall must be managed based on application context and risk tolerance.; Governance frameworks often require explicit documentation and justification of chosen performance metrics.; Ethical implications include potential harm from both false positives and false negatives.; Balanced metrics like the F1 score can help provide a more comprehensive evaluation of model performance.; Ongoing monitoring and threshold adjustment are necessary to maintain responsible model behavior.; Stakeholder needs and societal impacts should guide the prioritization of precision or recall.

bottom of page