Classification & Regression

Supervised Learning

Classification

AI System Fundamentals

Overview

Classification and regression are two foundational types of supervised machine learning tasks. Classification involves predicting discrete labels or categories based on input features, such as determining whether an email is spam or not. Regression, on the other hand, predicts continuous numerical values, such as forecasting the price of a house given its characteristics. Both approaches require labeled data for training and are widely used across sectors. While classification is well-suited for problems with a finite set of outcomes, regression is ideal for scenarios where the outcome is a real-valued number. However, a key limitation is that real-world problems sometimes blur these lines, such as ordinal regression or multi-label classification, which require nuanced approaches and may not fit neatly into one category. Additionally, model performance can be highly sensitive to data quality and feature selection. Careful consideration of the problem type, data characteristics, and evaluation metrics is essential for effective model development and deployment.

Governance Context

Within AI governance frameworks such as the EU AI Act and NIST AI Risk Management Framework, organizations must ensure transparency in model purpose and output type-clearly documenting whether a model is used for classification or regression. For example, under the EU AI Act, high-risk AI systems must provide documentation on intended use and performance metrics, which differ for classification (e.g., accuracy, precision) versus regression (e.g., mean squared error). NIST's framework obligates implementers to assess and mitigate risks of misclassification or prediction error, and to conduct regular audits of model performance. Both frameworks require data governance controls to ensure the quality and representativeness of training data, which is critical for both classification and regression models. Concrete obligations include: (1) maintaining detailed records of model type, intended use, and evaluation metrics; (2) implementing regular performance audits and bias assessments to detect and address errors or unfair outcomes.

Ethical & Societal Implications

Misapplication or misinterpretation of classification and regression models can lead to unfair or harmful outcomes, such as biased loan approvals or incorrect medical diagnoses. Inadequate transparency regarding model type and performance can erode stakeholder trust. Furthermore, overreliance on automated decisions without human oversight may exacerbate existing inequalities, especially if training data is unrepresentative. Ethical governance requires rigorous monitoring, explainability, and mechanisms for recourse when errors occur. Societal impacts include potential discrimination, loss of access to critical services, and diminished accountability if errors are not traceable or correctable.

Key Takeaways

Classification predicts discrete labels; regression predicts continuous values.; Both require high-quality, representative training data for reliable results.; Governance frameworks mandate transparency and risk mitigation for both model types.; Misclassification or poor regression predictions can have significant real-world impacts.; Ethical use requires explainability, auditability, and recourse mechanisms.; Choosing the wrong model type or evaluation metric can lead to poor outcomes.; Regular audits and bias assessments are essential for responsible model deployment.