Developed by Robert Tibshirani in 1996, the LASSO (Least Absolute Shrinkage and Selection Operator) method operates on a straightforward premise: efficiently predict an outcome using a set of
predictors while maintaining a model that is both accurate and minimalist.
The objective function for LASSO regression is to minimize:
RSS + λ * sum of |β_j|
The initial segment, the Residual Sum of Squares (RSS), quantifies the model's fit to the data. The subsequent part introduces the LASSO penalty, with λ as the tuning parameter that controlsthepenalty's strength, and β_j as the coefficients for the predictors.
The tendency of coefficients to shrink towards zero results from minimizing the objective function. This dynamic is steered by the balance between data fit minimization (lowering RSS) andmodelsimplicity (reducing the sum of the absolute values of the coefficients).
For predictors with minimal impact on the outcome, the optimization process finds it more cost-effective to nullify their coefficients rather than to maintain them withnon-zerovalues.
Selecting an optimal λ is critical. Excessively high λ values can overly simplify the model, potentially overlooking significant predictors. Conversely, a λ that's too low might yield a modelthat'sunnecessarily complex and prone to overfitting.
Imagine predicting a target variable Y based on 5 predictors (X1 through X5), where the true relationship, unknown initially, is significantly influenced by X1 and X2, but X3, X4, and X5 haveminimalimpact:
- True relationship: Y = 2X1 + 3X2 + 0.1X3 + error, with X4 and X5 not contributing to Y at all.
If we fit a model that includes all five predictors, it might erroneously assign non-zero coefficients to X3, X4, and X5 due to noise, even though they do not truly affect Y.
1. λ = 0: This equals ordinary least squares fitting:
- Coefficients: X1 = 2.1, X2 = 3.1, X3 = 0.5, X4 = -0.2, X5 = 0.2
2. λ is small: LASSO begins to affect the model by slightly reducing the coefficients for X3, X4, and X5:
- New Coefficients: X1 = 2.05, X2 = 3.05, X3 = 0.3, X4 = -0.1, X5 = 0.1
3. λ is moderate: Increases in λ lead to a more significant reduction, especially for less informative predictors:
- New Coefficients: X1 = 2.0, X2 = 3.0, X3 = 0, X4 = 0, X5 = 0
4. λ is large: Continuing to increase λ might excessively penalize even significant predictors:
- Hypothetical Coefficients: X1 = 1.8, X2 = 2.8, X3 = 0, X4 = 0, X5 = 0
LASSO effectively removes the influence of X3, X4, and X5, focusing the model on the variables that truly matter (X1 and X2). By eliminating irrelevant variables, LASSO simplifies the modelandpotentially enhances its generalizability and predictive accuracy on new data.
Écrire commentaire