Credit risk is the possibility of a loss resulting from a borrower's failure to repay a loan or meet contractual obligations. Determining credit risk requires creditors to evaluate customers based on their credit scores. As a result of this, there are classification imbalances with credit risk because good loans outnumber riskier loans. We are tasked to build a classification model using machine learning statistical algorithms to make predictions on the credit risk of a client. In our analysis, we will be using the credit card credit dataset from LendingClub, a peer-to-peer lending services company. We will utilize different machine learning techniques such as RandomOverSampler
, SMOTE
, ClusterCentroids
, SMOTEENN
, BalancedRandomForestClassifier
, and EasyEnsembleClassifier
to train and evaluate data to build a recommendation for the best machine learning model to use for credit risk predictions.
Jupyter Notebook 6.4.12
LoanStats_2019Q1.csv
Python 3.10
In each analysis with the resampling models, we used the resampled data to train a logistic regression model and calculated the balanced accuracy score from sklearn.metrics
, printed the confusion matrix, and generated a classification report from imbalanced-learn
.
In random oversampling, instances of the minority class are randomly selected and added to the training set until the majority and minority classes are balanced. Oversampling addresses class imbalance by duplicating or mimicking existing data.
Python Code:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
X_resampled, y_resampled = ros.fit_resample(X_train, y_train)
Balanced Accuracy Score:
0.663188044716539
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 76 | 25 |
Actual Low Risk | 7288 | 9816 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.01 0.75 0.57 0.02 0.66 0.44 101
low_risk 1.00 0.57 0.75 0.73 0.66 0.42 17104
avg / total 0.99 0.57 0.75 0.72 0.66 0.42 17205
The Naive Random Oversampling model accurately predicts credit risk 66.3% of the time. Additionally, the precision of the model for high risk is 0.01 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 1% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model 0.75 for high risk and 0.57 for low risk. This means that it correctly identifies 75% of all high risk and 57% for all low risk.
The synthetic minority oversampling technique (SMOTE) is another oversampling approach to deal with unbalanced datasets. In SMOTE, like random oversampling, the size of the minority is increased. The key difference between the two lies in how the minority class is increased in size. As we have seen, in random oversampling, instances from the minority class are randomly selected and added to the minority class. In SMOTE, by contrast, new instances are interpolated. That is, for an instance from the minority class, a number of its closest neighbors is chosen. Based on the values of these neighbors, new values are created.
Python Code:
from imblearn.over_sampling import SMOTE
X_resampled, y_resampled = SMOTE(random_state=1,
sampling_strategy='auto').fit_resample(X_train, y_train)
Balanced Accuracy Score:
0.6621894942066704
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 64 | 37 |
Actual Low Risk | 5290 | 11814 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.01 0.63 0.69 0.02 0.66 0.44 101
low_risk 1.00 0.69 0.63 0.82 0.66 0.44 17104
avg / total 0.99 0.69 0.63 0.81 0.66 0.44 17205
The SMOTE Oversampling model accurately predicts credit risk 66.2% of the time. Additionally, the precision of the model for high risk is 0.01 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 1% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model 0.63 for high risk and 0.69 for low risk. This means that it correctly identifies 63% of all high risk and 69% for all low risk.
Cluster centroid undersampling is akin to SMOTE. The algorithm identifies clusters of the majority class, then generates synthetic data points, called centroids, that are representative of the clusters. The majority class is then undersampled down to the size of the minority class.
Python Code:
from imblearn.under_sampling import ClusterCentroids
cc = ClusterCentroids(random_state=1)
X_resampled, y_resampled = cc.fit_resample(X_train, y_train)
Balanced Accuracy Score:
0.5447339051023905
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 70 | 31 |
Actual Low Risk | 10324 | 6780 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.01 0.69 0.40 0.01 0.52 0.28 101
low_risk 1.00 0.40 0.69 0.57 0.52 0.27 17104
avg / total 0.99 0.40 0.69 0.56 0.52 0.27 17205
The Cluster Centroids model accurately predicts credit risk 54.4% of the time. Additionally, the precision of the model for high risk is 0.01 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 1% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model 0.69 for high risk and 0.40 for low risk. This means that it correctly identifies 69% of all high risk and 40% for all low risk.
SMOTEENN is an approach to resampling that combines aspects of both oversampling and undersampling.
Python Code:
from imblearn.combine import SMOTEENN
smote_enn = SMOTEENN(random_state=0)
X_resampled, y_resampled = smote_enn.fit_resample(X, y)
Balanced Accuracy Score:
0.644711676499736
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 73 | 28 |
Actual Low Risk | 7412 | 9692 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.01 0.72 0.57 0.02 0.64 0.42 101
low_risk 1.00 0.57 0.72 0.72 0.64 0.40 17104
avg / total 0.99 0.57 0.72 0.72 0.64 0.40 17205
The SMOTEENN model accurately predicts credit risk 64.5% of the time. Additionally, the precision of the model for high risk is 0.01 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 1% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model 0.72 for high risk and 0.57 for low risk. This means that it correctly identifies 72% of all high risk and 57% for all low risk.
In each analysis with the ensemble models, we trained the model using training data and calculated the balanced accuracy score from sklearn.metrics
, printed the confusion matrix, and generated a classification report from imbalanced-learn
.
Instead of having a single, complex tree like the ones created by decision trees, a random forest algorithm will sample the data and build several smaller, simpler decision trees. Each tree is simpler because it is built from a random subset of features.
Python Code:
from imblearn.ensemble import BalancedRandomForestClassifier
rf_model = BalancedRandomForestClassifier(n_estimators=100, random_state=1)
rf_model.fit(X_train, y_train)
Balanced Accuracy Score:
0.7885466545953005
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 71 | 30 |
Actual Low Risk | 2153 | 14951 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.03 0.70 0.87 0.06 0.78 0.60 101
low_risk 1.00 0.87 0.70 0.93 0.78 0.62 17104
avg / total 0.99 0.87 0.70 0.93 0.78 0.62 17205
The SMOTEENN model accurately predicts credit risk 78.9% of the time. Additionally, the precision of the model for high risk is 0.03 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 3% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model 0.70 for high risk and 0.87 for low risk. This means that it correctly identifies 70% of all high risk and 87% for all low risk.
In AdaBoost, a model is trained and then evaluated. After evaluating the errors of the first model, another model is trained. This time, however, the model gives extra weight to the errors from the previous model. The purpose of this weighting is to minimize similar errors in subsequent models. Then, the errors from the second model are given extra weight for the third model. This process is repeated until the error rate is minimized.
Python Code:
from imblearn.ensemble import EasyEnsembleClassifier
EE_model = EasyEnsembleClassifier(n_estimators=100, random_state=1)
EE_model.fit(X_train, y_train)
Balanced Accuracy Score:
0.9316600714093861
Confusion Matrix:
Predicted High Risk | Predicted Low Risk | |
---|---|---|
Actual High Risk | 93 | 8 |
Actual Low Risk | 983 | 16121 |
Classification Report:
pre rec spe f1 geo iba sup
high_risk 0.09 0.92 0.94 0.16 0.93 0.87 101
low_risk 1.00 0.94 0.92 0.97 0.93 0.87 17104
avg / total 0.99 0.94 0.92 0.97 0.93 0.87 17205
The SMOTEENN model accurately predicts credit risk 93.2% of the time. Additionally, the precision of the model for high risk is 0.09 and low risk is 1.00. In other words, when it predicts that a client is high risk, it is correct 9% of the time and when it predicts that a client is low risk, it is correct 100% of the time. The recall in our model is 0.92 for high risk and 0.94 for low risk. This means that it correctly identifies 92% of all high risk and 94% of all low risk.
EasyEnsembleClassifer
: 93.2% accuracy, 9% precision, and 92% recallBalancedRandomForestClassifer
: 78.9% accuracy, 3% precision, and 70% recallSMOTE
: 66.2% accuracy, 1% precision, and 63% recallRandomOverSampler
: 66.3% accuracy, 1% precision, and 75% recallSMOTEENN
: 64.5% accuracy, 1% precision, and 72% recall ClusterCentroids
: 54.4% accuracy, 1% precision, and 69% recallBased on the results, the best overall model is the AdaBoost Classifier or EasyEnsembleClassifer
. This model has a 93.2% balanced accuracy score, a precision rate of 9%, and a sensitivity rate of 92% for high risk. The overall results were highest compared to the other models we tested in our analysis therefore this is the model we recommend using.
Although this model is the best compared to the other models in our test, it still has a low precision. Since the precision for high risk is only 0.09, when it predicts that a client is high risk, it is correct 9% of the time. As a result, the classifier returns a lot of false positives. This will benefit the credit card companies since it is better to reject predicted high risk individuals to avoid risky loans.