Model Evaluation on Loan Prediction Dataset¶

Author: Zhanglin Liu

Date: 11/07/2020

Background¶

The dataset of interest here is the training dataset of the Loan Prediction Problem Dataset, which can be found on Kaggle's publicly available repository. The goal of this project is to evaluate the Logistic Regression prediction model on Loan Prediction training dataset, fine-tune the model on the validation dataset, and use appropriate and accurate diagnotics for the data.

Data Dictionary:

Loan_ID: Unique Loan ID
Gender: Male/Female
Married: Applicant married Y/N
Dependents: Number of dependents
Education: Graduate/Undergrad
Self_Employed: Y/N
ApplicantIncome: Applicant Income
CoapplicantIncome: Coapplicant Income
LoanAmount: Loan amount in thousands
Loan_Amount_Term: Term of loan in months
Credit_History: 1 for meeting the guidelines, 0 for not meeting the guidelines
Property_Area: Urban/Semi Urban/Rural
Loan_Status: Loan approved Y/N

Loading Dataset¶

import pandas as pd
import numpy as np
from collections import Counter
from pandas import DataFrame, Series
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
df = pd.read_csv("loan_train.csv")
df

Data Cleaning¶

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB

# drop Loan_ID column
df = df.drop(['Loan_ID'], axis = 1)

# Identify the columns with null values
df.isnull().sum()

Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

# Fill in all null values appropriately
for col in df.columns:
    df[col].fillna(df[col].mode()[0], inplace = True)
df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace = True)
df.isnull().sum()

Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

# One-hot encoding
code_numeric = {'Male':1, 'Female':2,
               'Yes': 1, 'No':2,
                'Graduate':1, 'Not Graduate':2,
                'Urban':1, 'Semiurban':2, 'Rural':3,
                'Y':1, 'N':0,
                '3+':3 }
df = df.applymap(lambda i: code_numeric.get(i) if i in code_numeric else i)
df['Dependents'] = pd.to_numeric(df.Dependents)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Gender             614 non-null    int64  
 1   Married            614 non-null    int64  
 2   Dependents         614 non-null    int64  
 3   Education          614 non-null    int64  
 4   Self_Employed      614 non-null    int64  
 5   ApplicantIncome    614 non-null    int64  
 6   CoapplicantIncome  614 non-null    float64
 7   LoanAmount         614 non-null    float64
 8   Loan_Amount_Term   614 non-null    float64
 9   Credit_History     614 non-null    float64
 10  Property_Area      614 non-null    int64  
 11  Loan_Status        614 non-null    int64  
dtypes: float64(4), int64(8)
memory usage: 57.7 KB

# shuffle the dataset as all records should be independent of each other
df.sample(frac = 1)

# separating features and target
y = df['Loan_Status']
X = df.drop(['Loan_Status'], axis = 1)

# Split the dataset at 60/20/20 ratio
N = len(X)
X_train = X[:3*N//5]
X_validation = X[3*N//5:4*N//5]
X_test = X[4*N//5:]
y_train = y[:3*N//5]
y_validation = y[3*N//5:4*N//5]
y_test = y[4*N//5:]
len(X),len(X_train), len(X_validation), len(X_test)

(614, 368, 123, 123)

len(y),len(y_train), len(y_validation), len(y_test)

(614, 368, 123, 123)

Prediction Model¶

# Logistic Regression model
model = LogisticRegression(max_iter = 4000)
model.fit(X_train, y_train)
print('Model Score with all features: \n', model.score(X_train, y_train))

Model Score with all features: 
 0.7880434782608695

print('Coefficient: \n', model.coef_)
print('Intercept: \n', model.intercept_)

Coefficient: 
 [[ 1.36883904e-01 -1.46079349e-01 -1.42282797e-02 -3.00144038e-01
   5.56749245e-01  1.71173087e-05  6.61155418e-05 -2.88458774e-03
  -4.64137684e-03  2.57582074e+00 -2.45759060e-01]]
Intercept: 
 [0.29113768]

coeff = DataFrame(X_train.columns)
coeff['Coefficient Estimates'] = Series(model.coef_.flatten())
coeff.columns = ['Features','Coefficient Estimates' ]
coeff

Model Evaluation¶

# getting the probabilities of whether an event is happening or not 
# for each observations in X_train
prob = model.predict_proba(X_train)
# obtaining the probabilities of event happening for all observations
p_x_val = prob[:,1]
p_x_val

array([0.79614169, 0.74223928, 0.74243871, 0.78722563, 0.78656791,
       0.68335155, 0.78925635, 0.20619012, 0.8039015 , 0.80732366,
       0.83550295, 0.82864181, 0.85038485, 0.73253232, 0.9470967 ,
       0.79126056, 0.83535585, 0.27097126, 0.65988969, 0.84535762,
       0.2157884 , 0.78674621, 0.17784428, 0.14427753, 0.78850137,
       0.63752538, 0.79472631, 0.7772402 , 0.77416177, 0.79031243,
       0.81068791, 0.80986518, 0.57943046, 0.79412731, 0.63724472,
       0.82975825, 0.27079503, 0.80050261, 0.83556915, 0.71496304,
       0.80833672, 0.82974443, 0.82909121, 0.79315913, 0.66841272,
       0.82488391, 0.84683735, 0.80825666, 0.31909978, 0.78634107,
       0.76829003, 0.76962265, 0.7770991 , 0.80510078, 0.13197652,
       0.78983856, 0.79793975, 0.74584932, 0.81877596, 0.77850383,
       0.84580621, 0.81427817, 0.17688633, 0.16957957, 0.20757354,
       0.7624746 , 0.35622383, 0.73580042, 0.88166129, 0.19860882,
       0.81993928, 0.62236777, 0.81265668, 0.16762008, 0.64044138,
       0.68775609, 0.81441161, 0.74707999, 0.26272126, 0.60692084,
       0.77645954, 0.65307232, 0.82187677, 0.7302225 , 0.9021402 ,
       0.76461651, 0.74721468, 0.80083679, 0.75937331, 0.79849063,
       0.79789517, 0.90501159, 0.79714833, 0.74676837, 0.89717853,
       0.79872144, 0.80924571, 0.81217915, 0.85618604, 0.80139677,
       0.8972277 , 0.77929908, 0.83780193, 0.8144328 , 0.803466  ,
       0.82808515, 0.79361389, 0.64476169, 0.23688683, 0.6042151 ,
       0.78702329, 0.80984664, 0.15208573, 0.67038375, 0.75188394,
       0.85816563, 0.80321542, 0.83553462, 0.73008826, 0.76428476,
       0.73439811, 0.80905741, 0.30128131, 0.78536529, 0.64792977,
       0.74052126, 0.63654638, 0.72016951, 0.35803049, 0.73338667,
       0.26209163, 0.82591311, 0.76980811, 0.88624828, 0.75990989,
       0.77036141, 0.77486206, 0.76916078, 0.16049843, 0.67451337,
       0.71603203, 0.77144651, 0.77824795, 0.79206953, 0.90232105,
       0.84469785, 0.87162849, 0.8549839 , 0.70807198, 0.78350266,
       0.15552725, 0.63640223, 0.74296586, 0.65643112, 0.82051694,
       0.21074651, 0.70886667, 0.79642223, 0.72301026, 0.77267579,
       0.73625345, 0.7496962 , 0.21536621, 0.72025366, 0.92643341,
       0.73316207, 0.76046004, 0.76004842, 0.12857703, 0.73890172,
       0.69780699, 0.7020528 , 0.82951574, 0.8082829 , 0.62805222,
       0.75362545, 0.747475  , 0.15496901, 0.77277603, 0.12100937,
       0.31736156, 0.78433968, 0.90135007, 0.78981904, 0.80156466,
       0.83060822, 0.14698177, 0.78207234, 0.64181939, 0.84758191,
       0.68176155, 0.7493145 , 0.74951989, 0.77019882, 0.69898167,
       0.77353202, 0.7247245 , 0.78380486, 0.82696711, 0.73115761,
       0.75850714, 0.16297924, 0.87753198, 0.7653404 , 0.70877927,
       0.71680472, 0.82778771, 0.70960133, 0.81466719, 0.74169338,
       0.74142333, 0.2078931 , 0.651236  , 0.52148428, 0.83387216,
       0.73268913, 0.73009425, 0.78784328, 0.22834544, 0.81482245,
       0.27759639, 0.79047295, 0.76541994, 0.76859459, 0.7766864 ,
       0.69597333, 0.64119321, 0.74904637, 0.64047301, 0.5231707 ,
       0.79785132, 0.88501955, 0.66750586, 0.6972976 , 0.77941452,
       0.7182479 , 0.75347806, 0.71010119, 0.72744505, 0.78147775,
       0.78893551, 0.88551925, 0.95172502, 0.63625276, 0.76925143,
       0.87335672, 0.82764408, 0.8801088 , 0.68002548, 0.83879897,
       0.1595846 , 0.79857288, 0.53679865, 0.85068895, 0.14186674,
       0.62907885, 0.12053718, 0.82207836, 0.68486184, 0.70704038,
       0.76376555, 0.76086004, 0.93096193, 0.73707934, 0.6283339 ,
       0.72696695, 0.77203706, 0.2438526 , 0.71561976, 0.83531794,
       0.84737228, 0.82533915, 0.71599413, 0.78037237, 0.78370538,
       0.79110603, 0.76788476, 0.84060827, 0.62261092, 0.69257312,
       0.11259032, 0.78679975, 0.91990226, 0.84949522, 0.78083816,
       0.75325421, 0.74907646, 0.75258142, 0.77528058, 0.67703931,
       0.74542476, 0.19995108, 0.81706422, 0.15351551, 0.78138816,
       0.82530446, 0.73793039, 0.82743668, 0.6259304 , 0.83782568,
       0.23776455, 0.74470955, 0.72283542, 0.83345802, 0.72046736,
       0.7303351 , 0.73863261, 0.27561944, 0.5196393 , 0.629934  ,
       0.78703318, 0.7213969 , 0.75678352, 0.83777077, 0.72481586,
       0.89301934, 0.76811922, 0.84249851, 0.77255917, 0.60253265,
       0.79597577, 0.76766072, 0.68025776, 0.75482836, 0.64831551,
       0.0765151 , 0.14807418, 0.660066  , 0.85939335, 0.8312894 ,
       0.81003625, 0.75587101, 0.81031695, 0.9055628 , 0.8561214 ,
       0.75787945, 0.71555848, 0.62644041, 0.18419082, 0.79671863,
       0.63906699, 0.76165293, 0.75832785, 0.73237942, 0.74501085,
       0.77772502, 0.14459726, 0.7787704 , 0.76368809, 0.76525005,
       0.73233349, 0.65784199, 0.75576089, 0.12548557, 0.79338554,
       0.90981443, 0.75223303, 0.83386858, 0.14119189, 0.76619232,
       0.79124946, 0.75538031, 0.83410733, 0.88194847, 0.56713302,
       0.63168963, 0.63911658, 0.15072382])

tbl_p = DataFrame(X_train.columns)
tbl_p['Probability'] = Series(p_x_val.flatten())
tbl_p.columns = ['Features','Probability' ]
tbl_p['Probability'] = sorted(tbl_p['Probability'])
tbl_p

ax = tbl_p.plot.scatter(x = 'Features', y = 'Probability')

# The actual prediction of the target variable for X_train
y_train_pred = model.predict(X_train)
y_train_pred

array([1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0], dtype=int64)

# actual y_train values
np.array(y_train)

array([1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1,
       0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1,
       1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0,
       0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
       1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0], dtype=int64)

print('Actual y_train values: \n', Counter(y_train))
print('Predicted y_train values: \n', Counter(y_train_pred))

Actual y_train values: 
 Counter({1: 253, 0: 115})
Predicted y_train values: 
 Counter({1: 319, 0: 49})

confusion_matrix(y_train, y_train_pred)

array([[ 43,  72],
       [  6, 247]], dtype=int64)

# Precision = TP/(TP+FP) for training dataset
Precision = 247/(247+72)
Precision

0.774294670846395

# Recall = TP/(TP+FN) for training dataset
Recall = 247/(247+6)
Recall

0.9762845849802372

Searching the Key Hyperparameters¶

solvers = ['newton-cg','lbfgs','liblinear']
penalty = ['l2']
c_val = [100, 10, 1.0, 0.1, 0.01]
grid = dict(solver = solvers, penalty = penalty, C = c_val)
cv = RepeatedStratifiedKFold(n_splits = 10, n_repeats = 3, random_state = 1)
grid_search = GridSearchCV(estimator = model, param_grid = grid, n_jobs = -1, cv = cv,scoring = 'accuracy', error_score = 0)
grid_result = grid_search.fit(X_validation, y_validation)
print("Best: %f using %s" %(grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with %r" % (mean, stdev, param))

Best: 0.845726 using {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}
0.845726 (0.108823) with {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}
0.845726 (0.093113) with {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}
0.842521 (0.098151) with {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
0.839957 (0.103594) with {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}
0.837179 (0.090634) with {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}
0.837179 (0.090634) with {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
0.831410 (0.077969) with {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}
0.839530 (0.080277) with {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}
0.834402 (0.086797) with {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}
0.714957 (0.069707) with {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
0.725855 (0.065177) with {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}
0.730983 (0.072097) with {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
0.679915 (0.043285) with {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}
0.679915 (0.043285) with {'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}
0.679915 (0.043285) with {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}

Fine-tuning the Model¶

new_model = LogisticRegression(penalty ='l2',C = 100, solver = 'newton-cg').fit(X_train,y_train)
print("Old model score on Training set: \n", model.score(X_train, y_train))
print("New model score after tuning hyperparameter: \n", new_model.score(X_train, y_train))

Old model score on Training set: 
 0.7880434782608695
New model score after tuning hyperparameter: 
 0.7907608695652174

y_test_old = model.predict(X_test)
y_test_new = new_model.predict(X_test)

Model Diagnotics¶

print('Actual y_test values: \n', Counter(y_test))
print('Predicted y_test values with old model: \n', Counter(y_test_old))
print('Predicted y_test values with new model: \n', Counter(y_test_new))

Actual y_test values: 
 Counter({1: 84, 0: 39})
Predicted y_test values with old model: 
 Counter({1: 101, 0: 22})
Predicted y_test values with new model: 
 Counter({1: 104, 0: 19})

print("Precision score for old model: \n",precision_score(y_test, y_test_old))
print("Recall score for old model:\n", recall_score(y_test, y_test_old))

Precision score for old model: 
 0.7920792079207921
Recall score for old model:
 0.9523809523809523

print("Precision score for fine-tuned model: \n",precision_score(y_test, y_test_new))
print("Recall score for fine-tuned model:\n", recall_score(y_test, y_test_new))

Precision score for fine-tuned model: 
 0.7980769230769231
Recall score for fine-tuned model:
 0.9880952380952381

Conclusion¶

The default Logistic Regression model works well in training dataset, but in order to avoid overfitting, which will cause relatively more inaccuracy in prediction, it is bests to split up the training dataset and fine-tune the model on a validation dataset. After fine-tuning the hyperparameter, not only the model score improved, but also the precision score and recall score.

	Gender	Married	Dependents	Education	Self_Employed	ApplicantIncome	CoapplicantIncome	LoanAmount	Loan_Amount_Term	Credit_History	Property_Area	Loan_Status
326	1	2	0	1	2	4917	0.0	130.0	360.0	0.0	3	1
180	1	1	1	1	2	6400	7250.0	180.0	360.0	0.0	1	0
56	1	1	0	1	2	2132	1591.0	96.0	360.0	1.0	2	1
472	1	1	3	1	2	4691	0.0	100.0	360.0	1.0	2	1
609	2	2	0	1	2	2900	0.0	71.0	360.0	1.0	3	1
...	...	...	...	...	...	...	...	...	...	...	...	...
194	1	2	0	1	2	4191	0.0	120.0	360.0	1.0	3	1
163	1	1	2	1	2	4167	1447.0	158.0	360.0	1.0	3	1
123	1	1	2	1	2	2957	0.0	81.0	360.0	1.0	2	1
186	1	1	1	1	1	2178	0.0	66.0	300.0	0.0	3	0
97	1	1	0	1	2	1977	997.0	50.0	360.0	1.0	2	1

	Loan_ID	Gender	Married	Dependents	Education	Self_Employed	ApplicantIncome	CoapplicantIncome	LoanAmount	Loan_Amount_Term	Credit_History	Property_Area	Loan_Status
0	LP001002	Male	No	0	Graduate	No	5849	0.0	NaN	360.0	1.0	Urban	Y
1	LP001003	Male	Yes	1	Graduate	No	4583	1508.0	128.0	360.0	1.0	Rural	N
2	LP001005	Male	Yes	0	Graduate	Yes	3000	0.0	66.0	360.0	1.0	Urban	Y
3	LP001006	Male	Yes	0	Not Graduate	No	2583	2358.0	120.0	360.0	1.0	Urban	Y
4	LP001008	Male	No	0	Graduate	No	6000	0.0	141.0	360.0	1.0	Urban	Y
...	...	...	...	...	...	...	...	...	...	...	...	...	...
609	LP002978	Female	No	0	Graduate	No	2900	0.0	71.0	360.0	1.0	Rural	Y
610	LP002979	Male	Yes	3+	Graduate	No	4106	0.0	40.0	180.0	1.0	Rural	Y
611	LP002983	Male	Yes	1	Graduate	No	8072	240.0	253.0	360.0	1.0	Urban	Y
612	LP002984	Male	Yes	2	Graduate	No	7583	0.0	187.0	360.0	1.0	Urban	Y
613	LP002990	Female	No	0	Graduate	Yes	4583	0.0	133.0	360.0	0.0	Semiurban	N

	Features	Coefficient Estimates
0	Gender	0.136884
1	Married	-0.146079
2	Dependents	-0.014228
3	Education	-0.300144
4	Self_Employed	0.556749
5	ApplicantIncome	0.000017
6	CoapplicantIncome	0.000066
7	LoanAmount	-0.002885
8	Loan_Amount_Term	-0.004641
9	Credit_History	2.575821
10	Property_Area	-0.245759

	Features	Probability
0	Gender	0.206190
1	Married	0.683352
2	Dependents	0.742239
3	Education	0.742439
4	Self_Employed	0.786568
5	ApplicantIncome	0.787226
6	CoapplicantIncome	0.789256
7	LoanAmount	0.796142
8	Loan_Amount_Term	0.803901
9	Credit_History	0.807324
10	Property_Area	0.835503