(This page is waiting for update.
Some files can be accessed at
my GitHub page
at this moment.)
Credit Card Default Analysis
June 2024 — August 2024
Introduction
From the angle of banks issuing credit cards,
it's better to lend more money to more people given the people would not default.
Having a clear view of the default distribution map over the whole population
can be a powerful tool for banks and companies to achieve more revenue.
For the scale of this project, we are going to investigate in customers' default
payments in Taiwan and develop a model that can predict whether a customer would
default on her/his credict card.
Even though the dataset is Taiwan specific,
it might still be appliable to other parts of the world.
Data Preparation
The data we are going to use come from UC Irvine Machine Learning Repository.
The dataset Default of Credit Card Clients
contains 30,000 rows of data with 23 features.
This dataset is clean enough so that the focus of this project is on wrangling between different
machine learning models to train an ideal model to estimate the real probability of default.
The data you can download includes:
- [X0] ID
- [X1] LIMIT_BAL: Amount of the given credit (NT dollar), it includes both the individual consumer credit and his/her family (supplementary) credit.
- [X2] SEX: Gender (1 = male; 2 = female).
- [X3] EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
- [X4] MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
- [X5] AGE: Age (year).
- [X6 - X11] PAY_0, PAY_2 - PAY_6: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
- [X12 - X17] BILL_AMT1 - BILL_AMT6: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
- [X18 - X23] PAY_AMT1 - PAY_AMT6: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.