Wangsheng's World
Credit Card Default Analysis
June 2024 — August 2024
(This page is waiting for update. Some files can be accessed at my GitHub page at this moment.)
Introduction
From the angle of banks issuing credit cards, it's better to lend more money to more people given the people would not default. Having a clear view of the default distribution map over the whole population can be a powerful tool for banks and companies to achieve more revenue. For the scale of this project, we are going to investigate in customers' default payments in Taiwan and develop a model that can predict whether a customer would default on her/his credict card. Even though the dataset is Taiwan specific, it might still be appliable to other parts of the world.
Data Preparation
The data we are going to use come from UC Irvine Machine Learning Repository. The dataset Default of Credit Card Clients contains 30,000 rows of data with 23 features. This dataset is clean enough so that the focus of this project is on wrangling between different machine learning models to train an ideal model to estimate the real probability of default.
The data you can download includes:
  • [X0] ID
  • [X1] LIMIT_BAL: Amount of the given credit (NT dollar), it includes both the individual consumer credit and his/her family (supplementary) credit.
  • [X2] SEX: Gender (1 = male; 2 = female).
  • [X3] EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
  • [X4] MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
  • [X5] AGE: Age (year).
  • [X6 - X11] PAY_0, PAY_2 - PAY_6: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • [X12 - X17] BILL_AMT1 - BILL_AMT6: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
  • [X18 - X23] PAY_AMT1 - PAY_AMT6: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
We have customers' credit card transaction history over the past 6 months and we are going to use them to predict if the customer will default or not. Note that since we are only using the data period of 6 months, the result model is likely to be universal and be easily implemented into banks' credit card service systems.