German Credit Dataset (Preprocessed)
credit.Rd
Preprocessed version of the German Credit Risk dataset available on kaggle, based on the Statlog (German credit dataset) of Hofmann (1994) available on UCI.
Format
A data frame with 522 and 6 variables:
- age
age of the customer [19-75]
- sex
sex of the customer (female, male)
- saving.accounts
saving account balance of the customer (little, moderate, rich)
- duration
payback duration of credit (in month) [6-72]
- credit.amount
credit amount [276-18424]
- risk
whether the credit is of low/good or high/bad risk (bad, good)
Details
The dataset was further adapted: rows with missing values were removed, low-cardinal classes were binned, classes of the job feature were renamed, the features on the savings and checking account were defined as ordinal variables, and all feature names were transposed to lower. Only a subset of features was selected: "age", "sex", "saving.accounts", "duration", "credit.amount", "risk".
References
Hofmann, Hans (1994). “Statlog (German Credit Data).” UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data.
Ferreira L (2018). “German credit risk.” Last accessed 10.04.2024, https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk.