German Credit Dataset (Preprocessed) — credit • mlr3summary

Preprocessed version of the German Credit Risk dataset available on kaggle, based on the Statlog (German credit dataset) of Hofmann (1994) available on UCI.

Format

A data frame with 522 and 6 variables:

age: age of the customer [19-75]
sex: sex of the customer (female, male)
saving.accounts: saving account balance of the customer (little, moderate, rich)
duration: payback duration of credit (in month) [6-72]
credit.amount: credit amount [276-18424]
risk: whether the credit is of low/good or high/bad risk (bad, good)

Details

The dataset was further adapted: rows with missing values were removed, low-cardinal classes were binned, classes of the job feature were renamed, the features on the savings and checking account were defined as ordinal variables, and all feature names were transposed to lower. Only a subset of features was selected: "age", "sex", "saving.accounts", "duration", "credit.amount", "risk".

References

Hofmann, Hans (1994). “Statlog (German Credit Data).” UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data.

Ferreira L (2018). “German credit risk.” Last accessed 10.04.2024, https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk.