Table 2 Example-dependent cost matrix.
Actual positive | Actual negative | |
|---|---|---|
Predicted positive | \({C_{T{P_i}}}=0\) | \({C_{F{P_i}}}={r_i}+C_{{FP}}^{a}\) |
Predicted negative | \({C_{F{N_i}}}=C{l_i} * {L_{gd}}\) | \({C_{T{N_i}}}=0\) |
- For the i-th customer sample \({x_i}\) in the dataset D, its cost matrix is \({C_i}=\)\([{C_{F{P_i}}},{C_{F{N_i}}},{C_{T{P_i}}},{C_{T{N_i}}}]\), where \({C_{T{P_i}}}\) is the cost of correctly classifying a positive class as positive, CFPi is the cost of misclassifying a negative class as positive, \({C_{F{N_i}}}\) is the cost of misclassifying a positive class as negative, and \({C_{T{N_i}}}\) is the cost of correctly classifying a negative class as negative. Specifically, \({C_{F{P_i}}}\) is composed of the sum of \({r_i}\) and \(C_{{FP}}^{a}\), where \({r_i}\) is the loss from losing a quality customer. \({r_i}\) can be calculated using the time value formula: \({r_i}=PV(A(C{l_i},in{t_i},{l_i}),in{t_{cf}},{l_i}) - C{l_i}\), where A is the customer’s monthly repayment amount, \(PV\) is the present value of monthly repayments, \(in{t_{{r_i}}}\) is the loan interest rate, \({l_i}\) is the loan term, and \(in{t_{cf}}\) is the cost of capital. The customer’s credit limit \(C{l_i}\) is calculated as follows: \(C{l_i}=\)\(\hbox{min} (q \cdot In{c_i},C{l_{max}},C{l_{max}}(deb{t_i}))\), where \(In{c_i}\) is the customer’s income, q is a parameter that defines the maximum credit limit Cli as a function of the income \(In{c_i}\) and \(deb{t_i}\) is the debt ratio. The maximum total credit limit \(C{l_{max}}(deb{t_i})\) can be calculated as: \(C{l_{max}}(deb{t_i})=PV(In{c_i} \cdot {P_m}(deb{t_i}),in{t_{{r_i}}},{l_i})\), where \({P_m}(deb{t_i})=\hbox{min} (A(q \cdot In{c_i},in{t_{{r_i}}},{l_i})/In{c_i}(1 - deb{t_i}))\) is the current debt ratio. The assumption that the financial institution does not retain the idle capital, \(C_{{FP}}^{a}\) is the potential loss from rejecting a quality customer, and is calculated as: \(C_{{FP}}^{a}= - \bar {r} \cdot {\pi _0}+\bar {C}l \cdot {L_{gd}} \cdot {\pi _1}\), where \(\bar {C}l\) is the average credit limit in the market, \(\bar {r}\) is the average profit margin, \({L_{gd}}\) is a loss due to bad debt as a proportion of the credit line, and \({\pi _1}\) and \({\pi _0}\) are the prior probabilities of potential customers defaulting or repaying the loan, respectively. Additionally, \({C_{F{N_i}}}\) is the product of \(C{l_i}\) and \({L_{gd}}\). It is generally assumed18 that the cost of misclassification should be greater than the cost of correct classification, i.e., \({C_{F{N_i}}}>{C_{T{P_i}}}\) and \({C_{F{P_i}}}>{C_{T{N_i}}}\), and the cost of correct classification is zero, i.e., \({C_{T{P_i}}}={C_{T{N_i}}}=0\). Based on the above cost matrix, the augmented feature vector for each sample can be obtained as \(\left[ {{x_i},{C_i}} \right]\). The dataset D can then be expanded to a new dataset \({D^\prime }=\{ ({x_i},{C_i},{y_i})\} _{{i=1}}^{N}\), where the overall misclassification cost for the N samples in \({D^\prime }\) is calculated as follows21: