本站消息

站长简介/公众号


站长简介:高级软件工程师,曾在阿里云,每日优鲜从事全栈开发工作,利用周末时间开发出本站,欢迎关注我的微信公众号:程序员总部,程序员的家,探索程序员的人生之路!分享IT最新技术,关注行业最新动向,让你永不落伍。了解同行们的工资,生活工作中的酸甜苦辣,谋求程序员的最终出路!

  价值13000svip视频教程,python大神匠心打造,零基础python开发工程师视频教程全套,基础+进阶+项目实战,包含课件和源码

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

2020-12(18)

2021-01(35)

一行代码搞定信用评分模型(python)

发布于2019-08-07 12:00     阅读(536)     评论(0)     点赞(0)     收藏(0)



版权声明:本文为博主原创文章,转载请注明来源。 https://blog.csdn.net/csqazwsxedc/article/details/87982257

欢迎大家,上一篇博客【信用卡评分模型(R语言)】详细的讲解了如何开发评分卡,这片博客就不再详细介绍,为方便开发评分卡,本人根据自己经验写了一个python包,导入此包后仅需一行代码即可成功制作评分卡,默认参数已经设置好,如有疑问以及使用中出现BUG,可随时与我联系。
联系方式QQ:990020186
数据和实例已上传在我的github,如果觉得实用就star我的github,欢迎大家一起学习交流进步。

安装说明

包已上传至PYPI官网,详见CreditScoreModel包

首次使用可以直接使用以下代码安装

pip install CreditScoreModel
  • 1

模型训练

from CreditScoreModel.LogisticScoreCard import *
data=pd.read_csv('C:\\Users\\HP\\Desktop\\give me some credit\\data\\cs-training.csv')
data_predict=pd.read_csv('C:\\Users\\HP\\Desktop\\give me some credit\\data\\cs-test.csv')

del data['Unnamed: 0']
data.columns=['y','RevolvingUtilizationOfUnsecuredLines', 'age','NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome','NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse','NumberOfDependents']
del data_predict['Unnamed: 0']

ls=logistic_score_card()
data_train, data_test = ls.get_data_train_test(data,test_size=0.25,random_state=1234)
ls.fit(data_train)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\fixes.py:313: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  _nan_object_mask = _nan_object_array != _nan_object_array
2019 16:15:06 INFO 任务开始。。。
2019 16:15:06 INFO 连续和离散变量划分中。。。
2019 16:15:06 INFO 连续和离散变量划分完成!
2019 16:15:06 INFO 连续变量最优分组进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 23.53it/s]
2019 16:15:06 INFO 连续变量最优分组完成!
2019 16:15:06 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.38it/s]
2019 16:15:11 INFO 根据cut离散化连续变量完成!
2019 16:15:11 INFO IV值计算中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 37.45it/s]
2019 16:15:11 INFO IV值计算完成!
2019 16:15:11 INFO WOE转换中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 11.36it/s]
2019 16:15:12 INFO WOE转换完成!
2019 16:15:12 INFO 根据IV值大于 0.1 且 相关性小于 0.6 ,以及l1正则选取变量进行中。。。
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
2019 16:15:12 INFO 变量选取完成,总共 10 个变量,最终筛选出 5 个变量
2019 16:15:12 INFO 评分卡制作中。。。
2019 16:15:12 INFO 连续和离散变量划分中。。。
2019 16:15:12 INFO 连续和离散变量划分完成!
2019 16:15:12 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.36it/s]
2019 16:15:14 INFO 根据cut离散化连续变量完成!
2019 16:15:14 INFO WOE转换中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11.81it/s]
2019 16:15:15 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
2019 16:15:18 INFO 评分卡制作完成!
2019 16:15:18 INFO 任务完成!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

模型结果

评分卡

对应的变量中文名

[‘变量名’, ‘变量类型’, ‘切分点’, ‘切分分组’, ‘y为1的数量’, ‘y为0的数量’, ‘总数’, ‘y为1的数量占比’, ‘y为0的数量占比’, ‘总数占比’,‘y为1占总数比例’, ‘woe’, ‘各分组iv’, ‘变量iv值’, ‘logistic参数col_coef’, ‘logistic参数lr_intercept’, ‘分组分数’]

ls.score_card
  • 1
col type cuts cut_points 1_num 0_num total_num 1_pct 0_pct total_pct 1_rate woe iv total_iv col_coef lr_intercept score
0 NumberOfTime30-59DaysPastDueNotWorse continuous [-inf, 0.0, 1.0, inf] (-inf, 0.0] 3803 90648 94451 0.503509 0.863750 0.839564 0.040264 -0.539683 0.194416 0.737079 0.538172 -2.598254 20.950981
1 NumberOfTime30-59DaysPastDueNotWorse continuous [-inf, 0.0, 1.0, inf] (0.0, 1.0] 1801 10249 12050 0.238448 0.097659 0.107111 0.149461 0.892673 0.125679 0.737079 0.538172 -2.598254 -34.654352
2 NumberOfTime30-59DaysPastDueNotWorse continuous [-inf, 0.0, 1.0, inf] (1.0, inf] 1949 4050 5999 0.258043 0.038591 0.053324 0.324887 1.900110 0.416983 0.737079 0.538172 -2.598254 -73.763987
3 NumberOfTime60-89DaysPastDueNotWorse continuous [-inf, 0.0, inf] (-inf, 0.0] 5469 101314 106783 0.724083 0.965383 0.949182 0.051216 -0.287618 0.069402 0.570277 0.403517 -2.598254 8.371879
4 NumberOfTime60-89DaysPastDueNotWorse continuous [-inf, 0.0, inf] (0.0, inf] 2084 3633 5717 0.275917 0.034617 0.050818 0.364527 2.075741 0.500875 0.570277 0.403517 -2.598254 -60.419867
5 NumberOfTimes90DaysLate continuous [-inf, 0.0, inf] (-inf, 0.0] 4953 101333 106286 0.655766 0.965564 0.944764 0.046601 -0.386908 0.119863 0.833081 0.528354 -2.598254 14.746114
6 NumberOfTimes90DaysLate continuous [-inf, 0.0, inf] (0.0, inf] 2600 3614 6214 0.344234 0.034436 0.055236 0.418410 2.302207 0.713218 0.833081 0.528354 -2.598254 -87.743345
7 RevolvingUtilizationOfUnsecuredLines continuous [-inf, 0.22, 0.49, 0.86, inf] (-inf, 0.22] 1320 62341 63661 0.174765 0.594024 0.565876 0.020735 -1.223477 0.512953 1.071693 0.643205 -2.598254 56.766152
8 RevolvingUtilizationOfUnsecuredLines continuous [-inf, 0.22, 0.49, 0.86, inf] (0.22, 0.49] 916 16760 17676 0.121276 0.159700 0.157120 0.051822 -0.275223 0.010575 1.071693 0.643205 -2.598254 12.769650
9 RevolvingUtilizationOfUnsecuredLines continuous [-inf, 0.22, 0.49, 0.86, inf] (0.49, 0.86] 1695 13027 14722 0.224414 0.124129 0.130862 0.115134 0.592169 0.059386 1.071693 0.643205 -2.598254 -27.475114
10 RevolvingUtilizationOfUnsecuredLines continuous [-inf, 0.22, 0.49, 0.86, inf] (0.86, inf] 3622 12819 16441 0.479545 0.122147 0.146142 0.220303 1.367609 0.488779 1.071693 0.643205 -2.598254 -63.453483
11 age continuous [-inf, 35.0, 55.0, 62.0, inf] (-inf, 35.0] 1801 14288 16089 0.238448 0.136145 0.143013 0.111940 0.560433 0.057334 0.239843 0.462719 -2.598254 -18.706199
12 age continuous [-inf, 35.0, 55.0, 62.0, inf] (35.0, 55.0] 4061 45818 49879 0.537667 0.436582 0.443369 0.081417 0.208263 0.021052 0.239843 0.462719 -2.598254 -6.951426
13 age continuous [-inf, 35.0, 55.0, 62.0, inf] (55.0, 62.0] 898 17050 17948 0.118893 0.162463 0.159538 0.050033 -0.312225 0.013604 0.239843 0.462719 -2.598254 10.421482
14 age continuous [-inf, 35.0, 55.0, 62.0, inf] (62.0, inf] 793 27791 28584 0.104991 0.264810 0.254080 0.027743 -0.925134 0.147853 0.239843 0.462719 -2.598254 30.879240

各变量类型以及IV值

ls.col_type_iv
  • 1
col type iv
0 RevolvingUtilizationOfUnsecuredLines continuous 1.071693
1 age continuous 0.239843
2 NumberOfTime30-59DaysPastDueNotWorse continuous 0.737079
3 DebtRatio continuous 0.069471
4 MonthlyIncome continuous 0.076410
5 NumberOfOpenCreditLinesAndLoans continuous 0.073217
6 NumberOfTimes90DaysLate continuous 0.833081
7 NumberRealEstateLoansOrLines continuous 0.055378
8 NumberOfTime60-89DaysPastDueNotWorse continuous 0.570277
9 NumberOfDependents continuous 0.031616

连续变量的切分点

ls.col_continuous_cut_points
  • 1
[['RevolvingUtilizationOfUnsecuredLines', [-inf, 0.22, 0.49, 0.86, inf]],
 ['age', [-inf, 35.0, 55.0, 62.0, inf]],
 ['NumberOfTime30-59DaysPastDueNotWorse', [-inf, 0.0, 1.0, inf]],
 ['DebtRatio', [-inf, 0.41, 0.67, 2.66, inf]],
 ['MonthlyIncome', [-inf, 1297.0, 4838.0, 6596.0, inf]],
 ['NumberOfOpenCreditLinesAndLoans', [-inf, 2.0, 3.0, 13.0, inf]],
 ['NumberOfTimes90DaysLate', [-inf, 0.0, inf]],
 ['NumberRealEstateLoansOrLines', [-inf, 0.0, 1.0, 2.0, inf]],
 ['NumberOfTime60-89DaysPastDueNotWorse', [-inf, 0.0, inf]],
 ['NumberOfDependents', [-inf, 0.0, 1.0, 2.0, inf]]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

模型效果检验

ls.plot_roc_ks(data_train,ls.score_card)
  • 1
2019 16:15:18 INFO 预测用户分数中。。。
2019 16:15:18 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.37it/s]
2019 16:15:20 INFO 根据cut离散化连续变量完成!
  • 1
  • 2
  • 3
  • 4

在这里插入图片描述

ls.plot_roc_ks(data_test,ls.score_card)
  • 1
2019 16:15:21 INFO 预测用户分数中。。。
2019 16:15:21 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.97it/s]
2019 16:15:22 INFO 根据cut离散化连续变量完成!
  • 1
  • 2
  • 3
  • 4

在这里插入图片描述

预测

ls.predict_score_proba(data_test,ls.score_card)
  • 1
2019 16:15:22 INFO 预测用户分数中。。。
2019 16:15:22 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.99it/s]
2019 16:15:23 INFO 根据cut离散化连续变量完成!
  • 1
  • 2
  • 3
  • 4
NumberOfTime30-59DaysPastDueNotWorsescore NumberOfTime60-89DaysPastDueNotWorsescore NumberOfTimes90DaysLatescore RevolvingUtilizationOfUnsecuredLinesscore agescore score proba
0 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
1 20.950981 8.371879 14.746114 -63.453483 -18.706199 561.909292 0.112027
2 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
3 -34.654352 -60.419867 -87.743345 -27.475114 -6.951426 382.755896 0.601900
4 -73.763987 8.371879 14.746114 -63.453483 10.421482 496.322007 0.238491
5 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
6 20.950981 8.371879 -87.743345 -63.453483 -6.951426 471.174606 0.307389
7 20.950981 8.371879 14.746114 12.769650 -6.951426 649.887198 0.035921
8 20.950981 8.371879 14.746114 56.766152 10.421482 711.256608 0.015664
9 20.950981 8.371879 14.746114 -27.475114 -6.951426 609.642434 0.061116
10 20.950981 8.371879 14.746114 -27.475114 -6.951426 609.642434 0.061116
11 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
12 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
13 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
14 -34.654352 8.371879 14.746114 56.766152 10.421482 655.651276 0.033255
15 20.950981 8.371879 14.746114 -63.453483 10.421482 591.036974 0.077701
16 20.950981 8.371879 14.746114 -27.475114 -6.951426 609.642434 0.061116
17 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
18 20.950981 8.371879 -87.743345 -63.453483 -18.706199 459.419833 0.343125
19 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
20 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
21 20.950981 8.371879 -87.743345 -63.453483 -6.951426 471.174606 0.307389
22 -34.654352 8.371879 -87.743345 56.766152 30.879240 573.619574 0.096866
23 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
24 20.950981 8.371879 14.746114 12.769650 -6.951426 649.887198 0.035921
25 -73.763987 8.371879 14.746114 56.766152 30.879240 636.999398 0.042649
26 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
27 20.950981 8.371879 14.746114 -63.453483 30.879240 611.494731 0.059659
28 20.950981 8.371879 14.746114 12.769650 10.421482 667.260107 0.028452
29 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
... ... ... ... ... ... ... ...
37470 20.950981 8.371879 14.746114 12.769650 -6.951426 649.887198 0.035921
37471 -34.654352 8.371879 14.746114 12.769650 -6.951426 594.281865 0.074538
37472 20.950981 8.371879 -87.743345 -27.475114 -6.951426 507.152975 0.212299
37473 -73.763987 -60.419867 -87.743345 12.769650 -18.706199 372.136252 0.636593
37474 20.950981 8.371879 14.746114 56.766152 10.421482 711.256608 0.015664
37475 20.950981 8.371879 14.746114 56.766152 -18.706199 682.128926 0.023276
37476 20.950981 8.371879 14.746114 -63.453483 -6.951426 573.664065 0.096812
37477 20.950981 8.371879 14.746114 56.766152 -18.706199 682.128926 0.023276
37478 20.950981 8.371879 14.746114 56.766152 10.421482 711.256608 0.015664
37479 20.950981 8.371879 14.746114 -27.475114 -6.951426 609.642434 0.061116
37480 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
37481 -34.654352 8.371879 -87.743345 -63.453483 -6.951426 415.569274 0.489626
37482 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
37483 -73.763987 8.371879 14.746114 56.766152 -18.706199 587.413959 0.081378
37484 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
37485 20.950981 8.371879 14.746114 -63.453483 -18.706199 561.909292 0.112027
37486 20.950981 8.371879 14.746114 56.766152 -18.706199 682.128926 0.023276
37487 -73.763987 -60.419867 14.746114 -27.475114 -6.951426 446.135720 0.385743
37488 20.950981 8.371879 14.746114 56.766152 -18.706199 682.128926 0.023276
37489 20.950981 8.371879 -87.743345 -63.453483 -6.951426 471.174606 0.307389
37490 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
37491 20.950981 8.371879 14.746114 56.766152 -6.951426 693.883700 0.019845
37492 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
37493 -34.654352 8.371879 14.746114 56.766152 -6.951426 638.278367 0.041931
37494 20.950981 8.371879 14.746114 -27.475114 10.421482 627.015343 0.048671
37495 20.950981 8.371879 14.746114 12.769650 30.879240 687.717864 0.021578
37496 -34.654352 8.371879 14.746114 56.766152 -6.951426 638.278367 0.041931
37497 20.950981 8.371879 14.746114 12.769650 -6.951426 649.887198 0.035921
37498 20.950981 8.371879 14.746114 56.766152 30.879240 731.714366 0.011842
37499 20.950981 8.371879 14.746114 -27.475114 30.879240 647.473100 0.037099

37500 rows × 7 columns

补充-模型调试

变量IV值以及分箱分析

#默认决策树分箱
ls.plot_col_woe_iv(data,'age') 
  • 1
  • 2

png

cut_points cut_points_id 1_num 0_num total_num 1_pct 0_pct total_pct 1_rate woe iv total_iv
1 (-inf, 36.0] 0 2628 21237 23865 0.262118 0.151721 0.159100 0.110119 0.546753 0.060360 0.250005
0 (36.0, 55.0] 1 5177 58953 64130 0.516357 0.421171 0.427533 0.080727 0.203760 0.019395 0.250005
3 (55.0, 63.0] 2 1345 26409 27754 0.134151 0.188671 0.185027 0.048461 -0.341036 0.018593 0.250005
2 (63.0, inf] 3 876 33375 34251 0.087373 0.238437 0.228340 0.025576 -1.003921 0.151657 0.250005
# 手动分箱
ls.plot_col_woe_iv(data,'age',[-inf,20,30,40,inf])
  • 1
  • 2
C:\ProgramData\Anaconda3\lib\site-packages\CreditScoreModel\LogisticScoreCard.py:152: RuntimeWarning: divide by zero encountered in log
  result['woe'] = np.log(result['1_pct'] / result['0_pct'])  # WOE
  • 1
  • 2

png

cut_points cut_points_id 1_num 0_num total_num 1_pct 0_pct total_pct 1_rate woe iv total_iv
3 (-inf, 20.0] 0 0 1 1 0.000000 0.000007 0.000007 0.000000 0.000000 0.000000 0.0
2 (20.0, 30.0] 1 1244 9513 10757 0.124077 0.067963 0.071713 0.115646 0.601948 0.033778 0.0
1 (30.0, 40.0] 2 2390 21949 24339 0.238380 0.156808 0.162260 0.098196 0.418847 0.034166 0.0
0 (40.0, inf] 3 6392 108511 114903 0.637542 0.775223 0.766020 0.055630 -0.195529 0.026921 0.0
# 不输出具体数据
ls.plot_col_woe_iv(data,'age',[-inf,20,30,40,inf],return_data=False)
  • 1
  • 2
C:\ProgramData\Anaconda3\lib\site-packages\CreditScoreModel\LogisticScoreCard.py:152: RuntimeWarning: divide by zero encountered in log
  result['woe'] = np.log(result['1_pct'] / result['0_pct'])  # WOE
  • 1
  • 2

png

模型效果分析

#默认参数跑出的结果
col_result=ls.col_result
col_continuous_cut_points=[col for col in ls.col_continuous_cut_points if col[0] in ls.col_result]
data_new=data_train[ls.col_result+['y']]

score_card=ls.get_logistic_socre_card(data_new,col_continuous_cut_points)
ls.plot_roc_ks(data_new,score_card)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
2019 16:15:35 INFO 评分卡制作中。。。
2019 16:15:35 INFO 连续和离散变量划分中。。。
2019 16:15:35 INFO 连续和离散变量划分完成!
2019 16:15:35 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.50it/s]
2019 16:15:38 INFO 根据cut离散化连续变量完成!
2019 16:15:38 INFO WOE转换中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11.89it/s]
2019 16:15:38 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
2019 16:15:40 INFO 评分卡制作完成!
2019 16:15:40 INFO 预测用户分数中。。。
2019 16:15:40 INFO 根据cut离散化连续变量进行中。。。
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.41it/s]
2019 16:15:43 INFO 根据cut离散化连续变量完成!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

png

# 例如:选取全部变量,并按以下切分点分箱
col_result=['y', 'RevolvingUtilizationOfUnsecuredLines', 'age',
           'NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'MonthlyIncome',
           'NumberOfOpenCreditLinesAndLoans', 'NumberOfTimes90DaysLate',
           'NumberRealEstateLoansOrLines', 'NumberOfTime60-89DaysPastDueNotWorse',
           'NumberOfDependents']
col_continuous_cut_points=[['RevolvingUtilizationOfUnsecuredLines', [-inf, 0.22, 0.49, 0.86, inf]],
                         ['age', [-inf, 35.0, 55.0, 62.0, inf]],
                         ['NumberOfTime30-59DaysPastDueNotWorse', [-inf, 0.0, 1.0, inf]],
                         ['DebtRatio', [-inf, 0.41, 0.67, 2.66, inf]],
                         ['MonthlyIncome', [-inf, 1297.0, 4838.0, 6596.0, inf]],
                         ['NumberOfOpenCreditLinesAndLoans', [-inf, 2.0, 3.0, 13.0, inf]],
                         ['NumberOfTimes90DaysLate', [-inf, 0.0, inf]],
                         ['NumberRealEstateLoansOrLines', [-inf, 0.0, 1.0, 2.0, inf]],
                         ['NumberOfTime60-89DaysPastDueNotWorse', [-inf, 0.0, inf]],
                         ['NumberOfDependents', [-inf, 0.0, 1.0, 2.0, inf]]]
data_new=data_train[col_result]

score_card=ls.get_logistic_socre_card(data_new,col_continuous_cut_points)
ls.plot_roc_ks(data_new,score_card)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
2019 16:15:43 INFO 评分卡制作中。。。
2019 16:15:43 INFO 连续和离散变量划分中。。。
2019 16:15:43 INFO 连续和离散变量划分完成!
2019 16:15:43 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.49it/s]
2019 16:15:48 INFO 根据cut离散化连续变量完成!
2019 16:15:48 INFO WOE转换中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 12.17it/s]
2019 16:15:49 INFO WOE转换完成!
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
2019 16:15:54 INFO 评分卡制作完成!
2019 16:15:54 INFO 预测用户分数中。。。
2019 16:15:54 INFO 根据cut离散化连续变量进行中。。。
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.13it/s]
2019 16:15:58 INFO 根据cut离散化连续变量完成!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

在这里插入图片描述

补充-包的参数解释

def __init__(self,
         max_depth=None,  # 决策树的深度
         max_leaf_nodes=4,  # 决策树的子节点数
         min_samples_leaf=0.05,  # 分节点最小划分比例
         base_score=600,  # 基础分
         increase_score=50,  # 增加分(概率增加一份所增加的分数)
         min_iv=0.1,  # 特征筛选(选取IV值大于0.1的值)
         max_corr=0.6,  # 特征筛选(选取相关性小于0.6的值)
         C=0.01,  # 特征筛选(L1正则化系数)
         penalty='l1',  # 特征筛选(L1正则化)
         round_num=2  # 所有变量保留两位有效数
         ):

    # 参数选择
    self.max_depth = max_depth
    self.max_leaf_nodes = max_leaf_nodes
    self.min_samples_leaf = min_samples_leaf
    self.base_score = base_score
    self.increase_score = increase_score
    self.min_iv = min_iv
    self.max_corr = max_corr
    self.C = C
    self.penalty = penalty
    self.round_num = round_num

    # 保存变量结果
    self.col_type_iv = None  # 各变量类型以及IV值
    self.col_continuous_cut_points = None  # 连续变量的切分点
    self.col_result = None  # 最终评分卡选择的变量
    self.score_card = None  # 评分卡
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30






所属网站分类: 技术文章 > 博客

作者:yoyo

链接:https://www.pythonheidong.com/blog/article/10571/a5ffd035a625ee1bde8f/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

0 0
收藏该文
已收藏

评论内容:(最多支持255个字符)