Evaluation Metrics for Classification

[CodesSates] AI 부트캠프

Evaluation Metrics for Classification

웅탈 2021. 4. 14. 23:47

Evaluation Metrics for Classification

Confusion matrix

from sklearn.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
pcm = plot_confusion_matrix(pipe, X_val, y_val,
                            cmap=plt.cm.Blues,
                            ax=ax);
                            
 # 메트릭스 확인
 pcm.confusion_matrix

정확도(Accuracy)는 전체 범주를 모두 바르게 맞춘 경우를 전체 수로 나눈 값입니다: TP+TN / Total
정밀도(Precision)는 Positive로 예측한 경우 중 올바르게 Positive를 맞춘 비율입니다: TP / TP+FP
재현율(Recall, Sensitivity)은 실제 Positive인 것 중 올바르게 Positive를 맞춘 것의 비율 입니다: TP / TP+FN
F1점수(F1 score)는 정밀도와 재현율의 조화평균(harmonic mean)입니다
1종 오류 (FP) : 남성한테 당신 임신했다 라고 잘못 진단
2종 오류 (FN) : 임신한 여성에게 임신하지 않았다고 잘못 진단

- 다루는 문제에 따라 Precision, recall 중 어떤 지표를 우선시 해야할지 판단해야함

-sklearn.metrics.classification_report 를 사용하면 Precision, recall 확인 가능(1을 postive로 보고 판단)

from sklearn.metrics import classification_report
print(classification_report(y_val, y_pred))






              precision    recall  f1-score   support

           0       0.76      0.80      0.78      7680
           1       0.75      0.70      0.72      6372

    accuracy                           0.75     14052
   macro avg       0.75      0.75      0.75     14052
weighted avg       0.75      0.75      0.75     14052

ROC, AUC

- ROC curve, AUC를 사용하면 분류문제에서 여러 임계값 설정에 대한 모델의 성능을 구할 수 있음

- 재현율은 최대화 하고 위양성률은 최소화 하는 임계값이 최적의 임계값

# 사이킷런 roc_curve는 임계값에 따른 TPR, FPR 수치를 자동으로 계산

from sklearn.metrics import roc_curve

# roc_curve(타겟값, prob of 1)
fpr, tpr, thresholds = roc_curve(y_val, y_pred_proba)

roc = pd.DataFrame({
    'FPR(Fall-out)': fpr, 
    'TPRate(Recall)': tpr, 
    'Threshold': thresholds
})

- AUC 는 ROC curve의 아래 면적

from sklearn.metrics import roc_auc_score
auc_score = roc_auc_score(y_val, y_pred_proba)
auc_score

저작자표시

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

Choose Your ML Problems (0)	2021.04.22
Model Selection (0)	2021.04.15
Random Forests (0)	2021.04.14
Decision Trees (0)	2021.04.12
Logistic Regression (0)	2021.04.12

현재글Evaluation Metrics for Classification

코드스테이츠 #AI부트캠프,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

2_H_J

Evaluation Metrics for Classification

Evaluation Metrics for Classification

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

'[CodesSates] AI 부트캠프'의 다른글

티스토리툴바

Evaluation Metrics for Classification

Evaluation Metrics for Classification

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

'[CodesSates] AI 부트캠프'의 다른글

관련글

티스토리툴바