Choose Your ML Problems

[CodesSates] AI 부트캠프

Choose Your ML Problems

웅탈 2021. 4. 22. 22:51

Choose Your ML Problems

Data Scientist process

1. 비즈니스 문제

- 실무자들과 대화 통해 문제 발견

2. 데이터 문제

- 문제와 관련된 데이터 발견

3. 데이터 문제 해결

- 데이터 처리, 시각화

- 머신러닝/통계

4. 비즈니스 문제 해결

- 데이터 문제 해결 통해 실무자들과 함께 비즈니스 문제 해결

정보 누수

- 분류 문제에서 target 비율이 70%이상 차이 날 경우 불균형 - > 정밀도, 재현율, RIC curve, AUC 같이 사용

# classification_report()로 확인 가능
from sklearn.metrics import classification_report
y_pred = pipe.predict(X_val)
print(classification_report(y_val, y_pred))


              precision    recall  f1-score   support

       False       0.84      0.98      0.91       302
        True       0.17      0.02      0.03        57

    accuracy                           0.83       359
   macro avg       0.50      0.50      0.47       359
weighted avg       0.73      0.83      0.77       359


# AUC
from sklearn.metrics import roc_auc_score

y_pred_proba = pipe.predict_proba(X_val)[:, -1]
print('AUC score: ', roc_auc_score(y_val, y_pred_proba))

- 불균형 클래스일 경우 class_weight 파라미터로 가중치 설정 가능(or 'balance')

로그 변환

- 회귀의 경우 target의 분포가 한쪽으로 치우친 형태일 가능성 존재(정규 분포 형태일수록 좋은 성능)

log1p: ln(1 + x)
expm1: exp(x) - 1, the inverse of log1p.

저작자표시

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

Interpreting ML Model (0)	2021.04.23
Feature Importances (0)	2021.04.22
Model Selection (0)	2021.04.15
Evaluation Metrics for Classification (0)	2021.04.14
Random Forests (0)	2021.04.14

현재글Choose Your ML Problems

2_H_J

코드스테이츠 #AI부트캠프,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

2_H_J

Choose Your ML Problems

Choose Your ML Problems

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

'[CodesSates] AI 부트캠프'의 다른글

티스토리툴바

Choose Your ML Problems

Choose Your ML Problems

'[CodesSates] AI 부트캠프' 카테고리의 다른 글

'[CodesSates] AI 부트캠프'의 다른글

관련글

티스토리툴바