Upstage_Last one mile between textbook ML and its real applications review

카테고리 없음

Upstage_Last one mile between textbook ML and its real applications review

Hardy. 2021. 5. 11. 12:19

좋은 배움의 기회를 주신 upstage 관계자 분들 , 정현준 박사님에게 감사드립니다. 05/11 11am ~

Section 1. Career

Apple

; ML 이 endpoint 에 들어가는건 드물지만 product 에서의 구현

Nike

; focus on product transformation ( Offline -> Online , how to give the good experience to customer?)

Amazon

providing most relevant items through ML-based advertisements.

; CTR prediction, CVR prediction

* Large scale ML models handling

Section 2. TextbookAI vs. Real Applications

- What ?

Problem

Biz KPI

- Label

데이터에 대한 고찰

- Model

reusable and scalable

interpretable

- Evaluation

metrics

measure

- Serving

ML System Development

Problem Definition

- Understand a business problem and transform it to a ML problem.

- Find the intersection of user needs and AI capabilities

- Automation Vs. Augmentation

- Business KPI vs. ML metrics

Align , 상관성이 있는것에 대해 잘 탐구해 보기.(best practice , reference , history)

What , Why , How

<->

Problem , Goals , KPI

Practical Tips ; Whenever we solve a problem, please imagine a realistic scenario.

( https://mlcontests.com )

정제된 데이터에서의 performance도 좋으나 real product에서의 noise 데이터 등 여러 문제상황들이 발생할 때의 대처법도 중요함.

Data Acquisition

- ETL

대다수 경우, 데이터가 많을 때 모델에 치중하기보다 data quality 를 선택하는것이 효율적임.

- Build data products not only for ML but also data analysis and business reporting.

Practical Tips ;

- Data Engineering is as important as ML.

- Understanding what we need and how it is instrumented. (DS 와 DE 는 협업하여야한다. 초기 지정해놓은 schema 가 이후 modeling 할 때 어긋나는 경우가 빈번하게 발생하였음.)

- Data Productionalization ( Airflow, Snowflake)

- Data Quality Validation and MOnitoring.

Data Analysis

Provide insights and investigate the feasibility of a problem by understanding a given data.

- Consider contextual perspectives such as time and space.

- Consider a difference between single point estimates vs. ranged estimates.

단순 하나하나의 값보다 , 그 값이 무언가의 데이터의 결과값으로 존재할 것이다 라는 의문으로 여러 팩터들을 토대로 가정사실을 풀어가다보면 good ml modeling , feature engineering 으로 나아갈 수 있음.

- Only small parts of data are labeled while the most of the data are unlabeld.

-> Annotate it ? or Derive it?

-> Explicit label vs. Implicit label

Practical Tips ;

Prepare your own toolkits for story telling

Language ; R Python Scala Hive and etc.

Visualization ; D3 Plotly Tableau

Top-down vs. Bottom-up Data Analysis

Privacy, Subjectivity, Bias, Imbalance

Reward modeling, interpretability

Data Analysis and Understanding

Key task in Data Analysis

- EDA , Confirmatory Data Analysis

- Metric Design , Golden-set Generation , Feature Engineering

ML Modeling

- modeling 은 high-level (sota) 부터 고려하기 보다 가장 기본적인 모델(interpretability)가능한 RF 등으로 부터 시작하는 것을 권고함.

Model Verification

- metric 에 대한 끊임없는 고민을 해야함. 남들이 모두 O을 사용한다 해서 나도 O을 사용한다? 는 생각은 지양하길.

대신 문제정의 딴에서 생각한 kpi를 검증할 수 있는 evaluation metric인가에 대해 끊임없는 고민을 해야함.

- 대다수가 Quanitity research 에 치중하고 있으나 Qualitative research 에도 생각을 해야함.

(ex. 특정 상황에서의 model underperform 등등)

Deployment / Online Testing

- Distributed ML model serving

- Define a clear hypothesis

- Understand how many variations exist in the experiment

- Multi-armed bandits

! Iterative Process !

한 번의 시도로 성공을 하는것을 기대치 말아라 !

developer.apple.com/design/human-interface-guidelines/machine-learning/overview/introduction/

- implicit / explicit 등 ML 에 대해 전반적인 overview를 볼 수 있음.

research.fb.com/blog/2018/05/the-facebook-field-guide-to-machine-learning-video-series/

Introducing the Facebook Field Guide to Machine Learning video series - Facebook Research

The Facebook Field Guide to Machine Learning is a six-part video series developed by the Facebook ads machine learning team.…

research.fb.com

- practice learn video series 있음.

** Understanding How we work? **

각각의 role 에 대해 이해를 토대로 다른 사람들의 role을 공감하고자 해야함.

** A Good ML / Data Scientist or Engineer? **

from ; seminar

** Collaboration and Communitcation **

- pros and cons 가 있다.

** vision **

- High quality data is commercialized and hard to access from public

- Models and basic ML infra are increasingly commoditized

sota 알고리즘보다 ml engineering 직무인 library를 실생활 문제에 어떻게 ! 잘 ! 적용할것인가에 대한 니즈가 늘어날거라 생각됨.

- Human and AI interaction is gaining a significant amount of attention

현재글Upstage_Last one mile between textbook ML and its real applications review

Graph for others.

- All graph knowledge is here. welcome graph beginner

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Graph for others.