카테고리 없음

Upstage_GCP + ML pipeline seminar review

Hardy. 2021. 3. 9. 12:25

 

좋은 배움의 기회를 주신 AI upstage 관계자분들 , 박찬성님에게 감사드립니다.

 

Why building ML Pipeline?

- Data productive <-> Rule based (Refresh term)

-> Languages keep changing over time

 

 

 

https://ml-ops.org/content/end-to-end-ml-workflow

 

https://ml-ops.org/content/mlops-principles

 

Hardware/Software Infrastructure

- GKE (Kubernetes Engine)

 

 

Control Pipeline Workflow

- AI Platform (Pipeline)

 

Define + Inject Pipeline Workflow

- TF X (Pipeline Component Chaining)

Truner , Trainer 만 tunning 나머지 앞전의 ExampleGen ~ Transform 까지는 정형화 되어있어서 그 standard 에 맞게 설정하면 됨.

 

Job Allocation to Other CGP Products

GKE ->

1) Dataflow_Data Injection

2) AI Platform(Training/Tuner)_Model Training

3) AI Platform(Prediction)_Model Serving

 

Data Location

1) BIgQuery 2) Cloud Storage(Bucket version control)

 

Triggering (Data)

1) Cloud Function 2) Cloud Build (github, GCP repo)

 

above things can automated ML pipeline

 

What's more?

 

- Easy to scale up / down

- GCS comes with Data Version Control

- AI Platform is able to handle Model Version Control

...etc)

 

- $300 free credit ( enough to pipeline... "unit tests" build not training ; not heavy working )

 

Incrementally move from Local to Cloud Multiple Ways to "TFX" pipeline

 

3 ways to create Custom Components

 

- Python function with decorator

- Components using containers

- Extending existing component classes

 

TFX + Model card Toolkit

- end-to-end schema 

 

The easiest way to get started

- TFX example notebooks for executing a component by component

- TFX CLI for creating a template project

 

references

- TFX Web

- TFX Youtube

- GCP Products

- Toy Project Repo

 

Summarize

- GCP has a pipeline ease tools getting to deploy the contents and unit tests (ex. TFX)