좋은 배움의 기회를 주신 AI upstage 관계자분들 , 박찬성님에게 감사드립니다.
Why building ML Pipeline?
- Data productive <-> Rule based (Refresh term)
-> Languages keep changing over time
Hardware/Software Infrastructure
- GKE (Kubernetes Engine)
Control Pipeline Workflow
- AI Platform (Pipeline)
Define + Inject Pipeline Workflow
- TF X (Pipeline Component Chaining)
Truner , Trainer 만 tunning 나머지 앞전의 ExampleGen ~ Transform 까지는 정형화 되어있어서 그 standard 에 맞게 설정하면 됨.
Job Allocation to Other CGP Products
GKE ->
1) Dataflow_Data Injection
2) AI Platform(Training/Tuner)_Model Training
3) AI Platform(Prediction)_Model Serving
Data Location
1) BIgQuery 2) Cloud Storage(Bucket version control)
Triggering (Data)
1) Cloud Function 2) Cloud Build (github, GCP repo)
above things can automated ML pipeline
What's more?
- Easy to scale up / down
- GCS comes with Data Version Control
- AI Platform is able to handle Model Version Control
...etc)
- $300 free credit ( enough to pipeline... "unit tests" build not training ; not heavy working )
Incrementally move from Local to Cloud Multiple Ways to "TFX" pipeline
3 ways to create Custom Components
- Python function with decorator
- Components using containers
- Extending existing component classes
TFX + Model card Toolkit
- end-to-end schema
The easiest way to get started
- TFX example notebooks for executing a component by component
- TFX CLI for creating a template project
references
- TFX Web
- TFX Youtube
- GCP Products
- Toy Project Repo
Summarize
- GCP has a pipeline ease tools getting to deploy the contents and unit tests (ex. TFX)
-