Question
Direction
Data
Model
Cost Function
Optimization
Tuning
Results and Benchmarking
Scaling
Deployment and Operationalisation
Infrastructure

Question

Classification - Is this A or B?
Regression - How much, or how many of these?
Anomaly Detection 异常检测 - Is this anomalous?

Clustering - How can these elements be grouped?
Reinforcement Learning - What should I do now?

Direction

SaaS - Pre-built Machine Learning models

Google Cloud

Vision API
Speech API
Jobs API
Video Intelligence API
Language API
Translation API

AWS

Rekognition
Lex
Polly

…and many others

Data Science and Applied Machine Learning

Google Cloud: ML Engine
AWS: Amazon Machine Learning
Tools: Jupiter / Datalab / Zeppelin
… many others

Machine Learning Research

Tensorflow
MXNet
Torch
… many others

Data

Find - Collect - Explore - Clean Features - Impute Features - Engineer Features - Select Features - Encode Features - Build Datasets

Model

Select Algorithm based on question and data available.

Cost Function

The cost function will provide a measure of how far my algorithm and its parameters are from accurately representing my training data.

Sometimes referred to as Cost or Loss function when the goal is to minimise it, or Objective function when the goal is to maximise it.

Optimization

Having selected a cost function, we need a method to minimise the Cost function, or maximise the Objective function. Typically this is done by Gradient Descent or Stochastic Gradient Descent.

Tuning

Different Algorithms have different Hyperparameters, which will affect the algorithms performance. There are multiple methods for Hyperparameter Tuning, such as Grid and Random search.

Results and Benchmarking

Analyse the performance of each algorithms and discuss results.

Are the results good enough for production?

Is the ML algorithm training and inference completing in a reasonable timeframe?

Scaling

How does my algorithm scale for both training and inference?

Deployment and Operationalisation

How can feature manipulation be done for training and inference in real-time?

How to make sure that the algorithm is retrained periodically and deployed into production?

How will the ML algorithms be integrated with other systems?

Infrastructure

Can the infrastructure running the machine learning process scale?

How is access to the ML algorithm provided? REST API? SDK?

Is the infrastructure appropriate for the algorithm we are running? CPU’s or GPU’s?

机器学习过程