Types
Regression
A supervised problem, the outputs are continuous rather than discrete.
Classification
Inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled in a supervised way.
Clustering
A set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task.
Density Estimation
Finds the distribution of inputs in some space.
Dimensionality Reduction
Simplifies inputs by mapping them into a lower-dimensional space.
Kind
Rarametric
Step 1: Making an assumption about the functional form or shape of our function (), i.e.:
is linear, thus we will select a linear model.
Step 2: Selecting a procedure to fit or train our model. This means estimating the Beta parameters in the linear function. A common approach is the (ordinary) least squares, amongst others.
Non-Parametric
When we do not make assumptions about the form of our function (). However, since these methods do not reduce the problem of estimating
to a small number of parameters, a large number of observations is required in order to obtain an accurate estimate for
. An example would be the thin-plate spline model.
Categories
Supervised
The computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs.
Unsupervised
No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
Reinforcement Learning
A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). The program is provided feedback in terms of rewards and punishments as it navigates its problem space.
Approaches
Decision tree learning
Association rule learning
Artificial neural networks
Deep learning
Inductive logic programming
Support vector machines
Clustering
Bayesian networks
Reinforcement learning
Representation learning
Similarity and metric learning
Sparse dictionary learning
Genetic algorithms
Rule-based machine learning
Learning classifier systems
Taxonomy
Generative Methods
Model class-conditional pdfs and prior probabilities. “Generative” since sampling can generate synthetic data points.
Popular models:
- Gaussians, Naïve Bayes, Mixtures of multinomials
- Mixtures of Gaussians, Mixtures of experts, Hidden Markov Models (HMM)
- Sigmoidal belief networks, Bayesian networks, Markov random fields
Discriminative Methods
Directly estimate posterior probabilities. No attempt to model underlying probability distributions. Focus computational resources on given task– better performance.
Popular models:
- Logistic regression, SVMs
- Traditional neural networks, Nearest neighbor
- Conditional Random Fields (CRF)
Selection Criteria
Prediction Accuracy vs Model Interpretability.
There is an inherent tradeoff between Prediction Accuracy and Model Interpretability, that is to say that as the model get more flexible in the way the function (f) is selected, they get obscured, and are hard to interpret. Flexible methods are better for inference, and inflexible methods are preferable for prediction.