Types

Regression

A supervised problem, the outputs are continuous rather than discrete.

Classification

Inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled in a supervised way.

Clustering

A set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task.

Density Estimation

Finds the distribution of inputs in some space.

Dimensionality Reduction

Simplifies inputs by mapping them into a lower-dimensional space.

Kind

Rarametric

Step 1: Making an assumption about the functional form or shape of our function (Types - 图1), i.e.: Types - 图2 is linear, thus we will select a linear model.

Step 2: Selecting a procedure to fit or train our model. This means estimating the Beta parameters in the linear function. A common approach is the (ordinary) least squares, amongst others.

Non-Parametric

When we do not make assumptions about the form of our function (Types - 图3). However, since these methods do not reduce the problem of estimating Types - 图4 to a small number of parameters, a large number of observations is required in order to obtain an accurate estimate for Types - 图5. An example would be the thin-plate spline model.

Categories

Supervised

The computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs.

Unsupervised

No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

Reinforcement Learning

A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). The program is provided feedback in terms of rewards and punishments as it navigates its problem space.

Approaches

Decision tree learning

Association rule learning

Artificial neural networks

Deep learning

Inductive logic programming

Support vector machines

Clustering

Bayesian networks

Reinforcement learning

Representation learning

Similarity and metric learning

Sparse dictionary learning

Genetic algorithms

Rule-based machine learning

Learning classifier systems

Taxonomy

Generative Methods

Model class-conditional pdfs and prior probabilities. “Generative” since sampling can generate synthetic data points.

Popular models:

  • Gaussians, Naïve Bayes, Mixtures of multinomials
  • Mixtures of Gaussians, Mixtures of experts, Hidden Markov Models (HMM)
  • Sigmoidal belief networks, Bayesian networks, Markov random fields

Discriminative Methods

Directly estimate posterior probabilities. No attempt to model underlying probability distributions. Focus computational resources on given task– better performance.

Popular models:

  • Logistic regression, SVMs
  • Traditional neural networks, Nearest neighbor
  • Conditional Random Fields (CRF)

Selection Criteria

Prediction Accuracy vs Model Interpretability.

There is an inherent tradeoff between Prediction Accuracy and Model Interpretability, that is to say that as the model get more flexible in the way the function (f) is selected, they get obscured, and are hard to interpret. Flexible methods are better for inference, and inflexible methods are preferable for prediction.