Machine learning is a subfield of computer science where computers have the ability to learn without being explicitly programmed. List of the machine learning task:
- Supervised: Approximation where the all data is labeled and the algorithms learn to predict the output from the input data (training, cross validation and testing sets). We have two types of supervised problems:
- Regression: When the output variable is a real value, such as “dollars” or “age”.
- Classification: When the output variable is a category, such as “cat” or “dog” or “Tumor benign” and “Tumor malignant”.
Regression vs Classification
- Unsupervised: Description where the all data is unlabeled and the algorithms learn to inherent structure from the input data.
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Sometimes some of our data is labeled. However, most of it is unlabeled. These problems sit in between both supervised and unsupervised learning. Therefore, we can use both learning techniques to make predictions.
Let’s talk about the simplest machine learning algorithm (K-nearest neighbors) and one of the oldest (Perceptron) machine learning algorithm. Most of the Machine learning algorithms requires us to build a model but not all of them require a model. Good example would be the k-nearest neighbors (KNN) algorithm where it doesn’t require to build a model, make assumptions, tune parameters.
In pattern recognition, the k-nearest neighbors algorithm (KNN) is a non-parametric method used for classification and regression, which are a subset of supervised learning. In both cases, the input consists of the k closest training examples in the feature space. KNN uses the standard Euclidian distance to define nearest neighbors. In mathematics, the Euclidean distance (Euclidean metric) is the “ordinary” (i.e. straight-line) distance between two points in Euclidean space.
In KNN classification, a data is classified by a majority vote of its k nearest neighbors where the k is small integer. For instance, if k = 1, then the object is simply assigned to the class of that single nearest neighbor. In KNN regression, the output is the property value where the value is the average of the values of its k nearest neighbors.
- Simple to implement.
- Works well in practice.
- Does not require to build a model, make assumptions, tune parameters.
- Can be extended easily with news examples.
- Requires large space to store the entire training dataset.
- Slow! Given n examples and d features. The method takes O(n × d) to run.
- Suffers from the curse of dimensionality.
Perceptron learning algorithm is one of the oldest and simplest Linear classification method where it belongs to Neural Networks class of algorithms. It works perfectly if data is linearly separable. If not, it will not converge. The idea is to start with a random hyperplane and adjust it using your training data using the iterative method.
The Perceptron learning algorithm can represent many Boolean functions such as AND, OR, NAND, NOR, NOT but not all of them (e.g., XOR). NN Neural networks use the ability of the perceptrons to represent elementary functions and combine them in a network of layers of elementary questions. However, a cascade of linear functions is still linear!
To sum up; the Perceptron learning algorithm is a model of a single neuron that can be used for the classification problems (2-class problems). Here is the python implementation of Perceptron learning algorithm.