Machine Learning

Machine Learning - "ability to learn without being explicity programmed" according to Arthur Samuel. Machine learning algorithms are able making data-driven prediction or decisions. ML allows you to uncover "hiden insight" through learning from historical relationship and trends in the data.

ML tasks depending of the learning "signal" or "feedback" available to a learning system:

  • Supervised learning (Develop Predictive Model based on both input and output data)
  • Unsupervised learning (Discover an internal representation from input data only)
  • Reinforcement learning
Machine Learning Sample Applications:
  • computer vision
  • OCR - Optical Character Recognition
  • detection of network intruders
  • detection of malicious insider working towards data breach
  • email filtering
Machine Learning Types
SUPERVISED LEARNING
Classification
(assigning to the class - Labeling Data, the previous data are used to predict a category/class).

Classification is based on identification and categorization (subset) of a new observation on the basis of a training set of data whose category membership is known, e.g.: emails can be assign to the class "spam" or "non-spam", blod type can be assign to the class "A", "B", "AB" or "0", image can be assign to the class "car", "bus" and so on. So we have some previous data that are assign to the specyfic classes and some new one that we want to assign - this is called supervised learning in ML terminology (training set correctly labeled is available). Classifier is an algorithm that implements classification (maps input data to a specyfic category) . 
ML Classification Algorithms:
  • DecisionTrees
  • Random Forest
  • K-Nearest Neighbors (KNN)
  • Support Vector Machine (SVM)
  • Logistic Regression
  • Naive-Bayes
Regression (Linear Regression - Linear Model, the output variable takes continuous values/numeric)
Linear Regression is a model that assumes a linear relationship between the input variables (x) and the single output variable (y), in other worlds (y) can be calculated from a linear combination of the input varaiables (x). Simple Linear regression is given by equation: y=B0+B1*x, where B0 and B1 are coefficients. Learning process of LR model means estimating the values of the coefficients (B0 and B1) with the data that we have available. To estimate the values of coefficients we usually use one of the following methods:
  • Ordinary Least Squares (minimize the sum of the squared errors/residuals)
  • Gradient Descent (iteratively minimizing the error of the model on the training data)
  • Regularization (Lasso or Ridge Regression, it combine Least Square and reduction of the complexity of the model)
Ranking (Learning To Rank (LTR))
LTR solves a ranking problem on a list of items - put them in optimal order like in the search engine. 

UNSUPERVISED LEARNING
Clustering
(Cluster Analysis - grouping objects/input data into subsets called Clusters)
Objects in a given cluster are more similar (small distance) to each other than those in other clusters/groups.  

ML Clustering Algorithms (best known):
  • Centroid Based: 
    • K-means
    • Gaussian Mixture Models (GMM)
    • Fuzzy C-mean
  • Connectivity Based:
    • All Hierarchical
  • Probabilistic:
    • Latent Dirichlet Allocation (LDA)
  • Density Based:
    • Density Based Spatial Clustering (DBSCAN)
    • Ordering Points To Identify the Clustering Structure (OPTICS)
  • Dimensionality Reduction:
    • Principal Component Analysis (PCA)
    • Kernel PCA (KPCA)
  • Neural Networks/Deep Learning
Segmentation (e.g. Image Segmentation)
Image Segmentation 
is the process of imige division into multiple segments (sets of pixels) that are simpler to analysis.
Image Segmentation Algorithms:
  • Spectrally Based:
    • Thresholding based
    • Suport Vector Machines
  • Spatialy Based
    • Morphorogical
      • Watershed
      • Morpological Profiles
    • Graph Based
      • Optimal Spanning Forest
      • Normalized Cuts
REINFORCEMENT LEARNING (Learns to react to an environment, approximate dynamic programming).
It is focus on-line performance and finding the balance between exploration of uncharted territory and exploitation of current knowledge. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the Markov Decision Process (MDP) and they target large MDPs where exact methods become infeasible. Reinforcement Learning has been applied successfully to various problems:
  • long-term versus short-term reward trade-off
  • robot control
  • telecomunication
  • elevator scheduling
  • Go - AlphaGo