Machine Learning for beginners

I am planning to start my career in Datascience and want to learn what are  machine learning models and algorithms.
I have googled Before starting to learn and collected some of the points and wanted to share with you what i understand ,
we should understand  following main steps in analysis using machine learning models:

1. Diagnosing the data – before defining the possible approaches to work with data,
it is necessary to analyse the raw data itself first. What kind of measurements are
included,  which kind of models it is possible to apply to the data and
defining the initial goal of the research.
Try and identify all the metrics that are important to the
business. The metrics we are optimising for have a profound
effect on the solution we choose, so it is important to identify
these early on. It also affects what alternatives there are to
machine learning.

2. Data Preparation – merging data, imputing missing values or excluding variables
with too many missing values, sorting data, etc.

3. Model Training – actually training the models and analyzing data
4. Results Evaluation – an important stage of the results understanding, which makes
possible adjustment of the models and correction of the initial research plan


Additionally, it is worth defining and explaining the main types of models one can apply:
1. Supervised learning - these are methods where a given set of independent
variables are to be matched to one or more dependent variables. During this kind
of analysis, model is given a “labled data”, where it can find the real values of the
parameter it is working with for some certain measurement and values of other
parameters for the same measurement, thus it can fit a function. These can be
regression tasks (working with continuous values) and classification tasks
(working with class labeled data)

Supervised learning is useful in cases where a property (label) is available for a certain dataset (training set), but is missing and needs to be predicted for other instances.

Most widely used supervised learning algorithms are

1. Decision Trees:
2. Naïve Bayes Classification:
3. Ordinary Least Squares Regression:
4. Logistic Regression:
5. Support Vector Machines:
6. Ensemble Methods:


2. Unsupervised learning - in contrast, with unsupervised methods there is no prior
“correct” data and the purpose of this kind of analysis is to search for the
underlying patterns in the data

Unsupervised learning is useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset (items are not pre-assigned).

1. Clustering Algorithms:
2. Principal Component Analysis:
3. Singular Value Decomposition:
4. Independent Component Analysis:


3. Optimization - techniques for finding the optimal set of parameters which
minimize a pre-defined cost function



Comments

Popular posts from this blog

SSAS : A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. (Microsoft.AnalysisServices.AdomdClient). + Solution

SSIS Error : Unable to infer the XSD from the XML file. The XML contains multiple namespaces

SSRS : [Teradata Database] 3939 There is a mismatch between the number of parameters specified and the number of parameters required.