A Primer in Machine Learning for Beginners

All you need to know about getting started with ML.

Unnati Shah
6 min readJun 5, 2023

The rise of AI has been largely driven by one tool in AI called Machine Learning. It is the science of getting computers to learn and act like humans do, and improve their learning over time in an autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.

Coursera “How Google does Machine Learning”

Coding has been the bread and butter for developers since the dawn of computing. We’re used to creating applications by breaking down requirements into composable problems that can then be coded against. Rules and data go in and answers come out. Rules are expressed in a programming language and data can come from a variety of sources from local variables all the way up to databases. Machine learning rearranges this diagram where we put answers in data in and then we get rules out.

Hubert Wang, Introduction to TensorFlow for AI, ML, and DL

So instead of us as developers figuring out the rules when should the brick be removed, when should the player’s life end, or what’s the desired analytic for any other concept, what we will do is we can get a bunch of examples for what we want to see and then have the computer figure out the rules. Ultimately machine learning is very similar but we’re just flipping the axes. So instead of us trying to express the problem as rules when often that isn’t even possible, we’ll have to compromise. The new paradigm is that we get lots and lots of examples and then have labels on those examples and we use the data to say this is what walking looks like, this is what running looks like, this is what biking looks like and yes, even this is what golfing looks like. So, then it becomes answers and data in with rules being inferred by the machine. A machine learning algorithm then figures out the specific patterns in each set of data that determines the distinctiveness of each. That’s what’s so powerful and exciting about this programming paradigm. It’s more than just a new way of doing the same old thing. It opens up new possibilities that were infeasible to do before.

The most commonly used type of machine learning is a type of AI that learns A to B or input to output mappings. This is called supervised learning. Now, the idea of supervised learning has been around for many decades. But it’s really taken off in the last few years. Why is this? In a lot of industries, the amount of data you have access to has really grown over the last couple of decades. Thanks to the rise of the Internet, the rise of computers. A lot of what used to be pieces of paper are now instead recorded on a digital computer. So, we’ve just been getting more and more data. ML is a way to get predictive insights from data to make repeated decisions.

Training, Validation, and Test Datasets

Since you are a beginner, you will need to know important terms like attributes, label, training set, validation set, test set. The table below will help you understand these terms in a better way.

Fig. 3

Two Stages of Machine Learning

Coursera “How Google does Machine Learning”

The first stage of ML is to train an ML model with examples. The form of machine learning that we’ll be focused on is called supervised learning. In supervised learning, we start from examples. An example consists of a label and an input. For example, suppose we want to train a machine learning model to look at images and identify what’s in those images. The true answer is called the label. So cat for the first image, and dog for the second image, those are the labels. The image itself, the pixels of the image are the input to the model. The model itself is a mathematical function of a form that can be applied to a wide variety of problems. There are many such mathematical functions. The models used in machine learning have a bunch of adjustable parameters though, all of them do.

Coursera “How Google does Machine Learning”

When we train a model, what we’re doing is that we’re making tiny adjustments to the model. So that the output of the model, the output of the mathematical function, is as close as possible to the true answer for any given input. Of course, we don’t do this on one image at a time. The idea is to adjust the mathematical function so that overall, the outputs of the model for the set of training inputs are as close as possible to the training labels. The key to making a machine learning model generalized is data, and lots and lots of it. Having labeled the data is a precondition for successful machine learning.

It is important to realize that machine learning has two stages, training and inference. Sometimes people refer to prediction as inference because prediction seems to imply a future state. In the case of images like this, we’re not really predicting that it’s a cat, just inferring that it’s a cat based on the pixel data.

Now that you have got a gist of what is supervised learning, here is a basic explanation of different types of learning algorithms you can use.

  1. Supervised Learning Algorithms — Infer a function from labeled training data
  • Used to predict target attribute/label
  • Needs datasets with target attribute values
  • Supervised Learning models can be further grouped into
    Classification Models — Types of problems where the output variable is a category such as red, blue, or disease, non-disease.
    Regression Models — Type of problems where an output variable is a real number.

2. Unsupervised Learning Algorithms — Find hidden structure in unlabeled data

  • Does not have datasets with target attribute values
  • Used to find the pattern among the input datasets
  • Unsupervised Learning problems can be further grouped into
    Clustering — A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
    Association — An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

3. Semi-supervised Learning Algorithms — Some data labeled, lots of data unlabeled

  • Semi-supervised models are used in the cases where some of the input data is labeled and some of them are not.
  • Example: Photo archives where only some of the images are labeled.

Machine Learning is complex. For newbies, starting to learn Machine Learning can be painful if they don’t have the right resources to learn from. I hope this article will help you learn and understand AI and ML in a better way. Becoming a pro in AI is not far away!!!

References

[1] https://www.coursera.org/lecture/ai-for-everyone/machine-learning-5TPFo

--

--

Unnati Shah

Data enthusiast, currently pursuing MS in Computer Science @ USC. For more information visit my website: https://unnatibshah.github.io/