Machine Learning with PyTorch and Scikit-Learn - Sebastian Raschka - E-Book

Machine Learning with PyTorch and Scikit-Learn E-Book

Sebastian Raschka

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Machine Learning with PyTorch and Scikit-Learn is a comprehensive guide to machine learning and deep learning with PyTorch. It acts as both a step-by-step tutorial and a reference you'll keep coming back to as you build your machine learning systems.

Packed with clear explanations, visualizations, and examples, the book covers all the essential machine learning techniques in depth. While some books teach you only to follow instructions, with this machine learning book, we teach the principles allowing you to build models and applications for yourself.

Why PyTorch?

PyTorch is the Pythonic way to learn machine learning, making it easier to learn and simpler to code with. This book explains the essential parts of PyTorch and how to create models using popular libraries, such as PyTorch Lightning and PyTorch Geometric.

You will also learn about generative adversarial networks (GANs) for generating new data and training intelligent agents with reinforcement learning. Finally, this new edition is expanded to cover the latest trends in deep learning, including graph neural networks and large-scale transformers used for natural language processing (NLP).

This PyTorch book is your companion to machine learning with Python, whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 1065

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Machine Learning with PyTorch and Scikit-Learn

Develop machine learning and deep learning models with Python

Sebastian Raschka

Yuxi (Hayden) Liu

Vahid Mirjalili

BIRMINGHAM—MUMBAI

“Python” and the Python Logo are trademarks of the Python Software Foundation.

Machine Learning with PyTorch and Scikit-Learn

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Producer: Tushar Gupta

Acquisition Editor – Peer Reviews: Saby Dsilva

Project Editor: Janice Gonsalves

Content Development Editor: Bhavesh Amin

Copy Editor: Safis Editing

Technical Editor: Aniket Shetty

Proofreader: Safis Editing

Indexer: Tejal Daruwale Soni

Presentation Designer: Pranit Padwal

First published: February 2022

Production reference: 5151122

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-80181-931-2

www.packt.com

Foreword

Over recent years, machine learning methods, with their ability to make sense of vast amounts of data and automate decisions, have found widespread applications in healthcare, robotics, biology, physics, consumer products, internet services, and various other industries.

Giant leaps in science usually come from a combination of powerful ideas and great tools. Machine learning is no exception. The success of data-driven learning methods is based on the ingenious ideas of thousands of talented researchers over the field’s 60-year history. But their recent popularity is also fueled by the evolution of hardware and software solutions that make them scalable and accessible. The ecosystem of excellent libraries for numeric computing, data analysis, and machine learning built around Python like NumPy and scikit-learn gained wide adoption in research and industry. This has greatly helped propel Python to be the most popular programming language.

Massive improvements in computer vision, text, speech, and other tasks brought by the recent advent of deep learning techniques exemplify this theme. Approaches draw on neural network theory of the last four decades that started working remarkably well in combination with GPUs and highly optimized compute routines.

Our goal with building PyTorch over the past five years has been to give researchers the most flexible tool for expressing deep learning algorithms while taking care of the underlying engineering complexities. We benefited from the excellent Python ecosystem. In turn, we’ve been fortunate to see the community of very talented people build advanced deep learning models across various domains on top of PyTorch. The authors of this book were among them.

I’ve known Sebastian within this tight-knit community for a few years now. He has unmatched talent in easily explaining information and making the complex accessible. Sebastian contributed to many widely used machine learning software packages and authored dozens of excellent tutorials on deep learning and data visualization.

Mastery of both ideas and tools is also required to apply machine learning in practice. Getting started might feel intimidating, from making sense of theoretical concepts to figuring out which software packages to install.

Luckily, the book you’re holding in your hands does a beautiful job of combining machine learning concepts and practical engineering steps to guide you in this journey. You’re in for a delightful ride from the basics of data-driven techniques to the most novel deep learning architectures. Within every chapter, you will find concrete code examples applying the introduced methods to a practical task.

When the first edition came out in 2015, it set a very high bar for the ML and Python book category. But the excellence didn’t stop there. With every edition, Sebastian and the team kept upgrading and refining the material as the deep learning revolution unfolded in new domains. In this new PyTorch edition, you’ll find new chapters on transformer architectures and graph neural networks. These approaches are on the cutting edge of deep learning and have taken the fields of text understanding and molecular structure by storm in the last two years. You will get to practice them using new yet widely popular software packages in the ecosystem like Hugging Face, PyTorch Lightning, and PyTorch Geometric.

The excellent balance of theory and practice this book strikes is no surprise given the authors’ combination of advanced research expertise and experience in solving problems hands-on. Sebastian Raschka and Vahid Mirjalili draw from their background in deep learning research for computer vision and computational biology. Hayden Liu brings the experience of applying machine learning methods to event prediction, recommendation systems, and other tasks in the industry. All of the authors share a deep passion for education, and it reflects in the approachable way the book goes from simple to advanced.

I’m confident that you will find this book invaluable both as a broad overview of the exciting field of machine learning and as a treasure of practical insights. I hope it inspires you to apply machine learning for the greater good in your problem area, whatever it might be.

Dmytro Dzhulgakov

PyTorch Core Maintainer

Contributors

About the authors

Dr. Sebastian Raschka is an Asst. Professor of Statistics at the University of Wisconsin-Madison focusing on machine learning and deep learning. His recent research focused on general challenges such as few-shot learning for working with limited data and developing deep neural networks for ordinal targets. Sebastian is also an avid open-source contributor, and in his new role as Lead AI Educator at Grid.ai, he plans to follow his passion for helping people to get into machine learning and AI.

Big thanks to Jitian Zhao and Ben Kaufman, with whom I had the pleasure to work on the new chapters on transformers and graph neural networks. I’m also very grateful for Hayden’s and Vahid’s help—this book wouldn’t have been possible without you. Lastly, I want to thank Andrea Panizza, Tony Gitter, and Adam Bielski for helpful discussions on sections of the manuscript.

Yuxi (Hayden) Liu is a machine learning software engineer at Google and has worked as a machine learning scientist in a variety of data-driven domains. Hayden is the author of a series of ML books. His first book, Python Machine Learning By Example, was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. His other books include R Deep Learning Projects, Hands-On Deep Learning Architectures with Python, and PyTorch 1.x Reinforcement Learning Cookbook.

I would like to thank all the great people I worked with, especially my co-authors, my editors at Packt, and my reviewers. Without them, this book would be harder to read and to apply to real-world problems. Lastly, I’d like to thank all the readers for their support, which encouraged me to write the PyTorch edition of this bestselling ML book.

Dr. Vahid Mirjalili is a deep learning researcher focusing on computer vision applications. Vahid received a Ph.D. degree in both Mechanical Engineering and Computer Science from Michigan State University. During his Ph.D. journey, he developed novel computer vision algorithms to solve real-world problems and published several research articles that are highly cited in the computer vision community.

Other contributors

Benjamin Kaufman is a Ph.D. candidate at the University of Wisconsin-Madison in Biomedical Data Science. His research focuses on the development and application of machine learning methods for drug discovery. His work in this area has provided a deeper understanding of graph neural networks.

Jitian Zhao is a Ph.D. student at the University of Wisconsin-Madison, where she developed her interest in large-scale language models. She is passionate about deep learning in developing both real-world applications and theoretical support.

I would like to thank my parents for their support. They encouraged me to always pursue my dream and motivated me to be a good person.

About the reviewer

Roman Tezikov is an industrial research engineer and deep learning enthusiast with over four years of experience in advanced computer vision, NLP, and MLOps. As the co-creator of the ML-REPA community, he organized several workshops and meetups about ML reproducibility and pipeline automation. One of his current work challenges involves utilizing computer vision in the fashion industry. Roman was also a core developer of Catalyst – a PyTorch framework for accelerated deep learning.

Join our book’s Discord space

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at:

https://packt.link/MLwPyTorch

Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Share your thoughts

Giving Computers the Ability to Learn from Data

Building intelligent machines to transform data into knowledge

The three different types of machine learning

Making predictions about the future with supervised learning

Classification for predicting class labels

Regression for predicting continuous outcomes

Solving interactive problems with reinforcement learning

Discovering hidden structures with unsupervised learning

Finding subgroups with clustering

Dimensionality reduction for data compression

Introduction to the basic terminology and notations

Notation and conventions used in this book

Machine learning terminology

A roadmap for building machine learning systems

Preprocessing – getting data into shape

Training and selecting a predictive model

Evaluating models and predicting unseen data instances

Using Python for machine learning

Installing Python and packages from the Python Package Index

Using the Anaconda Python distribution and package manager

Packages for scientific computing, data science, and machine learning

Summary

Training Simple Machine Learning Algorithms for Classification

Artificial neurons – a brief glimpse into the early history of machine learning

The formal definition of an artificial neuron

The perceptron learning rule

Implementing a perceptron learning algorithm in Python

An object-oriented perceptron API

Training a perceptron model on the Iris dataset

Adaptive linear neurons and the convergence of learning

Minimizing loss functions with gradient descent

Implementing Adaline in Python

Improving gradient descent through feature scaling

Large-scale machine learning and stochastic gradient descent

Summary

A Tour of Machine Learning Classifiers Using Scikit-Learn

Choosing a classification algorithm

First steps with scikit-learn – training a perceptron

Modeling class probabilities via logistic regression

Logistic regression and conditional probabilities

Learning the model weights via the logistic loss function

Converting an Adaline implementation into an algorithm for logistic regression

Training a logistic regression model with scikit-learn

Tackling overfitting via regularization

Maximum margin classification with support vector machines

Maximum margin intuition

Dealing with a nonlinearly separable case using slack variables

Alternative implementations in scikit-learn

Solving nonlinear problems using a kernel SVM

Kernel methods for linearly inseparable data

Using the kernel trick to find separating hyperplanes in a high-dimensional space

Decision tree learning

Maximizing IG – getting the most bang for your buck

Building a decision tree

Combining multiple decision trees via random forests

K-nearest neighbors – a lazy learning algorithm

Summary

Building Good Training Datasets – Data Preprocessing

Dealing with missing data

Identifying missing values in tabular data

Eliminating training examples or features with missing values

Imputing missing values

Understanding the scikit-learn estimator API

Handling categorical data

Categorical data encoding with pandas

Mapping ordinal features

Encoding class labels

Performing one-hot encoding on nominal features

Optional: encoding ordinal features

Partitioning a dataset into separate training and test datasets

Bringing features onto the same scale

Selecting meaningful features

L1 and L2 regularization as penalties against model complexity

A geometric interpretation of L2 regularization

Sparse solutions with L1 regularization

Sequential feature selection algorithms

Assessing feature importance with random forests

Summary

Compressing Data via Dimensionality Reduction

Unsupervised dimensionality reduction via principal component analysis

The main steps in principal component analysis

Extracting the principal components step by step

Total and explained variance

Feature transformation

Principal component analysis in scikit-learn

Assessing feature contributions

Supervised data compression via linear discriminant analysis

Principal component analysis versus linear discriminant analysis

The inner workings of linear discriminant analysis

Computing the scatter matrices

Selecting linear discriminants for the new feature subspace

Projecting examples onto the new feature space

LDA via scikit-learn

Nonlinear dimensionality reduction and visualization

Why consider nonlinear dimensionality reduction?

Visualizing data via t-distributed stochastic neighbor embedding

Summary

Learning Best Practices for Model Evaluation and Hyperparameter Tuning

Streamlining workflows with pipelines

Loading the Breast Cancer Wisconsin dataset

Combining transformers and estimators in a pipeline

Using k-fold cross-validation to assess model performance

The holdout method

K-fold cross-validation

Debugging algorithms with learning and validation curves

Diagnosing bias and variance problems with learning curves

Addressing over- and underfitting with validation curves

Fine-tuning machine learning models via grid search

Tuning hyperparameters via grid search

Exploring hyperparameter configurations more widely with randomized search

More resource-efficient hyperparameter search with successive halving

Algorithm selection with nested cross-validation

Looking at different performance evaluation metrics

Reading a confusion matrix

Optimizing the precision and recall of a classification model

Plotting a receiver operating characteristic

Scoring metrics for multiclass classification

Dealing with class imbalance

Summary

Combining Different Models for Ensemble Learning

Learning with ensembles

Combining classifiers via majority vote

Implementing a simple majority vote classifier

Using the majority voting principle to make predictions

Evaluating and tuning the ensemble classifier

Bagging – building an ensemble of classifiers from bootstrap samples

Bagging in a nutshell

Applying bagging to classify examples in the Wine dataset

Leveraging weak learners via adaptive boosting

How adaptive boosting works

Applying AdaBoost using scikit-learn

Gradient boosting – training an ensemble based on loss gradients

Comparing AdaBoost with gradient boosting

Outlining the general gradient boosting algorithm

Explaining the gradient boosting algorithm for classification

Illustrating gradient boosting for classification

Using XGBoost

Summary

Applying Machine Learning to Sentiment Analysis

Preparing the IMDb movie review data for text processing

Obtaining the movie review dataset

Preprocessing the movie dataset into a more convenient format

Introducing the bag-of-words model

Transforming words into feature vectors

Assessing word relevancy via term frequency-inverse document frequency

Cleaning text data

Processing documents into tokens

Training a logistic regression model for document classification

Working with bigger data – online algorithms and out-of-core learning

Topic modeling with latent Dirichlet allocation

Decomposing text documents with LDA

LDA with scikit-learn

Summary

Predicting Continuous Target Variables with Regression Analysis

Introducing linear regression

Simple linear regression

Multiple linear regression

Exploring the Ames Housing dataset

Loading the Ames Housing dataset into a DataFrame

Visualizing the important characteristics of a dataset

Looking at relationships using a correlation matrix

Implementing an ordinary least squares linear regression model

Solving regression for regression parameters with gradient descent

Estimating the coefficient of a regression model via scikit-learn

Fitting a robust regression model using RANSAC

Evaluating the performance of linear regression models

Using regularized methods for regression

Turning a linear regression model into a curve – polynomial regression

Adding polynomial terms using scikit-learn

Modeling nonlinear relationships in the Ames Housing dataset

Dealing with nonlinear relationships using random forests

Decision tree regression

Random forest regression

Summary

Working with Unlabeled Data – Clustering Analysis

Grouping objects by similarity using k-means

k-means clustering using scikit-learn

A smarter way of placing the initial cluster centroids using k-means++

Hard versus soft clustering

Using the elbow method to find the optimal number of clusters

Quantifying the quality of clustering via silhouette plots

Organizing clusters as a hierarchical tree

Grouping clusters in a bottom-up fashion

Performing hierarchical clustering on a distance matrix

Attaching dendrograms to a heat map

Applying agglomerative clustering via scikit-learn

Locating regions of high density via DBSCAN

Summary

Implementing a Multilayer Artificial Neural Network from Scratch

Modeling complex functions with artificial neural networks

Single-layer neural network recap

Introducing the multilayer neural network architecture

Activating a neural network via forward propagation

Classifying handwritten digits

Obtaining and preparing the MNIST dataset

Implementing a multilayer perceptron

Coding the neural network training loop

Evaluating the neural network performance

Training an artificial neural network

Computing the loss function

Developing your understanding of backpropagation

Training neural networks via backpropagation

About convergence in neural networks

A few last words about the neural network implementation

Summary

Parallelizing Neural Network Training with PyTorch

PyTorch and training performance

Performance challenges

What is PyTorch?

How we will learn PyTorch

First steps with PyTorch

Installing PyTorch

Creating tensors in PyTorch

Manipulating the data type and shape of a tensor

Applying mathematical operations to tensors

Split, stack, and concatenate tensors

Building input pipelines in PyTorch

Creating a PyTorch DataLoader from existing tensors

Combining two tensors into a joint dataset

Shuffle, batch, and repeat

Creating a dataset from files on your local storage disk

Fetching available datasets from the torchvision.datasets library

Building an NN model in PyTorch

The PyTorch neural network module (torch.nn)

Building a linear regression model

Model training via the torch.nn and torch.optim modules

Building a multilayer perceptron for classifying flowers in the Iris dataset

Evaluating the trained model on the test dataset

Saving and reloading the trained model

Choosing activation functions for multilayer neural networks

Logistic function recap

Estimating class probabilities in multiclass classification via the softmax function

Broadening the output spectrum using a hyperbolic tangent

Rectified linear unit activation

Summary

Going Deeper – The Mechanics of PyTorch

The key features of PyTorch

PyTorch’s computation graphs

Understanding computation graphs

Creating a graph in PyTorch

PyTorch tensor objects for storing and updating model parameters

Computing gradients via automatic differentiation

Computing the gradients of the loss with respect to trainable variables

Understanding automatic differentiation

Adversarial examples

Simplifying implementations of common architectures via the torch.nn module

Implementing models based on nn.Sequential

Choosing a loss function

Solving an XOR classification problem

Making model building more flexible with nn.Module

Writing custom layers in PyTorch

Project one – predicting the fuel efficiency of a car

Working with feature columns

Training a DNN regression model

Project two – classifying MNIST handwritten digits

Higher-level PyTorch APIs: a short introduction to PyTorch-Lightning

Setting up the PyTorch Lightning model

Setting up the data loaders for Lightning

Training the model using the PyTorch Lightning Trainer class

Evaluating the model using TensorBoard

Summary

Classifying Images with Deep Convolutional Neural Networks

The building blocks of CNNs

Understanding CNNs and feature hierarchies

Performing discrete convolutions

Discrete convolutions in one dimension

Padding inputs to control the size of the output feature maps

Determining the size of the convolution output

Performing a discrete convolution in 2D

Subsampling layers

Putting everything together – implementing a CNN

Working with multiple input or color channels

Regularizing an NN with L2 regularization and dropout

Loss functions for classification

Implementing a deep CNN using PyTorch

The multilayer CNN architecture

Loading and preprocessing the data

Implementing a CNN using the torch.nn module

Configuring CNN layers in PyTorch

Constructing a CNN in PyTorch

Smile classification from face images using a CNN

Loading the CelebA dataset

Image transformation and data augmentation

Training a CNN smile classifier

Summary

Modeling Sequential Data Using Recurrent Neural Networks

Introducing sequential data

Modeling sequential data – order matters

Sequential data versus time series data

Representing sequences

The different categories of sequence modeling

RNNs for modeling sequences

Understanding the dataflow in RNNs

Computing activations in an RNN

Hidden recurrence versus output recurrence

The challenges of learning long-range interactions

Long short-term memory cells

Implementing RNNs for sequence modeling in PyTorch

Project one – predicting the sentiment of IMDb movie reviews

Preparing the movie review data

Embedding layers for sentence encoding

Building an RNN model

Building an RNN model for the sentiment analysis task

Project two – character-level language modeling in PyTorch

Preprocessing the dataset

Building a character-level RNN model

Evaluation phase – generating new text passages

Summary

Transformers – Improving Natural Language Processing with Attention Mechanisms

Adding an attention mechanism to RNNs

Attention helps RNNs with accessing information

The original attention mechanism for RNNs

Processing the inputs using a bidirectional RNN

Generating outputs from context vectors

Computing the attention weights

Introducing the self-attention mechanism

Starting with a basic form of self-attention

Parameterizing the self-attention mechanism: scaled dot-product attention

Attention is all we need: introducing the original transformer architecture

Encoding context embeddings via multi-head attention

Learning a language model: decoder and masked multi-head attention

Implementation details: positional encodings and layer normalization

Building large-scale language models by leveraging unlabeled data

Pre-training and fine-tuning transformer models

Leveraging unlabeled data with GPT

Using GPT-2 to generate new text

Bidirectional pre-training with BERT

The best of both worlds: BART

Fine-tuning a BERT model in PyTorch

Loading the IMDb movie review dataset

Tokenizing the dataset

Loading and fine-tuning a pre-trained BERT model

Fine-tuning a transformer more conveniently using the Trainer API

Summary

Generative Adversarial Networks for Synthesizing New Data

Introducing generative adversarial networks

Starting with autoencoders

Generative models for synthesizing new data

Generating new samples with GANs

Understanding the loss functions of the generator and discriminator networks in a GAN model

Implementing a GAN from scratch

Training GAN models on Google Colab

Implementing the generator and the discriminator networks

Defining the training dataset

Training the GAN model

Improving the quality of synthesized images using a convolutional and Wasserstein GAN

Transposed convolution

Batch normalization

Implementing the generator and discriminator

Dissimilarity measures between two distributions

Using EM distance in practice for GANs

Gradient penalty

Implementing WGAN-GP to train the DCGAN model

Mode collapse

Other GAN applications

Summary

Graph Neural Networks for Capturing Dependencies in Graph Structured Data

Introduction to graph data

Undirected graphs

Directed graphs

Labeled graphs

Representing molecules as graphs

Understanding graph convolutions

The motivation behind using graph convolutions

Implementing a basic graph convolution

Implementing a GNN in PyTorch from scratch

Defining the NodeNetwork model

Coding the NodeNetwork’s graph convolution layer

Adding a global pooling layer to deal with varying graph sizes

Preparing the DataLoader

Using the NodeNetwork to make predictions

Implementing a GNN using the PyTorch Geometric library

Other GNN layers and recent developments

Spectral graph convolutions

Pooling

Normalization

Pointers to advanced graph neural network literature

Summary

Reinforcement Learning for Decision Making in Complex Environments

Introduction – learning from experience

Understanding reinforcement learning

Defining the agent-environment interface of a reinforcement learning system

The theoretical foundations of RL

Markov decision processes

The mathematical formulation of Markov decision processes

Visualization of a Markov process

Episodic versus continuing tasks

RL terminology: return, policy, and value function

The return

Policy

Value function

Dynamic programming using the Bellman equation

Reinforcement learning algorithms

Dynamic programming

Policy evaluation – predicting the value function with dynamic programming

Improving the policy using the estimated value function

Policy iteration

Value iteration

Reinforcement learning with Monte Carlo

State-value function estimation using MC

Action-value function estimation using MC

Finding an optimal policy using MC control

Policy improvement – computing the greedy policy from the action-value function

Temporal difference learning

TD prediction

On-policy TD control (SARSA)

Off-policy TD control (Q-learning)

Implementing our first RL algorithm

Introducing the OpenAI Gym toolkit

Working with the existing environments in OpenAI Gym

A grid world example

Implementing the grid world environment in OpenAI Gym

Solving the grid world problem with Q-learning

A glance at deep Q-learning

Training a DQN model according to the Q-learning algorithm

Replay memory

Determining the target values for computing the loss

Implementing a deep Q-learning algorithm

Chapter and book summary

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Share your thoughts

Once you’ve read Machine Learning with PyTorch and Scikit-Learn, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801819312

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

1

Giving Computers the Ability to Learn from Data

In my opinion, machine learning, the application and science of algorithms that make sense of data, is the most exciting field of all the computer sciences! We are living in an age where data comes in abundance; using self-learning algorithms from the field of machine learning, we can turn this data into knowledge. Thanks to the many powerful open-source libraries that have been developed in recent years, there has probably never been a better time to break into the machine learning field and learn how to utilize powerful algorithms to spot patterns in data and make predictions about future events.

In this chapter, you will learn about the main concepts and different types of machine learning. Together with a basic introduction to the relevant terminology, we will lay the groundwork for successfully using machine learning techniques for practical problem solving.

In this chapter, we will cover the following topics:

The general concepts of machine learningThe three types of learning and basic terminologyThe building blocks for successfully designing machine learning systemsInstalling and setting up Python for data analysis and machine learning

Building intelligent machines to transform data into knowledge

In this age of modern technology, there is one resource that we have in abundance: a large amount of structured and unstructured data. In the second half of the 20th century, machine learning evolved as a subfield of artificial intelligence (AI) involving self-learning algorithms that derive knowledge from data to make predictions.

Instead of requiring humans to manually derive rules and build models from analyzing large amounts of data, machine learning offers a more efficient alternative for capturing the knowledge in data to gradually improve the performance of predictive models and make data-driven decisions.

Not only is machine learning becoming increasingly important in computer science research, but it is also playing an ever-greater role in our everyday lives. Thanks to machine learning, we enjoy robust email spam filters, convenient text and voice recognition software, reliable web search engines, recommendations on entertaining movies to watch, mobile check deposits, estimated meal delivery times, and much more. Hopefully, soon, we will add safe and efficient self-driving cars to this list. Also, notable progress has been made in medical applications; for example, researchers demonstrated that deep learning models can detect skin cancer with near-human accuracy (https://www.nature.com/articles/nature21056). Another milestone was recently achieved by researchers at DeepMind, who used deep learning to predict 3D protein structures, outperforming physics-based approaches by a substantial margin (https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology). While accurate 3D protein structure prediction plays an essential role in biological and pharmaceutical research, there have been many other important applications of machine learning in healthcare recently. For instance, researchers designed systems for predicting the oxygen needs of COVID-19 patients up to four days in advance to help hospitals allocate resources for those in need (https://ai.facebook.com/blog/new-ai-research-to-help-predict-covid-19-resource-needs-from-a-series-of-x-rays/). Another important topic of our day and age is climate change, which presents one of the biggest and most critical challenges. Today, many efforts are being directed toward developing intelligent systems to combat it (https://www.forbes.com/sites/robtoews/2021/06/20/these-are-the-startups-applying-ai-to-tackle-climate-change). One of the many approaches to tackling climate change is the emergent field of precision agriculture. Here, researchers aim to design computer vision-based machine learning systems to optimize resource deployment to minimize the use and waste of fertilizers.

The three different types of machine learning

In this section, we will take a look at the three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. We will learn about the fundamental differences between the three different learning types and, using conceptual examples, we will develop an understanding of the practical problem domains where they can be applied:

Figure 1.1: The three different types of machine learning

Making predictions about the future with supervised learning

The main goal in supervised learning is to learn a model from labeled training data that allows us to make predictions about unseen or future data. Here, the term “supervised” refers to a set of training examples (data inputs) where the desired output signals (labels) are already known. Supervised learning is then the process of modeling the relationship between the data inputs and the labels. Thus, we can also think of supervised learning as “label learning.”

Figure 1.2 summarizes a typical supervised learning workflow, where the labeled training data is passed to a machine learning algorithm for fitting a predictive model that can make predictions on new, unlabeled data inputs:

Figure 1.2: Supervised learning process

Considering the example of email spam filtering, we can train a model using a supervised machine learning algorithm on a corpus of labeled emails, which are correctly marked as spam or non-spam, to predict whether a new email belongs to either of the two categories. A supervised learning task with discrete class labels, such as in the previous email spam filtering example, is also called a classification task. Another subcategory of supervised learning is regression, where the outcome signal is a continuous value.

Classification for predicting class labels

Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels of new instances or data points based on past observations. Those class labels are discrete, unordered values that can be understood as the group memberships of the data points. The previously mentioned example of email spam detection represents a typical example of a binary classification task, where the machine learning algorithm learns a set of rules to distinguish between two possible classes: spam and non-spam emails.

Figure 1.3 illustrates the concept of a binary classification task given 30 training examples; 15 training examples are labeled as class A and 15 training examples are labeled as class B. In this scenario, our dataset is two-dimensional, which means that each example has two values associated with it: x1 and x2. Now, we can use a supervised machine learning algorithm to learn a rule—the decision boundary represented as a dashed line—that can separate those two classes and classify new data into each of those two categories given its x1 and x2 values:

Figure 1.3: Classifying a new data point

However, the set of class labels does not have to be of a binary nature. The predictive model learned by a supervised learning algorithm can assign any class label that was presented in the training dataset to a new, unlabeled data point or instance.

A typical example of a multiclass classification task is handwritten character recognition. We can collect a training dataset that consists of multiple handwritten examples of each letter in the alphabet. The letters (“A,” “B,” “C,” and so on) will represent the different unordered categories or class labels that we want to predict. Now, if a user provides a new handwritten character via an input device, our predictive model will be able to predict the correct letter in the alphabet with certain accuracy. However, our machine learning system will be unable to correctly recognize any of the digits between 0 and 9, for example, if they were not part of the training dataset.

Regression for predicting continuous outcomes

We learned in the previous section that the task of classification is to assign categorical, unordered labels to instances. A second type of supervised learning is the prediction of continuous outcomes, which is also called regression analysis. In regression analysis, we are given a number of predictor (explanatory) variables and a continuous response variable (outcome), and we try to find a relationship between those variables that allows us to predict an outcome.

Note that in the field of machine learning, the predictor variables are commonly called “features,” and the response variables are usually referred to as “target variables.” We will adopt these conventions throughout this book.

For example, let’s assume that we are interested in predicting the math SAT scores of students. (The SAT is a standardized test frequently used for college admissions in the United States.) If there is a relationship between the time spent studying for the test and the final scores, we could use it as training data to learn a model that uses the study time to predict the test scores of future students who are planning to take this test.

Regression toward the mean

The term “regression” was devised by Francis Galton in his article Regression towards Mediocrity in Hereditary Stature in 1886. Galton described the biological phenomenon that the variance of height in a population does not increase over time.

He observed that the height of parents is not passed on to their children, but instead, their children’s height regresses toward the population mean.

Figure 1.4 illustrates the concept of linear regression. Given a feature variable, x, and a target variable, y, we fit a straight line to this data that minimizes the distance—most commonly the average squared distance—between the data points and the fitted line.

We can now use the intercept and slope learned from this data to predict the target variable of new data:

Figure 1.4: A linear regression example

Solving interactive problems with reinforcement learning

Another type of machine learning is reinforcement learning. In reinforcement learning, the goal is to develop a system (agent) that improves its performance based on interactions with the environment. Since the information about the current state of the environment typically also includes a so-called reward signal, we can think of reinforcement learning as a field related to supervised learning. However, in reinforcement learning, this feedback is not the correct ground truth label or value, but a measure of how well the action was measured by a reward function. Through its interaction with the environment, an agent can then use reinforcement learning to learn a series of actions that maximizes this reward via an exploratory trial-and-error approach or deliberative planning.

A popular example of reinforcement learning is a chess program. Here, the agent decides upon a series of moves depending on the state of the board (the environment), and the reward can be defined as win or lose at the end of the game:

Figure 1.5: Reinforcement learning process

There are many different subtypes of reinforcement learning. However, a general scheme is that the agent in reinforcement learning tries to maximize the reward through a series of interactions with the environment. Each state can be associated with a positive or negative reward, and a reward can be defined as accomplishing an overall goal, such as winning or losing a game of chess. For instance, in chess, the outcome of each move can be thought of as a different state of the environment.

To explore the chess example further, let’s think of visiting certain configurations on the chessboard as being associated with states that will more likely lead to winning—for instance, removing an opponent’s chess piece from the board or threatening the queen. Other positions, however, are associated with states that will more likely result in losing the game, such as losing a chess piece to the opponent in the following turn. Now, in the game of chess, the reward (either positive for winning or negative for losing the game) will not be given until the end of the game. In addition, the final reward will also depend on how the opponent plays. For example, the opponent may sacrifice the queen but eventually win the game.

In sum, reinforcement learning is concerned with learning to choose a series of actions that maximizes the total reward, which could be earned either immediately after taking an action or via delayed feedback.

Discovering hidden structures with unsupervised learning

In supervised learning, we know the right answer (the label or target variable) beforehand when we train a model, and in reinforcement learning, we define a measure of reward for particular actions carried out by the agent. In unsupervised learning, however, we are dealing with unlabeled data or data of an unknown structure. Using unsupervised learning techniques, we are able to explore the structure of our data to extract meaningful information without the guidance of a known outcome variable or reward function.

Finding subgroups with clustering

Clustering is an exploratory data analysis or pattern discovery technique that allows us to organize a pile of information into meaningful subgroups (clusters) without having any prior knowledge of their group memberships. Each cluster that arises during the analysis defines a group of objects that share a certain degree of similarity but are more dissimilar to objects in other clusters, which is why clustering is also sometimes called unsupervised classification. Clustering is a great technique for structuring information and deriving meaningful relationships from data. For example, it allows marketers to discover customer groups based on their interests, in order to develop distinct marketing programs.

Figure 1.6 illustrates how clustering can be applied to organizing unlabeled data into three distinct groups or clusters (A, B, and C, in arbitrary order) based on the similarity of their features, x1 and x2:

Figure 1.6: How clustering works

Dimensionality reduction for data compression

Another subfield of unsupervised learning is dimensionality reduction. Often, we are working with data of high dimensionality—each observation comes with a high number of measurements—that can present a challenge for limited storage space and the computational performance of machine learning algorithms. Unsupervised dimensionality reduction is a commonly used approach in feature preprocessing to remove noise from data, which can degrade the predictive performance of certain algorithms. Dimensionality reduction compresses the data onto a smaller dimensional subspace while retaining most of the relevant information.

Sometimes, dimensionality reduction can also be useful for visualizing data; for example, a high-dimensional feature set can be projected onto one-, two-, or three-dimensional feature spaces to visualize it via 2D or 3D scatterplots or histograms. Figure 1.7 shows an example where nonlinear dimensionality reduction was applied to compress a 3D Swiss roll onto a new 2D feature subspace:

Figure 1.7: An example of dimensionality reduction from three to two dimensions

Introduction to the basic terminology and notations

Now that we have discussed the three broad categories of machine learning—supervised, unsupervised, and reinforcement learning—let’s have a look at the basic terminology that we will be using throughout this book. The following subsection covers the common terms we will be using when referring to different aspects of a dataset, as well as the mathematical notation to communicate more precisely and efficiently.

As machine learning is a vast field and very interdisciplinary, you are guaranteed to encounter many different terms that refer to the same concepts sooner rather than later. The second subsection collects many of the most commonly used terms that are found in machine learning literature, which may be useful to you as a reference section when reading machine learning publications.

Notation and conventions used in this book

Figure 1.8 depicts an excerpt of the Iris dataset, which is a classic example in the field of machine learning (more information can be found at https://archive.ics.uci.edu/ml/datasets/iris). The Iris dataset contains the measurements of 150 Iris flowers from three different species—Setosa, Versicolor, and Virginica.

Here, each flower example represents one row in our dataset, and the flower measurements in centimeters are stored as columns, which we also call the features of the dataset:

Figure 1.8: The Iris dataset

To keep the notation and implementation simple yet efficient, we will make use of some of the basics of linear algebra. In the following chapters, we will use a matrix notation to refer to our data. We will follow the common convention to represent each example as a separate row in a feature matrix, X, where each feature is stored as a separate column.

The Iris dataset, consisting of 150 examples and four features, can then be written as a 150×4 matrix, formally denoted as :

Notational conventions

For most parts of this book, unless noted otherwise, we will use the superscript i to refer to the ith training example, and the subscript j to refer to the jth dimension of the training dataset.

We will use lowercase, bold-face letters to refer to vectors () and uppercase, bold-face letters to refer to matrices (). To refer to single elements in a vector or matrix, we will write the letters in italics (x(n) or , respectively).

For example, refers to the first dimension of flower example 150, the sepal length. Each row in matrix X represents one flower instance and can be written as a four-dimensional row vector, :

And each feature dimension is a 150-dimensional column vector, . For example:

Similarly, we can represent the target variables (here, class labels) as a 150-dimensional column vector:

Machine learning terminology

Machine learning is a vast field and also very interdisciplinary as it brings together many scientists from other areas of research. As it happens, many terms and concepts have been rediscovered or redefined and may already be familiar to you but appear under different names. For your convenience, in the following list, you can find a selection of commonly used terms and their synonyms that you may find useful when reading this book and machine learning literature in general:

Training example: A row in a table representing the dataset and synonymous with an observation, record, instance, or sample (in most contexts, sample refers to a collection of training examples).Training: Model fitting, for parametric models similar to parameter estimation.Feature, abbrev. x: A column in a data table or data (design) matrix. Synonymous with predictor, variable, input, attribute, or covariate.Target, abbrev. y: Synonymous with outcome, output, response variable, dependent variable, (class) label, and ground truth.Loss function: Often used synonymously with a cost function. Sometimes the loss function is also called an error function. In some literature, the term “loss” refers to the loss measured for a single data point, and the cost is a measurement that computes the loss (average or summed) over the entire dataset.

A roadmap for building machine learning systems

In previous sections, we discussed the basic concepts of machine learning and the three different types of learning. In this section, we will discuss the other important parts of a machine learning system accompanying the learning algorithm.

Figure 1.9 shows a typical workflow for using machine learning in predictive modeling, which we will discuss in the following subsections:

Figure 1.9: Predictive modeling workflow

Preprocessing – getting data into shape

Let’s begin by discussing the roadmap for building machine learning systems. Raw data rarely comes in the form and shape that is necessary for the optimal performance of a learning algorithm. Thus, the preprocessing of the data is one of the most crucial steps in any machine learning application.

If we take the Iris flower dataset from the previous section as an example, we can think of the raw data as a series of flower images from which we want to extract meaningful features. Useful features could be centered around the color of the flowers or the height, length, and width of the flowers.

Many machine learning algorithms also require that the selected features are on the same scale for optimal performance, which is often achieved by transforming the features in the range [0, 1] or a standard normal distribution with zero mean and unit variance, as we will see in later chapters.

Some of the selected features may be highly correlated and therefore redundant to a certain degree. In those cases, dimensionality reduction techniques are useful for compressing the features onto a lower-dimensional subspace. Reducing the dimensionality of our feature space has the advantage that less storage space is required, and the learning algorithm can run much faster. In certain cases, dimensionality reduction can also improve the predictive performance of a model if the dataset contains a large number of irrelevant features (or noise); that is, if the dataset has a low signal-to-noise ratio.

To determine whether our machine learning algorithm not only performs well on the training dataset but also generalizes well to new data, we also want to randomly divide the dataset into separate training and test datasets. We use the training dataset to train and optimize our machine learning model, while we keep the test dataset until the very end to evaluate the final model.

Training and selecting a predictive model

As you will see in later chapters, many different machine learning algorithms have been developed to solve different problem tasks. An important point that can be summarized from David Wolpert’s famous No free lunch theorems is that we can’t get learning “for free” (