29,99 €
This book introduces basic Python 3 programming concepts related to machine learning. The first four chapters provide a fast-paced introduction to Python 3, NumPy, and Pandas. The fifth chapter covers fundamental machine learning concepts. The sixth chapter dives into machine learning classifiers, such as logistic regression, k-NN, decision trees, random forests, and SVMs. The final chapter includes material on natural language processing (NLP) and reinforcement learning (RL). Keras-based code samples supplement the theoretical discussion.
The course begins with Python basics, including conditional logic, loops, functions, and collections. It then explores data manipulation with NumPy and Pandas. The journey continues with an introduction to machine learning, focusing on essential concepts and classifiers. Advanced topics like NLP and RL are covered, ensuring a comprehensive understanding of machine learning.
These concepts are crucial for developing machine learning applications. This book transitions readers from basic Python programming to advanced machine learning techniques, blending theory with practical skills. Appendices for regular expressions, Keras, and TensorFlow 2, along with companion files, enhance learning, making this an essential resource for mastering Python and machine learning.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 407
Veröffentlichungsjahr: 2024
LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY
By purchasing or using this book and its companion files (the “Work”), you agree that this license grants permission to use the contents contained herein, but does not give you the right of ownership to any of the textual content in the book or ownership to any of the information, files, or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.
MERCURY LEARNING AND INFORMATION (“MLI” or “the Publisher”) and anyone involved in the creation, writing, production, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to insure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).
The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.
The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product.
Companion files also available for downloading from the publisher by writing to [email protected].
Copyright ©2020 by MERCURY LEARNING AND INFORMATION LLC. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David PallaiMERCURY LEARNING AND INFORMATION22841 Quicksilver DriveDulles, VA [email protected]
O. Campesato. Python 3 for Machine Learning.ISBN: 978-1-68392-495-1
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2020930258
202122321 Printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
Companion files are available for download by writing to the publisher at [email protected]. All of our titles are available in digital format at Academiccourseware.com and other digital vendors. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the book, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
I’d like to dedicate this book to my parents –may this bring joy and happiness into their lives.
Preface
Chapter 1 Introduction to Python 3
1.1 Tools for Python
1.1.1 easy_install and pip
1.1.2 virtualenv
1.1.3 IPython
1.2 Python Installation
1.3 Setting the PATH Environment Variable (Windows Only)
1.4 Launching Python on Your Machine
1.4.1 The Python Interactive Interpreter
1.5 Python Identifiers
1.6 Lines, Indentation, and Multilines
1.7 Quotation and Comments in Python
1.8 Saving Your Code in a Module
1.9 Some Standard Modules in Python
1.10 The help() and dir() Functions
1.11 Compile Time and Runtime Code Checking
1.12 Simple Data Types in Python
1.13 Working with Numbers
1.13.1 Working with Other Bases
1.13.2 The chr() Function
1.13.3 The round() Function in Python
1.13.4 Formatting Numbers in Python
1.14 Working with Fractions
1.15 Unicode and UTF-8
1.16 Working with Unicode
1.17 Working with Strings
1.17.1 Comparing Strings
1.17.2 Formatting Strings in Python
1.18 Uninitialized Variables and the Value None in Python
1.19 Slicing Strings
1.19.1 Testing for Digits and Alphabetic Characters
1.20 Search and Replace a String in Other Strings
1.21 Remove Leading and Trailing Characters
1.22 Printing Text without NewLine Characters
1.23 Text Alignment
1.24 Working with Dates
1.24.1 Converting Strings to Dates
1.25 Exception Handling in Python
1.26 Handling User Input
1.27 Command-Line Arguments
1.28 Summary
Chapter 2 Conditional Logic, Loops, and Functions
2.1 Precedence of Operators in Python
2.2 Python Reserved Words
2.3 Working with Loops in Python
2.3.1 Python for Loops
2.3.2 A for Loop with try/except in Python
2.3.3 Numeric Exponents in Python
2.4 Nested Loops
2.5 The split() Function with for Loops
2.6 Using the split() Function to Compare Words
2.7 Using the split() Function to Print Justified Text
2.8 Using the split() Function to Print Fixed Width Text
2.9 Using the split() Function to Compare Text Strings
2.10 Using a Basic for Loop to Display Characters in a String
2.11 The join() Function
2.12 Python while Loops
2.13 Conditional Logic in Python
2.14 The break/continue/pass Statements
2.15 Comparison and Boolean Operators
2.15.1 The in/not in/is/is not Comparison Operators
2.15.2 The and, or, and not Boolean Operators
2.16 Local and Global Variables
2.17 Scope of Variables
2.18 Pass by Reference versus Value
2.19 Arguments and Parameters
2.20 Using a while loop to Find the Divisors of a Number
2.20.1 Using a while loop to Find Prime Numbers
2.21 User-Defined Functions in Python
2.22 Specifying Default Values in a Function
2.22.1 Returning Multiple Values from a Function
2.23 Functions with a Variable Number of Arguments
2.24 Lambda Expressions
2.25 Recursion
2.25.1 Calculating Factorial Values
2.25.2 Calculating Fibonacci Numbers
2.25.3 Calculating the GCD of Two Numbers
2.25.4 Calculating the LCM of Two Numbers
2.26 Summary
Chapter 3 Python Collections
3.1 Working with Lists
3.1.1 Lists and Basic Operations
3.1.2 Reversing and Sorting a List
3.1.3 Lists and Arithmetic Operations
3.1.4 Lists and Filter-Related Operations
3.2 Sorting Lists of Numbers and Strings
3.3 Expressions in Lists
3.4 Concatenating a List of Words
3.5 The BubbleSort in Python
3.6 The Python range() Function
3.6.1 Counting Digits, Uppercase, and Lowercase Letters
3.7 Arrays and the append() Function
3.8 Working with Lists and the split()Function
3.9 Counting Words in a List
3.10 Iterating through Pairs of Lists
3.11 Other List-Related Functions
3.12 Using a List as a Stack and a Queue
3.13 Working with Vectors
3.14 Working with Matrices
3.15 The NumPy Library for Matrices
3.16 Queues
3.17 Tuples (Immutable Lists)
3.18 Sets
3.19 Dictionaries
3.19.1 Creating a Dictionary
3.19.2 Displaying the Contents of a Dictionary
3.19.3 Checking for Keys in a Dictionary
3.19.4 Deleting Keys from a Dictionary
3.19.5 Iterating through a Dictionary
3.19.6 Interpolating Data from a Dictionary
3.20 Dictionary Functions and Methods
3.21 Dictionary Formatting
3.22 Ordered Dictionaries
3.22.1 Sorting Dictionaries
3.22.2 Python Multidictionaries
3.23 Other Sequence Types in Python
3.24 Mutable and Immutable Types in Python
3.25 The type() Function
3.26 Summary
Chapter 4 Introduction to NumPy and Pandas
4.1 What is NumPy?
4.1.1 Useful NumPy Features
4.2 What are NumPy Arrays?
4.3 Working with Loops
4.4 Appending Elements to Arrays (1)
4.5 Appending Elements to Arrays (2)
4.6 Multiply Lists and Arrays
4.7 Doubling the Elements in a List
4.8 Lists and Exponents
4.9 Arrays and Exponents
4.10 Math Operations and Arrays
4.11 Working with “-1” Subranges with Vectors
4.12 Working with “-1” Subranges with Arrays
4.13 Other Useful NumPy Methods
4.14 Arrays and Vector Operations
4.15 NumPy and Dot Products (1)
4.16 NumPy and Dot Products (2)
4.17 NumPy and the “Norm” of Vectors
4.18 NumPy and Other Operations
4.19 NumPy and the reshape() Method
4.20 Calculating the Mean and Standard Deviation
4.21 Calculating Mean and Standard Deviation: Another Example
4.22 What is Pandas?
4.22.1 Pandas Dataframes
4.22.2 Dataframes and Data Cleaning Tasks
4.23 A Labeled Pandas Dataframe
4.24 Pandas Numeric DataFrames
4.25 Pandas Boolean DataFrames
4.25.1 Transposing a Pandas Dataframe
4.26 Pandas Dataframes and Random Numbers
4.27 Combining Pandas DataFrames (1)
4.28 Combining Pandas DataFrames (2)
4.29 Data Manipulation with Pandas Dataframes (1)
4.30 Data Manipulation with Pandas DataFrames (2)
4.31 Data Manipulation with Pandas Dataframes (3)
4.32 Pandas DataFrames and CSV Files
4.33 Pandas DataFrames and Excel Spreadsheets (1)
4.34 Select, Add, and Delete Columns in DataFrames
4.35 Pandas DataFrames and Scatterplots
4.36 Pandas DataFrames and Simple Statistics
4.37 Useful One_line Commands in Pandas
4.38 Summary
Chapter 5 Introduction to Machine Learning
5.1 What is Machine Learning?
5.1.1 Types of Machine Learning
5.2 Types of Machine Learning Algorithms
5.2.1 Machine Learning Tasks
5.3 Feature Engineering, Selection, and Extraction
5.4 Dimensionality Reduction
5.4.1 PCA
5.4.2 Covariance Matrix
5.5 Working with Datasets
5.5.1 Training Data versus Test Data
5.5.2 What is Cross-validation?
5.6 What is Regularization?
5.6.1 ML and Feature Scaling
5.6.2 Data Normalization versus Standardization
5.7 The Bias-Variance Tradeoff
5.8 Metrics for Measuring Models
5.8.1 Limitations of R-Squared
5.8.2 Confusion Matrix
5.8.3 Accuracy versus Precision versus Recall
5.8.4 The ROC Curve
5.9 Other Useful Statistical Terms
5.9.1 What Is an F1 score?
5.9.2 What Is a p-value?
5.10 What is Linear Regression?
5.10.1 Linear Regression versus Curve-Fitting
5.10.2 When Are Solutions Exact Values?
5.10.3 What is Multivariate Analysis?
5.11 Other Types of Regression
5.12 Working with Lines in the Plane (optional)
5.13 Scatter Plots with NumPy and Matplotlib (1)
5.13.1 Why the “Perturbation Technique” is Useful
5.14 Scatter Plots with NumPy and Matplotlib (2)
5.15 A Quadratic Scatterplot with NumPy and matplotlib
5.16 The MSE Formula
5.16.1 A List of Error Types
5.16.2 Nonlinear Least Squares
5.17 Calculating the MSE Manually
5.18 Approximating Linear Data with np.linspace()
5.19 Calculating MSE with np.linspace() API
5.20 Linear Regression with Keras
5.21 Summary
Chapter 6 Classifiers in Machine Learning
6.1 What is Classification?
6.1.1 What Are Classifiers?
6.1.2 Common Classifiers
6.1.3 Binary versus Multiclass Classification
6.1.4 Multilabel Classification
6.2 What are Linear Classifiers?
6.3 What is kNN?
6.3.1 How to Handle a Tie in kNN
6.4 What are Decision Trees?
6.5 What are Random Forests?
6.6 What are SVMs?
6.6.1 Tradeoffs of SVMs
6.7 What is Bayesian Inference?
6.7.1 Bayes Theorem
6.7.2 Some Bayesian Terminology
6.7.3 What Is MAP?
6.7.4 Why Use Bayes Theorem?
6.8 What is a Bayesian Classifier?
6.8.1 Types of Naïve Bayes Classifiers
6.9 Training Classifiers
6.10 Evaluating Classifiers
6.11 What are Activation Functions?
6.11.1 Why Do We Need Activation Functions?
6.11.2 How Do Activation Functions Work?
6.12 Common Activation Functions
6.12.1 Activation Functions in Python
6.12.2 Keras Activation Functions
6.13 The ReLU and ELU Activation Functions
6.13.1 The Advantages and Disadvantages of ReLU
6.13.2 ELU
6.14 Sigmoid, Softmax, and Hardmax Similarities
6.14.1 Softmax
6.14.2 Softplus
6.14.3 Tanh
6.15 Sigmoid, Softmax, and HardMax Differences
6.16 What is Logistic Regression?
6.16.1 Setting a Threshold Value
6.16.2 Logistic Regression: Important Assumptions
6.16.3 Linearly Separable Data
6.17 Keras, Logistic Regression, and Iris Dataset
6.18 Summary
Chapter 7 Natural Language Processing and Reinforcement Learning
7.1 Working with NLP
7.1.1 NLP Techniques
7.1.2 The Transformer Architecture and NLP
7.1.3 Transformer-XL Architecture
7.1.4 Reformer Architecture
7.1.5 NLP and Deep Learning
7.1.6 Data Preprocessing Tasks in NLP
7.2 Popular NLP Algorithms
7.2.1 What is an n-gram?
7.2.2 What is a skip-gram?
7.2.3 What is BoW?
7.2.4 What is Term Frequency?
7.2.5 What is Inverse Document Frequency (idf)?
7.2.6 What is tf-idf?
7.3 What are Word Embeddings?
7.4 ELMo, ULMFit, OpenAI, BERT, and ERNIE 2.0
7.5 What is Translatotron?
7.6 Deep Learning and NLP
7.7 NLU versus NLG
7.8 What is Reinforcement Learning (RL)?
7.8.1 RL Applications
7.8.2 NLP and RL
7.8.3 Values, Policies, and Models in RL
7.9 From NFAs to MDPs
7.9.1 What Are NFAs?
7.9.2 What Are Markov Chains?
7.9.3 MDPs
7.10 The Epsilon-Greedy Algorithm
7.11 The Bellman Equation
7.11.1 Other Important Concepts in RL
7.12 RL Toolkits and Frameworks
7.12.1 TF-Agents
7.13 What is Deep RL (DRL)?
7.14 Summary
Appendix A Introduction to Regular Expressions
A.1 What Are Regular Expressions?
A.2 Metacharacters in Python
A.3 Character Sets in Python
A.4 Character Classes in Python
A.5 Matching Character Classes with the re Module
A.6 Using the re.match() Method
A.7 Options for the re.match() Method
A.8 Matching Character Classes with the re.search() Method
A.9 Matching Character Classes with the findAll() Method
A.9.1 Finding Capitalized Words in a String
A.10 Additional Matching Function for Regular Expressions
A.11 Grouping with Character Classes in Regular Expressions
A.12 Using Character Classes in Regular Expressions
A.12.1 Matching Strings with Multiple Consecutive Digits
A.12.2 Reversing Words in Strings
A.13 Modifying Text Strings with the re Module
A.14 Splitting Text Strings with the re.split() Method
A.15 Splitting Text Strings Using Digits and Delimiters
A.16 Substituting Text Strings with the re.sub() Method
A.17 Matching the Beginning and the End of Text Strings
A.18 Compilation Flags
A.19 Compound Regular Expressions
A.20 Counting Character Types in a String
A.21 Regular Expressions and Grouping
A.22 Simple String Matches
A.23 Additional Topics for Regular Expressions
A.24 Summary
A.25 Exercises
Appendix B Introduction to Keras
B.1 What is Keras?
B.1.1 Working with Keras Namespaces in TF 2
B.1.2 Working with the tf.keras.layers Namespace
B.1.3 Working with the tf.keras.activations Namespace
B.1.4 Working with the keras.tf.datasets Namespace
B.1.5 Working with the tf.keras.experimental Namespace
B.1.6 Working with Other tf.keras Namespaces
B.1.7 TF 2 Keras versus “Standalone” Keras
B.2 Creating a Keras-based Model
B.3 Keras and Linear Regression
B.4 Keras, MLPs, and MNIST
B.5 Keras, CNNs, and cifar10
B.6 Resizing Images in Keras
B.7 Keras and Early Stopping (1)
B.8 Keras and Early Stopping (2)
B.9 Keras and Metrics
B.10 Saving and Restoring Keras Models
B.11 Summary
Appendix C Introduction to TF 2
C.1 What is TF 2?
C.1.1 TF 2 Use Cases
C.1.2 TF 2 Architecture: The Short Version
C.1.3 TF 2 Installation
C.1.4 TF 2 and the Python REPL
C.2 Other TF 2-based Toolkits
C.3 TF 2 Eager Execution
C.4 TF 2 Tensors, Data Types, and Primitive Types
C4.1 TF 2 Data Types
C.4.2 TF 2 Primitive Types
C.5 Constants in TF 2
C.6 Variables in TF 2
C.7 The tf.rank() API
C.8 The tf.shape() API
C.9 Variables in TF 2 (Revisited)
C.9.1 TF 2 Variables versus Tensors
C.10 What is @tf.function in TF 2?
C.10.1 How Does @tf.function Work?
C.10.2 A Caveat about @tf.function in TF 2
C.10.3 The tf.print() Function and Standard Error
C.11 Working with @tf.function in TF 2
C.11.1 An Example without @tf.function
C.11.2 An Example with @tf.function
C.11.3 Overloading Functions with @tf.function
C.11.4 What is AutoGraph in TF 2?
C.12 Arithmetic Operations in TF 2
C.13 Caveats for Arithmetic Operations in TF 2
C.13.1 TF 2 and Built-in Functions
C.14 Calculating Trigonometric Values in TF 2
C.15 Calculating Exponential Values in TF 2
C.16 Working with Strings in TF 2
C.17 Working with Tensors and Operations in TF 2
C.18 2nd Order Tensors in TF 2 (1)
C.19 2nd Order Tensors in TF 2 (2)
C.20 Multiplying Two 2nd Order Tensors in TF
C.21 Convert Python Arrays to TF Tensors
C.21.1 Conflicting Types in TF 2
C.22 Differentiation and tf.GradientTape in TF 2
C.23 Examples of tf.GradientTape
C.23.1 Using the watch() Method of tf.GradientTape
C.23.2 Using Nested Loops with tf.GradientTape
C.23.3 Other Tensors with tf.GradientTape
C.23.4 A Persistent Gradient Tape
C.24 Google Colaboratory
C.25 Other Cloud Platforms
C.25.1 GCP SDK
C.26 Summary
Index
This book endeavors to provide you with as much relevant information about Python and machine learning as possible that can be reasonably included in a book of this size.
This book is intended to reach an international audience of readers with highly diverse backgrounds in various age groups. While many readers know how to read English, their native spoken language is not English (which could be their second, third, or even fourth language). Consequently, this book uses standard English rather than colloquial expressions that might be confusing to those readers. As you know, many people learn by different types of imitation, which includes reading, writing, or hearing new material. This book takes these points into consideration in order to provide a comfortable and meaningful learning experience for the intended readers.
Some programmers learn well from prose, others learn well from sample code (and lots of it), which means that there's no single style that can be used for everyone.
Moreover, some programmers want to run the code first, see what it does, and then return to the code to delve into the details (and others use the opposite approach).
Consequently, there are various types of code samples in this book: some are short, some are long, and other code samples "build" from earlier code samples.
There are useful websites containing installation instructions for Python for various platforms. Instead of repeating those instructions in this book, that space is used for Python material. In general, this book attempts to avoid “filler” content as well as easily accessible set-up steps that are available online.
The code samples in this book have been tested in Python version 3.6.8 on a Macbook Pro with OS X 10.8.5.
The most useful prerequisite is some familiarity with another scripting language, such as Perl or PHP. Knowledge of other programming languages (such as Java) can also be helpful because of the exposure to programming concepts and constructs. The less technical knowledge that you have, the more diligence will be required in order to understand the various topics that are covered. Basic machine learning is helpful but not required.
If you want to be sure that you can grasp the material in this book, glance through some of the code samples to get an idea of how much is familiar to you and how much is new for you.
The target audience consists of readers ranging from beginners to intermediate in terms of their knowledge of programming languages. During the preparation of this book, every effort has been made to accommodate those readers so that they will be adequately prepared to explore more advanced features of Python during their self-study.
One of the primary rules of exposition of virtually any kind is "show, don't tell." While this rule is not taken literally in this book, it’s the motivation for showing first and telling second. You can decide for yourself if show-first-then-tell is valid in this book by performing a simple experiment: when you see the code samples and the accompanying graphics effects in this book, determine if it's more effective to explain ("tell") the visual effects or to show them. If the adage “a picture is worth a thousand words” is true, then this book endeavors to provide both the pictures and the words.
The companion files contain all of the code samples to save you time and effort from the error-prone process of manually typing code into a text file. Moreover, the book provides explanations that assist you in understanding the code samples.
The code samples are available for download by writing to the publisher at [email protected].
The code samples show you some features of Python3 that are useful for machine learning. In addition, clarity has higher priority than writing more compact code that is more difficult to understand (and possibly more prone to bugs). If you decide to use any of the code in this book in a production environment, submit that code to the same rigorous analysis as the other parts of your code base.
In This Chapter
Tools for Python
Python Installation
Setting the PATH Environment Variable (Windows Only)
Launching Python on Your Machine
Python Identifiers
Lines, Indentation, and Multilines
Quotation and Comments in Python
Saving Your Code in a Module
Some Standard Modules in Python
The help() and dir() Functions
Compile Time and Runtime Code Checking
Simple Data Types in Python
Working with Numbers
Working with Fractions
Unicode and UTF-8
Working with Unicode
Working with Strings
Uninitialized Variables and the Value None in Python
Slicing and Splicing Strings
Search and Replace a String in Other Strings
Remove Leading and Trailing Characters
Printing Text without NewLine Characters
Text Alignment
Working with Dates
Exception Handling in Python
Handling User Input
Command-Line Arguments
Summary
This chapter contains an introduction to Python, with information about useful tools for installing Python modules, basic Python constructs, and how to work with some data types in Python.
The first part of this chapter covers how to install Python, some Python environment variables, and how to use the Python interpreter. You will see Python code samples, and you will also learn how to save Python code in text files that you can launch from the command line. The second part of this chapter shows you how to work with simple data types, such as numbers, fractions, and strings. The final part of this chapter discusses exceptions and how to use them in Python scripts.
If you like to read documentation, one of the best third-party documentation websites is pymotw (Python Module of the Week) by Doug Hellman, and its home page is here:
http://pymotw.com/2/
Note: the Python scripts in this book are for Python 2.7.5 and although most of them are probably compatible with Python 2.6, these scripts are not compatible with Python 3.
The Anaconda Python distribution available for Windows, Linux, and Mac, and it’s downloadable here:
http://continuum.io/downloads
Anaconda is well-suited for modules such as numPy and sciPy (discussed in Chapter 7), and if you are a Windows user, Anaconda appears to be a better alternative.
Both easy_install and pip are very easy to use when you need to install Python modules.
Whenever you need to install a Python module (and there are many in this book), use either easy_install or pip with the following syntax:
easy_install <module-name>
pip install <module-name>
Note: Python-based modules are easier to install, whereas modules with code written in C are usually faster but more difficult in terms of installation.
The virtualenv tool enables you to create isolated Python environments, and its home page is here:
http://www.virtualenv.org/en/latest/virtualenv.html
virtualenv addresses the problem of preserving the correct dependencies and versions (and indirectly permissions) for different applications. If you are a Python novice you might not need virtualenv right now, but keep this tool in mind.
Another very good tool is IPython (which won a Jolt award), and its home page is here:
http://ipython.org/install.html
Two very nice features of IPython are tab expansion and “? ”, and an example of tab expansion is shown here:
python
Python 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31)
Type "copyright", "credits" or "license" for more information.
IPython 0.13.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: di
%dirs dict dir divmod
In the preceding session, if you type the characters di, iPython responds with the following line that contains all the functions that start with the letters di:
%dirs dict dir divmod
If you enter a question mark (“? ”), ipython provides textual assistance, the first part of which is here:
IPython -- An enhanced Interactive Python
IPython offers a combination of convenient shell features, special commands and a history mechanism for both input (command history) and output (results caching, similar to Mathematica). It is intended to be a fully compatible replacement for the standard Python interpreter, while offering vastly improved functionality and flexibility.
The next section shows you how to check whether or not Python is installed on your machine, and also where you can download Python.
Before you download anything, check if you have Python already installed on your machine (which is likely if you have a Macbook or a Linux machine) by typing the following command in a command shell:
python3 -V
The output for the Macbook used in this book is here:
Python 3.6.8
Note: install Python 3.6.8 (or as close as possible to this version) on your machine so that you will have the same version of Python that was used to test the Python scripts in this book.
If you need to install Python on your machine, navigate to the Python home page and select the downloads link or navigate directly to this website:
http://www.python.org/download/
In addition, PythonWin is available for Windows, and its home page is here:
http://www.cgl.ucsf.edu/Outreach/pc204/pythonwin.html
Use any text editor that can create, edit, and save Python scripts and save them as plain text files (don’t use Microsoft Word).
After you have Python installed and configured on your machine, you are ready to work with the Python scripts in this book.
The PATH environment variable specifies a list of directories that are searched whenever you specify an executable program from the command line. A very good guide to setting up your environment so that the Python executable is always available in every command shell is to follow the instructions here:
http://www.blog.pythonlibrary.org/2011/11/24/python-101-setting-up-python-on-windows/
There are three different ways to launch Python:
Use the Python interactive interpreter.
Launch Python scripts from the command line.
Use an IDE.
The next section shows you how to launch the Python interpreter from the command line, and later in this chapter you will learn how to launch Python scripts from the command line and also about Python IDEs.
Note: The emphasis in this book is to launch Python scripts from the command line or to enter code in the Python interpreter.
Launch the Python interactive interpreter from the command line by opening a command shell and typing the following command:
python3
You will see the following prompt (or something similar):
Python 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Now type the expression 2 + 7 at the prompt:
>>> 2 + 7
Python displays the following result:
9
>>>
Press ctrl-d to exit the Python shell.
You can launch any Python script from the command line by preceding it with the word “python.” For example, if you have a Python script myscript.py that contains Python commands, launch the script as follows:
python myscript.py
As a simple illustration, suppose that the Python script myscript.py contains the following Python code:
When you launch the preceding Python script you will see the following output:
A Python identifier is the name of a variable, function, class, module, or other Python object, and a valid identifier conforms to the following rules:
starts with a letter A to Z or a to z or an underscore (_)
zero or more letters, underscores, and digits (0 to 9)
Note: Python identifiers cannot contain characters such as @, $, and %.
Python is a case-sensitive language, so Abc and abc different identifiers in Python.
In addition, Python has the following naming convention:
Class names start with an uppercase letter and all other identifiers with a lowercase letter.
An initial underscore is used for private identifiers.
Two initial underscores is used for strongly private identifiers.
A Python identifier with two initial underscore and two trailing underscore characters indicates a language-defined special name.
Unlike other programming languages (such as Java or Objective-C), Python uses indentation instead of curly braces for code blocks. Indentation must be consistent in a code block, as shown here:
if True:
print("ABC")
print("DEF")
else:
print("ABC")
print("DEF")
Multiline statements in Python can terminate with a new line or the backslash (“ \”) character, as shown here:
Obviously you can place x1, x2, and x3 on the same line, so there is no reason to use three separate lines; however, this functionality is available in case you need to add a set of variables that do not fit on a single line.
You can specify multiple statements in one line by using a semicolon (“;”) to separate each statement, as shown here:
a=10; b=5; print(a); print(a+b)
The output of the preceding code snippet is here:
10
15
Note: the use of semicolons and the continuation character are discouraged in Python.
Python allows single (‘), double (“), and triple (‘’’ or “””) quotes for string literals, provided that they match at the beginning and the end of the string. You can use triple quotes for strings that span multiple lines. The following examples are legal Python strings:
A string literal that begins with the letter “r” (for “raw”) treats everything as a literal character and “escapes” the meaning of metacharacters, as shown here:
The output of the preceding code block is here:
a1: \n a2: \r a3: \t
You can embed a single quote in a pair of double quotes (and vice versa) in order to display a single quote or a double quote. Another way to accomplish the same result is to precede a single or double quote with a backslash (“ \”) character. The following code block illustrates these techniques:
The output of the preceding code block is here:
b1: ' b2: "
b3: ' b4: "
A hash sign (#) that is not inside a string literal is the character that indicates the beginning of a comment. Moreover, all characters after the # and up to the physical line end are part of the comment (and ignored by the Python interpreter). Consider the following code block:
#!/usr/bin/python
# First comment
print("Hello, Python!") # second comment
This will produce following result:
Hello, Python!
A comment may be on the same line after a statement or expression:
You can comment multiple lines as follows:
# This is comment one
# This is comment two
# This is comment three
A blank line in Python is a line containing only whitespace, a comment, or both.
Earlier you saw how to launch the Python interpreter from the command line and then enter Python commands. However, that everything that you type in the Python interpreter is only valid for the current session: if you exit the interpreter and then launch the interpreter again, your previous definitions are no longer valid. Fortunately, Python enables you to store code in a text file, as discussed in the next section.
A module in Python is a text file that contains Python statements. In the previous section, you saw how the Python interpreter enables you to test code snippets whose definitions are valid for the current session. If you want to retain the code snippets and other definitions, place them in a text file so that you can execute that code outside of the Python interpreter.
The outermost statements in a Python are executed from top to bottom when the module is imported for the first time, which will then set up its variables and functions.
A Python module can be run directly from the command line, as shown here:
python First.py
As an illustration, place the following two statements in a text file called First.py:
Now type the following command:
python First.py
The output from the preceding command is 3, which is the same as executing the preceding code from the Python interpreter.
When a Python module is run directly, the special variable __name__ is set to __main__. You will often see the following type of code in a Python module:
The preceding code snippet enables Python to determine if a Python module was launched from the command line or imported into another Python module.
The Python Standard Library provides many modules that can simplify your own Python scripts. A list of the Standard Library modules is here:
http://www.python.org/doc/
Some of the most important Python modules include cgi, math, os, pickle, random, re, socket, sys, time, and urllib.
The code samples in this book use the modules math, os, random, re, socket, sys, time, and urllib. You need to import these modules in order to use them in your code. For example, the following code block shows you how to import 4 standard Python modules:
import datetime
import re
import sys
import time
The code samples in this book import one or more of the preceding modules, as well as other Python modules. In Chapter 8, you will learn how to write Python modules that import other user-defined Python modules.
An Internet search for Python-related topics usually returns a number of links with useful information. Alternatively, you can check the official Python documentation site: docs.python.org
In addition, Python provides the help() and dir() functions that are accessible from the Python interpreter. The help() function displays documentation strings, whereas the dir() function displays defined symbols.
For example, if you type help(sys) you will see documentation for the sys module, whereas dir(sys) displays a list of the defined symbols.
Type the following command in the Python interpreter to display the string-related methods in Python:
>>> dir(str)
The preceding command generates the following output:
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
The preceding list gives you a consolidated “dump” of built-in functions (including some that are discussed later in this chapter). Although the max() function obviously returns the maximum value of its arguments, the purpose of other functions such as filter() or map() is not immediately apparent (unless you have used them in other programming languages). In any case, the preceding list provides a starting point for finding out more about various Python built-in functions that are not discussed in this chapter.
Note that while dir() does not list the names of built-in functions and variables, you can obtain this information from the standard module __builtin__ that is automatically imported under the name __builtins__:
>>> dir(__builtins__)
The following command shows you how to get more information about a function:
help(str.lower)
The output from the preceding command is here:
Help on method_descriptor:
lower(...)
S.lower() -> string
Return a copy of the string S converted to lowercase.
(END)
Check the online documentation and also experiment with help() and dir() when you need additional information about a particular function or module.
Python performs some compile-time checking, but most checks (including type, name, and so forth) are deferred until code execution. Consequently, if your Python code references a user-defined function that that does not exist, the code will compile successfully. In fact, the code will fail with an exception only when the code execution path references the nonexistent function.
As a simple example, consider the following Python function myFunc that references the nonexistent function called DoesNotExist:
The preceding code will only fail when the myFunc function is passed the value 3, after which Python raises an error.
In Chapter 2, you will learn how to define and invoke user-defined functions, along with an explanation of the difference between local versus global variables in Python.
Now that you understand some basic concepts (such as how to use the Python interpreter) and how to launch your custom Python modules, the next section discusses primitive data types in Python.
Python supports primitive data types, such as numbers (integers, floating point numbers, and exponential numbers), strings, and dates. Python also supports more complex data types, such as lists (or arrays), tuples, and dictionaries, all of which are discussed in Chapter 3. The next several sections discuss some of the Python primitive data types, along with code snippets that show you how to perform various operations on those data types.
Python provides arithmetic operations for manipulating numbers a straightforward manner that is similar to other programming languages. The following examples involve arithmetic operations on integers:
>>> 2+2
4
>>> 4/3
1
>>> 3*8
24
The following example assigns numbers to two variables and computes their product:
The following examples demonstrate arithmetic operations involving integers:
>>> 2+2
4
>>> 4/3
1
>>> 3*8
24
Notice that division (“/”) of two integers is actually truncation in which only the integer result is retained. The following example converts a floating point number into exponential form:
You can use the int() function and the float() function to convert strings to numbers:
The output from the preceding code block is here:
var1: 123 var2: 456.78
Alternatively, you can use the eval() function:
If you attempt to convert a string that is not a valid integer or a floating point number, Python raises an exception, so it’s advisable to place your code in a try/except block (discussed later in this chapter).
Numbers in Python are in base 10 (the default), but you can easily convert numbers to other bases. For example, the following code block initializes the variable x with the value 1234, and then displays that number in base 2, 8, and 16, respectively:
Use the format() function if you wan to suppress the 0b, 0o, or 0x prefixes, as shown here:
>>> format(x, 'b') '10011010010'
>>> format(x, 'o') '2322'
>>> format(x, 'x') '4d2'
Negative integers are displayed with a negative sign:
The Python chr() function takes a positive integer as a parameter and converts it to its corresponding alphabetic value (if one exists). The letters A through Z have decimal representation of 65 through 91 (which corresponds to hexadecimal 41 through 5b), and the lowercase letters a through z have decimal representation 97 through 122 (hexadecimal 61 through 7b).
Here is an example of using the chr() function to print uppercase A:
>>> x=chr(65)
>>> x
'A'
The following code block prints the ASCII values for a range of integers:
Note: Python 2 uses ASCII strings whereas Python 3 uses UTF-8.
You can represent a range of characters with the following line:
for x in range(65,91):
However, the following equivalent code snippet is more intuitive:
for x in range(ord('A'), ord('Z')):
If you want to display the result for lowercase letters, change the preceding range from (65,91) to either of the following statements:
for x in range(65,91):
for x in range(ord('a'), ord('z')):
The Python round() function enables you to round decimal values to the nearest precision:
>>> round(1.23, 1)
1.2
>>> round(-3.42,1)
-3.4
Python allows you to specify the number of decimal places of precision to use when printing decimal numbers, as shown here:
Python supports the Fraction() function (which is define in the fractions module) that accepts two integers that represent the numerator and the denominator (which must be nonzero) of a fraction. Several example of defining and manipulating fractions in Python are shown here:
Before delving into Python code samples that work with strings, the next section briefly discusses Unicode and UTF-8, both of which are character encodings.
A Unicode string consists of a sequence of numbers that are between 0 and 0x10ffff, where each number represents a group of bytes. An encoding is the manner in which a Unicode string is translated into a sequence of bytes. Among the various encodings, Unicode Transformation Format (UTF)-8 is perhaps the most common, and it’s also the default encoding for many systems. The digit 8 in UTF-8 indicates that the encoding uses 8-bit numbers, whereas UTF-16 uses 16-bit numbers (but this encoding is less common).
The ASCII character set is a subset of UTF-8, so a valid ASCII string can be read as a UTF-8 string without any re-encoding required. In addition, a Unicode string can be converted into a UTF-8 string.
Python supports Unicode, which means that you can render characters in different languages. Unicode data can be stored and manipulated in the same way as strings. Create a Unicode string by prepending the letter “u,” as shown here:
>>> u'Hello from Python!'
u'Hello from Python!'
Special characters can be included in a string by specifying their Unicode value. For example, the following Unicode string embeds a space (which has the Unicode value 0x0020) in a string:
>>> u'Hello\u0020from Python!'
u'Hello from Python!'
Listing 1.1 displays the contents of Unicode1.py that illustrates how to display a string of characters in Japanese and another string of characters in Chinese (Mandarin).
Listing 1.1: Unicode1.py
The output of Listing 1.2 is here:
Chinese:
將探討
HTML5
及其他
Hiragana: D3
は
かっこぃぃ
です
!
The next portion of this chapter shows you how to “slice and dice” text strings with built-in Python functions.
A string in Python 2 is a sequence of ASCII-encoded bytes. You can concatenate two strings using the “+” operator. The following example prints a string and then concatenates two single-letter strings:
>>> 'abc'
'abc'
>>> 'a' + 'b'
'ab'
You can use “+” or “*” to concatenate identical strings, as shown here:
>>> 'a' + 'a' + 'a'
'aaa'
>>> 'a'
*
3
'aaa'
You can assign strings to variables and print them using the print command:
You can “unpack” the letters of a string and assign them to variables, as shown here:
The preceding code snippets shows you how easy it is to extract the letters in a text string, and in Chapter 3 you will learn how to “unpack” other Python data structures.
You can extract substrings of a string as shown in the following examples:
However, you will cause an error if you attempt to “subtract” two strings, as you probably expect:
>>> 'a' - 'b'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'str'
The try/except construct in Python (discussed later in this chapter) enables you to handle the preceding type of exception more gracefully.
You can use the methods lower() and upper() to convert a string to lowercase and uppercase, respectively, as shown here:
>>> 'Python'.lower()
'python'
>>> 'Python'.upper()
'PYTHON'
>>>
The methods lower() and upper() are useful for performing a case insensitive comparison of two ASCII strings. Listing 1.2 displays the contents of Compare.py that uses the lower() function in order to compare two ASCII strings.
Listing 1.2: Compare.py
Since x contains mixed case letters and y contains lowercase letters, Listing 1.2 displays the following output:
x and y: different
Python provides the functions string.lstring(), string.rstring(), and string.center() for positioning a text string so that it is left-justified, right-justified, and centered, respectively. As you saw in a previous section, Python also provides the format() method for advanced interpolation features.
Now enter the following commands in the Python interpreter:
The output is shown here:
this is a string
this is a string
this is a string
Python distinguishes between an uninitialized variable and the value None. The former is a variable that has not been assigned a value, whereas the value None is a value that indicates “no value.” Collections and methods often return the value None, and you can test for the value None in conditional logic (shown in Chapter 2).
The next portion of this chapter shows you how to “slice and dice” text strings with built-in Python functions.
Python enables you to extract substrings of a string (called “slicing”) using array notation. Slice notation is start:stop:step, where the start, stop, and step values are integers that specify the start value, end value, and the increment value. The interesting part about slicing in Python is that you can use the value -1, which operates from the right side instead of the left side of a string.
Some examples of slicing a string are here:
The output from the preceding code block is here:
First 7 characters: this is
Characters 2-4: is
Right-most character: g
Right-most 2 characters: in
Later in this chapter you will see how to insert a string in the middle of another string.
Python enables you to examine each character in a string and then test whether that character is a bona fide digit or an alphabetic character. This section provides a precursor to regular expressions that are discussed in Chapter 4.
Listing 1.3 displays the contents of CharTypes.py that illustrates how to determine if a string contains digits or characters. In case you are unfamiliar with the conditional “if” statement in Listing 1.3, more detailed information is available in Chapter 2.
Listing 1.3: CharTypes.py
Listing 1.3 initializes some variables, followed by 2 conditional tests that check whether or not str1 and str2 are digits using the isdigit() function. The next portion of Listing 1.3 checks if str3, str4, and str5 are alphabetic strings using the isalpha() function. The output of Listing 1.3 is here:
this is a digit: 4
this is a digit: 4234
this is alphabetic: b
this is alphabetic: abc
this is not pure alphabetic: a1b2c3
capitalized first letter: A1B2C3
Python provides methods for searching and also for replacing a string in a second text string. Listing 1.4 displays the contents of FindPos1.py that shows you how to use the find function to search for the occurrence of one string in another string.
Listing 1.4: FindPos1.py
Listing 1.4 initializes the variables item1, item2, and text, and then searches for the index of the contents of item1