Python 3 and Data Analytics Pocket Primer - Mercury Learning and Information - E-Book

Python 3 and Data Analytics Pocket Primer E-Book

Mercury Learning and Information

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

This book, part of the best-selling Pocket Primer series, introduces readers to the fundamental concepts of data analytics using Python 3. The course begins with a concise introduction to Python, covering essential programming constructs and data manipulation techniques. This foundation sets the stage for deeper dives into data analytics, emphasizing the importance of data cleaning, a critical step in any data analysis process.
Following the Python basics, the course explores powerful libraries such as NumPy and Pandas for efficient data handling and manipulation. It then delves into statistical concepts, providing the necessary background for understanding data distributions and analytical methods. The course culminates in data visualization techniques using Matplotlib and Seaborn, demonstrating how to effectively communicate insights through graphical representations.
Throughout the course, numerous code samples and practical examples are provided, reinforcing learning and offering hands-on experience. Companion files with source code and figures are available online, supporting the learning journey. This comprehensive guide equips both beginners and seasoned professionals with the skills needed to excel in data analytics.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 319

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



PYTHON 3 ANDDATA ANALYTICS

Pocket Primer

LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY

By purchasing or using this book and disc (the “Work”), you agree that this license grants permission to use the contents contained herein, including the disc, but does not give you the right of ownership to any of the textual content in the book / disc or ownership to any of the information or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.

MERCURY LEARNING AND INFORMATION (“MLI” or “the Publisher”) and anyone involved in the creation, writing, or production of the companion disc, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).

The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.

The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and/or disc, and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product.

Companion files for this title are available by writing to the publisher [email protected].

PYTHON 3 ANDDATA ANALYTICS

Pocket Primer

Oswald Campesato

MERCURY LEARNING AND INFORMATIONDulles, VirginiaBoston, MassachusettsNew Delhi

Copyright ©2021 by MERCURY LEARNINGAND INFORMATION LLC. All rights reserved.

This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.

Publisher: David PallaiMERCURY LEARNINGAND INFORMATION22841 Quicksilver DriveDulles, VA [email protected]

O. Campesato. Python 3 and Data Analytics Pocket Primer.ISBN: 978-1-68392-654-2

The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.

Library of Congress Control Number: 2021934305

212223321 This book is printed on acid-free paper in the United States of America.

Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223 (toll free).

All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion files (figures and code listings) for this title are available by contacting [email protected]. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.

I’d like to dedicate this book to my parents – may this bring joy and happiness into their lives.

CONTENTS

Preface

Chapter 1 Introduction to Python

Tools for Python

easy_install and pip

virtualenv

IPython

Python Installation

Setting the PATH Environment Variable (Windows Only)

Launching Python on Your Machine

The Python Interactive Interpreter

Python Identifiers

Lines, Indentation, and Multilines

Quotation and Comments in Python

Saving Your Code in a Module

Some Standard Modules in Python

The help() and dir() Functions

Compile Time and Runtime Code Checking

Simple Data Types in Python

Working with Numbers

Working with Other Bases

The chr() Function

The round() Function in Python

Formatting Numbers in Python

Working with Fractions

Unicode and UTF-8

Working with Unicode

Working with Strings

Comparing Strings

Formatting Strings in Python

Uninitialized Variables and the Value None in Python

Slicing and Splicing Strings

Testing for Digits and Alphabetic Characters

Search and Replace a String in Other Strings

Remove Leading and Trailing Characters

Printing Text without New Line Characters

Text Alignment

Working with Dates

Converting Strings to Dates

Exception Handling in Python

Handling User Input

Command-Line Arguments

Summary

Chapter 2 Working with Data

What Are Datasets?

Data Preprocessing

Data Types

Preparing Datasets

Discrete Data Versus Continuous Data

“Binning” Continuous Data

Scaling Numeric Data via Normalization

Scaling Numeric Data via Standardization

What to Look for in Categorical Data

Mapping Categorical Data to Numeric Values

Working with Dates

Working with Currency

Missing Data, Anomalies, and Outliers

Missing Data

Anomalies and Outliers

Outlier Detection

What Is Data Drift?

What Is Imbalanced Classification?

What Is SMOTE?

SMOTE Extensions

Analyzing Classifiers (Optional)

What Is LIME?

What Is ANOVA?

The Bias-Variance Trade-Off

Types of Bias in Data

Summary

Chapter 3 Introduction to NumPy

What Is NumPy?

Useful NumPy Features

What Are NumPy Arrays?

Working with Loops

Appending Elements to Arrays (1)

Appending Elements to Arrays (2)

Multiplying Lists and Arrays

Doubling the Elements in a List

Lists and Exponents

Arrays and Exponents

Math Operations and Arrays

Working with “−1” Subranges with Vectors

Working with “−1” Subranges with Arrays

Other Useful NumPy Methods

Arrays and Vector Operations

NumPy and Dot Products (1)

NumPy and Dot Products (2)

NumPy and the Length of Vectors

NumPy and Other Operations

NumPy and the reshape() Method

Calculating the Mean and Standard Deviation

Code Sample with Mean and Standard Deviation

Trimmed Mean and Weighted Mean

Working with Lines in the Plane (Optional)

Plotting Randomized Points with NumPy and Matplotlib

Plotting a Quadratic with NumPy and Matplotlib

What Is Linear Regression?

What Is Multivariate Analysis?

What about Nonlinear Datasets?

The MSE Formula

Other Error Types

Nonlinear Least Squares

Calculating MSE Manually

Find the Best-Fitting Line in NumPy

Calculating MSE by Successive Approximation (1)

Calculating MSE by Successive Approximation (2)

Google Colaboratory

Uploading CSV Files in Google Colaboratory

Summary

Chapter 4 Introduction to Pandas

What Is Pandas?

Pandas DataFrames

Dataframes and Data Cleaning Tasks

A Pandas DataFrame Example

Describing a Pandas DataFrame

Pandas Boolean Dataframes

Transposing a Pandas DataFrame

Pandas DataFrames and Random Numbers

Converting Categorical Data to Numeric Data

Matching and Splitting Strings in Pandas

Merging and Splitting Columns in Pandas

Combining Pandas DataFrames

Data Manipulation with Pandas Dataframes

Data Manipulation with Pandas DataFrames (2)

Data Manipulation with Pandas Dataframes (3)

Pandas DataFrames and CSV Files

Pandas DataFrames and Excel Spreadsheets

Select, Add, and Delete Columns in Dataframes

Handling Outliers in Pandas

Pandas DataFrames and Scatterplots

Pandas DataFrames and Simple Statistics

Finding Duplicate Rows in Pandas

Finding Missing Values in Pandas

Sorting Dataframes in Pandas

Working with groupby() in Pandas

Aggregate Operations with the titanic.csv Dataset

Working with apply() and mapapply() in Pandas

Useful One-Line Commands in Pandas

Working with JSON-Based Data

Python Dictionary and JSON

Python, Pandas, and JSON

Pandas and Regular Expressions (Optional)

What Is texthero?

Summary

Chapter 5 Introduction to Probability and Statistics

What Is a Probability?

Calculating the Expected Value

Random Variables

Discrete versus Continuous Random Variables

Well-Known Probability Distributions

Fundamental Concepts in Statistics

The Mean

The Median

The Mode

The Variance and Standard Deviation

Population, Sample, and Population Variance

Chebyshev’s Inequality

What Is a p-value?

The Moments of a Function (Optional)

What Is Skewness?

What Is Kurtosis?

Data and Statistics

The Central Limit Theorem

Correlation versus Causation

Statistical Inferences

Statistical Terms – RSS, TSS, R^2, and F1 Score

What Is an F1 Score?

Gini Impurity, Entropy, and Perplexity

What Is Gini Impurity?

What Is Entropy?

Calculating Gini Impurity and Entropy Values

Multidimensional Gini Index

What Is Perplexity?

Cross-Entropy and KL Divergence

What Is Cross-Entropy?

What Is KL Divergence?

What’s Their Purpose?

Covariance and Correlation Matrices

The Covariance Matrix

Covariance Matrix: An Example

The Correlation Matrix

Eigenvalues and Eigenvectors

Calculating Eigenvectors: A Simple Example

Gauss Jordan Elimination (Optional)

PCA (Principal Component Analysis)

The New Matrix of Eigenvectors

Well-Known Distance Metrics

Pearson Correlation Coefficient

Jaccard Index (or Similarity)

Local Sensitivity Hashing (Optional)

Types of Distance Metrics

What Is Bayesian Inference?

Bayes’s Theorem

Some Bayesian Terminology

What Is MAP?

Why Use Bayes’s Theorem?

Summary

Chapter 6 Data Visualization

What Is Data Visualization?

Types of Data Visualization

What Is Matplotlib?

Horizontal Lines in Matplotlib

Slanted Lines in Matplotlib

Parallel Slanted Lines in Matplotlib

A Grid of Points in Matplotlib

A Dotted Grid in Matplotlib

Lines in a Grid in Matplotlib

A Colored Grid in Matplotlib

A Colored Square in an Unlabeled Grid in Matplotlib

Randomized Data Points in Matplotlib

A Histogram in Matplotlib

A Set of Line Segments in Matplotlib

Plotting Multiple Lines in Matplotlib

Trigonometric Functions in Matplotlib

Display IQ Scores in Matplotlib

Plot a Best-Fitting Line in Matplotlib

Introduction to Sklearn (scikit-learn)

The Digits Dataset in Sklearn

The Iris Dataset in Sklearn(1)

Sklearn, Pandas, and the Iris Dataset

The Iris Dataset in Sklearn(2)

The faces Dataset in Sklearn (optional)

Working with Seaborn

Features of Seaborn

Seaborn Built-in Datasets

The Iris Dataset in Seaborn

The Titanic Dataset in Seaborn

Extracting Data from the Titanic Dataset in Seaborn (1)

Extracting Data from the Titanic Dataset in Seaborn (2)

Visualizing a Pandas Dataset in Seaborn

Data Visualization in Pandas

What Is Bokeh?

Summary

Appendix: Regular Expressions

What Are Regular Expressions?

Metacharacters in Python

Character Sets in Python

Working with “^” and “ \”

Character Classes in Python

Matching Character Classes with the re Module

Using the re.match() Method

Options for the re.match() Method

Matching Character Classes with the re.search() Method

Matching Character Classes with the findAll() Method

Finding Capitalized Words in a String

Additional Matching Function for Regular Expressions

Grouping with Character Classes in Regular Expressions

Using Character Classes in Regular Expressions

Matching Strings with Multiple Consecutive Digits

Reversing Words in Strings

Modifying Text Strings with the re Module

Splitting Text Strings with the re.split() Method

Splitting Text Strings Using Digits and Delimiters

Substituting Text Strings with the re.sub() Method

Matching the Beginning and the End of Text Strings

Compilation Flags

Compound Regular Expressions

Counting Character Types in a String

Regular Expressions and Grouping

Simple String Matches

Additional Topics for Regular Expressions

Summary

Exercises

Index

PREFACE

WHAT IS THE PRIMARY VALUE PROPOSITION FOR THIS BOOK?

This book contains a fast-paced introduction to as much relevant information about data analytics as possible in a book of this size. At the same time, please keep in mind: you will not become an expert in data analytics by reading this book.

However, you will be exposed to a variety of features of NumPy and Pandas, how to write regular expressions (with the accompanying appendix), and how to perform many data cleaning tasks. Keep in mind that some topics are presented in a cursory manner for two main reasons. First, it’s important that you be exposed to these concepts. In some cases, you will find topics that might pique your interest, and hence motivate you to learn more about them through self-study; in other cases, you will probably be satisfied with a brief introduction. In other words, you will decide whether or not to delve into more detail regarding the topics in this book. Second, a full treatment of all the topics that are covered in this book would significantly increase its length. This is contrary to the series design as “primers.” It’s important for you to decide if this approach is suitable for your needs and learning style: if not, you can select one or more of the plethora of data analytics books that are available.

THE TARGET AUDIENCE

The book is intended primarily for people who have worked with Python and are interested in learning about several important Python libraries, such as NumPy and Pandas.

It is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. While many readers know how to read English, their native spoken language is not English (which could be their second, third, or even fourth language). Consequently, this book uses standard English rather than colloquial expressions that might be confusing to those readers. As you know, many people learn by different types of imitation, which includes reading, writing, or hearing new material. The book takes these points into consideration in order to provide a comfortable and meaningful learning experience for the intended readers.

WHAT WILL I LEARN FROM THIS BOOK?

The first chapter contains a quick tour of basic Python 3, followed by a chapter which introduces you to data types and data cleaning tasks, such as working with datasets that contain different types of data, and how to handle missing data. The third and fourth chapters introduce you to NumPy and Pandas (and many code samples).

The fifth chapter contains fundamental concepts in probability and statistics, such as mean, mode, and variance and correlation matrices. You will also learn about Gini impurity, entropy, and KL-divergence. The book covers eigenvalues, eigenvectors, and PCA (principal component analysis).

The sixth and final chapter of this book delves into data visualization with Matplotlib, Seaborn, and an example of a rendering of graphics effects in Bokeh. Finally, there is an appendix for regular expressions, with enough examples so you can understand most regular expressions that you will encounter in your code.

WHY ARE THE CODE SAMPLES PRIMARILY IN PYTHON?

Most of the code samples are short (usually less than one page and sometimes less than half a page), and if need be, you can easily and quickly copy/paste the code into a new Jupyter notebook. For the Python code samples that reference a CSV file, you do not need any additional code in the corresponding Jupyter notebook to access the CSV file. Moreover, the code samples execute quickly, so you won’t need to avail yourself of the free GPU that is provided in Google Colaboratory.

If you do decide to use Google Colaboratory, you can easily copy/paste the Python code into a notebook, and also use the upload feature to upload existing Jupyter notebooks. Keep in mind the following point: if the Python code references a CSV file, make sure that you include the appropriate code snippet (as explained in Chapter 1) to access the CSV file in the corresponding Jupyter notebook in Google Colaboratory.

DO I NEED TO LEARN THE THEORY PORTIONS OF THIS BOOK?

Once again, the answer depends on the extent to which you plan to become involved in data analytics. For example, if you plan to study machine learning, then you will probably learn how to create and train a model, which is a task that is performed after data cleaning tasks. In general, you will probably need to learn everything that you encounter in this book if you are planning to become a machine learning engineer.

WHY DOES THIS BOOK INCLUDE SKLEARN MATERIAL?

First, keep in mind that the Sklearn material in this book is minimalistic because this book is not about machine learning. Second, the Sklearn material is located in Chapter 6 where you will learn about some of the Sklearn built-in datasets. If you decide to delve into machine learning, you will have already been introduced to some aspects of Sklearn.

WHY IS A REGEX APPENDIX INCLUDED IN THIS BOOK?

Regular expressions are supported in multiple languages (including Java and JavaScript) and they enable you to perform complex tasks with very compact, regular expressions. Alas, regular expressions can seem arcane and too complex to learn in a reasonable amount of time. Fortunately, there is good news: Chapter 2 contains some Pandas-based code samples that use regular expressions to perform tasks that might otherwise be more complicated.

If you plan to use Pandas extensively or you plan to work on NLP-related tasks, then the code samples in the appendix will be very useful for you because they are more than adequate for solving certain types of tasks, such as removing HTML tags. Moreover, the knowledge you gain will transfer instantly to other languages that support regular expressions.

GETTING THE MOST FROM THIS BOOK

Some programmers learn well from prose, others learn well from sample code (and lots of it), which means that there’s no single style that can be used for everyone.

Moreover, some programmers want to run the code first, see what it does, and then return to the code to delve into the details (and others use the opposite approach).

Consequently, there are various types of code samples in this book: some are short, some are long, and other code samples “build” from earlier code samples.

WHAT DO I NEED TO KNOW FOR THIS BOOK?

Current knowledge of Python 3.x is the most helpful skill. Knowledge of other programming languages (such as Java) can also be helpful because of the exposure to programming concepts and constructs. The less technical knowledge that you have, the more diligence will be required in order to understand the various topics that are covered.

If youwant to be sure that you can grasp the materialin this book, glance through some of the code samplesto get an idea of how much is familiar toyou and how much is new for you.

DON’T THE COMPANION FILES OBVIATE THE NEED FOR THIS BOOK?

The companion files contain all the code samples to save you time and effort from the error-prone process of manually typing code into a text file. There are situations, however, in which you might not have easy access to the companion files. Furthermore, the code samples in the book provide explanations that are not available on the companion files.

DOES THIS BOOK CONTAIN PRODUCTION-LEVEL CODE SAMPLES?

The primary purpose of the code samples in this book is to show you Python-based libraries for solving a variety of data-related tasks in conjunction with acquiring a rudimentary understanding of statistical concepts. Clarity has higher priority than writing more compact code that is more difficult to understand (and possibly more prone to bugs). If you decide to use any of the code in this book in a production website, you ought to subject that code to the same rigorous analysis as the other parts of your code base.

WHAT ARE THE NON-TECHNICAL PREREQUISITES FOR THIS BOOK?

Although the answer to this question is more difficult to quantify, it’s very important to have strong desire to learn about data analytics, along with the motivation and discipline to read and understand the code samples.

HOW DO I SET UP A COMMAND SHELL?

If you are a Mac user, there are three methods. The first is to use Finder to navigate to Applications > Utilities and then double click on the Utilities application. Next, if you already have a command shell available, you can launch a new command shell by typing the following command:

open /Applications/Utilities/Terminal.app

A second method for Mac users is to open a new command shell on a MacBook from a command shell that is already visible, simply by clicking command+n in that command shell, and your Mac will launch another command shell.

If you are a PC user, you can install Cygwin (open source https://cygwin.com/) that simulates bash commands, or use another toolkit such as MKS (a commercial product). Please read the online documentation that describes the download and installation process. Note that custom aliases are not automatically set if they are defined in a file other than the main start-up file (such as .bash_login).

COMPANION FILES

All the code samples and figures in this book may be obtained by writing to the publisher at [email protected].

WHAT ARE THE “NEXT STEPS” AFTER FINISHING THIS BOOK?

The answer to this question varies widely, mainly because the answer depends heavily on your objectives. If you are interested primarily in NLP, then you can learn more advanced concepts, such as attention, transformers, and the BERT-related models.

If you are primarily interested in machine learning, there are some subfields of machine learning, such as deep learning and reinforcement learning (and deep reinforcement learning) which might appeal to you. Fortunately, there are many resources available, and you can perform an Internet search for those resources. One other point: the aspects of machine learning for you to learn depend on your interests: the needs of a machine learning engineer, data scientist, manager, student or software developer are all different.

Oswald CampesatoMarch 2021

CHAPTER 1

INTRODUCTION TO PYTHON

This chapter contains an introduction to Python with information about useful tools for installing Python modules, basic Python constructs, and how to work with some data types in Python.

The first part of this chapter covers how to install Python, some Python environment variables, and how to use the Python interpreter. You will see Python code samples and also how to save Python code in text files that you can launch from the command line. The second part of this chapter shows you how to work with simple data types such as numbers, fractions, and strings. The final part of this chapter discusses exceptions and how to use them in Python scripts.

NOTE

The Python scripts in this book are for Python 3 and more details are provided in a subsequent section.

TOOLS FOR PYTHON

The Anaconda distribution is available for Windows, Linux, and Mac, and it’s downloadable here:

http://continuum.io/downloads.

Anaconda is well-suited for modules such asNumPy (discussed in Chapter 3) this andscipy (not discussed in this book), and if you are a Windows user, Anaconda appears to be a better alternative.

easy_install andpip

Botheasy_install andpip are very easy to use when you need to install Python modules.

Whenever you need to install a Python module (and there are many in this book), use eithereasy_install orpip with the following syntax:

easy_install <module-name>

pip install <module-name>

NOTE

Python-based modules are easier to install, whereas modules with code written in C are usually faster but more difficult in terms of installation.

virtualenv

Thevirtualenv tool enables you to create isolated Python environments, and its home page is here:

http://www.virtualenv.org/en/latest/virtualenv.html.

virtualenv addresses the problem of preserving the correct dependencies and versions (and indirectly, permissions) for different applications. If you are a Python novice, you might not needvirtualenv right now, but keep this tool in mind.

IPython

Another very good tool isIPython (which won a Jolt award), and its home page is here:

http://ipython.org/install.html.

Typeipython to invokeIPython from the command line:

ipython

The preceding command displays the following output:

Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec 7 2020, 12:44:01)

Type 'copyright', 'credits' or 'license' for more information

IPython 7.20.0 -- An enhanced Interactive Python. Type'?'

for help.

In [1]:

Now type a question mark (“?”) at the prompt and you will see some useful information, a portion of which is here:

IPython -- An enhanced Interactive Python

IPython offers a fully compatible replacement for the standard Python interpreter, with convenient shell features, special commands, command history mechanism and output results caching.

At your system command line, type 'ipython -h' to see the command line options available. This document only describes interactive features.

GETTING HELP------------

Within IPython you have various way to access help:

  ?         -> Introduction and overview of IPython's features (this screen).

  object?   -> Details about 'object'.

  object??  -> More detailed, verbose information about 'object'.

  %quickref -> Quick reference of all IPython specific syntax and magics.

  help      -> Access Python's own help system.

If you are in terminal IPython you can quit this screen by pressing'q'.

Finally, simply typequit at the command prompt and you will exit theIPython shell.

The next section shows you how to check whether or not Python is installed on your machine, and also where you can download Python.

PYTHON INSTALLATION

Before you download anything, check if you have Python already installed on your machine (which is likely if you have a MacBook or a Linux machine) by typing the following command in a command shell:

python -V

The output for the MacBook used in this book is here:

Python 3.9.1

NOTE 1

Install Python 3.9.1 (or as close as possible to this version) on your machine if you want to use the same version of Python that was used to test the Python scripts in this book. However, the Python scripts in this book will probably also work with versions of Python that are in the range 3.7.x to 3.9.1.

NOTE 2

When you install Python 3.x, the executable might be python3 instead of python, and be sure to use python3 if it is version 3.x of Python.

If you need to install Python on your machine, navigate to the Python home page and select the downloads link or navigate directly to this website:

http://www.python.org/download/.

In addition, PythonWin is available for Windows, and its home page is here:

http://www.cgl.ucsf.edu/Outreach/pc204/pythonwin.html.

Use any text editor that can create, edit, and save Python scripts and save them as plain text files (don’t use Microsoft Word).

After you have Python installed and configured on your machine, you are ready to work with the Python scripts in this book.

SETTING THE PATH ENVIRONMENT VARIABLE (WINDOWS ONLY)

ThePATH environment variable specifies a list of directories that are searched whenever you specify an executable program from the command line. A very good guide to setting up your environment so that the Python executable is always available in every command shell is to follow the instructions here:

http://www.blog.pythonlibrary.org/2011/11/24/python-101-setting-up-python-on-windows/.

LAUNCHING PYTHON ON YOUR MACHINE

There are three different ways to launch Python:

•Use the Python Interactive Interpreter

•Launch Python scripts from the command line

•Use an IDE

The next section shows you how to launch the Python interpreter from the command line, and later in this chapter you will learn how to launch Python scripts from the command line and also about Python IDEs.

NOTE

The emphasis in this book is to launch Python scripts from the command line or to enter code in the Python interpreter.

The Python Interactive Interpreter

PYTHON IDENTIFIERS

A Python identifier is the name of a variable, function, class, module, or other Python object, and a valid identifier conforms to the following rules:

•starts with a letter A to Z or a to z or an underscore (_)

•zero or more letters, underscores, and digits (0 to 9)

NOTE

Python identifiers cannot contain characters such as @, $, and %.

Python is a case-sensitive language, soAbc andabc are different identifiers in Python.

In addition, Python has the following naming convention:

•Class names start with an uppercase letter and all other identifiers with a lowercase letter

•an initial underscore is used for private identifiers

•two initial underscores is used for strongly private identifiers

A Python identifier with two initial underscore and two trailing underscore characters indicates a language-defined special name.

LINES, INDENTATION, AND MULTILINES

QUOTATION AND COMMENTS IN PYTHON

SAVING YOUR CODE IN A MODULE

SOME STANDARD MODULES IN PYTHON

The Python Standard Library provides many modules that can simplify your own Python scripts. A list of the Standard Library modules is here:

http://www.python.org/doc/.

Some of the most important Python modules include cgi, math, os, pickle, random, re, socket, sys, time, and urllib.

The code samples in this book use the modules math, os, random, re, socket, sys, time, and urllib. You need to import these modules in order to use them in your code. For example, the following code block shows you how to import four standard Python modules:

import datetime

import re

import sys

import time

The code samples in this book import one or more of the preceding modules, as well as other Python modules.

THE HELP() AND DIR() FUNCTIONS

An Internet search for Python-related topics usually returns a number of links with useful information. Alternatively, you can check the official Python documentation site: docs.python.org.

In addition, Python provides the help() and dir() functions that are accessible from the Python interpreter. The help() function displays documentation strings, whereas the dir() function displays defined symbols.

For example, if you type help(sys), you will see documentation for the sys module, whereas dir(sys) displays a list of the defined symbols.

Type the following command in the Python interpreter to display the string-related methods in Python:

>>> dir(str)

The preceding command generates the following output:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

The preceding list gives you a consolidated “dump” of built-in functions (including some that are discussed later in this chapter). Although the max() function obviously returns the maximum value of its arguments, the purpose of other functions such as filter() or map() is not immediately apparent (unless you have used them in other programming languages). In any case, the preceding list provides a starting point for finding out more about various Python built-in functions that are not discussed in this chapter.

Note that while dir() does not list the names of built-in functions and variables, you can obtain this information from the standard module __builtin__ that is automatically imported under the name __builtins__:

>>> dir(__builtins__)

The following command shows you how to get more information about a function:

help(str.lower)

The output from the preceding command is here:

Help on method_descriptor:

lower(...)

    S.lower() -> string

    Return a copy of the string S converted to lowercase.

(END)

Check the online documentation and also experiment with help() and dir() when you need additional information about a particular function or module.

COMPILE TIME AND RUNTIME CODE CHECKING

SIMPLE DATA TYPES IN PYTHON

Python supports primitive data types, such as numbers (integers, floating point numbers, and exponential numbers), strings, and dates. Python also supports more complex data types, such as lists (or arrays), tuples, and dictionaries. The next several sections discuss some of the Python primitive data types, along with code snippets that show you how to perform various operations on those data types.

WORKING WITH NUMBERS