49,19 €
Python 3 and Data Visualization provides an in-depth exploration of Python 3 programming and data visualization techniques. The course begins with an introduction to Python, covering essential topics from basic data types and loops to advanced constructs such as dictionaries and matrices. This foundation prepares readers for the next section, which focuses on NumPy and its powerful array operations, seamlessly leading into data visualization using prominent libraries like Matplotlib.
Chapter 6 delves into Seaborn's rich visualization tools, providing insights into datasets like Iris and Titanic. The appendix covers additional visualization tools and techniques, including SVG graphics and D3 for dynamic visualizations. The companion files include numerous Python code samples and figures, enhancing the learning experience.
From foundational Python concepts to advanced data visualization techniques, this course serves as a comprehensive resource for both beginners and seasoned professionals, equipping them with the necessary skills to effectively visualize data.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 291
Veröffentlichungsjahr: 2024
LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY
By purchasing or using this book and companion files (the “Work”), you agree that this license grants permission to use the contents contained herein, including the disc, but does not give you the right of ownership to any of the textual content in the book / disc or ownership to any of the information or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.
MERCURY LEARNING AND INFORMATION (“MLI” or “the Publisher”) and anyone involved in the creation, writing, or production of the companion disc, accompanying algorithms, code, or computer programs (“the software”), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold “as is” without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).
The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.
The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and/or disc, and only at the discretion of the Publisher. The use of “implied warranty” and certain “exclusions” vary from state to state, and might not apply to the purchaser of this product.
Companion files for this title are available by writing to the publisher at [email protected].
Oswald Campesato
MERCURY LEARNING AND INFORMATION
Dulles, Virginia
Boston, Massachusetts
New Delhi
Copyright ©2024 by MERCURY LEARNING AND INFORMATION. An Imprint of DeGruyter Inc. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David Pallai
MERCURY LEARNING AND INFORMATION
121 High Street, 3rd Floor
Boston, MA 02110
www.merclearning.com
800-232-0223
O. Campesato. Python 3 and Data Visualization.
ISBN: 978-1-68392-946-8
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2023944271
232425321 This book is printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion files (figures and code listings) for this title are available by contacting [email protected]. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
I’d like to dedicate this book to my parents– may this bring joy and happiness into their lives.
Preface
Chapter 1: Introduction to Python 3
Some Standard Modules in Python
Simple Data Types in Python
Working With Numbers
Working With Other Bases
The chr() Function
The round() Function in Python
Unicode and UTF-8
Working With Unicode
Working With Strings
Comparing Strings
Uninitialized Variables and the Value None in Python
Slicing and Splicing Strings
Testing for Digits and Alphabetic Characters
Search and Replace a String in Other Strings
Precedence of Operators in Python
Python Reserved Words
Working With Loops in Python
Python for Loops
Numeric Exponents in Python
Nested Loops
The split() Function With for Loops
Using the split() Function to Compare Words
Python while Loops
Conditional Logic in Python
The break/continue/pass Statements
Comparison and Boolean Operators
The in/not in/is/is not Comparison Operators
The and, or, and not Boolean Operators
Local and Global Variables
Scope of Variables
Pass by Reference versus Value
Arguments and Parameters
User-Defined Functions in Python
Specifying Default Values in a Function
Returning Multiple Values From a Function
Lambda Expressions
Working With Lists
Lists and Basic Operations
Lists and Arithmetic Operations
Lists and Filter-Related Operations
The join(), range(), and split() Functions
Arrays and the append() Function
Other List-Related Functions
Working With List Comprehensions
Working With Vectors
Working With Matrices
Queues
Tuples (Immutable Lists)
Sets
Dictionaries
Creating a Dictionary
Displaying the Contents of a Dictionary
Checking for Keys in a Dictionary
Deleting Keys From a Dictionary
Iterating Through a Dictionary
Interpolating Data From a Dictionary
Dictionary Functions and Methods
Other Sequence Types in Python
Mutable and Immutable Types in Python
Summary
Chapter 2: NumPy and Data Visualization
What Is NumPy?
Useful NumPy Features
What Are NumPy Arrays?
Working With Loops
Appending Elements to Arrays (1)
Appending Elements to Arrays (2)
Multiplying Lists and Arrays
Doubling the Elements in a List
Lists and Exponents
Arrays and Exponents
Math Operations and Arrays
Working With “–1” Subranges With Vectors
Working With “–1” Subranges With Arrays
Other Useful NumPy Methods
Arrays and Vector Operations
NumPy and Dot Products (1)
NumPy and Dot Products (2)
NumPy and the Length of Vectors
NumPy and Other Operations
NumPy and the reshape() Method
Calculating the Mean and Standard Deviation
Code Sample With Mean and Standard Deviation
Trimmed Mean and Weighted Mean
Working With Lines in the Plane (Optional)
Plotting Randomized Points With NumPy and Matplotlib
Plotting a Quadratic With NumPy and Matplotlib
What Is Linear Regression?
What Is Multivariate Analysis?
What About Nonlinear Datasets?
The MSE (Mean Squared Error) Formula
Other Error Types
Nonlinear Least Squares
Calculating the MSE Manually
Find the Best-Fitting Line in NumPy
Calculating MSE by Successive Approximation (1)
Calculating MSE by Successive Approximation (2)
Google Colaboratory
Uploading CSV Files in Google Colaboratory
Summary
Chapter 3: Pandas and Data Visualization
What Is Pandas?
Pandas DataFrames
Dataframes and Data Cleaning Tasks
A Pandas DataFrame Example
Describing a Pandas DataFrame
Pandas Boolean DataFrames
Transposing a Pandas DataFrame
Pandas DataFrames and Random Numbers
Converting Categorical Data to Numeric Data
Matching and Splitting Strings in Pandas
Merging and Splitting Columns in Pandas
Combining Pandas DataFrames
Data Manipulation With Pandas DataFrames
Data Manipulation With Pandas DataFrames (2)
Data Manipulation With Pandas DataFrames (3)
Pandas DataFrames and CSV Files
Pandas DataFrames and Excel Spreadsheets
Select, Add, and Delete Columns in DataFrames
Handling Outliers in Pandas
Pandas DataFrames and Scatterplots
Pandas DataFrames and Simple Statistics
Finding Duplicate Rows in Pandas
Finding Missing Values in Pandas
Sorting DataFrames in Pandas
Working With groupby() in Pandas
Aggregate Operations With the titanic.csv Dataset
Working with apply() and applymap() in Pandas
Useful One-Line Commands in Pandas
What is Texthero?
Data Visualization in Pandas
Summary
Chapter 4: Pandas and SQL
Pandas and Data Visualization
Pandas and Bar Charts
Pandas and Horizontally Stacked Bar Charts
Pandas and Vertically Stacked Bar Charts
Pandas and Nonstacked Area Charts
Pandas and Stacked Area Charts
What Is Fugue?
MySQL, SQLAlchemy, and Pandas
What Is SQLAlchemy?
Read MySQL Data via SQLAlchemy
Export SQL Data From Pandas to Excel
MySQL and Connector/Python
Establishing a Database Connection
Reading Data From a Database Table
Creating a Database Table
Writing Pandas Data to a MySQL Table
Read XML Data in Pandas
Read JSON Data in Pandas
Working With JSON-Based Data
Python Dictionary and JSON
Python, Pandas, and JSON
Pandas and Regular Expressions (Optional)
What Is SQLite?
SQLite Features
SQLite Installation
Create a Database and a Table
Insert, Select, and Delete Table Data
Launch SQL Files
Drop Tables and Databases
Load CSV Data Into a sqlite Table
Python and SQLite
Connect to a sqlite3 Database
Create a Table in a sqlite3 Database
Insert Data in a sqlite3 Table
Select Data From a sqlite3 Table
Populate a Pandas Dataframe From a sqlite3 Table
Histogram With Data From a sqlite3 Table (1)
Histogram With Data From a sqlite3 Table (2)
Working With sqlite3 Tools
SQLiteStudio Installation
DB Browser for SQLite Installation
SQLiteDict (Optional)
Working With BeautifulSoup
Parsing an HTML Web Page
BeautifulSoup and Pandas
BeautifulSoup and Live HTML Web Pages
Summary
Chapter 5: Matplotlib for Data Visualization
What Is Data Visualization?
Types of Data Visualization
What Is Matplotlib?
Matplotlib Styles
Display Attribute Values
Color Values in Matplotlib
Cubed Numbers in Matplotlib
Horizontal Lines in Matplotlib
Slanted Lines in Matplotlib
Parallel Slanted Lines in Matplotlib
A Grid of Points in Matplotlib
A Dotted Grid in Matplotlib
Two Lines and a Legend in Matplotlib
Loading Images in Matplotlib
A Checkerboard in Matplotlib
Randomized Data Points in Matplotlib
A Set of Line Segments in Matplotlib
Plotting Multiple Lines in Matplotlib
Trigonometric Functions in Matplotlib
A Histogram in Matplotlib
Histogram With Data From a sqlite3 Table
Plot Bar Charts Matplotlib
Plot a Pie Chart Matplotlib
Heat Maps in Matplotlib
Save Plot as a PNG File
Working With SweetViz
Working With Skimpy
3D Charts in Matplotlib
Plotting Financial Data With mplfinance
Charts and Graphs With Data From Sqlite3
Summary
Chapter 6: Seaborn for Data Visualization
Working With Seaborn
Features of Seaborn
Seaborn Dataset Names
Seaborn Built-In Datasets
The Iris Dataset in Seaborn
The Titanic Dataset in Seaborn
Extracting Data From Titanic Dataset in Seaborn (1)
Extracting Data From Titanic Dataset in Seaborn (2)
Visualizing a Pandas Dataset in Seaborn
Seaborn Heat Maps
Seaborn Pair Plots
What Is Bokeh?
Introduction to Scikit-Learn
The Digits Dataset in Scikit-learn
The Iris Dataset in Scikit-Learn
Scikit-Learn, Pandas, and the Iris Dataset
Advanced Topics in Seaborn
Summary
Appendix: SVG and D3
Basic Two-Dimensional Shapes in SVG
SVG Gradients and the <path> Element
SVG <polygon> Element
Bézier Curves and Transforms
SVG Filters and Shadow Effects
Rendering Text Along an SVG <path> Element
SVG Transforms
SVG and HTML
CSS3 and SVG
Similarities and Differences Between SVG and CSS3
Introduction to D3
What Is D3?
D3 Boilerplate
Method Chaining in D3
The D3 Methods select() and selectAll()
Specifying UTF-8 in HTML5 Web Pages With D3
Creating New HTML Elements
The Most Common Idiom in D3
Binding Data to Document-Object-Model Elements
Generating Text Strings
Creating Simple Two-Dimensional Shapes
Bézier Curves and Text
A Digression: Scaling Arrays of Numbers to Different Ranges
Tweening in D3
Formatting Numbers
Working With Gradients
Linear Gradients
Radial Gradients
Adding HTML <div> Elements With Gradient Effects
Other D3 Graphics Samples
D3 Application Programming Interface Reference
Other Features of D3
Summary
Index
This book contains a fast-paced introduction to relevant information about Python-based data visualization. You will learn how to generate graphics using Pandas, Matplotlib, and Seaborn. In addition, an appendix contains SVG-based and D3-based graphics effects, along with links for many additional code samples.
This book is intended primarily for those who have worked with Python and are interested in learning about graphics effects with Python libraries. It is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. Consequently, the book uses standard English rather than colloquial expressions that might be confusing to those readers. It provides a comfortable and meaningful learning experience for the intended readers.
The first chapter contains a quick tour of basic Python 3, followed by a chapter that introduces you to NumPy. The third and fourth chapters introduce you to Pandas as well as Pandas with JSON data. MySQL and SQL.
The fifth chapter delves into data visualization with Matplotlib and also working with SweetViz and Skimpy. The final chapter of this book shows you how to create graphics effects with Seaborn, and an example of a rendering graphics effects in Bokeh. In addition, an appendix is included with graphics effects based on SVG and D3.
Most of the code samples are short (usually less than one page and sometimes less than half a page), and if need be, you can easily and quickly copy/paste the code into a new Jupyter notebook. For the Python code samples that reference a CSV file, you do not need any additional code in the corresponding Jupyter notebook to access the CSV file. Moreover, the code samples execute quickly, so you won’t need to avail yourself of the free GPU that is provided in Google Colaboratory.
If you do decide to use Google Colaboratory, you can easily copy/paste the Python code into a notebook, and also use the upload feature to upload existing Jupyter notebooks. Keep in mind the following point: if the Python code references a CSV file, make sure that you include the appropriate code snippet (details are available online) to access the CSV file in the corresponding Jupyter notebook in Google Colaboratory.
First, keep in mind that the Sklearn material in this book is minimalistic, because it is not about machine learning. Second, the Sklearn material is located in chapter 6 where you will learn about some of the Sklearn built-in datasets. If you decide to study machine learning, you will have already been introduced to some aspects of Sklearn.
Some programmers learn well from prose, others learn well from sample code (and lots of it), which means that there’s no single style that can be used for everyone.
Moreover, some programmers want to run the code first, see what it does, and then return to the code to delve into the details (and others use the opposite approach).
Consequently, there are various types of code samples in this book: some are short, some are long, and other code samples “build” from earlier code samples.
Current knowledge of Python 3.x is the most helpful skill. Knowledge of other programming languages (such as Java) can also be helpful because of the exposure to programming concepts and constructs. The less technical knowledge that you have, the more diligence will be required in order to understand the various topics that are covered.
As for the non-technical skills, it’s very important to have a strong desire to learn about data visualization, along with the motivation and discipline to read and understand the code samples.
The companion files contain all the code samples to save you time and effort from the error-prone process of manually typing code into a text file. In addition, there are situations in which you might not have easy access to these files. Furthermore, the code samples in the book provide explanations that are not available in the companion files.
The primary purpose of the code samples in this book is to show you Python-based libraries for data visualization. Clarity has higher priority than writing more compact code that is more difficult to understand (and possibly more prone to bugs). If you decide to use any of the code in this book in a production website, you ought to subject that code to the same rigorous analysis as the other parts of your code base.
If you are a Mac user, there are three ways to do so. The first method is to use Finder to navigate to Applications > Utilities and then double click on the Utilities application. Next, if you already have a command shell available, you can launch a new command shell by typing the following command:
open /Applications/Utilities/Terminal.app
A second method for Mac users is to open a new command shell on a Macbook from a command shell that is already visible simply by clicking command+n in that command shell, and your Mac will launch another command shell.
If you are a PC user, you can install Cygwin (open source https://cygwin.com/) that simulates bash commands, or use another toolkit such as MKS (a commercial product). Please read the online documentation that describes the download and installation process. Note that custom aliases are not automatically set if they are defined in a file other than the main start-up file (such as .bash_login).
All the code samples and figures in this book may be obtained by writing to the publisher at [email protected].
The answer to this question varies widely, mainly because the answer depends heavily on your objectives. If you are interested primarily in NLP, then you can learn more advanced concepts, such as attention, transformers, and the BERT-related models.
If you are primarily interested in machine learning, there are some subfields of machine learning, such as deep learning and reinforcement learning (and deep reinforcement learning) that might appeal to you. Fortunately, there are many resources available, and you can perform an Internet search for those resources. One other point: the aspects of machine learning for you to learn depend on who you are: the needs of a machine learning engineer, data scientist, manager, student or software developer are all different.
Oswald Campesato
September 2023