43,19 €
Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python
If you wish to learn how to implement Predictive Analytics algorithms using Python libraries, then this is the book for you. If you are familiar with coding in Python (or some other programming/statistical/scripting language) but have never used or read about Predictive Analytics algorithms, this book will also help you. The book will be beneficial to and can be read by any Data Science enthusiasts. Some familiarity with Python will be useful to get the most out of this book, but it is certainly not a prerequisite.
Social Media and the Internet of Things have resulted in an avalanche of data. Data is powerful but not in its raw form - It needs to be processed and modeled, and Python is one of the most robust tools out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age.
This book is your guide to getting started with Predictive Analytics using Python. You will see how to process data and make predictive models from it. We balance both statistical and mathematical concepts, and implement them in Python using libraries such as pandas, scikit-learn, and numpy.
You'll start by getting an understanding of the basics of predictive modeling, then you will see how to cleanse your data of impurities and get it ready it for predictive modeling. You will also learn more about the best predictive modeling algorithms such as Linear Regression, Decision Trees, and Logistic Regression. Finally, you will see the best practices in predictive modeling, as well as the different applications of predictive modeling in the modern world.
All the concepts in this book been explained and illustrated using a dataset, and in a step-by-step manner. The Python code snippet to implement a method or concept is followed by the output, such as charts, dataset heads, pictures, and so on. The statistical concepts are explained in detail wherever required.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 385
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2016
Production reference: 1050216
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-326-1
www.packtpub.com
Author
Ashish Kumar
Reviewer
Matt Hollingsworth
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Nikhil Karkal
Content Development Editor
Amey Varangaonkar
Technical Editor
Saurabh Malhotra
Copy Editor
Sneha Singh
Project Coordinator
Francina Pinto
Proofreader
Safis Editing
Indexer
Hemangini Bari
Graphics
Disha Haria
Kirk D'Penha
Production Coordinator
Shantanu N. Zagade
Cover Work
Shantanu N. Zagade
Data science is changing the way we go about our daily lives at an unprecedented pace. The recommendations you see on e-commerce websites, the technologies that prevent credit card fraud, the logic behind airline itinerary and route selections, the products and discounts you see in retail stores, and many more decisions are largely powered by data science. Futuristic sounding applications like self-driving cars, robots to do household chores, smart wearable technologies, and so on are becoming a reality, thanks to innovations in data science.
Predictive analytics is a branch of data science, used to predict unknown future events based on historical data. It uses a number of techniques from data mining, statistical modelling and machine learning to help make forecasts with an acceptable level of reliability.
Python is a high-level, object-oriented programming language. It has gained popularity because of its clear syntax and readability, and beginners can pick up the language easily. It comes with a large library of modules that can be used to do a multitude of tasks ranging from data cleaning to building complex predictive modelling algorithms.
I'm a co-founder at Tiger Analytics, a firm specializing in providing data science and predictive analytics solutions to businesses. Over the last decade, I have worked with clients at numerous Fortune 100 companies and start-ups alike, and architected a variety of data science solution frameworks. Ashish Kumar, the author of this book, is currently a budding data scientist at our company. He has worked on several predictive analytics engagements, and understands how businesses are using data to bring in scientific decision making to their organizations. Being a young practitioner, Ashish relates to someone who wants to learn predictive analytics from scratch. This is clearly reflected in the way he presents several concepts in the book.
Whether you are a beginner in data science looking to build a career in this area, or a weekend enthusiast curious to explore predictive analytics in a hands-on manner, you will need to start from the basics and get a good handle on the building blocks. This book helps you take the first steps in this brave new world; it teaches you how to use and implement predictive modelling algorithms using Python. The book does not assume prior knowledge in analytics or programming. It differentiates itself from other such programming cookbooks as it uses publicly available datasets that closely represent data encountered in business scenarios, and walks you through the analysis steps in a clear manner.
There are nine chapters in the book. The first few chapters focus on data exploration and cleaning. It is written keeping beginners to programming in mind—by explaining different data structures and then going deeper into various methods of data processing and cleaning. Subsequent chapters cover the popular predictive modelling algorithms like linear regression, logistic regression, clustering, decision trees, and so on. Each chapter broadly covers four aspects of the particular model—math behind the model, different types of the model, implementing the model in Python, and interpreting the results.
Statistics/math involved in the model is clearly explained. Understanding this helps one implement the model in any other programming language. The book also teaches you how to interpret the results from the predictive model and suggests different techniques to fine tune the model for better results. Wherever required, the author compares two different models and explains the benefits of each of the models. It will help a data scientist narrow down to the right algorithm that can be used to solve a specific problem. In addition, this book exposes the readers to various Python libraries and guides them with the best practices while handling different datasets in Python.
I am confident that this book will guide you to implement predictive modelling algorithms using Python and prepare you to work on challenging business problems involving data. I wish this book and its author Ashish Kumar every success.
Pradeep Gulipalli
Co-founder and Head of India Operations - Tiger Analytics
Ashish Kumar has a B. Tech from IIT Madras and is a Young India Fellow from the batch of 2012-13. He is a data science enthusiast with extensive work experience in the field. As a part of his work experience, he has worked with tools, such as Python, R, and SAS. He has also implemented predictive algorithms to glean actionable insights for clients from transport and logistics, online payment, and healthcare industries. Apart from the data sciences, he is enthused by and adept at financial modelling and operational research. He is a prolific writer and has authored several online articles and short stories apart from running his own analytics blog. He also works pro-bono for a couple of social enterprises and freelances his data science skills.
He can be contacted on LinkedIn at https://goo.gl/yqrfo4, and on Twitter at https://twitter.com/asis64.
I dedicate this book to my beloved grandfather who is the prime reason behind whatever I am today. He is my source of inspiration and he is the one I want to be like. Not a single line of this book was written without thinking about him; may you stay strong and healthy.
I want to acknowledge the support of my family, especially my parents and siblings. My conversations with them were the power source, which kept me going.
I want to acknowledge the guidance and support of my friends for insisting that I should do this when I was skeptical about taking this up. I would like to thank Ajit and Pranav for being the best friends one could ask for and always being there for me. A special mention to Vijayaraghavan for lending his garden for me to work in and relax post the long writing sessions. I would like to thank my college friends, especially my wing mates, Zenithers, who have always been pillars of support. My friends at the Young India Fellowship have made me evolve as a person and I am grateful to all of them.
I would like to thank my college friends, especially my wing mates, Zenithers, who have been pillars of support all throughout my life. My friends at the Young India Fellowship have made me evolve as a person and I am grateful to all of them.
I would like to extend my sincere gratitude to my faculty and well wishers at IIT Madras and the Young India Fellowship. The Tiger Analytics family, especially Pradeep, provided a conducive environment and encouraged me to take up and complete this task. I would also like to convey my sincere regards to Zeena Johar for believing in me and giving me the best learning and working opportunities, which were more than what I could have asked for in my first job.
I want to thank my editors Nikhil, Amey, Saurabh, Indrajit, and reviewer, Matt, for their wonderful comments and prompt responses. I would like to thank the entire PACKT publication team that was involved with ISBN B01782.
Matt Hollingsworth is a software engineer, data analyst, and entrepreneur. He has M.S. and B.S. degrees in Physics from the University of Tennessee. He is currently working on his MBA at Stanford, where he is putting his past experience with Big Data to use as an entrepreneur. He is passionate about about technology and loves finding new ways to use it to make our lives better.
He was part of the team at CERN that first discovered the Higgs boson, and he helped develop both the physics analysis and software systems to handle the massive data set that the Large Hadron Collider (LHC) produces. Afterward, he worked with Deepfield Networks to analyze traffic patterns in network telemetry data for some of the biggest computer networks in the world. He also co-founded Global Dressage Analytics, a company that provides dressage athletes with a web-based platform to track their progress and build high-quality training regimens.
If you are reading this book, chances are that you and him have a lot to talk about! Feel free to reach out to him at http://linkedin.com/in/mhworth or [email protected].
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Social media and the Internet of Things have resulted in an avalanche of data. The data is powerful but not in its raw form; it needs to be processed and modelled and Python is one of the most robust tools we have out there to do so. It has an array of packages for predictive modelling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age.
This book is your guide to get started with Predictive Analytics using Python as the tool. You will learn how to process data and make predictive models out of them. A balanced weightage has been given to both the statistical and mathematical concepts and implementing them in Python using libraries, such as pandas, scikit-learn, and NumPy. Starting with understanding the basics of predictive modelling, you will see how to cleanse your data of impurities and make it ready for predictive modelling. You will also learn more about the best predictive modelling algorithms, such as linear regression, decision trees, and logistic regression. Finally, you will see what the best practices in predictive modelling are, as well as the different applications of predictive modelling in the modern world.
Chapter 1, Getting Started with Predictive Modelling, talks about aspects, scope, and applications of predictive modelling. It also discusses various Python packages commonly used in data science, Python IDEs, and the methods to install these on systems.
Chapter 2, Data Cleaning, describes the process of reading a dataset, getting a bird's eye view of the dataset, handling the missing values in the dataset, and exploring the dataset with basic plotting using the pandas and matplotlib packages in Python. The data cleaning and wrangling together constitutes around 80% of the modelling time.
Chapter 3, Data Wrangling, describes the methods to subset a dataset, concatenate or merge two or more datasets, group the dataset by categorical variables, split the dataset into training and testing sets, generate dummy datasets using random numbers, and create simulations using random numbers.
Chapter 4, Statistical Concepts for Predictive Modelling, explains the basic statistics needed to make sense of the model parameters resulting from the predictive models. This chapter deals with concepts like hypothesis testing, z-tests, t-tests, chi-square tests, p-values, and so on followed by a discussion on correlation.
Chapter 5, Linear Regression with Python, starts with a discussion on the mathematics behind the linear regression validating the mathematics behind it using a simulated dataset. It is then followed by a summary of implications and interpretations of various model parameters. The chapter also describes methods to implement linear regression using the stasmodel.api and scikit-learn packages and handling various related contingencies, such as multiple regression, multi-collinearity, handling categorical variables, non-linear relationships between predictor and target variables, handling outliers, and so on.
Chapter 6, Logistic Regression with Python, explains the concepts, such as odds ratio, conditional probability, and contingency tables leading ultimately to detailed discussion on mathematics behind the logistic regression model (using a code that implements the entire model from scratch) and various tests to check the efficiency of the model. The chapter also describes the methods to implement logistic regression in Python and drawing and understanding an ROC curve.
Chapter 7, Clustering with Python, discusses the concepts, such as distances, the distance matrix, and linkage methods to understand the mathematics and logic behind both hierarchical and k-means clustering. The chapter also describes the methods to implement both the types of clustering in Python and methods to fine tune the number of clusters.
Chapter 8, Trees and Random Forests with Python, starts with a discussion on topics, such as entropy, information gain, gini index, and so on. To illustrate the mathematics behind creating a decision tree followed by a discussion on methods to handle variations, such as a continuous numerical variable as a predictor variable and handling a missing value. This is followed by methods to implement the decision tree in Python. The chapter also gives a glimpse into understanding and implementing the regression tree and random forests.
Chapter 9, Best Practices for Predictive Modelling, entails the best practices to be followed in terms of coding, data handling, algorithms, statistics, and business context for getting good results in predictive modelling.
Appendix, A List of Links, contains a list of sources which have been directly or indirectly consulted or used in the book. It also contains the link to the folder which contains datasets used in the book.
In order to make the best use of this book, you will require the following:
If you wish to learn the implementation of predictive analytics algorithms using Python libraries, then this is the book for you. If you are familiar with coding in Python (or some other programming/statistical/scripting language) but have never used or read about predictive analytics algorithms, this book will also help you. The book will be beneficial to and can be read by any data science enthusiasts. Some familiarity with Python will be useful to get the most out of this book but it is certainly not a pre-requisite.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: http://www.packtpub.com/sites/default/files/downloads/LearningPredictiveAnalyticswithPython_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the erratasubmissionform link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
Predictive modelling is an art; its a science of unearthing the story impregnated into silos of data. This chapter introduces the scope and application of predictive modelling and shows a glimpse of what could be achieved with it, by giving some real-life examples.
In this chapter, we will cover the following topics in detail:
Did you know that Facebook users around the world share 2,460,000 pieces of content every minute of the day? Did you know that 72-hours worth of new video content is uploaded on YouTube in the same time and, brace yourself, did you know that everyday around 2.5 exabytes (10^18) of data is created by us humans? To give you a perspective on how much data that is, you will need a million 1 TB (1000 GB) hard disk drives every day to store that much data. In a year, we will outgrow the US population and will be north of five times the UK population and this estimation is by assuming the fact that the rate of the data generation will remain the same, which in all likelihoods will not be the case.
The breakneck speed at which the social media and Internet of Things have grown is reflected in the huge silos of data humans generate. The data about where we live, where we come from, what we like, what we buy, how much money we spend, where we travel, and so on. Whenever we interact with a social media or Internet of Things website, we leave a trail, which these websites gleefully log as their data. Every time you buy a book at Amazon, receive a payment through PayPal, write a review on Yelp, post a photo on Instagram, do a check-in on Facebook, apart from making business for these websites, you are creating data for them.
Harvard Business Review (HBR) says "Data is the new oil" and that "Data Scientist is the sexiest job of the 21st century". So, why is the data so important and how can we realize the full potential of it? There are broadly two ways in which the data is used:
Let us evaluate the comparisons made with oil in detail:
A more detailed comparison of oil and data is provided in the following table:
Data
Oil
It's a non-depleting resource and also reusable.
It's a depleting resource and non-reusable.
Data collection requires some infrastructure or system in place. Once the system is in place, the data generation happens seamlessly.
Drilling oil requires a lot of infrastructure. Once the infrastructure is in place, one can keep drawing the oil until the stock dries up.
It needs to be cleaned and modelled.
It needs to be cleaned and processed.
The time taken to generate data varies from fractions of second to months and years.
It takes decades to generate.
The worth and marketability of different kinds of data is different.
The worth of crude oil is same everywhere. However, the price and marketability of different end products of refinement is different.
The time horizon for monetization of data is smaller after getting the data.
The time horizon for monetizing oil is longer than that for data.
Predictive modelling is an ensemble of statistical algorithms coded in a statistical tool, which when applied on historical data, outputs a mathematical function (or equation). It can in-turn be used to predict outcomes based on some inputs (on which the model operates) from the future to drive a goal in business context or enable better decision making in general.
To understand what predictive modelling entails, let us focus on the phrases highlighted previously.
Statistics are important to understand data. It tells volumes about the data. How is the data distributed? Is it centered with little variance or does it varies widely? Are two of the variables dependent on or independent of each other? Statistics helps us answer these questions. This book will expect a basic understanding of basic statistical terms, such as mean, variance, co-variance, and correlation. Advanced terms, such as hypothesis testing, Chi-Square tests, p-values, and so on will be explained as and when required. Statistics are the cog in the wheel called model.
Algorithms, on the other hand, are the blueprints of a model. They are responsible for creating mathematical equations from the historical data. They analyze the data, quantify the relationship between the variables, and convert it into a mathematical equation. There is a variety of them: Linear Regression, Logistic Regression, Clustering, Decision Trees, Time-Series Modelling, Naïve Bayes Classifiers, Natural Language Processing, and so on. These models can be classified under two classes:
The selection of a particular algorithm for a model depends majorly on the kind of data available. The focus of this book would be to explain methods of handling various kinds of data and illustrating the implementation of some of these models.
There are a many statistical tools available today, which are laced with inbuilt methods to run basic statistical chores. The arrival of open-source robust tools like R and Python has made them extremely popular, both in industry and academia alike. Apart from that, Python's packages are well documented; hence, debugging is easier.
Python has a number of libraries, especially for running the statistical, cleaning, and modelling chores. It has emerged as the first among equals when it comes to choosing the tool for the purpose of implementing preventive modelling. As the title suggests, Python will be the choice for this book, as well.
Our machinery (model) is built and operated on this oil called data. In general, a model is built on the historical data and works on future data. Additionally, a predictive model can be used to fill missing values in historical data by interpolating the model over sparse historical data. In many cases, during modelling stages, future data is not available. Hence, it is a common practice to divide the historical data into training (to act as historical data) and testing (to act as future data) through sampling.
As discussed earlier, the data might or might not have an output variable. However, one thing that it promises to be is messy. It needs to undergo a lot of cleaning and manipulation before it can become of any use for a modelling process.
Most of the data science algorithms have underlying mathematics behind them. In many of the algorithms, such as regression, a mathematical equation (of a certain type) is assumed and the parameters of the equations are derived by fitting the data to the equation.
For example, the goal of linear regression is to fit a linear model to a dataset and find the equation parameters of the following equation:
The purpose of modelling is to find the best values for the coefficients. Once these values are known, the previous equation is good to predict the output. The equation above, which can also be thought of as a linear function of Xi's (or the input variables), is the linear regression model.
Another example is of logistic regression. There also we have a mathematical equation or a function of input variables, with some differences. The defining equation for logistic regression is as follows:
Here, the goal is to estimate the values of a and b by fitting the data to this equation. Any supervised algorithm will have an equation or function similar to that of the model above. For unsupervised algorithms, an underlying mathematical function or criterion (which can be formulated as a function or equation) serves the purpose. The mathematical equation or function is the backbone of a model.
All the effort that goes into predictive analytics and all its worth, which accrues to data, is because it solves a business problem. A business problem can be anything and it will become more evident in the following examples:
The predictive analytics is being used in an array of industries to solve business problems. Some of these industries are, as follows:
By what quantum did the proposed solution make life better for the business, is all that matters. That is the reason; predictive analytics is becoming an indispensable practice for management consulting.
In short, predictive analytics sits at the sweet spot where statistics, algorithm, technology and business sense intersect. Think about it, a mathematician, a programmer, and a business person rolled in one.
As discussed earlier, predictive modelling is an interdisciplinary field sitting at the interface and requiring knowledge of four disciplines, such as Statistics, Algorithms, Tools, Techniques, and Business Sense. Each of these disciplines is equally indispensable to perform a successful task of predictive modelling.
These four disciplines of predictive modelling carry equal weights and can be better represented as a knowledge matrix; it is a symmetric 2 x 2 matrix containing four equal-sized squares, each representing a discipline.
Fig. 1.1: Knowledge matrix: four disciplines of predictive modelling
The tasks involved in predictive modelling follows the Pareto principle. Around 80% of the effort in the modelling process goes towards data cleaning and wrangling, while only 20% of the time and effort goes into implementing the model and getting the prediction. However, the meaty part of the modelling that is rich with almost 80% of results and insights is undoubtedly the implementation of the model. This information can be better represented as a matrix, which can be called a task matrix that will look something similar to the following figure:
Fig. 1.2: Task matrix: split of time spent on data cleaning and modelling and their final contribution to the model
Many of the data cleaning and exploration chores can be automated because they are alike most of the times, irrespective of the data. The part that needs a lot of human thinking is the implementation of a model, which is what makes the bulk of this book.
In the introductory section, data has been compared with oil. While oil has been the primary source of energy for the last couple of centuries and the legends of OPEC, Petrodollars, and Gulf Wars have set the context for the oil as a begrudged resource; the might of data needs to be demonstrated here to set the premise for the comparison. Let us glance through some examples of predictive analytics to marvel at the might of data.
If you are a frequent LinkedIn user, you might be familiar with LinkedIn's "People also viewed" feature.
Let's say you have searched for some person who works at a particular organization and LinkedIn throws up a list of search results. You click on one of them and you land up on their profile. In the middle-right section of the screen, you will find a panel titled "People Also Viewed"; it is essentially a list of people who either work at the same organization as the person whose profile you are currently viewing or the people who have the same designation and belong to same industry.
Isn't it cool? You might have searched for these people separately if not for this feature. This feature increases the efficacy of your search results and saves your time.
Are you wondering how LinkedIn does it? The rough blueprint is as follows:
If you browse the Internet, which I am sure you must be doing frequently, you must have encountered online ads, both on the websites and smartphone apps. Just like the ads in the newspaper or TV, there is a publisher and an advertiser for online ads too. The publisher in this case is the website or the app where the ad will be shown while the advertiser is the company/organization that is posting that ad.
The ultimate goal of an online ad is to be clicked on. Each instance of an ad display is called an impression. The number of clicks per impression is called Click Through Rate and is the single most important metric that the advertisers are interested in. The problem statement is to determine the list of publishers where the advertiser should publish its ads so that the Click Through Rate is the maximum.
The logistical regression is one of the most standard classifiers for situations with binary outcomes. In banking, whether a person will default on his loan or not can be predicted using logistical regression given his credit history.
Based on the historical data consisting of the area and time window of the occurrence of a crime, a model was developed to predict the place and time where the next crime might take place.
The good news is that the police are using such techniques to predict the crime scenes in advance so that they can prevent it from happening. The bad news is that certain terrorist organizations are using such techniques to target the locations that will cause the maximum damage with minimal efforts from their side. The good news again is that this strategic behavior of terrorists has been studied in detail and is being used to form counter-terrorist policies.
The accelerometer in a smartphone measures the acceleration over a period of time as the user indulges in various activities. The acceleration is measured over the three axes, X, Y, and Z. This acceleration data can then be used to determine whether the user is sleeping, walking, running, jogging, and so on.