23,99 €
Data Analytics Made Easy is an accessible beginner’s guide for anyone working with data. The book interweaves four key elements:
Data visualizations and storytelling – Tired of people not listening to you and ignoring your results? Don’t worry; chapters 7 and 8 show you how to enhance your presentations and engage with your managers and co-workers. Learn to create focused content with a well-structured story behind it to captivate your audience.
Automating your data workflows – Improve your productivity by automating your data analysis. This book introduces you to the open-source platform, KNIME Analytics Platform. You’ll see how to use this no-code and free-to-use software to create a KNIME workflow of your data processes just by clicking and dragging components.
Machine learning – Data Analytics Made Easy describes popular machine learning approaches in a simplified and visual way before implementing these machine learning models using KNIME. You’ll not only be able to understand data scientists’ machine learning models; you’ll be able to challenge them and build your own.
Creating interactive dashboards – Follow the book’s simple methodology to create professional-looking dashboards using Microsoft Power BI, giving users the capability to slice and dice data and drill down into the results.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 545
Veröffentlichungsjahr: 2021
Data Analytics Made Easy
Analyze and present data to make informed decisions without writing any code
Andrea De Mauro
BIRMINGHAM - MUMBAI
Data Analytics Made Easy
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Tushar Gupta
Acquisition Editor – Peer Reviews: Saby Dsilva
Content Development Editor: Bhavesh Amin
Technical Editor: Gaurav Gavas
Project Editor: Namrata Katare
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Presentation Designer: Ganesh Bhadwalkar
First published: August 2021
Production reference: 2200921
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-80107-415-5
www.packt.com
Writing a book that tries to combine theory and practice of such a vast field has been possible only thanks to the eye-opening inputs, the rigorous feedback, and the heartfelt encouragement of so many wonderful people. Towards all those colleagues and friends, I am now left with a sense of profound gratitude. Although writing the names of some of them does not do them full justice, let me thank: Dimitrios Skoufakis, Maria Navrotskaya, Francesco Pisanò, Salvatore Gatto, Leonardo and Alessandro De Mauro, Lenka Dzurendova, Michele Pacifico (how could I do without his brilliant advice?), Adam Graham, Kate Daley, Angelo Spedicati, Francesco Lefons, Marcin Czajkowski, Alessio Villardita, Antonio Faraldi, Giorgio Binenti, Giuseppe Papaianni, Jacek Ludwig Scarso, Jon Thomson, Piril Paker Yagli, Paolo Palazzo, Gilda Notaro (whom I dearly miss), Felice Di Tanno, Giorgio Demetrio, Roberto Bellotti, Marcello Lando, my dear parents Gianfranco and Maria Teresa, Dyi Huijg, Cristina Trapani-Scott and all the other hosts at the Shut Up & Write!® events that accompanied so much of my evenings and weekend writing, Katia Cocca, Davide D'Emiliano, Simona Palomba, Luisa Fabro, Antonio Gatto, Daniela Meo, Rachel Breslin, Laurent Eyers, Antonella Rossi, Kasia Bojanowska, Fabio Pistilli, Jerryn Cherian, Saurabh Dichwalkar, Michael Leonhardt, Kacper Hankiewicz, Nori Reis, Tutku Oztekin, Carolina Martinez, Miguel Estrella (the picture of me on the front cover is his, although I recognize that—given the poor subject—it doesn't give full justice to his outstanding skills as a photographer), Mario Galietti, Nicola Lopez, Antonio Fazzari, Paola Lucetti, Vinay Ahuja, Giuliana Farbo, Taide Guajardo, Luca Merlo, Paolo Grue, Francesca Sagramora, Nicolas Kerling, Guy Peri, and all the other amazing friends, colleagues, and leaders at P&G who have passionately worked alongside me to elevate the role of data analytics in the way business is done. It would have been impossible for me to address this subject without having the opportunity to investigate it systematically alongside my many precious partners in academic research and university teaching: Valentina Poggioni, Mohamed Almgerbi, Adham Kahlawi, Andrea Sestino, Paola Demartini, Cristiano Ciappei, Gianluca Cubadda, Luca Petruzzellis, Pasquale Del Vecchio, Giusy Secundo, Andrea Bacconi, Gaetano Cascini, Francesca Montagna, Nikos Tsourakis, Dogan Duven and the amazing staff at the IUG, Alberto Pezzi, Simone Malacaria, Marco Greco, and Michele Grimaldi. I am in debt to the authors of the two thought-provoking forewords, Andy Walter and Francesco Marzoni: the best pages of the entire book are certainly theirs.
I am also very grateful to Tushar Gupta, Ravit Jain, Namrata Katare, Bhavesh Amin, Gaurav Gavas, and the rest of the amazing team at Packt for their high-quality professional support (they made a real book out of a shaky manuscript) and the vast patience they had with me throughout the last few months, and with Scott Fincher for making the book way better thanks to his careful content review and precious feedback. A special thank you goes to my dear Sławka G. Scarso: her true writing talent has always been my primary source of inspiration. Without her patience, support, and writing coaching, I wouldn't have gone far.
If it weren't for all these people, this book wouldn't be in your hands. So, if you find it helpful, the full appreciation should solely go to them.
Andrea De Mauro
A common misconception undermines the effectiveness of most digital transformations across industries. It's about the belief that hiring a pool of data and digital experts is enough to become a data-enabled, cognitive organization.
The continuum of data, analytics, and artificial intelligence is pervasive in nature. Creating sustainable value with data requires multifunctional teams of data experts, digital experience designers, business strategists, business process experts, and many more roles to partner up and work together with a common denominator of knowledge about both data and business. It's a team sport, as Andrea will say.
Hence, at least two types of capability development efforts are needed to shape a future-ready organization. They are the ones that develop:
business literacy of the few digital experts, so that they can proactively identify data-driven business opportunities and influence, with inclusive collaboration and credibility, different functions and processes;data literacy of the many professionals in the organization and beyond, so that they can be key actors in co-developing digital and cognitive capabilities as well as leaders of process transformation and systemic adoption of those capabilities.Several efforts of Andrea over the past years and the pages in this book are an important contribution to the industry and an essential tool for anybody who has stakes in the development of Data Literacy for The Many in their professional (or personal!) ecosystem. In fact, it's not just about the professional arena. It's about enabling every individual to play an active role in the quest to improve the state of the world. It's about making better collective decisions for our society. All with the power of data.
By democratizing data literacy and understanding analytics, we drive positive progress on at least three levels.
The single organization. Analytics is ultimately about creating actionable knowledge from data and then combining that knowledge with newly available data to create further actionable knowledge. And so on so forth in a systemic series of iterations. Attempting to drive analytics with the sole effort of analytics experts is equivalent to starting that loop of knowledge and data from scratch, as if an organization had no history and as if its people had no domain expertise. With data literacy democratization, multifunctional teams can build value with a common dictionary together.An entire industry. It's linked to AI Ethics and Trustworthiness, which means a responsible use of data and algorithms, without which data is not an asset, it's a liability. Having leaders that are fluent in data analytics across the value chain of a specific industry is a pre-condition for responsible AI. Why? Because building value with AI requires at least two steps: defining an objective function (what you want to optimize, like costs, sales, customer convenience, and so on) and codifying a set of constraints (within which to find an optimal solution) into mathematical equations. These constraints are ultimately the business decisions of non-analytics practitioners. Limiting the amount of sugar in food products despite a negative impact on sales growth potential or limiting the screen time of users of a digital platform despite a negative impact on advertising revenue, are just two examples of business decisions that leaders need to take to provide relevant inputs to a good data-enabled business strategy.Society at large. A higher penetration of data literacy across society means a higher number of decisions and actions that individuals will make based on facts and not on opinions. As the good Hans Rosling taught us over the years, data is the best tool we have to understand the world and think with clarity about its evolution. Being it facing a pandemic, reducing our carbon footprint, redesigning a justice system, increasing access to education, managing a healthcare system…the more we learn to do it based on data, instead of following guts and opinions, the more resilient our society becomes.Shaping digital transformation with the many, and not only with the few, creates sustainable shared value. Join us in driving the good data revolution to build a better society. Enjoy the learning!
Francesco Marzoni
Chief Data & Analytics Officer of IKEA Retail (Ingka Group)
In January 2010, I was presented with a unique opportunity from our P&G CEO and CIO to lead the analytics transformation of the company. They believed that "analytics" was going to transform P&G, our industry, and business in general. They wanted me to lead a complete transformation of analytics, data, talent, technology, and how we approach and drive value for individual business units, functional areas, and the company overall.
Driving a holistic/business-impacting analytics program is not easy. One of the many challenges is helping the leadership across the company: executives, functional leaders, analytics practitioners, and even the "frozen middle" – individuals happy with the status quo, understand the analytics journey they are embarking on; call it the basics of analytics, or call it Data Analytics Made Easy!
Andrea does a great job of providing the analytic fundamentals that professionals of all levels need on this journey. Whether you are a junior professional breaking into data science, a business leader across marketing, supply chain, HR, etc., or the executive tasked with starting/fixing/transforming your company's analytics journey, you are starting in the right place. Dig in!
As Andrea unfolds the key steps of data analytics, keep the following in mind:
Start with the business need and strategy – Sounds simple but is done incredibly poorly by most. All analytics start with the business problem you are trying to solve! It is not about fancy technology, interesting datasets, or impassioned leaders preaching to the crowds. It is about the business need. Chapter 1 provides insights into the types of analytics and how to tie them to business needs.
Invest in talent – Special Operation Forces, Rule #1: Humans are more important than hardware. Talent is critical! But like any asset, how you leverage it makes all the difference. The best analysts have three skills: 1) Analytics expertise, 2) Deep business knowledge in the domain or business unit they are working in, and 3) Effective communications skills. Focus on developing all three aspects of great analysts. Great analytics expertise without context is useless. Great business knowledge and analytics without the ability to communicate/influence makes it slow and tedious. Great communication without substance is smoke and mirrors – you know, the PowerPoint warriors. Chapters 7 and 8 are critical to the journey.
Don't wait for the data to be perfect – In talking with numerous Fortune 500 companies, one of the insights I always share is do not wait to get the data perfect. I remember the CIO during this discussion who turned as white as a ghost! They had been spending money for two years trying to get the data right before trying to do any analytics with the business. This is a waste! First, you do not know what data will be most critical without driving true business analytics. Second, nothing gets data cleaner faster than presenting it to the senior leadership of the company! Chapter 3 jumps in here, but as stated, move fast to create value for the business.
Select tools that allow your analysts and data scientists to adapt/harmonize on the fly – It is key to select the tools that allow your analysts and business teams to adjust quickly, add new data sources, and so on. Do not create a model dependent on a central team to "code" for every new business problem or adaptation. Andrea does a nice job of laying out the tools, and Chapter 9 provides key learnings as you extend the toolbox.
Network beyondyour company and industry – I immediately realized there were extremely smart people working in other companies, industry bodies, academia, and non-profit institutes that could be incredibly valuable to the journey. Seek them out and learn with the best together.
Congrats on making the personal investment with this book and on your journey. Make Data Analytics Made Easy the start of your journey, and never stop learning!
Andrew J. Walter
Board and Strategic Advisor, former SVP at Proctor & Gamble
Andrea De Mauro is director of business analytics at Procter & Gamble, looking after the continuous elevation of the role of data and algorithms in the business and the development of digital fluency across the global organization. He has more than 15 years of international experience in leading data analytics initiatives across multiple business domains, including sales, marketing, finance, and product supply. He is also a professor of marketing analytics and applied machine learning at the Universities of Bari and Florence, Italy, and the International University in Geneva, Switzerland. His research investigates the essential components of big data as a phenomenon and the impact of AI and data analytics on companies and people. He is the author of popular science books on data analytics and various research papers in international journals.
Scott Fincher is a data scientist with KNIME based in Austin, TX. He routinely teaches, presents, and leads group workshops covering topics such as KNIME Analytics Platform, machine learning, and the broad data science umbrella. He enjoys assisting other data scientists with general best practices and model optimization. For Scott, this is not just an academic exercise. Prior to his work at KNIME, he worked for almost 20 years as an environmental consultant, with a focus on numerical modeling of atmospheric pollutants. Scott holds an MS in statistics and a BS in meteorology, both from Texas A&M University.
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the data files
Download the color images
Conventions used
Get in touch
Share your thoughts
What is Data Analytics?
Three types of data analytics
Descriptive analytics
Predictive analytics
Prescriptive analytics
Data analytics in action
Who is involved in data analytics?
Technology for data analytics
The data analytics toolbox
From data to business value
Summary
Getting Started with KNIME
KNIME in a nutshell
Moving around in KNIME
Nodes
Hello World in KNIME
CSV Reader
Sorter
Excel Writer
Cleaning data
Excel Reader
Duplicate Row Filter
String Manipulation
Row Filter
Missing Value
Column Filter
Column Rename
Column Resorter
CSV Writer
Summary
Transforming Data
Modeling your data
Combining tables
Joiner
Aggregating values
GroupBy
Pivoting
Tutorial: Sales report automation
Concatenate
Number To String
Math Formula
Group Loop Start
Loop End
String to Date&Time
Date&Time-based Row Filter
Table Row to Variable
Extract Date&Time Fields
Line Plot
Image Writer (Port)
Summary
What is Machine Learning?
Introducing artificial intelligence and machine learning
The machine learning way
Scenario #1: Predicting market prices
Scenario #2: Segmenting customers
Scenario #3: Finding the best ad strategy
The business value of learning machines
Three types of learning algorithms
Supervised learning
Unsupervised learning
Reinforcement learning
Selecting the right learning algorithm
Evaluating performance
Regression
Classification
Underfitting and overfitting
Validating a model
Pulling it all together
Summary
Applying Machine Learning at Work
Predicting numbers through regressions
Statistics
Partitioning
Linear regression algorithm
Linear Regression Learner
Regression Predictor
Numeric Scorer
Anticipating preferences with classification
Decision tree algorithm
Decision Tree Learner
Decision Tree Predictor
Scorer
Random forest algorithm
Random Forest Learner
Random Forest Predictor
Moving Aggregation
Line Plot (local)
Segmenting consumers with clustering
K-means algorithm
Numeric Outliers
Normalizer
k-Means
Denormalizer
Color Manager
Scatter Matrix (local)
Conditional Box Plot
Summary
Getting Started with Power BI
Power BI in a nutshell
Walking through Power BI
Loading data
Transforming data
Defining the data model
Building visuals
Tutorial: Sales Dashboard
Summary
Visualizing Data Effectively
What is data visualization?
A chart type for every message
Bar charts
Line charts
Treemaps
Scatterplots
Finalizing your visual
Summary
Telling Stories with Data
The art of persuading others
The power of telling stories
The data storytelling process
Setting objectives
Selecting scenes
Evolution
Comparison
Relationship
Breakdown
Distribution
Applying structure
Beginning
Middle
End
Polishing scenes
Focusing attention
Making scenes accessible
Finalizing your story
The data storytelling canvas
Summary
Extending Your Toolbox
Getting started with Tableau
Python for data analytics
A gentle introduction to the Python language
Integrating Python with KNIME
Automated machine learning
AutoML in action: an example with H2O.ai
Summary
And now?
Useful Resources
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Why subscribe?
Other Books You May Enjoy
Index
Once you've read Data Analytics Made Easy, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
Before we start our journey across the vast and exciting land of data analytics, it is wise to get equipped with an up-to-date map that can show us the way. In this chapter, you will cover all those fundamental concepts you need to visualize, with clarity, the role of data analytics in companies. This will let you spot opportunities for leveraging data and decide how to distill business value out of it. You also want to feel confident about the naming conventions adopted in this domain to avoid any confusion and speak decisively with those around you. Given the hectic development of data analytics these days, it is a wise choice to build a robust foundation of the key concepts before getting our hands dirty with tables and algorithms.
Specifically, within this chapter, you will find answers to the following questions:
What types of analytics can we find in companies?Who should be designing, maintaining, and using them?What technology is required for data analytics to work?What is the data analytics toolbox and what does it contain?How can data be transformed into business value?Although this initial part of this book is more theoretical than the rest of it, let me make a promise: all the concepts you'll encounter are there to enable a better understanding of what you'll find ahead in your journey as a data analytics practitioner. We are now fully ready to go; let's get started!
The term data analytics normally denotes those processes and techniques used to extract some sort of value from data. Sometimes, the same term indicates the actual tools used to make this transformation happen. In any case, data analytics represents how we can transform crude data into something more actionable and valuable. We can recognize three different types of data analytics, each one carrying its own set of peculiarities and possible applications: descriptive, predictive, and prescriptive analytics.
Descriptive analytics is the unmissable "bread and butter" of any analytical effort. These methodologies focus on describing past data to make it digestible and useable as required by the business need. They answer the generic question "what happened?" by leveraging summary statistics (like average, median, and variance) and simple transformations and aggregations (like indices, counts, and sums), ultimately displaying the results through tables and visuals. The iconic (and most basic) deliverable within the camp of descriptive analytics is the standalone report: this can be a file in a portable format (PDF documents and Excel worksheets are the most popular ones) that is distributed on a regular basis via email or posted in a shared repository. Most managers love reports as they can find all the Key Performance Indicators (KPIs) of interest at hand, with minimal effort from their side. In fact, they don't need to "go and look" for anything: it's data by itself coming their way, right in the format they need. A more sophisticated deliverable within descriptive analytics is the interactive dashboard: in this case, users access a web-based interface from which they are guided through their data of interest. Visuals and tables will display the most relevant aspects of the business, while filters, selectors, and buttons offer users the possibility to customize their journey through data, drilling down into the aspects they are mostly intrigued by. Sometimes, dashboards are specifically designed to please senior executives, focusing on top KPIs only: in this case, they are known by the more picturesque name of management cockpits. If the standalone report gives you a preset guided tour, interactive dashboards will let you drive yourself through your data, giving you the possibility to take unusual paths. Although the latter have clearly more potential in relation to their ability to unveil useful insights, some less adventurous managers will still prefer the comfort of receiving standalone reports directly in their inbox. To please both cases at once, dashboards can be set to offer subscriptions: in this case, users can sign up to receive a selection of visuals or tables from the dashboard via email regularly, or as soon as the data changes. Subscriptions are a promising feature as they avoid the duplication of efforts in updating dashboards and disseminating reports.
Predictive analytics focuses on answering the natural follow-up questions that you have after learning what happened in the past, such as: "why did it happen?" and "what will happen now?". These methodologies leverage more sophisticated techniques, including AI, to go beyond the mere description of historical facts. By using them, we can make sense of the causal relationships that lie under our data and extrapolate them, to show what the future is most likely going to look like. The simplest examples of predictive analytics are diagnostic tools: they enrich the more traditional descriptive reports with a model-based inference of possible causes behind what we see in data. By using basic methods like correlation analysis, control charting, and tests of statistical significance, these tools can highlight interesting patterns, shedding light on the reasons why the business is going in a certain way. The next level of sophistication is brought by business alerts: in this case, diagnostic checks are carried out automatically and users get notified when some situation of business interest (like the market share for a brand going below a certain threshold) arises. Similarly, in the case of anomaly detection, algorithms continuously inspect data to find any inconsistency in patterns and flag it as such, so it can be managed accordingly: for instance, the data generated by sensors in a production line can be used to spot malfunctions early on and trigger the required maintenance. Predictive analytics also includes methods that anticipate the future by generating forecasts of measures, such as sales, price, market size, and level of risk, all of which can certainly help managers make better decisions and prepare for what's about to come. Also, the behavior of individual entities like consumers and competitors can be forecasted, producing a competitive advantage and an improved Return on Investment (ROI) of forthcoming activities. In the case of propensity models, AI is leveraged to predict how much a customer is going to like a commercial offer, or how likely are they to leave our store or service (churn), enabling us to finetune our retention activities. One last example of predictive analytics is when we use data and algorithms to create smart segmentations of our business, like when we're grouping together similar customers, stores, or products. By tailoring the way we manage them, we can make our operations more efficient and the experience of our customers more personalized and engaging.
Prescriptive analytics transforms data into a recommended course of action, by answering the ultimate question every business manager has: "what should be done?". If descriptive and predictive analytics produces insights and informs us about our business, prescriptive analytics is certainly more assertive and direct: it tells us what to do. For example, it can simulate a large set of alternative scenarios and implement a systematic optimization across them: the output would be the "best recipe" to follow to maximize profit or minimize cost, given the current conditions. Other examples of prescriptive analytics are the so-called recommendation systems: essentially, they provide users with recommendations on products. These algorithms are virtually omnipresent in our everyday digital experience. When we shop around on Amazon or struggle to pick which TV series to binge watch next on Netflix, we are going to be presented with the output of some recommendation systems, offering us a limited number of options we are likely going to like. The ultimate level of sophistication comes when prescriptive analytics is not only recommending what to do but is in charge of doing it! In fact, when they are run in real time, no human being might be able to fully control and explicitly approve every decision or recommendation that the machine has come up with. In some cases, algorithms are designed to behave like autonomous agents: these are in charge of learning through an iterative trial and error process. They will continuously test their strategy in the real world and correct it as necessary, with the ultimate objective of maximizing returns in the long term. This is what happens, for instance, with automated trading (a rising trend in FinTech) and programmatic advertising (which is the real-time buying of digital media through automatic bids).
You will find in your readings that, as a naming convention, the descriptive layer is mostly associated with the term Business Intelligence, while predictive and prescriptive analytics are known by Advanced Analytics. Since there are no clear cuts across these domains, sometimes, it can get confusing – don't worry, it's normal.
Over the course of this book, you will encounter all types of data analytics, and you will learn how to make them come true in your business. Figure 1.1 shows the trade-off between the potential value and complexity of these different types of analytics:
Figure 1.1: Types of data analytics – how much value would you like to unlock?
Now that we have grasped the fundamentals behind descriptive, predictive, and prescriptive analytics, it's time to see them in action. Katia is the proud Chief Data Officer (CDO) of a multinational hotel chain. Let's see how she summarizes a list of data analytics capabilities they have put together in the last few years:
Board members receive a 5-page report every month via email. It includes a short executive summary prepared by analysts with key highlights and a set of standard tables and charts with KPI trends by country, banner, and hotel type.All managers have access to an online dashboard that's refreshed every week with top measures of interest and the possibility to drill down to very granular views, such as the latest occupancy levels of individual hotels or current room rates.Area managers are subscribed to an alerting tool that notifies them when some facilities are off course to meeting their monthly goals. An automated email is sent when the forecast for the month is below a certain threshold versus targets.Room rates are dynamically managed by a central system, which they've called AutoPrice. The system simulates different occupancy levels by facility, taking into account seasonality, trends, and interest displayed by web users. AutoPrice can adjust room rates daily so as to maximize profit, given the expected sales and costs.Customers who are part of the reward program occasionally receive special offers for room upgrades or weekend escapes in luxury hotels. A propensity model tailors the offer (controlling the level of discount) to maximize redemption and expected profit.Newsletter subscribers are grouped into four homogeneous segments (Business, Families, Deal-hunters, and Premium) according to their sociodemographic traits and interests. The content of each monthly newsletter is diversified by segment. Every new subscriber is automatically associated with a segment when they sign up.When the COVID-19 pandemic broke out, analysts built a statistical model to anticipate the upcoming impact on sales by country and built alternative scenarios of evolution. Based on this work, the company put together a successful response plan, which included temporary closures and down-staffing, conversions of restaurants into takeaway services, and conversions of rooms for low-care infected patients in partnership with local authorities.Following the launch of a competitive chain in France, the digital marketing team partnered with data scientists to build a data-enabled reaction plan to boost communication activities in the areas with the biggest risk of losing customers. Additionally, they built a churn model to send individual retention offers to those members who were most inclined to move to a new competitor.We have to stop Katia right now; otherwise, she would go on and on for a few more pages! All these examples show how pervasive and versatile data analytics can be for a business. They also give us the opportunity to notice some general patterns whose value goes beyond the hotel chain's case:
The three types of data analytics are not necessarily alternative to each other. In truth, they tend to co-exist in companies – you don't have to "pick" one. For example, you will always have a need for ongoing descriptive analytics, such as reports for management or dashboards, even if you have some forward-looking, advanced analytics tricks up your sleeve. You can't just "ignore" the basics; otherwise, you'll receive less traction for everything else.The same aspect of a business might get value from each of the three types of analytics. Take pricing in the example of the hotel: you can report average room rates (descriptive), you can forecast scenarios based on different prices (predictive), or you can automatically set the best room rates (prescriptive). For each aspect of the business, there is an opportunity for each type of analytics.The different types of analytics can partially overlap and cross-enrich. For example, in the management dashboard (descriptive), we might add some different color coding, depending on the likelihood that a hotel is going to meet its target for the year (predictive).Some opportunities to create value through data analytics are ad hoc, contingent on specific one-off circumstances: this was the case for the COVID-19 model and the response to a competitive move. Others are ongoing and systematic, like the regular reports and updates to the dashboard or real-time price optimization. Data analytics can bring value to both ad hoc and ongoing cases.Thanks to this example, we now have a clearer picture of what the three different levels of analytics look like. Before moving on to the next topic, it's worth thinking about what differentiates business intelligence (descriptive) from advanced analytics (predictive and prescriptive) in terms of business impact, like Table 1.1 summarizes:
Aspect
Descriptive (Business Intelligence)
Predictive/Prescriptive (Advanced Analytics)
User base
Broad
Limited
Implementation complexity
Low
High
Trust required by management
Low
High
Potential value
Low
High
Table 1.1: Business impact of different analytics
Descriptive analytics tends to have a broad set of potential users in the company. The same dashboards and reports can be of use for many colleagues across different levels of seniority, business units, and functions. Indeed, one of the widespread feelings in companies is that "data is there, but not sure where": the more data can be democratized with a solid business intelligence offering, the more its value is unlocked. Considering the potential breadth of the user base these capabilities have, it's worth planning well for "mass" deployment activities so that everyone has the opportunity to become a user and the business impact of the capability is maximized. Often, dashboards are underutilized in the long run – or simply forgotten – because people don't know how to use them: an even bigger reason to plan for regular training sessions is to keep them accessible to everyone, including newcomers.
Needless to say, the complexity of designing and implementing predictive and prescriptive analytics is higher than for descriptive. You will require skilled business analysts and data scientists (we'll talk more about roles in the next few pages), and the time needed for prototyping, and then deploying and scaling, is normally more. Hence, it's worth proceeding with an agile and iterative mode, so as to unlock incremental value progressively and avoid losing the momentum and the enthusiasm from stakeholders.
Advanced analytics has a tougher job to get accepted within a company and requires a more decisive sponsorship from senior managers to go through. Think about that: descriptive analytics enables better decisions by informing people about what's going on. On the other end, prescriptive analytics will bluntly tell you what the best decision you can make is, potentially restricting the ability of managers' gut feelings to guide the business. Algorithms who are "in charge" of decisions require higher trust to be accepted than just plain reports. The more you progress to advanced analytics, the more you need to involve top management and have them sponsor the transformation. This will counterbalance the natural tendency of people to "protect" their role and power against all threats, including those algorithms that assertively prescribe decisions.
Like most things in life, you get what you pay for. Advanced analytics is more complex to build and requires more management attention, but the potential value it can unlock is higher than what descriptive analytics can do. My advice is to look for opportunities to progressively elevate the role of analytics, moving the footprint of your capabilities toward the advanced end of the ladder. At the same time, you don't want to "forget" about the power of enabling a broad set of colleagues to manage their business in a smarter way, through democratized access to data via descriptive analytics.
The short answer to the heading of this section is also the most obvious: everyone has a role to play in data analytics – nobody is excluded! In fact, all knowledge workers will undoubtedly have to deal with data as part of their job: they will interact with analytics in one way or another, either solely as a passive user or all the way to the other end of engagement, as the main creator and owner of data capabilities. We can recognize four families of roles with regard to data analytics in companies: business users, business analysts, data scientists, and data engineers. Let's deep dive to understand what each role means and what competencies it requires:
Role
Vertical Business Knowledge
Data Analysis and Storytelling
Machine Learning Algorithms
Coding
Data Architecture
Business User
★★★
★
★
Business Analyst
★★
★★★
★★
★
Data Scientist
★
★★★
★ ★ ★
★
Data Engineer
★★
★ ★
★★★
Table 1.2: Competencies of different users
Business users of any function and level, including senior managers, surely interact with data analytics to some extent. Although their main role is being a user, they will highly benefit from having a basic understanding of data analysis and storytelling techniques. These will enable business users to make the most out of the data as they integrate it with their business knowledge, interpret it properly, and communicate insights through effective visualizations and stories. Also, their personal productivity will be positively affected by knowing how to automate their routine "data crunching" work, using macros in Excel or, as we will learn later, workflows in KNIME. Lastly, they should have a basic understanding of what advanced analytics could do for them and acquire the fundamental concepts behind machine learning and its algorithms. They clearly don't need to become experts in this. However, until they see "what's possible," they will miss anticipating opportunities to impact their business with data.Business analysts (or data analysts) play the fundamental role of uniting the two – apparently detached – worlds of business and data. They have a very solid understanding of the business dynamics (market, customer, and competitor landscape) as they are constantly in touch with partners from all functions (sales, marketing, finance, and so on). Thanks to their strong business background, they can proactively intercept opportunities for analytics to make the difference and "translate" business needs into technical requirements for the next data capability to meet. Business analysts are proficient with data analysis (as they need to constantly "peel the onion" and extract business-relevant insights from large quantities of data) and storytelling (since they want their data findings to make strong impacts and drive action). At the same time, they are familiar with machine learning concepts: they will use them directly to solve their business needs, but also to prototype advanced analytics capabilities before leaving them in the capable hands of data scientists for scaling. Although not strictly required and not a focus for their job, they can benefit from having basic coding abilities to build queries for data extraction and untangle the more tedious data transformation steps.Data scientists focus on designing and scaling advanced analytics capabilities. They are the recognized experts of machine learning algorithms and can implement predictive and prescriptive analytics from scratch or build upon existing prototypes. They collaborate closely with business analysts, through whom they stay in touch with the "latest" business necessities, and data engineers, their primary partners for ensuring sustainability and scale to data capabilities. Data scientists are proficient at coding, especially when it comes to applying advanced transformations to data and leveraging state-of-the-art machine learning libraries.Data science is the multidisciplinary field at the intersection of maths, statistics, and computer science that studies the systematic extraction of value from data. Everyone – not only data scientists – can benefit from using some aspects of data science.
These four actors, each with its specific part to perform, will jointly cover for the vast majority of interactions with data analytics in a company. Whichever character you feel closer with, you certainly have a role to play in extracting value from data. In the next section, you'll meet the tools that will let you perform at your best.
The technology that empowers data analytics in a company does not look like a monolithic body. In fact, there are several hardware and software systems involved. For simplicity, you can think of them as being organized into three layers: these are piled upon each other and form the so-called Technology Stack. Every layer relies on the one below to function properly. Let's take a bottom-up "helicopter" view of the fundamental features that you need to know about for each layer of the stack:
The underlying layer is the Physical Infrastructure. This is stuff you can touch. It is made up of servers or mainframe computers that store and process data. Companies can decide to either build and maintain a physical infrastructure of their own (normally kept in corporate data centers) or rely on cloud providers from whom they rent only the required resources.The middle layer is the Data Platform. The technology at this level implements a logical organization of the data stored in the infrastructure (data architecture) and the available computing power. Even if data resides in different databases, at the platform level, it gets virtually unified on a simpler, more harmonious view.The top layer is made of Applications. Here is where data analytics methods get implemented into user-facing apps. Applications leverage both the organized data and the horsepower provided by the underpinning platform to serve users in different ways. Some applications will provide interfaces for users to explore data, to make sense of it and to identify insights (Business Intelligence); others will enable expert users to take it to the next level and to build predictions or prescriptions (Advanced Analytics).This three-layer stack model is certainly a simplification versus the multifaceted reality underneath real-world data infrastructures. However, it gives us the benefit of envisioning, at once, the different levels of abstraction that data can have and introduces several challenges that come with it:
Figure 1.2: Technology Stack supporting data analytics. The arrows clarify which roles interact with which layers of the stack
The four roles we have seen earlier will "operate" at different levels of the technology stack. Data engineers will deal with the complexity of the data infrastructure and its organization in a platform. Data scientists will normally leverage applications of advanced analytics to build models and will sometimes access data directly at the platform level to enjoy maximum versatility. Business analysts will feel at ease with both advanced analytics tools and business intelligence apps, which they can also use to design actionable data exploration routes for others. Business users will solely interact with easy-to-use business intelligence interfaces: all the complexities related to storing data and its organization in the platform will be conveniently far from their sight.
Out of all the technologies related to data analytics, this book is going to focus on the application layer. This is where the "magic" happens: analytics applications can transform data into actual business value and in the next chapters, you will learn how to do this.
There are many data analytics applications out there available for use. Each of them has its strengths and peculiarities. Although some can be very versatile, no single application will satisfy the full range of analytical needs we could encounter on our way. Hence, we should pick a selection of tools that will jointly cover an acceptable range of needs: they form our data analytics toolbox. By learning how to use and how to effectively combine the few tools we have put in the toolbox, we can become autonomous data analytics practitioners. Like a plumber would have his or her preferences on the instruments to use, you will also have your own predilections and can customize your toolbox to your personal tastes. You just want to ensure you pick the right mix of tool types, so that you have a broad range of functionalities readily available to you.
Let's go through the different types of tools that qualify for being added to our toolbox:
Spreadsheets: Although their analytics ability is quite limited, spreadsheet applications are virtually omnipresent because of their ease of use and extended portability that facilitates sharing data with colleagues. Nearly everyone is able to open a Microsoft Excel file (or its open source alternative, OpenOffice Calc, or a cloud-based service such as Google Sheets) and add simple formula calculations to it. They can also be very helpful when creating simple, one-off data visualizations: their level of graphic customization is good enough for many day-to-day data presentation needs. On the other side, spreadsheet software is inadequate for creating robust and automated data workflows: refreshing even a simple report created in Excel requires manual steps and is prone to human error.Business Intelligence: These are the most-suited tools for creating advanced data visualizations and interactive dashboards. Tools like Microsoft Power BI, QlikView/Qlik Sense, Tableau, and TIBCO Spotfire