Amazon Redshift Cookbook - Shruti Worlikar - E-Book

Amazon Redshift Cookbook E-Book

Shruti Worlikar

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift.
This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform.
By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 286

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Amazon Redshift Cookbook

Recipes for building modern data warehousing solutions

Shruti Worlikar

Thiyagarajan Arumugam

Harshida Patel

BIRMINGHAM—MUMBAI

Amazon Redshift Cookbook

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh

Publishing Product Manager: Ali Abidi

Senior Editor: Mohammed Yusuf Imaratwale

Content Development Editor: Nazia Shaikh

Technical Editor: Arjun Varma

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Vinayak Purushotham

Production Designer: Vijay Kamble

First published: July 2021

Production reference: 2270721

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80056-968-3

www.packt.com

Foreword

Amazon Redshift is a fully managed cloud data warehouse house service that enables you to analyze all your data. Tens of thousands of customers use Amazon Redshift today to analyze exabytes of structured and semi-structured data across their data warehouse, operational databases, and data lake using standard SQL.

Our Analytics Specialist Solutions Architecture team at AWS work closely with customers to help use Amazon Redshift to meet their unique analytics needs. In particular, the authors of this book, Shruti, Thiyagu, and Harshida have worked hands-on with hundreds of customers of all types, from startups to multinational enterprises. They’ve helped projects ranging from migrations from other data warehouses to Amazon Redshift, to delivering new analytics use cases such as building a predictive analytics solution using Redshift ML. They’ve also helped our Amazon Redshift service team to better understand customer needs and prioritize new feature development.

I am super excited that Shruti, Thiyagu, and Harshida have authored this book, based on their deep expertise and knowledge of Amazon Redshift, to help customers quickly perform the most common tasks. This book is designed as a cookbook to provide step-by-step instructions across these different tasks. It has clear instructions on prerequisites and steps required to meet different objectives such as creating an Amazon Redshift cluster, loading data in Amazon Redshift from Amazon S3, or querying data across OLTP sources like Amazon Aurora directly from Amazon Redshift.

I recommend this book to any new or existing Amazon Redshift customer who wants to learn not only what features Amazon Redshift provides, but also how to quickly take advantage of them.

Eugene Kawamoto

Director, Product Management

Amazon Redshift, AWS

Contributors

About the authors

Shruti Worlikar is a cloud professional with technical expertise in data lakes and analytics across cloud platforms. Her background has led her to become an expert in on-premises-to-cloud migrations and building cloud-based scalable analytics applications. Shruti earned her bachelor's degree in electronics and telecommunications from Mumbai University in 2009 and later earned her masters' degree in telecommunications and network management from Syracuse University in 2011. Her work history includes work at J.P. Morgan Chase, MicroStrategy, and Amazon Web Services (AWS). She is currently working in the role of Manager, Analytics Specialist SA at AWS, helping customers to solve real-world analytics business challenges with cloud solutions and working with service teams to deliver real value. Shruti is the DC Chapter Director for the non-profit Women in Big Data (WiBD) and engages with chapter members to build technical and business skills to support their career advancements. Originally from Mumbai, India, Shruti currently resides in Aldie, VA, with her husband and two kids.

Thiyagarajan Arumugam (Thiyagu) is a principal big data solution architect at AWS, architecting and building solutions at scale using big data to enable data-driven decisions. Prior to AWS, Thiyagu as a data engineer built big data solutions at Amazon, operating some of the largest data warehouses and migrating to and managing them. He has worked on automated data pipelines and built data lake-based platforms to manage data at scale for the customers of his data science and business analyst teams. Thiyagu is a certified AWS Solution Architect (Professional), earned his master's degree in mechanical engineering at the Indian Institute of Technology, Delhi, and is the author of several blog posts at AWS on big data. Thiyagu enjoys everything outdoors – running, cycling, ultimate frisbee – and is currently learning to play the Indian classical drum the mrudangam. Thiyagu currently resides in Austin, TX, with his wife and two kids.

Harshida Patel is a senior analytics specialist solution architect at AWS, enabling customers to build scalable data lake and data warehousing applications using AWS analytical services. She has presented Amazon Redshift deep-dive sessions at re:Invent. Harshida has a bachelor's degree in electronics engineering and a master's in electrical and telecommunication engineering. She has over 15 years of experience architecting and building end-to-end data pipelines in the data management space. In the past, Harshida has worked in the insurance and telecommunication industries. She enjoys traveling and spending quality time with friends and family, and she lives in Virginia with her husband and son.

About the reviewers

Anusha Challa is a senior analytics specialist solution architect at AWS with over 10 years of experience in data warehousing both on-premises and in the cloud. She has worked on multiple large-scale data projects throughout her career at Tata Consultancy Services (TCS), EY, and AWS. She has worked with hundreds of Amazon Redshift customers and has built end-to-end scalable, reliable, and robust data pipelines.

Vaidy Krishnan leads business development for AWS, helping customers successfully adopt and be successful with AWS analytics services. Prior to AWS, Vaidy spent close to 15 years building, marketing, and launching analytics products to customers in market-leading companies such as Tableau and GE across industries ranging from healthcare to manufacturing. When not at work, Vaidy likes to travel and golf.

Table of Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Chapter 1: Getting Started with Amazon Redshift

Technical requirements

Creating an Amazon Redshift cluster using the AWS Console

Getting ready

How to do it…

Creating an Amazon Redshift cluster using the AWS CLI

Getting ready

How to do it…

How it works…

Creating an Amazon Redshift cluster using an AWS CloudFormation template

Getting ready

How to do it…

How it works…

Connecting to an Amazon Redshift cluster using the Query Editor

Getting ready

How to do it…

Connecting to an Amazon Redshift cluster using the SQL Workbench/J client

Getting ready

How to do it…

Connecting to an Amazon Redshift Cluster using a Jupyter Notebook

Getting ready

How to do it…

Connecting to an Amazon Redshift cluster using Python

Getting ready

How to do it…

Connecting to an Amazon Redshift cluster programmatically using Java

Getting ready

How to do it…

Connecting to an Amazon Redshift cluster programmatically using .NET

Getting ready

How to do it…

Connecting to an Amazon Redshift cluster using the command line

Getting ready

How to do it…

Chapter 2: Data Management

Technical requirements

Managing a database in an Amazon Redshift cluster

Getting ready

How to do it…

Managing a schema in a database

Getting ready

How to do it…

Managing tables

Getting ready

How to do it…

How it works…

Managing views

Getting ready

How to do it…

Managing materialized views

Getting ready

How to do it…

How it works…

Managing stored procedures

Getting ready

How to do it…

How it works…

Managing UDFs

Getting ready

How to do it…

How it works…

Chapter 3: Loading and Unloading Data

Technical requirements

Loading data from Amazon S3 using COPY

Getting ready

How to do it…

How it works…

Loading data from Amazon EMR

Getting ready

How to do it…

Loading data from Amazon DynamoDB

Getting ready

How to do it…

How it works…

Loading data from remote hosts

Getting ready

How to do it…

Updating and inserting data

Getting ready

How to do it…

Unloading data to Amazon S3

Getting ready

How to do it…

Chapter 4: Data Pipelines

Technical requirements

Ingesting data from transactional sources using AWS DMS

Getting ready

How to do it…

How it works…

Streaming data to Amazon Redshift via Amazon Kinesis Firehose

Getting ready

How to do it…

How it works…

Cataloging and ingesting data using AWS Glue

How to do it…

How it works…

Chapter 5: Scalable Data Orchestration for Automation

Technical requirements

Scheduling queries using the Amazon Redshift query editor

Getting ready

How to do it…

How it works…

Event-driven applications using Amazon EventBridge and the Amazon Redshift Data API

Getting ready

How to do it…

How it works…

Event-driven applications using AWS Lambda

Getting ready

How to do it…

How it works…

Orchestrating using AWS Step Functions

Getting ready

How to do it…

How it works…

Orchestrating using Amazon MWAA

Getting ready

How to do it…

How it works…

Chapter 6: Data Authorization and Security

Technical requirements

Managing infrastructure security

Getting ready

How to do it

Data encryption at rest

Getting ready

How to do it

Data encryption in transit

Getting ready

How to do it

Column-level security

Getting ready

How to do it

How it works

Loading and unloading encrypted data

Getting ready

How to do it

Managing superusers

Getting ready

How to do it

Managing users and groups

Getting ready

How to do it

Managing federated authentication

Getting ready

How to do it

How it works

Using IAM authentication to generate database user credentials

Getting ready

How to do it

Managing audit logs

Getting ready

How to do it

How it works

Monitoring Amazon Redshift

Getting ready

How to do it

How it works

Chapter 7: Performance Optimization

Technical requirements

Amazon Redshift Advisor

Getting ready

How to do it…

How it works…

Managing column compression

Getting ready

How to do it…

How it works…

Managing data distribution

Getting ready

How to do it…

How it works…

Managing sort keys

Getting ready

How to do it…

How it works…

Analyzing and improving queries

Getting ready

How to do it…

How it works…

Configuring workload management (WLM)

Getting ready

How to do it…

How it works…

Utilizing Concurrency Scaling

Getting ready

How to do it…

How it works…

Optimizing Spectrum queries

Getting ready

How to do it…

How it works…

Chapter 8: Cost Optimization

Technical requirements

AWS Trusted Advisor

Getting ready

How to do it…

How it works…

Amazon Redshift Reserved Instance pricing

Getting ready

How to do it…

Configuring pause and resume for an Amazon Redshift cluster

Getting ready

How to do it…

Scheduling pause and resume

Getting ready

How to do it…

How it works…

Configuring Elastic Resize for an Amazon Redshift cluster

Getting ready

How to do it…

Scheduling Elastic Resizing

Getting ready

How to do it…

How it works…

Using cost controls to set actions for Redshift Spectrum

Getting ready

How to do it…

Using cost controls to set actions for Concurrency Scaling

Getting ready

How to do it…

Chapter 9: Lake House Architecture

Technical requirements

Building a data lake catalog using AWS Lake Formation

Getting ready

How to do it…

How it works…

Exporting a data lake from Amazon Redshift

Getting ready

How to do it…

Extending a data warehouse using Amazon Redshift Spectrum

Getting ready

How to do it…

Data sharing across multiple Amazon Redshift clusters

Getting ready

How to do it…

How it works…

Querying operational sources using Federated Query

Getting ready

How to do it…

Chapter 10: Extending Redshift's Capabilities

Technical requirements

Managing Amazon Redshift ML

Getting ready

How to do it…

How it works…

Visualizing data using Amazon QuickSight

Getting ready

How to do it…

How it works…

AppFlow for ingesting SaaS data in Redshift

Getting ready

How to do it…

How it works…

Data wrangling using DataBrew

Getting ready

How to do it…

How it works…

Utilizing ElastiCache for sub-second latency

Getting ready

How to do it…

How it works…

Subscribing to third-party data using AWS Data Exchange

Getting ready

How to do it…

How it works…

Appendix

Recipe 1 – Creating an IAM user

Recipe 2 – Storing database credentials using Amazon Secrets Manager

Recipe 3 – Creating an IAM role for an AWS service

Recipe 4 – Attaching an IAM role to the Amazon Redshift cluster

Why subscribe?

Other Books You May Enjoy

Preface

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift.

This book on Amazon Redshift starts by focusing on the Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you'll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform.

By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and will have understood the best practice solutions to commonly faced problems.

Who this book is for

This book is for anyone involved in architecting, implementing, and optimizing an Amazon Redshift data warehouse, such as data warehouse developers, data analysts, database administrators, data engineers, and data scientists. Basic knowledge of data warehousing, database systems, and cloud concepts and familiarity with Redshift would be beneficial.

What this book covers

Chapter 1, Getting Started with Amazon Redshift, discusses how Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases. This chapter walks you through the process of creating a sample Amazon Redshift cluster to set up the necessary access and security controls to easily get started with a data warehouse on AWS. Most operations are click-of-a-button operations; you should be able to launch a cluster in under 15 minutes.

Chapter 2, Data Management, discusses how a data warehouse system has very different design goals compared to a typical transaction-oriented relational database system for online transaction processing (OLTP). Amazon Redshift is optimized for the very fast execution of complex analytic queries against very large datasets. Because of the massive amounts of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of the columnar architecture and managed service. This chapter delves into the different data structure options to set up an analytical schema for the easy querying of your end users.  

Chapter 3, Loading and Unloading Data, looks at how Amazon Redshift has in-built integrations with data lakes and other analytical services and how it is easy to move and analyze data across different services. This chapter discusses scalable options to move large datasets from a data lake based out of Amazon S3 storage as well as AWS analytical services such as Amazon EMR and Amazon DynamoDB.

Chapter 4, Data Pipelines, discusses how modern data warehouses depend on ETL operations to convert bulk information into usable data. An ETL process refreshes your data warehouse from source systems, organizing the raw data into a format you can more readily use. Most organizations run ETL as a batch or as part of a real-time ingest process to keep the data warehouse current and provide timely analytics. A fully automated and highly scalable ETL process helps minimize the operational effort that you must invest in managing regular ETL pipelines. It also ensures the timely and accurate refresh of your data warehouse. Here we will discuss recipes to implement real-time and batch-based AWS native options to implement data pipelines for orchestrating data workflows.

Chapter 5, Scalable Data Orchestration for Automation, looks at how for large-scale production pipelines, a common use case is to read complex data originating from a variety of sources. This data must be transformed to make it useful to downstream applications such as machine learning pipelines, analytics dashboards, and business reports. This chapter discusses building scalable data orchestration for automation using native AWS services.

Chapter 6, Data Authorization and Security, discusses how Amazon Redshift security is one of the key pillars of a modern data warehouse for data at rest as well as in transit. In this chapter, we will discuss the industry-leading security controls provided in the form of built-in AWS IAM integration, identity federation for single sign-on (SSO), multi-factor authentication, column-level access control, Amazon Virtual Private Cloud (VPC), and AWS KMS integration to protect your data. Amazon Redshift encrypts and keeps your data secure in transit and at rest using industry-standard encryption techniques. We will also elaborate on how you can authorize data access through fine-grained access controls for the underlying data structures in Amazon Redshift.

Chapter 7, Performance Optimization, examines how Amazon Redshift being a fully managed service provides great performance out of the box for most workloads. Amazon Redshift also provides you with levers that help you maximize the throughputs when data access patterns are already established. Performance tuning on Amazon Redshift helps you manage critical SLAs for workloads and easily scale up your data warehouse to meet/exceed business needs.

Chapter 8, Cost Optimization, discusses how Amazon Redshift is one of the best price-performant data warehouse platforms on the cloud. Amazon Redshift also provides you with scalability and different options to optimize the pricing, such as elastic resizing, pause and resume, reserved instances, and using cost controls. These options allow you to create the best price-performant data warehouse solution.

Chapter 9, Lake House Architecture, looks at how AWS provides purpose-built solutions to meet the scalability and agility needs of the data architecture. With its in-built integration and governance, it is possible to easily move data across the data stores. You might have all the data centralized in a data lake, but use Amazon Redshift to get quick results for complex queries on structured data for business intelligence queries. The curated data can now be exported into an Amazon S3 data lake and classified to build a machine learning algorithm. In this chapter, we will discuss in-built integrations that allow easy data movement to integrate a data lake, data warehouse, and purpose-built data stores and enable unified governance.

Chapter 10, Extending Redshift Capabilities, looks at how Amazon Redshift allows you to analyze all your data using standard SQL, using your existing business intelligence tools. Organizations are looking for more ways to extract valuable insights from data, such as big data analytics, machine learning applications, and a range of analytical tools to drive new use cases and business processes. Building an entire solution from data sourcing, transforming data, reporting, and machine learning can be easily accomplished by taking advantage of the capabilities provided by AWS's analytical services. Amazon Redshift natively integrates with other AWS services, such as Amazon QuickSight, AWS Glue DataBrew, Amazon AppFlow, Amazon ElastiCache, Amazon Data Exchange, and Amazon SageMaker, to meet your varying business needs.

To get the most out of this book

You will need access to an AWS account to perform all the recipes in this book. You will need either administrator access to the AWS account or to work with an administrator to help create the IAM user, roles, and policies as listed in the different chapters. All the data needed in the setup is provided as steps in recipes, and the Amazon S3 bucket is hosted in the Europe (Ireland) (eu-west-1) AWS region. It is preferable to use the Europe (Ireland) AWS region to execute all the recipes. If you need to run the recipes in a different region, you will need to copy the data from the source bucket (s3://packt-redshift-cookbook/) to an Amazon S3 bucket in the desired AWS region, and use that in your recipes instead.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Amazon-Redshift-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800569683_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "To create the Amazon Redshift cluster, we used the redshift command and the create-cluster subcommand."

A block of code is set as follows:

SELECT 'hello world';

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

            "NodeType": "dc2.large",

            "ElasticResizeNumberOfNodeOptions": "[4]",

    …

            "ClusterStatus": "available"

Any command-line input or output is written as follows:

!pip install psycopg2-binary

### boto3 is optional, but recommended to leverage the AWS Secrets Manager storing the credentials  Establishing a Redshift Connection

!pip install boto3

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Navigate to your notebook instance and open JupyterLab."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Amazon Redshift Cookbook, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Chapter 1: Getting Started with Amazon Redshift

Amazon Redshift is a fully managed data warehouse service in Amazon Web Services (AWS). You can query all your data, which can scale from gigabytes to petabytes, using SQL. Amazon Redshift integrates into the data lake solution though the lake house architecture, allowing you access all the structured and semi-structured data in one place. Each Amazon Redshift data warehouse is hosted as a cluster (a group of servers or nodes) that consists of one leader node and a collection of one or more compute nodes. Each cluster is a single tenant environment (which can be scaled to a multi-tenant architecture using data sharing), and every node has its own dedicated CPU, memory, and attached disk storage that varies based on the node's type.

This chapter will walk you through the process of creating a sample Amazon Redshift cluster and connecting to it from different clients.

The following recipes will be discussed in this chapter:

Creating an Amazon Redshift cluster using the AWS consoleCreating an Amazon Redshift cluster using the AWS CLICreating an Amazon Redshift cluster using an AWS CloudFormation templateConnecting to an Amazon Redshift cluster using the Query EditorConnecting to an Amazon Redshift cluster using the SQL Workbench/J clientConnecting to an Amazon Redshift cluster using a Jupyter NotebookConnecting to an Amazon Redshift cluster programmatically using PythonConnecting to an Amazon Redshift cluster programmatically using JavaConnecting to an Amazon Redshift cluster programmatically using .NETConnecting to an Amazon Redshift cluster using the command line (psql)

Technical requirements

The following are the technical requirements for this chapter:

An AWS account.An AWS administrator should create an IAM user by following Recipe 1 – Creating an IAM user in the Appendix. This IAM user will be used to execute all the recipes in this chapter. An AWS administrator should deploy the AWS CloudFormation template to attach the IAM policy to the IAM user, which will give them access to Amazon Redshift, Amazon SageMaker, Amazon EC2, AWS CloudFormation, and AWS Secrets Manager. The template is available here: https://github.com/PacktPublishing/Amazon-Redshift-Cookbook/blob/master/Chapter01/chapter_1_CFN.yaml.Client tools such as SQL Workbench/J, an IDE, and a command-line tool.You will need to authorize network access from servers or clients to access the Amazon Redshift cluster: https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-authorize-cluster-access.html.The code files for this chapter can be found here: https://github.com/PacktPublishing/Amazon-Redshift-Cookbook/tree/master/Chapter01.

Creating an Amazon Redshift cluster using the AWS Console

The AWS Management Console allows you to interactively create an Amazon Redshift cluster via a browser-based user interface. It also recommends the right cluster configuration based on the size of your workload. Once the cluster has been created, you can use the Console to monitor the health of the cluster and diagnose query performance issues from a unified dashboard.

Getting ready

To complete this recipe, you will need the following:

A new or existing AWS Account. If new AWS accounts need to be created, go to https://portal.aws.amazon.com/billing/signup, enter the necessary information, and follow the steps on the site.An IAM user with access to Amazon Redshift.

How to do it…

Follow these steps to create a cluster with minimal parameters:

Navigate to the AWS Management Console and select Amazon Redshift: https://console.aws.amazon.com/redshiftv2/.Choose the AWS region (eu-west-1) or corresponding region from the top-right of the screen. Then, click Next.On the Amazon Redshift Dashboard, select CLUSTERS, and then click Create cluster.In the Cluster configuration section, type in any meaningful Cluster identifier, such as myredshiftcluster.Choose either Production or Free trial, depending on what you plan to use this cluster for.Select the Help me choose option for sizing your cluster for the steady state workload. Alternatively, if you know the required size of your cluster (that is, the node type and number of nodes), select I'll choose. For example, you can choose Node type: dc2.large with Nodes: 2.In the Database configurations section, specify values for Database name (optional), Database port (optional), Master user name, and Master user password; for example:Database name (optional): Enter devDatabase port (optional): Enter 5439Master user name: Enter awsuserMaster user password: Enter a value for the passwordOptionally, configure the Cluster permissions and Additional configurations sections when you want to pick a specific network and security configurations. The console defaults to the preset configuration otherwise.Choose Create cluster.The cluster creation takes a few minutes to complete. Once this has happened, navigate to Amazon Redshift | Clusters | myredshiftcluster | General information to find the JDBC/ODBC URL to connect to the Amazon Redshift cluster.

Creating an Amazon Redshift cluster using the AWS CLI

The AWS command-line interface (CLI) is a unified tool for managing your AWS services. You can use this tool on the command-line Terminal to invoke the creation of an Amazon Redshift cluster.

The command-line tool automates cluster creation and modification. For example, you can create a shell script that can create manual point in time snapshots for the cluster.

Getting ready

To complete this recipe, you will need to do the following:

Install and configure the AWS CLI based on your specific operating system at https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html and use the aws configure command to set up your AWS CLI installation, as explained here: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html.Verify that the AWS CLI has been configured using the following command, which will list the configured values:

$ aws configure list

Name                    Value             Type    Location

access_key     ****************PA4J         iam-role    

secret_key     ****************928H         iam-role    

    region                eu-west-1      config-file    

Create an IAM user with access to Amazon Redshift.

How to do it…

Follow these steps to create an Amazon Redshift cluster using the command-line tool:

Depending on the operation system the AWS CLI has been installed on, open a shell program such as bash or zsh in Linux-based systems or the Windows command line.Use the following command to create a two-node dc2.large cluster with the minimal set of parameters of cluster-identifier (any unique identifier for the cluster), node-type/number-of-nodes and the master user credentials. Replace <MasterUserPassword> in the following command with a password of your choice. The password must be 8-64 characters long and must contain at least one uppercase letter, one lowercase letter, and one number. You can use any printable ASCII character except /, "", or, or @:

$ aws redshift create-cluster --node-type dc2.large --number-of-nodes 2 --master-username adminuser --master-user-password <MasterUserPassword> --cluster-identifier myredshiftcluster

Here is the expected sample output:

{

    "Cluster": {

        "PubliclyAccessible": true,

        "MasterUsername": "adminuser",

        "VpcSecurityGroups": [

            {

                "Status": "active",

                "VpcSecurityGroupId": "sg-abcdef7"

            }

        ],

        "NumberOfNodes": 2,

        "PendingModifiedValues": {

            "MasterUserPassword": "****"

        },

        "VpcId": "vpc-abcdef99",

        "ClusterParameterGroups": [

            {

                "ParameterGroupName": "default.redshift-1.0",

                "ParameterApplyStatus": "in-sync"

            }

        ],

        "DBName": "dev",

        "ClusterSubnetGroupName": "default",

        "EnhancedVpcRouting": false,

   "ClusterIdentifier": "myredshiftcluster",

        "NodeType": "dc2.large",

        "Encrypted": false,

        "ClusterStatus": "creating"

    }

}

It will take a few minutes to create the cluster. You can monitor the status of the cluster creation process using the following command:

$ aws redshift describe-clusters --cluster-identifier myredshiftcluster

Here is the expected sample output:

myredshiftcluster

{

    "Clusters": [

            "NumberOfNodes": 2,

            "DBName": "dev",

            "Endpoint": {

                "Port": 5439,

                "Address": "myredshiftcluster.abcdefghijk.eu-west-1.redshift.amazonaws.com"

            },

            "NodeType": "dc2.large",

            "ElasticResizeNumberOfNodeOptions": "[4]",

            "ClusterStatus": "available"

        }

    ]

}

Note that "ClusterStatus": "available" indicates that the cluster is ready for use and that you can connect to it using the "Address": "myredshiftcluster.abcdefghijk.eu-west-1.redshift.amazonaws.com" endpoint.

The cluster is now ready. Now, you use an ODBC/JDBC to connect to the Amazon Redshift cluster.

How it works…

The AWS CLI uses a hierarchical structure in the command line that is specified in the following order:

$aws <command> <subcommand> [options and parameters]

These parameters can take different types of input values, such as strings, numbers, maps, lists, and JSON structures. What is supported depends on the command and subcommand that you specify. The AWS CLI also support help text for conveniently scripting the command. To see the help text, you can run any of the following commands:

$aws help

$aws <command> help

$aws <command> <subcommand> help

To create the Amazon Redshift cluster, we used the redshift command and the create-cluster subcommand.

You can refer to https://docs.aws.amazon.com/cli/latest/reference/redshift/create-cluster.html for the full set of parameters we used or by using the following command on the AWS CLI:

$aws redshift create-cluster help

Creating an Amazon Redshift cluster using an AWS CloudFormation template

With an AWS CloudFormation template, you treat your infrastructure as code, which enables you to create an Amazon Redshift cluster using a json/yaml file. The declarative code in the file contains the steps to create the AWS resources, and it also enables easy automation and distribution. This template allows you to standardize the Amazon Redshift Cluster's creation to meet your organizational infrastructure and security standards. Furthermore, you can distribute them to different teams within your organization using the AWS service catalog for easy setup.

Getting ready

To complete this recipe, you will need to do the following:

Create an IAM user with access to AWS CloudFormation, Amazon EC2, and Amazon Redshift.

How to do it…

We will create a CloudFormation template to author the Amazon Redshift cluster infrastructure as code using the JSON-based template. Follow these steps to create an Amazon Redshift cluster using the CloudFormation template:

Download the AWS CloudFormation template from https://github.com/PacktPublishing/Amazon-Redshift-Cookbook/blob/master/Chapter01/Creating_Amazon_Redshift_Cluster.json.Navigate to the AWS Console, choose CloudFormation, and then choose Create stack, as shown in the following screenshot:

Figure 1.1 – Create stack

Click on the Template is ready and Upload a template file options and choose the file that was downloaded (Creating_Amazon_Redshift_Cluster.json) from your local computer. Then, click Next.Enter the following input parameters:

a. Stack name: Enter a name for the stack; for example, myredshiftcluster.

b. ClusterType: A single-node or a multi-node cluster.

c. DatabaseName: Enter a database name; for example, dev.

d. InboundTraffic: Restrict the CIDR ranges of IPs that can access the cluster. 0.0.0.0/0 opens the cluster so that it's globally accessible.

e. MasterUserName: Enter a database master username; for example, awsuser.

f. MasterUserPassword: Enter a master user password. The password must be 8-64 characters long and must contain at least one uppercase letter, one lowercase letter, and one number. It can contain any printable ASCII character except /, "", or, or @.

g. NodeType: Enter the node type; for example, dc2.large.

h. NumberofNodes: Enter the number of compute nodes; for example, 2.

i. Redshift cluster port: Choose any TCP/IP port; for example, 5439.

Click Next and Create Stack.The AWS CloudFormation