Announcement: Mindasys is now Skillet! Check out our new website.

Key points about this course

Duration : 3 Days
Course Fee : RM4,199.00

HRD Corp Claimable Course

Analyzing Big Data with SQL

Live Virtual Class

Public Class

In-House Training

Private Class

Course Overview

In this course, you'll get an in-depth look at the SQL SELECT statement and its main clauses. The course focuses on big data SQL engines Apache Hive and Apache Impala, but most of the information is applicable to SQL with traditional RDBMs as well; the instructor explicitly addresses differences for MySQL and PostgreSQL.

What you will learn

  • Understand the basics of SELECT statements

  • Understand how and why to filter results

  • Explore grouping and aggregation to answer analytic questions

  • Work with sorting and limiting results

This course is available on face to face classroom training or live virtual class training or online training.

Course Prerequisites

To use the hands-on environment for this course, you need to download and install a virtual machine and the software on which to run it. Before continuing, be sure that you have access to a computer that meets the following hardware and software requirements:

• Windows, macOS, or Linux operating system (iPads and Android tablets will not work)

• 64-bit operating system (32-bit operating systems will not work)

• 8 GB RAM or more

• 25GB free disk space or more

• Intel VT-x or AMD-V virtualization support enabled (on Mac computers with Intel processors, this is always enabled; on Windows and Linux computers, you might need to enable it in the BIOS)

• For Windows XP computers only: You must have an unzip utility such as 7-Zip or WinZip installed (Windows XP’s built-in unzip utility will not work)

Course Objectives

By the end of the course, you will be able to

• explore and navigate databases and tables using different tools;

• understand the basics of SELECT statements;

• understand how and why to filter results;

• explore grouping and aggregation to answer analytic questions;

• work with sorting and limiting results; and

• combine multiple tables in different ways.

Course Content

Module 1: Orientation to SQL on Big Data

  • Review and Preparation
  • Using the Hue Query Editors
  • Running SQL Utility Statements
  • Running SQL SELECT Statements
  • Understanding Different SQL Interfaces
  • Overview of Beeline and Impala Shell
  • Using Beeline
  • Using Impala Shell

Module 2: SQL SELECT Essentials 

  • Introduction
  • SQL SELECT Building Blocks
  • Introduction to the SELECT List
  • Expressions and Operators
  • Data Types
  • Column Aliases
  • Built-In Functions
  • Data Type Conversion
  • The DISTINCT Keyword
  • Introduction to the FROM Clause
  • Identifiers
  • Formatting SELECT Statements
  • Using Beeline in Non-Interactive Mode
  • Using Impala Shell in Non-Interactive Mode
  • Formatting the Output of Beeline and Impala Shell
  • Saving Hive and Impala Query Results to a File

Module 3: Filtering Data 

  • Introduction
  • About the Datasets
  • Introduction to the WHERE Clause
  • Using Expressions in the WHERE Clause
  • Comparison Operators
  • Data Types and Precision
  • Logical Operators
  • Other Relational Operators
  • Understanding Missing Values
  • Handling Missing Values
  • Conditional Functions
  • Using Variables with Beeline and Impala Shell
  • Calling Beeline and Impala Shell from Scripts
  • Querying Hive and Impala in Scripts and Applications

Module 4: Grouping and Aggregating Data

  • Introduction
  • Introduction to Aggregation
  • Common Aggregate Functions
  • Using Aggregate Functions in the SELECT Statement
  • Introduction to the GROUP BY Clause
  • Choosing an Aggregate Function and Grouping Column
  • Grouping Expressions
  • Grouping and Aggregation, Together and Separately
  • NULL Values in Grouping and Aggregation
  • The COUNT Function
  • Tips for Applying Grouping and Aggregation
  • Filtering on Aggregates
  • The HAVING Clause
  • Understanding Hive and Impala Version Differences
  • Understanding Hue Version Differences

Module 5: Sorting and Limiting Data

  • Introduction
  • Introduction to the ORDER BY Clause
  • Controlling Sort Order
  • Ordering Expressions
  • Missing Values in Ordered Results
  • Using ORDER BY with Hive and Impala
  • Introduction to the LIMIT Clause
  • When to Use the LIMIT Clause
  • Using LIMIT with ORDER BY
  • Using LIMIT for Pagination
  • Review

Module 6: Combining Data

  • Introduction
  • Combining Query Results with the UNION Operator
  • Using ORDER BY and LIMIT with UNION
  • Introduction to Joins
  • Join Syntax
  • Inner Joins
  • Outer Joins
  • Conclusion

 

  • Analyzing Big Data with SQL

  • Ask For