Course Overview

Analyzing data with Python is an essential skill for Data Scientists and Data Analysts. This course will take you from the basics of data analysis with Python to building and evaluating data models.


Topics covered include:

- collecting and importing data

- cleaning, preparing & formatting data

- data frame manipulation - summarizing data,

- building machine learning regression models

- model refinement

- creating data pipelines


You will learn how to import data from multiple sources, clean and wrangle data, perform exploratory data analysis (EDA), and create meaningful data visualizations.

You will then predict future trends from data by developing linear, multiple, polynomial regression models & pipelines and learn how to evaluate them. In addition to video lectures you will learn and practice using hands-on labs and projects.

You will work with several open source Python libraries, including Pandas and Numpy to load, manipulate, analyze, and visualize cool datasets. You will also work with scipy and scikit-learn, to build machine learning models and make predictions.

Course Prerequisites

You should have a working knowledge of Python and Jupyter Notebooks.

Course Objectives

You will learn

  • Develop Python code for cleaning and preparing data for analysis - including handling missing values, formatting, normalizing, and binning data
  • Perform exploratory data analysis and apply analytical techniques to real-word datasets using libraries such as Pandas, Numpy and Scipy
  • Manipulate data using dataframes, summarize data, understand data distribution, perform correlation and create data pipelines
  • Build and evaluate regression models using machine learning scikit-learn library and use them for prediction and decision making
Course Content

Part 1: Importing Datasets

  • The Problem
  • Understanding the Data
  • Python Packages for Data Science
  • Importing and Exporting Data in Python
  • Getting Started Analyzing Data in Python

Part 2: Data Wrangling

  • Pre-processing Data in Python
  • Dealing with Missing Values in Python
  • Data Formatting in Python
  • Data Normalization in Python
  • Binning in Python
  • Turning categorical variables into quantitative variables in Python

Part 3: Exploratory Data Analysis

  • Exploratory Data Analysis
  • Descriptive Statistics
  • GroupBy in Python
  • Correlation
  • Correlation – Statistics
  • Analysis of Variance ANOVA

Part 4: Model Development

  • Model Development
  • Linear Regression and Multiple Linear Regression
  • Model Evaluation using Visualization
  • Polynomial Regression and Pipelines
  • Measures for In-Sample Evaluation
  • Prediction and Decision Making

Part 5: Model Evaluation

  • Model Evaluation and Refinement
  • Overfitting, Underfitting and Model Selection
  • Ridge Regression
  • Grid Search

