Data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.
Curriculum for this course
* Capstone Project – Data Cleaning for Machine Learning on Melbourne House Data
* Dataset
* Download Jupyter Notebook File
According to IBM Data Analytics you can expect to spend up to 75% of your time cleaning data. Using Python’s Pandas library, we’ll walk through a range of various data cleaning tasks. Specifically, we will concentrate on perhaps the largest job, missing values, for data cleaning. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.