Clean Your Data in Seconds with This R Function

By Naeemah Aliya Small

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

All data needs to be clean before you can explore and create models. Common sense, right. Cleaning data can be tedious but I created a function that will help.

The function do the following:

  • Clean Data from NA’s and Blanks
  • Separate the clean data – Integer dataframe, Double dataframe, Factor dataframe, Numeric dataframe, and Factor and Numeric dataframe.
  • View the new dataframes
  • Create a view of the summary and describe from the clean data.
  • Create histograms of the data frames.
  • Save all the objects

This will happen in seconds.

Package

First, load Hmisc package. I always save the original file.
The code below is the engine that cleans the data file.

cleandata 

The function

The function is below. You need to copy the code and save it in an R file. Run the code and the function cleanme will appear.

cleanme 

Type in and run:

cleanme(dataname)

When all the data frames appear, type to load the workspace as objects.

load("cleanmework.RData")

Enjoy

    Related Post

    1. Hands-on Tutorial on Python Data Processing Library Pandas – Part 2
    2. Hands-on Tutorial on Python Data Processing Library Pandas – Part 1
    3. Using R with MonetDB
    4. Recording and Measuring Your Musical Progress with R
    5. Spark RDDs Vs DataFrames vs SparkSQL – Part 4 Set Operators
    To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source:: R News