All data needs to be clean before you can explore and create models. Common sense, right. Cleaning data can be tedious but I created a function that will help.
The function do the following:
- Clean Data from NA’s and Blanks
- Separate the clean data – Integer dataframe, Double dataframe, Factor dataframe, Numeric dataframe, and Factor and Numeric dataframe.
- View the new dataframes
- Create a view of the summary and describe from the clean data.
- Create histograms of the data frames.
- Save all the objects
This will happen in seconds.
Hmisc package. I always save the original file.
The code below is the engine that cleans the data file.
The function is below. You need to copy the code and save it in an R file. Run the code and the function
Type in and run:
When all the data frames appear, type to load the workspace as objects.
- Hands-on Tutorial on Python Data Processing Library Pandas – Part 2
- Hands-on Tutorial on Python Data Processing Library Pandas – Part 1
- Using R with MonetDB
- Recording and Measuring Your Musical Progress with R
- Spark RDDs Vs DataFrames vs SparkSQL – Part 4 Set Operators
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News