We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence .
H2O requires you to have Java installed in your system .So please install Java before trying with H20 .As always check the documentation before trying these exercise set .
Answers to the exercises are available here.
If you want to install the latest release from H20 , install it via this instructions .
Download the latest stable release from h20 and initialize the cluster
Check the cluster information via clusterinfo
You can see how h2o works via the demo function , Check H2O’s glm via demo method .
down load the loan.csv from H2O’s github repo and import it using H2O .
Check the type of imported loan data and notice that its not a dataframe , check the summary of the loan data .
Hint -use h2o.summary()
One might want to transfer a dataframe from R environment to H2O , use as.h2o to conver the mtcars dataframe as a H2OFrame
- work with different data import techniques,
- know how to import data and transform it for a specific moddeling or analysis goal,
- and much more.
Check the dimension of the loan H2Oframe via h2o.dim
Find the colnames from the H2OFrame of loan data.
Check the histogram of the loan amount of loan H2Oframe .
Find the mean of loan amount by each home ownership group from the loan H2OFrame
Related exercise sets:
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News