Big Data Analytics with H20 in R Exercises -Part 1

By Biswarup Ghosh

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence .
H2O requires you to have Java installed in your system .So please install Java before trying with H20 .As always check the documentation before trying these exercise set .
Answers to the exercises are available here.
If you want to install the latest release from H20 , install it via this instructions .

Exercise 1
Download the latest stable release from h20 and initialize the cluster

Exercise 2
Check the cluster information via clusterinfo

Exercise 3
You can see how h2o works via the demo function , Check H2O’s glm via demo method .

Exercise 4

down load the loan.csv from H2O’s github repo and import it using H2O .
Exercise 5
Check the type of imported loan data and notice that its not a dataframe , check the summary of the loan data .
Hint -use h2o.summary()

Exercise 6
One might want to transfer a dataframe from R environment to H2O , use as.h2o to conver the mtcars dataframe as a H2OFrame

Learn more about importing big data in the online course Data Mining with R: Go from Beginner to Advanced. In this course you will learn how to

  • work with different data import techniques,
  • know how to import data and transform it for a specific moddeling or analysis goal,
  • and much more.

Exercise 7

Check the dimension of the loan H2Oframe via h2o.dim

Exercise 8
Find the colnames from the H2OFrame of loan data.

Exercise 9

Check the histogram of the loan amount of loan H2Oframe .

Exercise 10
Find the mean of loan amount by each home ownership group from the loan H2OFrame

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.