**ListenData**, and kindly contributed to R-bloggers)

In recent KDnuggets Analytics software survey poll, Python and R were ranked top 2 tools for data science and machine learning. If you really want to boost your career in data science world, these are the languages you need to focus on.

Combine Python and R |

RStudio developed a package called **reticulate** which provides a medium to run Python packages and functions from R.

**Install and Load Reticulate Package**

Run the command below to get this package installed and imported to your system.

# Install reticulate package

install.packages(“reticulate”)# Load reticulate package

library(reticulate)

**Check whether Python is available on your system**

py_available()

**Import a python module within R**

You can use the function **import( ) **to import a particular package or module.

os os$getcwd()

[1] "C:UsersDELLDocuments"

You can use **listdir( ) **function from **os** package** **to** **see all** **the files in working directory

os$listdir()

[1] ".conda" ".gitignore" ".httr-oauth"

[4] ".matplotlib" ".RData" ".RDataTmp"

[7] ".Rhistory" "1.pdf" "12.pdf"

[10] "122.pdf" "124.pdf" "13.pdf"

[13] "1403.2805.pdf" "2.pdf" "3.pdf"

[16] "AIR.xlsx" "app.r" "Apps"

[19] "articles.csv" "Attrition_Telecom.xlsx" "AUC.R"

**Install Python Package**

**Step 1 : Create a new environment **

conda_create(“r-reticulate”)

**Step 2 : Install a package within a conda environment**

conda_install(“r-reticulate”, “numpy”)

**Since numpy is already installed, you don’t need to install it again. The above example is just for demonstration.**

**Step 3 : Load the package**

numpy

**Working with numpy array**

Let’s create a sample numpy array

y x numpy$array(y)

[,1] [,2]

[1,] 1 3

[2,] 2 4

**Transpose the above array**

numpy$transpose(x)

[,1] [,2]

[1,] 1 2

[2,] 3 4

**Eigenvalues and eigen vectors**

numpy$linalg$eig(x)

[[1]]

[1] -0.3722813 5.3722813

[[2]]

[,1] [,2]

[1,] -0.9093767 -0.5657675

[2,] 0.4159736 -0.8245648

**Mathematical Functions**

numpy$sqrt(x)

numpy$exp(x)

**Working with Python interactively**

`repl_python()`

function, you can make it interactive. Download the **dataset**used in the program below.

repl_python()# Load Pandas packageimport pandas as pd# Importing Datasettravel = pd.read_excel(“AIR.xlsx”)# Number of rows and columns

travel.shape# Select random no. of rows

travel.sample(n = 10)# Group By

travel.groupby(“Year”).AIR.mean()# Filter

t = travel.loc[(travel.Month >= 6) & (travel.Year >= 1955),:]# Return to R

exit

Note : You need to enter **exit** to return to the R environment.

Run Python from R |

**How to access objects created in python from R**

You can use the **py** **object** to access objects created within python.

summary(py$t)

In this case, I am using R’s **summary( ) function** and accessing dataframe **t **which was created in python. Similarly, you can create line plot using ggplot2 package.

# Line chart using ggplot2

library(ggplot2)

ggplot(py$t, aes(AIR, Year)) + geom_line()

**How to access objects created in R from Python**

**r object**to accomplish this task.

**1. Let’s create a object in R**

mydata = head(cars, n=15)

**2. Use the R created object within Python REPL**

repl_python()

import pandas as pdr.mydata.describe()

pd.isnull(r.mydata.speed)

exit

**Building Logistic Regression Model using sklearn package**

repl_python()# Load librariesfrom sklearn import datasetsfrom sklearn.linear_model import LogisticRegression# load the iris datasetsiris = datasets.load_iris()# Developing logit modelmodel = LogisticRegression()model.fit(iris.data, iris.target)# Scoringactual = iris.targetpredicted = model.predict(iris.data)# Performance Metricsprint(metrics.classification_report(actual, predicted))print(metrics.confusion_matrix(actual, predicted))

**Other Useful Functions**

**To see configuration of python**

Run the **py_config( ) **command** **to find the version of R installed on your system.It also shows details about anaconda and numpy.

py_config()

python: C:UsersDELLANACON~1python.exe

libpython: C:/Users/DELL/ANACON~1/python36.dll

pythonhome: C:UsersDELLANACON~1

version: 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]

Architecture: 64bit

numpy: C:UsersDELLANACON~1libsite-packagesnumpy

numpy_version: 1.14.2

**To check whether a particular package is installed**

In the following program, we are checking whether **pandas **package is installed or not.

py_module_available(“pandas”)

Deepanshu founded ListenData with a simple objective – Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let’s Get Connected: LinkedIn

**leave a comment**for the author, please follow the link and comment on their blog:

**ListenData**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News