Linear splines with convenient parametrisations

By Michał

(This article was first published on R – Brokering Closure, and kindly contributed to R-bloggers)

I have just published a small package lspline on CRAN that implements linear splines using convenient parametrisations such that

  • coefficients are slopes of consecutive segments
  • coefficients capture slope change at consecutive knots

Knot locations can be specified

  • manually (with lspline())
  • at breaks dividing the range of x into q equal-frequency intervals (with qlspline())
  • at breaks dividing the range of x into n equal-width intervals (with elspline())

The implementation follows Greene (2003, chapter 7.5.2).

The package sources are on GitHub here.

Examples

We will use the following artificial data with knots at x=5 and x=10:

set.seed(666)
n <- 200
d <- data.frame(
  x = scales::rescale(rchisq(n, 6), c(0, 20))
)
d$interval <- findInterval(d$x, c(5, 10), rightmost.closed = TRUE) + 1
d$slope <- c(2, -3, 0)[d$interval]
d$intercept <- c(0, 25, -5)[d$interval]
d$y <- with(d, intercept + slope * x + rnorm(n, 0, 1))

Plotting y against x:

library(ggplot2)
fig <- ggplot(d, aes(x=x, y=y)) + 
  geom_point(aes(shape=as.character(slope))) +
  scale_shape_discrete(name="Slope") +
  theme_bw()
fig

The slopes of the consecutive segments are 2, -3, and 0.

Setting knot locations manually

We can parametrize the spline with slopes of individual segments (default marginal=FALSE):

library(lspline)
m1 <- lm(y ~ lspline(x, c(5, 10)), data=d)
knitr::kable(broom::tidy(m1))
term estimate std.error statistic p.value
(Intercept) 0.1343204 0.2148116 0.6252941 0.5325054
lspline(x, c(5, 10))1 1.9435458 0.0597698 32.5171747 0.0000000
lspline(x, c(5, 10))2 -2.9666750 0.0503967 -58.8664832 0.0000000
lspline(x, c(5, 10))3 -0.0335289 0.0518601 -0.6465255 0.5186955

Or parametrize with coeficients measuring change in slope (with marginal=TRUE):

m2 <- lm(y ~ lspline(x, c(5,10), marginal=TRUE), data=d)
knitr::kable(broom::tidy(m2))
term estimate std.error statistic p.value
(Intercept) 0.1343204 0.2148116 0.6252941 0.5325054
lspline(x, c(5, 10), marginal = TRUE)1 1.9435458 0.0597698 32.5171747 0.0000000
lspline(x, c(5, 10), marginal = TRUE)2 -4.9102208 0.0975908 -50.3143597 0.0000000
lspline(x, c(5, 10), marginal = TRUE)3 2.9331462 0.0885445 33.1262479 0.0000000

The coefficients are

  • lspline(x, c(5, 10), marginal = TRUE)1 – the slope of the first segment
  • lspline(x, c(5, 10), marginal = TRUE)2 – the change in slope at knot x = 5; it is changing from 2 to -3, so by -5
  • lspline(x, c(5, 10), marginal = TRUE)3 – tha change in slope at knot x = 10; it is changing from -3 to 0, so by 3

The two parametrisations (obviously) give identical predicted values:

all.equal( fitted(m1), fitted(m2) )
## [1] TRUE

graphically

fig +
  geom_smooth(method="lm", formula=formula(m1), se=FALSE) +
  geom_vline(xintercept = c(5, 10), linetype=2)

Knots at n equal-length intervals

Function elspline() sets the knots at points dividing the range of x into n equal length intervals.

m3 <- lm(y ~ elspline(x, 3), data=d)
knitr::kable(broom::tidy(m3))
term estimate std.error statistic p.value
(Intercept) 3.5484817 0.4603827 7.707678 0.00e+00
elspline(x, 3)1 0.4652507 0.1010200 4.605529 7.40e-06
elspline(x, 3)2 -2.4908385 0.1167867 -21.328105 0.00e+00
elspline(x, 3)3 0.9475630 0.2328691 4.069080 6.84e-05

Graphically

fig +
  geom_smooth(aes(group=1), method="lm", formula=formula(m3), se=FALSE, n=200)

Knots at quantiles of x

Function qlspline() sets the knots at points dividing the range of x into q equal-frequency intervals.

m4 <- lm(y ~ qlspline(x, 4), data=d)
knitr::kable(broom::tidy(m4))
term estimate std.error statistic p.value
(Intercept) 0.0782285 0.3948061 0.198144 0.8431388
qlspline(x, 4)1 2.0398804 0.1802724 11.315548 0.0000000
qlspline(x, 4)2 1.2675186 0.1471270 8.615132 0.0000000
qlspline(x, 4)3 -4.5846478 0.1476810 -31.044273 0.0000000
qlspline(x, 4)4 -0.4965858 0.0572115 -8.679818 0.0000000

Graphically

fig +
  geom_smooth(method="lm", formula=formula(m4), se=FALSE, n=200)

Installation

Stable version from CRAN or development version from GitHub with

devtools::install_github("mbojan/lspline", build_vignettes=TRUE)

Acknowledgements

Inspired by Stata command mkspline and function ares::lspline from Junger & Ponce de Leon (2011). As such, the implementation follows Greene (2003), chapter 7.5.2.

  • Greene, William H. (2003) Econometric analysis. Pearson Education
  • Junger & Ponce de Leon (2011) ares: Environment air pollution epidemiology: a library for timeseries analysis. R package version 0.7.2 retrieved from CRAN archives.
To leave a comment for the author, please follow the link and comment on their blog: R – Brokering Closure.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.