4 Notes of C++ and R
The following script aims to review the function of Rcpp
. As a sociologist, I always struggle to understand some underlying processes, and I am curious about digging a bit more into this issue!
WARNING! All my notes are for macOS users.
Pending,
- Deconstruct some
C++
codes for clarity - Re-write some notes and expand them
- Add some
Rcpp
codes for Social Network Analysis (maybe new script) - Create a new script for
RcppArmadillo
4.1 C++ and Rcpp
C++
is a compiled and statistically typed language.
According to Wikipedia?
C++ is a general-purpose programming language created by Bjarne Stroustrup as an extension of the C programming language, or “C with Classes”.
Why Rcpp
?
Sometimes R code just isn’t fast enough. You’ve used profiling to figure out where your bottlenecks are, and you’ve done everything you can in R, but your code still isn’t fast enough […] learn how to improve performance by rewriting key functions in C++. This magic comes by way of the Rcpp
package (Eddelbuettel and François 2011) (with key contributions by Doug Bates, John Chambers, and JJ Allaire).
From here I would assume that you have some previous background in the C++
programming1.
4.3 Setup
Before using Rcpp
I would need to install Xcode
in the app store. For Windows, Rtool
have to be installed, and in Linux use a code such as sudo apt-get install r-base-dev
.
4.4 Rcpp Introduction
We would now create a C++
function- However, the most important element in the following code to connect R
with C++
is this code // [[Rcpp::export]]
, which allow exporting the cumSum
function that is created in the following chunk into R
.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumSum(NumericVector x) {int n = x.size();
NumericVector out(n);0] = x[0];
out[for (int i = 1; i < n; ++i) {
/*
++i: incrementing (adding 1 to a variable)
--i: decrementing (subtracting 1 from a variable)
i++: increment after the variable's value is used in the remainder of the expression
i--: decrement occurs after the variable's value is used in the remainder of the expression
*/
1] + x[i];
out[i] = out[i-
}return out;
}
Same code in R
:
<- function(x) {
CumSum <- numeric(length(x))
output 1] <- x[1]
output[for (i in 2:length(x)) {
<- output[i-1] + x[i]
output[i]
}return(output)
}
This is the process that we are calculating with the function in the first iteration \((0-1)+2\), then \((1-1)+3\), and \((3-1)+4....\). Recall that C++
start from \(0\) rather than \(1\) (in C++, vectors indices start at 0). Now, we have three similar functions. One from base R
, the other is a C++
function (called: cumSum
), and the loop CumSum
created in R
. They should give the same results:
<- 1:10
x cumsum(x)
cumSum(x)
CumSum(x)
#> [1] 1 3 6 10 15 21 28 36 45 55
#> [1] 1 3 6 10 15 21 28 36 45 55
#> [1] 1 3 6 10 15 21 28 36 45 55
Now, we can compare the performance of each function using microbenchmark
. Rcpp
should be faster than the R
functions:
library(magrittr)
library(microbenchmark)
<- 1:1000
x microbenchmark(
native = cumsum(x),
loop = CumSum(x),
Rcpp = cumSum(x)
%>% summary(unit = "ms") %>% knitr::kable(format = "markdown") )
expr | min | lq | mean | median | uq | max | neval | cld |
---|---|---|---|---|---|---|---|---|
native | 0.005017 | 0.0053145 | 0.0058378 | 0.0055340 | 0.0063890 | 0.009001 | 100 | a |
loop | 0.089442 | 0.0916370 | 0.0952121 | 0.0949095 | 0.0974105 | 0.115373 | 100 | b |
Rcpp | 0.004407 | 0.0051995 | 0.0212336 | 0.0063640 | 0.0103825 | 1.323327 | 100 | a |
The speed and ability to cheaply use resources are two of the main features of C++
. Due that is a compiler and not an interpreter is one of the reasons why is faster than R
.
Using Rcpp
we can use C++
codes directley into R
(evalCpp
)
library(Rcpp)
# evaluate simple expression as C++
::evalCpp("3 + 2")
Rcpp#> [1] 5
set.seed(42)
::evalCpp("Rcpp::rnorm(2)")
Rcpp#> [1] 1.3709584 -0.5646982
# maximum numeric limits of my computer in terms of double
::evalCpp("std::numeric_limits<double>::max()")
Rcpp#> [1] 1.797693e+308
Also, we can add functions from C++
(cppFunction
) into the R
environment.
library(Rcpp)
## Example 1
# create a C++ function to be used in R
cppFunction('int add(int x, int y, int z) {
int sum = x + y + z;
return sum;
}')
# R function with same name as C++ function
add#> function (x, y, z)
#> .Call(<pointer: 0x10eac18f0>, x, y, z)
add(1, 2, 3)
#> [1] 6
## Example 2
# create a C++ function to be used in R
cppFunction("
int exampleCpp11(){
auto x = 10; // guesses type
return x;
}", plugins=c("cpp11")) ## C++11 is a version of the standard for the programming language C++
# R function with same name as C++ function
exampleCpp11()
#> [1] 10
## Other examples
cppFunction("
int doubleMe(int a){return a+a;}
")
doubleMe#> function (a)
#> .Call(<pointer: 0x112b89910>, a)
<- 3
a doubleMe(a)
#> [1] 6
cppFunction("
int f(int a, int b){
return(a+b);
}
")
f(21, 21)
#> [1] 42
f(21.0, 21)
#> [1] 42
f(21.5, 21.5) # !
#> [1] 42
#f(21.5, "hello, world") # error!
There are some issues with integer
and double
Some names that are used in R
:
- integer numbers:
int
- Floating point number:
double
## R
# literal numbers are double
<- 42
x storage.mode(x)
#> [1] "double"
# integers needs the L suffix
<- 42L
y storage.mode(y)
#> [1] "integer"
<- as.integer(42)
z storage.mode(z)
#> [1] "integer"
## C++
library(Rcpp)
# Literal integers are `int`
<- evalCpp("42")
x storage.mode(x)
#> [1] "integer"
# Suffix with .0 forces a double
<- evalCpp("42.0")
y storage.mode(y)
#> [1] "double"
# explicit casting with (double)
<- evalCpp("double(40+2)")
y storage.mode(y)
#> [1] "double"
# Beware of the integer division!
# integer division
evalCpp("13/4")
#> [1] 3
# explicit cast, and hence use of double division
evalCpp("(double)13/4")
#> [1] 3.25
The last function of the package Rcpp
is sourceCpp()
that is the source code in which other functions are built (e.g. evalCpp
and cppFunction
). This function builds on and extends cxxfunction()
. On the features is that can compile a number of functions and create a file from all of them. Also, have some plugins and dependency options (e.g. RcppArmadillo
, RcppEigen
, RcppSGL
). These plugins can also turn on support for C++11
, OpenMp
, and more.
We can also write some C++
codes directly in RMarkdown adding to the chunk the following line of codes
Now we will create an example, and create a function called timesTwo
and run the C++
code as an R
function from C++
. The most important element is the code // [[Rcpp::export]]
which export the following function into R
. Also, we would ask for Rcpp
to source
the function for us. Behind the scenes Rcpp
creates a wrapper, and then Rcpp
compiles, links, and loads the wrapper. Notice that std
is tacking the standard package of C++
to conduct a second function.
#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar). ...
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x){return x * 2;
}
// You can include R code blocks in C++ files processed with source Cpp
// (useful for testing and development). The R code will be automatically
// fun after the compilation.
/*** R
timesTwo(42)
cat("hello!\n")
timesTwo(c(12,48,28))
*/
// [[Rcpp::export]]
double cubed(double x){
return std::pow(x, 3);
}
// [[Rcpp::export]]
NumericVector cubedVec(NumericVector x){return pow(x, 3);
}
Actually, if we check once again, we would have available the function into our R
environment.
timesTwo(42)
#> [1] 84
cubed(2)
#> [1] 8
cubedVec(3)
#> [1] 27
Now we would check the speed of a function from R
and another from C++
. We would try the Fibonacci sequence
# An R function
<- function(n){
f if(n < 2) return(n)
return(f(n-1) + f(n-2))
}sapply(0:10, f)
#> [1] 0 1 1 2 3 5 8 13 21 34 55
# A C++ function
::cppFunction('int g(int n){
Rcpp if (n < 2) return(n);
return(g(n-1) + g(n-2));
}
')
sapply(0:10, g)
#> [1] 0 1 1 2 3 5 8 13 21 34 55
# Check time! increase exponentially
library(rbenchmark)
benchmark(f(10), f(15), f(20))[,1:4]
#> test replications elapsed relative
#> 1 f(10) 100 0.007 1.000
#> 2 f(15) 100 0.074 10.571
#> 3 f(20) 100 0.738 105.429
# Now we would compare both functions, C++ perform incredible faster!
benchmark(f(20), g(20))[,1:4]
#> test replications elapsed relative
#> 1 f(20) 100 0.718 179.5
#> 2 g(20) 100 0.004 1.0
<- microbenchmark::microbenchmark(f(20), g(20))
res
res#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> f(20) 6483.526 6626.454 7084.06236 6819.7120 7263.3165 8991.043 100 b
#> g(20) 20.397 21.238 23.36513 22.7275 24.2655 39.573 100 a
suppressMessages(microbenchmark:::autoplot.microbenchmark(res))
4.5 D. Eddelbuettel tutorial
Some resources:
- rcpp
- Rcpp Gallery
- Blog
- LearnCpp
- cplusplus
- Seamless R and C++ Integration with Rcpp
- D. Eddelbuettel tutorial
- RStudio
In comparison with R
, C++
is a compiler and not interpreted. Therefore, we may need to supply
- header location via
-I
- library location via
-L
- library via
--llibraryname
Then, we would need to provide these items and compile them all together as,
g++ -I/usr/include -c qnorm_rmath.cpp
g++ -o qnorm_rmath qnorm_rmath.o -L/usr/lib -lRmath
Some notes of the differences between R
and C++
:
R
is dynamically typed:x <- 3.14; x <- "foo"
is valid.- In
C++
, each variable must be declared before first use. - Common types are
int
andlong
(possible withunsigned
),float
anddouble
,bool
, as well aschar
. - No standard string type, though
std::string
is close. - All these variables types are scalars which is fundamentally different from
R
where everything is a vector class
and (struct
) allow the creation of composite types; classes add behaviour to data to formobjects
- Variables need to be declared, cannot change
- Control structures similar to
R
:for
,while
,if
,switch
- Functions are similar too but note the difference in position-only matching, also same function name but different arguments allowed in
C++
- Pointers and memory management: very different but lots of issues people had with
C
can be avoided viaSTL
(which is somethingRcpp
promotes too) - Sometimes still useful to know that a pointer is…
Comparison between C++
: Scalar; and Rcpp
Vectors
int
: IntegerVectordouble
: NumericVectorchar[];std::string
: CharacterVectorbool
: LogicalVectorcomplex
: ComplexVector
#include <Rcpp.h>
// [[Rcpp::export]]
double getMax(Rcpp::NumericVector v){
int n = v.size(); // vectors are describing
double m = v[0]; // initialize
for (int i=0; i<n; i++){
if (v[i] > m){
"Now" << m << std::endl;
Rcpp::Rcout <<
m = v[i];
}
}return(m);
}
getMax(c(20, 201, 18))
#> Now20
#> [1] 201
#include <Rcpp.h>
//[[Rcpp::export]]
Rcpp::NumericVector colSums(Rcpp::NumericMatrix mat){size_t cols = mat.cols(); // How big is the matrix?
// knowing the size, we could create a vector
Rcpp::NumericVector res(cols); for (size_t i=0; i<cols; i++){ // Then operation from 0 from less of the columns
res[i] = sum(mat.column(i));
}return(res);
}
What we did?
NumericMatrix
andNumericVector
go-to types for matrix and vector operationso n floating point variables- We prefix with
Rcpp::
to make the namespace explicit - Accessor functions
.rows()
and.cols()
for dimensions - Result vector allocated based on number of columns column
- Function
column(i)
extracts a column, gets a vector, andsum()
operates on it - The last
sum()
was internally vectorised, no need to loop over all elements
Another example
#include <Rcpp.h>
//[[Rcpp::export]]
double getMax(Rcpp::NumericVector v){
int n = v.size(); // vectors are describing
double m = v[0];
for (int i=0; i<n; i++){
if (v[i] > m) {
"Now" << m << std::endl;
Rcpp::Rcout <<
m = v[i];
}
}return(m);
}
getMax(c(1,4,3,46))
#> Now1
#> Now4
#> [1] 46
struct Date {
unsigned int year;
unsigned int month;
unsigned int day
};
struct Person {
char firstname[20];
char lastname[20];
struct Date birthday;
unsigned long id;
};
class Date {
private:
unsigned int year;
unsigned int month;
unsigned int date;
public:
void setDate(int y, int m, int d);
int getDay();
int getMonth();
int getYear();
}
C++
has vectors as well: written as std::vector<T>
where T
denotes template meaning different types can be used to instantiate
::cppFunction("double getmax2(std::vector<double> v){
Rcpp int n = v.size(); // vectors are describing
double m = v[0]; // initialize
for(int i=0; i<n; i++){
if (v[i] > m){
m = v[i];
}
}
}")
getMax(c(4,5,2))
#> Now4
#> [1] 5
- STL vectors are widely used so
Rcpp
supports them - Very useful to access other
C++
code and libraries - One caveat:
Rcpp
vectors reuse R memory so no copies - STL vectors have different underpinning so copies
- But not a concern unless you have a) either HUGE data structures, b) or many many calls
::cppFunction("void setSecond(Rcpp::NumericVector v){
Rcpp v[1] = 42; // numeric vector (temporal copy! could not survey)
}")
<- c(1,2,3); setSecond(v); v # as expected
v #> [1] 1 42 3
<- c(1L, 2L, 3L); setSecond(v); v # different (vector of integers)
v #> [1] 1 2 3
#include <Rcpp.h>
// [[Rcpp::export]]
double getMax(Rcpp::NumericVector v){
return( max(v));
}
getMax(c(2,3,4,5))
#> [1] 5
For math operations, it should be used RcppArmadillo
!
Packages
- The standard unit of R code organization.
- Creating packages with Rcpp is easy
- Create an empty one via
Rcpp.package.skeleton()
- Rstudio has the File -> New Project -> Directory Choices -> Package Choices
- Create an empty one via
- The vignette
Rcpp-packages
has fuller details