## 7 comments on “Some Quirks of the R Language”

1. Re: timing of x <- x+0

It's not an allocation thing (exactly) -- rather x is originally a vector of ints (as returned by `:`) and "x<- x + 0" entails coercion to 0 (because 0 is a double, vs 0L the integer literal for zero); for repeated work, you already have x as a double so that step gets skipped (see storage.mode() before and after). If you use x <- x + 0L, you should get roughly consistent timings each step.

• It's a great book that I've recommended on the site several times. Every beginner with R should read it.

2. The as.character(list(x)) example is funny but the reason is not what you think but coercion from "integer" to "numeric".

is(1:10)
is(1:10 + 0)
deparse(1:10)
deparse(1:10+0)
deparse(1:10+0L)

3. I have never figured out what's strange or wrong about Ihaka's example. This is how lexical scoping works. Inside the function, the x variable is shadowed -- in this case only if the predicate evaluates to TRUE. When return is called, R first looks for an x in the function environment then, if it can't be found, looks in the calling environment. Unless I am missing something about this particular example, this is standard behavior for lexically scoped languages. Just because there is an artificial random component added into the mix doesn't mean scoping is a problem.

4. Ihaka's example illustrates that R's scoping is not "lexical" but rather dynamic with deep binding.

"Lexical" derives from "lexis," referring to written words. In a lexically scoped language, variable should derive strictly from the text of the code. That is, you should be able to resolve the scope of an identifier directly by scanning the source code -- not by simulating the execution of the code. Ihaka's example shows that this is false of R, in order to resolve the location of "x" you must simulate the call to runif.

This requirement, that scoping can be determined without executing, is important because in a lexically scoped language, name resolution can be done ahead of time -- during the compilation phase, each symbol appearing in the code is replaced by a pointer offset and an integer. This is described in the original Scheme paper. ( http://dspace.mit.edu/handle/1721.1/5794 )

R, in contrast, has no ahead-of-time information on where variables are to be found, so it is stuck with explicitly scanning through linked lists of environments every time it wants to retrieve the value of "x", which is why R spends a good percentage of its execution time just looking up the values of variables even in byte-compiled code. At the implementation level it is not like the lexical scoping of Scheme, but the less efficient "deep binding" construct of Lisp 1.5 (as described in the paper "The function of FUNCTION in Lisp" http://dspace.mit.edu/handle/1721.1/5854 )