18 comments on “How Much of R is Written in R?

  1. This is great. That 22% should be rewritten in a high-level language with 0-indexed arrays, sensible data structures, and normal scope rules. This wouldn't look much like R (it's just be a bunch of libraries in a high-level language), but it would drastically increase the uptake of the R libraries by users (the lousiness and novelty of the S language is the biggest barrier). Anyone interested in joining me?

  2. Why not use R to do this?
    i.e. use R to tabulate the files types in R's source code.

    e.g.
    rUrl <- "http://cran.r-project.org/src/base/R-2/R-2.13.1.tar.gz&quot;
    (temp <- tempfile(fileext = ".tar.gz"))
    # may take a little while to grab a 20mb file.
    system.time(download.file(rUrl, temp))
    str(file.info(temp)) # 21mb file
    # extract only filenames from a compressed tarred file (on windows)
    system.time(filePaths <- untar(tarfile = temp,
    files = NULL, list = T, compressed = NA,
    verbose = FALSE, tar = Sys.getenv("TAR")))
    str(filePaths)
    head(filePaths, 2e1L)
    # focus on file extensions
    fileNames <- basename(filePaths)
    ext <- sub(".+(\.[A-Za-z]+$)", "\1", fileNames)
    # top 50 file types
    head(numTypes <- sort(table(ext[grep("\.", ext)]), de = T), 50)
    # .R, .c, .f only
    rcf <- unlist(strsplit("rcf", ""))
    (types <- paste(".", c(rcf, toupper(rcf)), sep = ""))
    (ans rUrl (temp # may take a little while to grab a 20mb file.
    > system.time(download.file(rUrl, temp))
    trying URL 'http://cran.r-project.org/src/base/R-2/R-2.13.1.tar.gz'
    Content type 'application/x-gzip' length 22063747 bytes (21.0 Mb)
    opened URL
    downloaded 21.0 Mb

    user system elapsed
    0.19 1.21 31.24
    > str(file.info(temp)) # 21mb file
    'data.frame': 1 obs. of 7 variables:
    $ size : num 22063747

    ....

    > (ans round(ans / sum(ans) * 1e2L, 0)

    .R .c .f
    55 42 3
    > # remove 20mb source code.
    > file.remove(temp)
    [1] TRUE

    • Cool approach, but unless I misread, this doesn't get the lines of code, which I think is also an important measure. I'm sure it's possible to get that in R (without using system() to call wc), but I don't know how.

      Generally, my feeling is that R is very good at what R does, but that it's not really all that well suited for shell tasks. R can be used as a general scripting language, but it's nowhere near my top choice for that.

      Plus, I just love sed's regex syntax. I honestly think it's adorable. Thanks for the cool idea, though!

    • Wow! This is really cool. Your graphs are especially beautiful.

      I was actually working on something that would grab all the historical data, and was basically done with it except for fixing some weird problems that occur when you run it more than once (to update it without having to re-do everything that's already done). I'll probably still finish it up and post it since I'm nearly done, but I have to say that you guys put me to shame!

  3. Pingback: How Much of R is Written in R Part 2: Contributed Packages « librestats

  4. Pingback: Ma quanto R รจ scritto in R? « Chemiomet[R]ia

  5. Pingback: Introduction to programming in C/C++ | NerdaHolyC

Leave a Reply

Your email address will not be published. Required fields are marked *


nine − = one

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *