13 comments on “Statistical Software Popularity on Google Scholar

    • I remember when it was named PASW, still everyone, in psychological research at least, called it SPSS. I would image that any researcher using PASW would've wrote it like PASW SPSS in their articles.

  1. As your Wikipedia article points out, the PASW name existed for only one year. When IBM bought them, they wisely reverted back to the name everyone knew. When citing the use of PASW, people should have said that it was from SPSS, Inc. which should have us covered. It's certainly worth trying though. We'll add it to see if it adds anything. The new count-based graph should be up by 4/13/2012 on http://r4stats.com/popularity.

    • I should have replied to Laura O'Grady but started a separate thread by accident, so here I am replying to my own post. We looked at PASW excluding SPSS and got a small number of messy hits. They included Plant Available Soil Water and Pluent Abdominal Segment Width. So we left both graphs (here and in the popularity article) unchanged. It was definitely worth a try though.

      • I'm not a programmer but I wonder if there is a what to write in something that can exclude, "Plant availabe soil water" etc. like:

        If PASW near 'Plant availabe soil water' then skip.

        Probably not worth the effort other than as an intellectual exercise as it has already been pointed out the likely use of PASW is limited.

        I've had research articles sent back from an editor because I didn't state the version number of SPSS I was using (in case a bug is revealed later, which could call my analysis into question).

        • I don't think Google Scholar has a "near" function as some software does, but it uses the minus sign to exclude things. That would work fine in this case, but we learned that you have to be careful not to make Google Scholar queries too complex. The logic seems to fall apart eventually. You can test this by adding a very large number of "or" conditions. The values should always increase but they can eventually decrease. That may have only happened when there were also some "and" conditions, so the test may not be as easy as that. We were totally surprised though that the logic failed ever given the popularity of Google.

          In this case the number of additional hits was tiny though.

  2. I love your "market share" chart - I prefer that as a way of representing the data to the overlaid time series chart. I wondered why SAS wasn't represented though, until I realised the "JMP" segment represents both SAS and JMP. I'm surprised at the early dominance of Systat though -- and do you know why it doesn't appear in the chart at r4stats.com/popularity? Anyway, thanks for providing this background information on how the data were collected.

  3. I think there is an erratum in your bash script: it is BMDP instead of BDMP. BDMP stands for 2,6-dimethylphenyl or another chemicals.

    • Nice catch Julio! Since we only plotted the top 6 packages, then dropped SAS and SPSS to plot the next 6, BMDP didn't show up. In 2011 it's last place with only 554 articles.

  4. Wao! am so happy to having gotten the exile file containing the raw data. I am actually a statistics scholar in my finals embarking on a project topic '' comparative analysis of the use of statistical packages over the years ''..I have been working so hard towards getting this data.. Am so grateful to whosoever made it possible for me this day 20-march-2015. I also solicit for our suggestions and assistance towards making my project work a success.... email me @ bartholomewdesmond@gmail.com , Am also on Facebook @ Desmond Decency.... Our contribution and directions on how to go about this project will be highly appreciated. God bless

Leave a Reply

Your email address will not be published. Required fields are marked *

Before commenting, please answer this simple arithmetic question to prove you aren\'t a spambot: *
Time limit is exhausted. Please reload CAPTCHA.