I just updated my RStudio version to the latest, v.0.94.92 (will this asymptotically approach 1, or actually get to 1?). It was nice to see the number of improvements the development team has implemented, based I’m sure on community feedback. The team has, in my experience, been extraordinarily responsive to user feedback, and I’m sure this played a large part in the development path taken by the team.
First and foremost, I was happy to see most of my wants met in this version:
I’ve often selected columns or rows of a data frame using grep or which, based on some property. That is inherently sound, but the trouble comes when you wish to remove rows or columns based on that grep or which call, e.g.,
dat <- dat[,-grep(’\.1’, names(dat))] which would remove columns with a .1 in the name. This is fine the first time around, but if you forget and re-run the code, grep(’\.
One of the disappointing problems in SAS (as I need PROC MIXED for some analysis) is to recode categorical variables to have a particular reference category. In R, my usual tool, this is rather easy both to set and to modify using the relevel command available in base R (in the stats package). My understanding is that this is actually easy in SAS for GLM, PHREG and some others, but not in PROC MIXED.
We often see, in publications, a Kaplan-Meier survival plot, with a table of the number of subjects at risk at different time points aligned below the figure. I needed this type of plot (or really, matrices of such plots) for an upcoming publication. Of course, my preferred toolbox was R and the ggplot2 package.
There were other attempts to do this type of plot in ggplot2, mainly by Gary Collins and an anonymous author as seen on the ggplot2 mailing list.
As most followers of R-bloggers.com and the Twitter #rstats know by now, RStudio is a new open-source IDE for R that was beta-released yesterday. I have started putting it through its paces within my R workflow, and my impressions are more than favorable. I also tried it out on my home Linux server in server mode.
RStudio is obviously designed by people who actually use R and code in R for their data analyses.
Last night at the DC R Users meetup, which was our largest meetup to date, I gave an introductory presentation on data munging, and spent a bit of time on the split-apply-combine paradigm that I use almost daily in my work. I talked mainly about the packages plyr and doBy, which I use a lot now. David Smith posted a link on the Revolution blog to this article by Steve Miller, talking about the virtues of the data.
I’ve been working on a long-term (25+yr) longitudinal study of rheumatoid arthritis with my boss. He just walked in and asked if I could create a plot showing the trajectory of pain scores over time for each subject, separated by educational level (4 groups). Having now worked with ggplot2 for a while, and learning more at the last two DC useR meetups, I realized that I could formulate this in ggplot very easily and in short order.
Forest plots are most commonly used in reporting meta-analyses, but can be profitably used to summarise the results of a fitted model. They essentially display the estimates for model parameters and their corresponding confidence intervals.
Matt Shotwell just posted a message to the R-help mailing list with his lattice-based solution to the problem of creating forest plots in R. I just figured out how to create a forest plot for a consulting report using ggplot2.
The useR! 2010 R users conference just finished up this afternoon with a thought-provoking, controversial, and sometimes hilarious talk by Richard Stallman of GNU fame. It started on Tuesday with great tutorials (I took ones on MICE for multiple imputation and Frank Harrell’s excellent regression modeling). In between these bookends was a wonderful conference where I got the chance to put faces to names (from their online presence), make many new friends, hopefully no enemies, and learn quite a bit.