Open Source

R is an open source software created over 20 years ago by Ihaka and Gentleman at the University of Auckland, New Zealand. However, its history is even longer as its lineage goes back to the S programming language created by John Chambers out of Bell Labs back in the 1970s.1 R is actually a combination of S with lexical scoping semantics inspired by Scheme.2 Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. Unbeknownst to many the S language has been a popular vehicle for research in statistical methodology, and R provides an open source route to participate in that activity.

Although the history of S and R is interesting3, the principal artifact to observe is that R is an open source software. Although some contest that open-source software is merely a “craze,”4 most evidence suggests that open-source is here to stay and represents a new5 norm for programming languages. Open-source software such as R blurs the distinction between developer and user which provides the ability to extend and modify the analytic functionality to your, or your organization’s needs. The data analysis process is rarely restricted to just a handful of tasks with predictable input and outputs that can be pre-defined by a fixed user interface as is common in proprietary software. Rather, as previosly mentioned in the introduction, data analysis includes unique, different, and often multiple requirements regarding the specific tasks involved. Open source software allows more flexibility for you, the data analyst, to manage how data are being transformed, manipulated, and modeled “under the hood” of software rather than relying on “stiff” point and click software interfaces. Open source also allows you to operate on every major platform rather than be restricted to what your personal budget allows or the idiosyncratic purchases of organizations.

This invariably leads to new expectations for data analysts; however, organizations are proving to greatly value the increased technical abilities of open source data analysts as evidenced by a recent O’Reilly survey revealing that data analysts focusing on open source technologies make more money than those still dealing in proprietary technologies.


  1. Consequently, R is named partly after its authors (Ross and Robert) and partly as a play on the name of S.

  2. Morandat, Frances; Hill, Brandon (2012). Evaluating the design of the R language: objects and functions for data analysis. ECOOP’12 Proceedings of the 26th European conference on Object-Oriented Programming.

  3. See Roger Peng’s R programming for Data Science for further, yet concise, details on S and R’s history.

  4. This was recently argued by Pollack et al. which was appropriately rebutted by Boehmke & Jackson. See my post which provides both articles.

  5. Open-source is far from new as its been around for decades (i.e. A-2 in the 1950s, IBM’s ACP in the ’60s, Tiny BASIC in the ’70s) but has gained prominence since the late 1990s.