Data Science : Is Python better than R?


For decades, researchers and developers have been debating whether Python or R is a better language for data science and analytics. Data science has rapidly grown across a variety of industries including biotech, finance and social media. Its importance is being recognized not only by the people working in the industries, but also by academic institutions that are now beginning to offer data science degrees. With the adoption of open source technologies rapidly taking over traditional, closed-source commercial technologies, Python and R have become extremely popular amongst data scientists and analysts.

“Data science job growth chart — Indeed.com

A (Very) Short Introduction

Invented by Guido van Rossum, Python was first released in 1991. Python 2.0 was released in 2000, and eight years later Python 3.0 was also released. Python 3.0 has some major syntax revisions, and is not backward-compatible with Python 2.0 . However, there are Python libraries such as 2to3 that automate translation between the two versions. Python 2.0 is currently scheduled for retirement in 2020.

R is invented in 1995 by Ross Ihaka and Robert Gentleman. It was initially an implementation of S programming language, which was invented by John Chambers in 1976. A stable beta version 1.0.0 was released in 2000. Currently, it is maintained by R Development Core Team and the latest stable version is 3.5.1 . Unlike Python, R has no major changes in the past that requires syntax conversions.

Guido van Rossum (left) Ross Ihaka (middle) Robert Gentleman (right)

Both Python and R have large user communities and support. A 2017 surveydone by Stack Overflow has revealed that almost 45% of data scientists used Python as their main programming language. R, on the other hand, was used by 11.2% of the data scientists.

“Developer Survey Results 2017” — Stack Overflow

It is important to note that Python, specifically the Jupyter Notebook , has gained tremendous popularity during recent years. While Jupyter Notebook can be used for languages other than Python, it is mostly used to document and showcase Python programs in browsers for data science competitions such as Kaggle. A survey done by Ben Frederickson has revealed Jupyter Notebook ’s percentage of Monthly Active Users (MAU) on Github has risen sharply after 2015.

Ranking Programming Languages by GitHub Users” — Ben Frederickson

As Python gains its popularity in the recent years, we observe a small decline in the percentage of MAU in Github users coding in R. Nevertheless, both languages are still incredibly popular amongst data scientists, engineers and analysts.

Finally, I highly recommend you to read Karlijn Willems’s Choosing R or Python for Data Analysis? An Infographic. It is provides a great visual summary to what we’ve discussed so far.