My deal is HFT so what I care about is
- read/load data from file or DB quickly in memory
- perform very efficient data-munging operations (group,transform)
- visualize easily the data
I think is is pretty clear that 3. goes to R, graphics and ggplot2 and others allow you to plot anything from scratch with little effort.
About 1. and 2. I am amazed reading previous post to see that people are advocating for python based on pandas and that no one cites data.table
The data.table is a fantastic package that allows blazing fast grouping/transforming of tables with 10s million rows. From this bench you can see that data.table is multiple time faster than pandas and much more stable (pandas tend to crash on massive tables)
R) DT = data.table(x=rnorm(2e7),y=rnorm(2e7),z=sample(letters,2e7,replace=T))
NAME NROW NCOL MB COLS KEY
[1,] DT 20,000,000 3 458 x,y,z
user system elapsed
0.226 0.037 0.264
user system elapsed
0.118 0.022 0.140
Then there is speed, as I work in HFT neither R nor python can be used in production. But the Rcpp package allows you to write efficient C++ code and integrate it to R trivially (literally adding 2 lines). I doubt R is fading, given the number of new packages created every day and the momentum the language has...
A few years latter I am amazed by how the R ecosystem has evolved. For in-memory computation you get unmatched tools, from fst for blazing fast binary read/write, fork or cluster parallelism in one liners. C++ integration is incredibly easy with Rcpp. You get interactive graphics with the classics like plotly, crazy features like ggplotly (just makes your ggplot2 interactive).
For trying python with pandas I honestly do not understand how there could even be a match. Syntax is clunky and performance is poor, I must be too used to R I guess.
Another thing that is really missing in python is litterate programming, nothing comes close to rmarkdown (the best I could find in python was jupyter but that does even come close).
With all the fuss surrounding the R vs Python langage war I realize that vast majority of people are simply uninformed, they do not know what data.table is, that it has nothing to do with a data.frame, they do not know that R fully supports tensorflow and keras....
To conclude I think both tools can do everything and it seems that python langage has very good PR...