What a Data Scientist does?

BoscoR is a solution for scientists - using R open source software - who want to expand the computing resources beyond a desktop or a single cluster, yet with the best user experience. As of today, many scientists do not know how to move easily to a cluster and work as easily as with a laptop.
Q: Do I need to be a computer genius to set up cloud cluster?
A: If you want a whole cluster of machines in order to parallelize your jobs, you will probably require a little more technical expertise than for a single machine– certainly for the initial setup. Then again, once the cluster is up and running, there are many tools and packages facilitating the parallelization of jobs for the end-user.
We hope Bosco R using a package GridR will become one of the most simple to use. Why we believe this?

Just read Derek Weitzel blog on R.

But who are the Data Scientists today and what is the role of statisticians in the Big Data  hoopla ?


Big Data [sorry] and  Data Science: What Does a Data Scientist Do? from Data Science London

There is a big debate about who the Data Scientists are. Are they statisticians or are they computer science? For now, only a tiny percentage of the statisticians are litterate in super-computing, high throughput computing , and clusters.

Here is a  reference to the an article by Larry Wasserman which is worth reading:
Generally, speaking, statisticians have limited computational skills. I saw a talk a few weeks ago in the machine learning department where the speaker dealt with a dataset of size 10 billion. And each data point had dimension 10,000. It was very impressive. Few statisticians have the skills to do calculations like this.
 What do we do about it? Whining won’t help. We can complain that that “data scientists” are ignoring biases, not computing standard errors, not stating and checking assumption and so on. No one is listening.
First of all, we need to make sure our students are competitive. They need to be able to do serious computing, which means they need to understand data structures, distributed computing and multiple programming languages.
Second, we need to hire CS people to be on the faculty in statistics department. This won’t be easy: how do we create incentives for computer scientists to take jobs in statistics departments?
Statistics needs a separate division at NSF. Simply renaming DMS (Division of Mathematical Sciences) as has been debated, isn’t enough. We need our own pot of money. (I realize this isn’t going to happen.)
 This is changing fast. This is quote from Joseph Rickert from Revolution Analytics
R is a fundamental tool of modern computational statistics that provides the very bridge to data science... In recent years, survey after survey  has singled out R as being one of the top tools for data scientists 
See slide 26 of ten things Data Scientists do. The last bullet #10:  "Tell Relevant Business Stories from Data" No one can tell these stories without a creative mind who knows the nuances and finesse of mathematical insights this is an art statisticians posses.

The wonders will happen as soon as R users cannot  stumble for lack of computing resources.

Post a Comment

Popular Posts