The dirty little secret of Hadoop has been just how dull many of its tasks have been. By far the biggest use for Hadoop to date has been as a "poor person's ETL"—that is, a form of data integration, at the risk of oversimplifying—rather than all the big, sexy data science we see constantly hyped.The Big Data "hyped" is not trivial
In a webinar published in skytree.net site, Machine Learning: How to Make it Work in Your Organization, Bradley Voytek, a UCSD Neuroscience Professor and Uber Data Evangelist is one of the speakers. He taught me a vivid lesson that common people are not able by themselves to make sense on big data.
- It is foolish to believe that my data have a better understanding of the world than I do
- It is arrogant to believe that the person who best knows what to do with my data is me.
|Professor Bradley Voytech, Ph.D and family|
- The more advanced the statistical method used, the fewer critics are available to be properly skeptical
- The more advanced the statistical method used, the more likely the data analysts will be to use math as a shield
To illustrate, Bradley shows his calculations on how many people were born in British Empire between September 4, 1752 and September 13, 1752. He extracted world's data births for that period, extrapolated and then applied a % proportional to the British Empire share of the then known world population
- Any sufficiently advanced statistics, can trick people into believing the results reflect truth
However it was impossible for any citizen of the British Empire to be born between September 4, 1752 and September 13, 1752. From Wikipedia
Year 1752 (MDCCLII) was a leap year starting on Saturday of the Gregorian calendar, and a leap year starting on Wednesday of the 11-day slower Julian calendar. In the British Empire, it was the only year with 355 days, as September 3 through September 13 were skipped.Sometimes the great algorithms we have can fail, if we have no knowledge of the real world. It is important to know when our models work, but it is equally important to know when our models break.
By the way Hadoop does not do any predictive analytics. It just collects the data ready to be analyzed.
Skytree's CTO, Alexander Gray says there is not one ML (Machine Learning) algorithm universally valid. Is your analysis parametric or non-parametric? Frequentist or Bayesian ? If you rush to look up the definitions on these terms, you proved Bradley Voytek right.
|Alexander Gray, Ph. D, Skytree CTO|
They are the big data scientists elite. able to use R open source predictive analysis in clusters, or hiring Revolution Analytics (who actually use R open source to deliver more solid and easier to use predictive analytics). Or maybe using Skytree Server - The Machine Learning Server which according to the web site
The Hadoop ready enterprise Machine Learning platform. Delivering high performance Advanced Analytics for critical business issues, such as: churn prediction, fraud detection, lead scoring, customer segmentation, recommendations and more.This product embodies the popular perception that we can press buttons to discover money making opportunities from Machine Learning . Perhaps for a reduced set of capabilities, this could be a solution that eliminates the need to consult the data science elite for some more day to day tasks
Woody Allen says one of the secrets of success is just showing up. So in spite of the rumored difficulties . Skytree at least showed up with a product that works for mere mortals and not just elites