The Machine Learning prophecy from Apcera is proven true

Hadoop future is changing

At Strata + Hadoop World 2016  from March 28-31, Hadoop turned 10 year old . As Doug Cutting, it's co-creator says:
 [I think ]the continuing changes it helped unleash likely will result in a diminished role for the Hadoop core technology itself in future big data applications.... [I see Spark] as a replacement for MapReduce, 
This  was as a bombshell for many people who consider Hadoop the sine qua non tool for Big Data. Kuldip Pabla and I wrote in 2012 a  Hadoop 101 paper   explaining to the layman.

I discovered it was much easier to write a popular article on Hadoop than using Hadoop.
"Hadoop it’s damn hard to use." Todd Papaioanou  Chief Cloud Architect at Yahoo and his 120-person team were tasked with setting up 45,000 Hadoop servers in Yahoo’s 400,000 node private cloud.
 “Hadoop is hard – let’s make no bones about it, It’s damn hard to use. It’s low-level infrastructure software, and most people out there are not used to using low-level infrastructure software.”

Apcera 2016  Predictions

Among Apcera's 2016 Predictions For The Future Of Enterprise Cloud Derek Collison , the Founder and CEO has the #1
...  Machine Learning, and not what we think of as Big Data today, will provide the insights, predictions, causation and correlations that will drive the modern enterprise within the next two years.
This tweet from November 23, 2015 summarizes why:
The vision (a prediction is after all a vision) surfaced in an interview  at Structure show 2015.  George Gilbert, (George) the analyst from asks very sharp questions. Derek Collison (Derek) replies.

If you are like me, listening to the tape is not enough. I transcribed some key segments from the video, keeping the casual conversational flow.

Google's do-over


A lot of what is now manifesting  itself in  the enterprise ecosystem, which was the initial MapReduce, and doing laws, and lets slap SQL in front of it, and add more memory, you see all these parallels in the industry. Eventually Google came and said: let's start a do over.
(Miha's note: Google is no longer a search company. It’s a machine-learning company.  For example Google Dream  
  1. It uses probabilities rather than the true/false binary.
  2. Humans accept a loss of control and precision over the details of the algorithm.
  3. The algorithm is refined and modified through a feedback process.)
What we see, right now, is that notion of "whatever Hadoop 3.0 becomes" you know the evolution of Spark, the automation of setting the things up, you know, one click and you spin this thing up - maybe in containers , orchestration, whatever - and you tear them down maybe 10 minutes later.

Something is coming very quickly in our rear-view mirror

What is more interesting though, - as we struggle how to get the data spinning up the back end systems and making sense from it - I think there is something coming up very quickly in our rear-view mirror, that passes us before anyone knows what is going on. This is the Machine Learning. 

No one will be talking Big Data in 24 month (November 2017)

It  is going to get so good, so fast, so in 24 month, I mean literally in less than 24 months, that no one will be talking about Big Data. We will just spit in data like using a fire hose, and then spit out patterns predictions, correlations, causation, that we could never  understood. These technologies are compressing things so hard and our brains are built linearly. That why we can not see it we can not see this.

We will not bother with Hadoop 3.0. We will use "this thing"

I do believe that the notion of Hadoop 3.0 would  simply be, "we will not even bother with it."  We are going to plug our data in the Google Brain project  or other things coming out from Amazon Machine Learning or IBM Watson. Whatever this thing is, we don't have to operate it, or worry about it. We just simply pump data into it and get an amazing amount of value . I truly believe this will happen faster than the people think

Profound Insight and Profoundly Unsettling


This is just a profound insight and profoundly unsettling. Along those lines,   my impression is we always will add more data feeds to improve the context of those automated decisions, and that's a manual process. You can't tell IBM Watson: "Look at all the data feeds that are in the world  and figure out which ones are relevant for improving a false positive I have."


Right now the big problem is how do you model the data correctly to put into these systems. But what I am saying, and I may be wrong,  the ability for them to auto-figure that stuff out , is coming faster than we think. We don't have to teach (the systems) how to do it. 98% of our learning, even as children, is not supervised. I may sound outlandish, and I might be wrong, but my gut tells me this wave of computing and where are we going is coming fastest than our ability to predict when.    

Read Part 2 of this blog


Popular Posts