Friday, June 22, 2012

When will the CIO's stop offering IT services for free?

I can't understand how IT Data Centers can offer services for free. The company pays rent, the offices are not for free; it pays salaries, the employees are not volunteers; it pays for all compute resources, hardware software, network, it pays for outside cloud providers. How can an IT organization with tens of millions budgets offer the services at no cost internally? And if there is an internal cost, it does not mean the price charges is not marked up.
A cloud is a business model that has disrupted the corporate Data Center Model, but it did not get all the way through. The  Data Center Cloud(s) AND  the Outsourced Cloud elements are both owned by corporate IT and must be managed as a whole in most optimal way. As the cloud offers pay-per-use services, someone must pay for them.

Each time a corporate user buys computer resources  from  a public provider like AWS, Rackspace & similar, he receives a  utility  invoice with all the details.  Monitoring the customer use of public clouds is a growing cottage industry with companies like Cloudyn, Cloudability  Newvem and others, who extract the maximum value per buck. They are SaaS cost optimization companies . Indirectly, they  feed more business to Amazon, promoting the belief that Amazon does not charge them for resources they don't need

The question is, why can't the Data Center, -  now selling services as a Private Cloud - offer it's corporate internal users the same clear invoices as AWS and Rackspace, offer each time they receive credit card payment?

IaaS and PaaS and SaaS are geek words. We need  a report in the CEO language.  A user satisfaction report and a reports from the CIO to the CEO showing the black profit in the P&L of the Enterprise cloud.

Last week IDC released a new Cloud Decision Framework Tool. It is free to use following a registration process. The output of the tool is summarized in those two screens:

Sample Assessment Score Results from IDC Cloud Decision Framework Tool.
Sample Financial Savings Results from IDC Cloud Decision Framework Tool.
IDC tool is nice, but why it does not show the revenues side of the IT operation?
We are at a stage in cloud adoption,  when we need to run both the private and public side of a corporate cloud as one business, with internal and external users and the goal should be 1st maximizing the profit, while maintaining a specified level of service.

Today June 22, 2012, there is not even one product to offer this capability from one single company.

The only way for a CIO  to keep the CEOs attention span intact is to talk about the operating the IT as a business, i.e. being able to show an IT P and L statement at any time

Wednesday, June 06, 2012

What distinguishes the best product managers from the very good

In a previous blog entry, I said Ian McAllister from Amazon could be the most significant hands-on product management guru whom I came across. Here is another stellar contribution from Ian:

The top 10% of product managers excel at a few of these things. The top 1% excel at most or all of them:
  • Think big - A 1% PM's thinking won't be constrained by the resources available to them today or today's market environment. They'll describe large disruptive opportunities, and develop concrete plans for how to take advantage of them.
  • Communicate - A 1% PM can make a case that is impossible to refute or ignore. They'll use data appropriately, when available, but they'll also tap into other biases, beliefs, and triggers that can convince the powers that be to part with headcount, money, or other resources and then get out of the way.
  • Simplify - A 1% PM knows how to get 80% of the value out of any feature or project with 20% of the effort. They do so repeatedly, launching more and achieving compounding effects for the product or business.
  • Prioritize - A 1% PM knows how to sequence projects. They balance quick wins vs. platform investments appropriately. They balance offense and defense projects appropriately. Offense projects are ones that grow the business. Defense projects are ones that protect and remove drag on the business (operations, reducing technical debt, fixing bugs, etc.).
  • Forecast and measure - A 1% PM is able to forecast the approximate benefit of a project, and can do so efficiently by applying past experience and leveraging comparable benchmarks. They also measure benefit once projects are launched, and factor those learnings into their future prioritization and forecasts.
  • Execute - A 1% PM grinds it out. They do whatever is necessary to ship. They recognize no specific bounds to the scope of their role. As necessary, they recruit, they produce buttons, they do bizdev, they escalate, they tussle with internal counsel, they *.
  • Understand technical trade-offs - A 1% PM does not need to have a CS degree. They do need to be able to roughly understand the technical complexity of the features they put on the backlog, without any costing input from devs. They should partner with devs to make the right technical trade-offs (i.e. compromise).
  • Understand good design - A 1% PM doesn't have to be a designer, but they should appreciate great design and be able to distinguish it from good design. They should also be able to articulate the difference to their design counterparts, or at least articulate directions to pursue to go from good to great.
  • Write effective copy - A 1% PM should be able to write concise copy that gets the job done. They should understand that each additional word they write dilutes the value of the previous ones. They should spend time and energy trying to find the perfect words for key copy (button labels, nav, calls-to-action, etc.), not just words that will suffice.

I'm not sure I've ever met a 1% PM, certainly not one that I identified as such prior to hiring. Instead of trying to hire one, you're better off trying to hire a 10% PM who strives to develop and improve along these dimensions.
I voted up Ian MacAllister answer, and over a thousand people did the same. What Ian says, in essence, "don't hire on past achievements and qualifications, hire based on potentiality". This  requires a good instinct, which means the hiring manager must be a top 0.5% PM at least. Mediocre hiring managers will not be able to offer a 1% PM the environment he needs to perform.
The 1% product managers, if they can identify themselves, usually are founders of a start up. We are in the top 1% only after we are successful. Some of the star PM are "not-for-hire" and even if they were, they rarely fit in a classic corporate culture

Sunday, June 03, 2012

Hadoop 101 paper by Miha Ahronovitz and Kuldip Pabla

Originally written for Cloud Tutorial 

Welcome to
home | Cloud Types | Related Technologies

What is Hadoop?

Miha Ahronovitz, Ahrono & Associates
Kuldip Pabla, Ahrono & Associates

Hadoop is a fault-tolerant distributed system for data storage which is highly scalable. The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.

Why have Hadoop as integral part of the enterprise IT?

It processes and analyzes variety of new and older data to extract meaningful business operations wisdom. Traditionally data moves to the computation node. In Hadoop, data is processed where the data resides . The type of questions one Hadoop helps answer are:

  • Event analytics — what series of steps lead a purchase or registration
  • Large scale web click stream analytics
  • Revenue assurance and price optimizations
  • Financial risk management and affinity engine
  • Many other...
The Hadoop cluster or cloud is disruptive in data center. Some grid software resource managers can be integrated with Hadoop. The main advantage is that Hadoop jobs can be submitted orderly from within the data center. See below the integration with Oracle Grid Engine.

What types of data we handle today?

Human-generated data that fits well into relational tables or arrays. Examples are conventional transactions – purchase/sale, inventory/manufacturing, employment status change, etc. This is the core data managed by OLTP relational DBMS everywhere. In the last decade, humans generated other kinds of data as well, like text, documents (text or otherwise), pictures, videos, slideware. Traditional relational databases are a poor home for this kind of data because:
  • It often deals with opinions or aesthetic judgments – there is little concept of perfect accuracy.
  • There is little concept of perfect completeness.
  • There’s also little concept of perfectly, unarguably accurate query results –
    • Different people will have different opinions as to what comprises good results for a search.
  • No clear cut binary answers; documents can have differing degrees of relevancy
Another type of data is the machine generated data, machine that human created and that produce unstoppable streams of data
  1. Computer logs
  2. Satellite telemetry (espionage or science)
  3. GPS outputs
  4. Temperature and environmental sensors
  5. Industrial sensors
  6. Video from security cameras
  7. Outputs from medical devises
  8. Seismic and Geo-phisical sensors
  9. Other
According to Gartner , Enterprise Data will grow 650% by 2014. 85% of these data will be “unstructured data”, and this segment has a CAGR of 62% per year, far larger than transactional data.

Example of Hadoop usage

Netflix (NASDAQ: NFLX) is a service offering online flat rate DVD and Blu-ray disc rental-by-mail and video streaming in the United States. It has over 100,000 titles and 10 million subscribers. The company has 55 million discs and, on average, ships 1.9 million DVDs to customers each day. Netflix offers Internet video streaming, enabling the viewing of films directly on a PC or TV at home. Netflix’s movie recommendation algorithm uses Hive (underneath using Hadoop, HDFS,  MapReduce) for query processing and Business Intelligence. Netflix collects all logs from website which are streaming logs collected using Hunu.
They parse 0.6TB of data running on Amazon S3 50 nodes. All data are processed for Business Intelligence using a software called MicroStrategy.

Hadoop challenges

Traditionally, Hadoop was opened for developers. But the wide adoption and success of Hadoop depends on business users, not developers. Commercial distributions will have to mak

Servers running Hadoop at
Templates for business scripts are a start, but getting away from scripting altogether should be the long term goal for the business user segment. This has not happened yet. Nevertheless Cloudera is trying to win the business user segment, and if they succeed they will create an enterprise Hadoop market.

To best illustrate, here it is a quote from Yahoo Hadoop development team:
“The way Yahoo! uses Hadoop is changing. Previously, most Hadoop users at Yahoo! were researchers. Researchers are usually hungry for scalability and features, but they are fairly tolerant of failures. Few scientists even know what "SLA" means, and they are not in the habit of counting the number of nines in your uptime. Today, more and more of Yahoo! production applications have moved to Hadoop. These mission-critical applications control every aspect of Yahoo!'s operation, from personalizing user experience to optimizing ad placement. They run 24/7, processing many terabytes of data per day. They must not fail. So we are looking for software engineers who want to help us make sure Hadoop works for Yahoo! and the numerous Hadoop users outside Yahoo!”

Hadoop Integration with resource management cloud software

One such example is Oracle Grid Engine 6.2 Update 5. Cycle Computing also announced an integration with Hadoop. It reduces the cost of running Apache Hadoop applications by enabling them to share resources with other data center applications, rather than having to maintain a dedicated cluster for running Hadoop applications. Here is a relevant customer quote “The Grid Engine software has dramatically lowered for us the cost of data intensive, Hadoop centered, computing. With its native understanding of HDFS data locality and direct support for Hadoop job submission, Grid Engine allows us to run Hadoop jobs within exactly the same scheduling and submission environment we use for traditional scalar and parallel loads. Before we were forced to either dedicate specialized clusters or to make use of convoluted, adhoc, integration schemes; solutions that were both expensive to maintain and inefficient to run. Now we have the best of both worlds: high flexibility within a single, consistent and robust, scheduling system"”

Getting Started with Hadoop

Hadoop is an open source implementation of the MapReduce algorithms and distributed file system. Hadoop is primarily developed in Java. Writing a Java application, obviously, will give you much more control and presumably improved performance. However, it can be used with other environments including scripting languages using “streaming”. Streaming applications simply reads data from stdin and write their output to stdout.

Installing Hadoop

To install Hadoop, you will need to download Hadoop Common (also referred as Hadoop Core) from The binaries are available from Open Source under an Apache License. Once you have downloaded the Hadoop Common, follow the installation and configuration instructions.

Hadoop With Virtual Machine

If you have no experience playing with Hadoop, there is an easier way to install and experiment with Hadoop. Rather than installing a local copy of Hadoop, install a virtual machine from Yahoo! Virtual machine comes with Hadoop pre-installed and pre-configured and is almost ready to use. The virtual machine is available from their Hadoop tutorial. This tutorial includes well documented instructions for running the virtual machine and running Hadoop applications. The virtual machine, in addition to Hadoop, includes Eclipse IDE for writing Java based Hadoop applications.

Hadoop Cluster

By default, Hadoop distributions are configured to run on single machine and the Yahoo virtual machine is a good way to get going. However, the power of Hadoop comes from its inherent distributed nature and deploying distributed computing on a single machine misses its very point. For any serious processing with Hadoop, you’ll need many more machines. Amazon’s Elastic Compute Cloud (EC2) is perfect for this. An alternative option to running Hadoop on EC2 is to use the Cloudera distribution. And of course, you can set up your own cluster of Hadoop by following the Apache instructions. Resources

There is a large active developer community who created many scripted languages such as HBaseHivePig and others). Cloudera, has a supported distribution.e it even easier for business analysts to use Hadoop.

Friday, June 01, 2012

Cloud Slam '12, May 31 2012. A personal view

One of the first nice impressions is to park your car in South San Francisco Convention Center at about 20 yards from the main entrance. Once inside no crowded registration booth, in less than a minute, one has the badge. This is small, but most quality cloud personalities and cloud  people are there.

Cloud Slam is already four years old, but it now started with an actual exhibition booth. There is nothing like  face to face meetings, as our Gigaom gurus observe.

Khazret Sapenov
Khazret Sapenov is soul of Cloud Slam. He comes from what I call the  Bohemian cloud movement from Toronto, Canada. This is an analogy to the Paris'  Montparnasse - which became famous at the beginning of the 20th century, referred to as les AnnĂ©es Folles (the Crazy Years), when it was the heart of intellectual and artistic life in Paris.

One of the other group members is  Reuven Cohen who founded Enomaly, now part of Virtustream and  has widely read cloud column in Forbes magazine

Reuven Cohen
 I sat next to Khazret in the presentation of Michael McCarthy, VP of Cloud Services at IBM. You can see his presentation on streaming live, day 2, 1:00 pm video. The first thing noticeable he said he was a 26 years IBM veteran. IBM still builds people's careers, in a classic way. When Michael joined IBM, Sun Micro was 3 years old, Google, Yahoo, Facebook and did not exist yet. Now he was talking to an audience of self employed and startups. He admitted IBM is the not the only cloud technology available to IT world wide. They are open to 3rd parties. He subtly addressed CIO fears. Did you think about security? Do you know which applications to migrate to cloud first? He said IBM RC2 (Research Compatibility Cloud) tests all the applications recommended as cloud compatible
Michael McCarthy

 His slides gave a feeling of a solid offering, the sort of you-can't-go-wrong-with-IBM. He was talking about reducing costs. This is a classic IT TCO-reducing paradigm is not applicable, IMO,  in the cloud. In a cloud, one must increase the profits. Can IBM tell a customer how much money they loose each day they do not acquire a an IBM cloud solution? So if a service provider - corporate or public, spends double to make 4x profits, it makes sense, doesn't it? I notice the slides have an IBM copyright 2009. Perhaps a newer version (2012) of this presentation can add these important details

The most impressive presentation was from Michael Lock , VP Google Americas. It is memorable and decided to embed it here

Watch live streaming video from cloudslam at
Michael Lock
Google wants to transfer the incomparable cloud consumer behavior to the corporate world. Why Apple is a great company? Because of Steve Jobs? To some extent, but they are great because they cater to the consumer IT. There are now more than 6 billion (billions!) mobile subscriptions and 1.2 billions are 3G subscriptions.There a total of 2.2 billions subscribers at higher speeds, so the mobile 3G plus are more than a half .The cloud is not about moving old applications to the cloud (which IBM considered as a centric issue), but about creating new applications taking the advantage of the social and other services available out  there. He gave an example of the Android translator. It enables us to make conversations  in languages we don't know, like say Spanish and Japanese. The app is only accurate 65% of the time now, but the human translation is accurate only 73% of the time Mr. Lock talks a language of the cloud that 90% in the audience identify  with. Corporate IT can not ignore all this massive ownerships of PDAs and Androids and iPad and iPhones. They will have to accommodate them.They have to assimilate in their businesses the productivity potential.

 To illustrate what Michael Lock says, I visited Cirrus Insight booth. You can manage apps right from the familiar Gmail user interface on Google . Watch the video

Mike Hoskins
Michael Hoskins  is the GM of Pervasive Software Big Data , a new spin-off with only 15 people. He showed a demo of RushAnalyzer

Their product is an essence a  humanized tool for "normal" people  to use Hadoop (known for its' complexity)  for data extraction

Cloud Slam originated from Cloud Computing Group , founded and moderated by Khazret  and many of the concepts now widely adopted were first debated there. The group has over 161,000 members. So compared to other big cloud shows, It is the true blue-blood cloud "serial provocateurs" event.. It's the real thing, not the blah-blah-thing. 

Blog Archive

About Me

My photo

AI and ML for Conversational Economy