Interview with Monika Madlen Vetter, Ph.D.

Monika Madlen Vetter, PhD,  works at  University of Chicago · Department of Ecology and Evolution. Marco Mambelli and I interviewed her as part of our "Going out of the door" strategy to learn what R Statistical project users do. The product we delivered in Beta is Bosco R We wanted to discover the data scientist, but Dr. Vetter is much more than just a Data Scientist. I always look around for what fascinates and her story is amazing. She explains  science in simple words but the genomic studies of plants is a complex science, normally hidden to the casual observer.  

How important is the usage of statistics in your research?

MV: Very important!
My work has two components: conducting an experiment in the laboratory or green house and later analyzing the gathered data. I use R for most of my statistical and graphical analysis. More specifically, I use a method called genome wide association (GWA) study to identify the genetic loci controlling the interactions between plants and bacteria. 

What is a GWA?

MV: A genome-wide association study aims to identify genes controlling the variation in a trait.   Most traits, however, are complex – most diseases for instance. Many genes contribute to hypertension or diabetes in humans. Statistical methods help elucidate these complex genetic traits. Scientific knowledge progressed a lot since the first human genome was sequenced in 2000. We have begun to understand the genetic basis of many diseases using genome-wide association studies.

You work on the innate immunity of plants. We live in a world where particle physics dominate the headlines, this is not a widely covered theme in the media

MV: *laughs* Yes, if I talk about my research, many people react with surprise when realizing that plants actually DO HAVE an immune system. Plants cannot run away to escape pathogens, which constantly threaten their survival and reproduction. They do not have antibodies and we therefore often describe the plant immune system as simple. Yet, it does a pretty good job, which is evident by a green world around us.
I investigate the evolution of innate immunity in plants. Immune receptors of the plant model species Arabidopsis thaliana recognize molecular signals, which are unique to bacteria. The perception of these signals triggers a general and effective defense response but is also accompanied with reduction in plant growth. My current work identified several genetic loci, which control these growth changes upon stimulation of the immune system. Another project investigates how plants shape the bacterial community within their leaves.

What is your biggest challenge on a daily basis?

MV:  [thinking a bit]. Perhaps the biggest challenge is to stay focused on the problem and one specific research questions. So many interesting possibilities and questions distract me. I guess having many new thoughts and a creative mind is also what makes a good scientist.

Does your work move in the direction pharmaceutical research?

MV: I am especially fascinated by how plants modulate their immune responses and growth in response to biotic and abiotic environments but it does not directly aim at developing an application or product. Basic knowledge does lead to innovation on the long run. A crop breeder might use this knowledge to make a plant more resistant against pests while maintaining yield for instance.

What motivated you to select this career?

MV: I like to get to the bottom of things and I was interested in plant biology early in my childhood. My parents would have liked me to be a physician but I could not get around cutting someone open – even for the prospect of helping them. I was interested in lichens instead. Three totally different organisms come together to create a form of life with properties which none of them has by itself. How cool is that!

All creatures struggle to let in nutrients and vent wastes. We know a lot at human level. What about the plant level?

MV: We declare waste as unwanted materials but one’s waste is another’s necessity. The photosynthesis of plants produces sugars from water and sunlight. What they release – their waste so to speak, is oxygen, which is crucial to most other life forms on the planet. Otherwise plants do not consume living matter so they do not have unwanted by-products they would need to get rid of.
Heavy metals can be a problem in plants. They either need an excretion system or a high tolerance when growing in soil contaminated by heavy metals such as cadmium, arsenic, mercury or lead. If the plant accumulates those metals, humans can harvest and depose the plants to clean soil. However, it can also be a problem to human health if we eat these plants. Some plants accumulate heavy metals to get resistant to herbivores. There is a lot of fun research ongoing.
In terms of nutrients, plants struggle just as much as other organisms. Their growth will be limited if they lack certain minerals. You might know that from a ‘sad looking’ plant on your windowsill. It might not get all nutrients from its regular water supply. You need to fertilize or re-pot it, too.

How U of Chicago stimulated your work?

MV: The University of Chicago supplies fantastic research facilities, helps with bureaucracy and provides a stimulating research environment. My co-workers come from diverse  (biological) disciplines, which leads to different viewpoints and lively discussions.

What would be in your opinion the biggest achievement as a scientist?

MV: *laughs* Perhaps I am not idealistic enough to think that my research can solve grandiose  humanity problems. However, my research has relevance to food safety, pathogen resistance and stability of yield in crops. On a smaller scale I am happy to share my passion about biological processes with students or lay people. 

August 8, 2013 in Chicago

Highly Functional Autistic young people employment plan and performance computing


This is a plan to create in a very short term quality employment for Highly Functioning Autistic (HFA) young people, leveraging my knowledge as Product Manager in High Throughput Computing at Open Science Grid and Center for High Throughput Computing. It shows science can have the most unexpected benefits

This  article is on my mind for over a week, day and night while asleep and while awake The idea is very clear, but, how do I put it best in words to reach every reader and make her a supporter?

There is a newer blog entry, that complements the description of the plan
The New Dandelion: a plan for highly functional autistic employment 

Why HFA employment?

There are three million people in US who are autistic. This is 1% of the population. The people with disability number is much larger, in excess of fifty millions.

We incorporated the deaf mute in US productive society  starting from 1885. This is the stage we are now with HFA. We discovered they can be productive, in many cases, more productive than mainstream employees like me.

Who supports HFA employment?

Absolutely everybody who is a mensch. Just read at any time Autism Speaks. The lobby for HFA employment has the intelligence, know-how and creativity and includes business leaders, movie stars, academia and top politicians. But there is no plan of action.  

What about the Federal Government?

Senator Tom Harkin
Senator Tom Harkin team has published 3 days ago, on September 2013 the report High Expectations: Transforming the American Workforce as the ADA Generation Comes of Age. Harkin recommended four steps to increase employment opportunities:
1. Increase support for high school students as they transition into the workforce
The age when school districts are required to start transition planning should be lowered from 16 to 14, and state vocational rehabilitation agencies should be required to allocate 15 percent of their budgets to transition planning, Harkin said. In addition, he urged the creation of more employment and internship opportunities during school years. 
2. Improve the transition of the ADA Generation into postsecondary education and the job market
Young adults should be better matched with appropriate postsecondary opportunities, whether certificate programs, career and technical training opportunities, community colleges, or two- and four-year degree programs.
3. Eliminate requirements in disability benefit programs that discourage young people with disabilities from working
The Social Security Disability Insurance (SSDI) and Supplemental Security Income (SSI) programs need to be modernized by eliminating the requirement that young adults with disabilities cannot and will not be able to work in order to receive benefits. "This requirement creates disincentives for young people with disabilities who are compelled to choose between forgoing benefits and launching a career, or forgoing work in order to receive benefits and attend to their daily support and healthcare needs," Harkin said. 
4. Leverage employer demand, correct misconceptions about employing people with disabilities, build strong pipelines from school to the competitive workforce, and establish supportive workplaces
"At the same time the business community is looking for skilled employees, we are graduating members of the ADA Generation who have had more access to and more success in education than any previous generation," 
 This is a lucid document, high level. But how can we implement, practically, the recommendation from this study?

Private organizations for  HFA employment

I described them in this article:  Specialisterne is well known,. They require one million dollars to educate and place fifty HFAs in jobs in a year. They claim they want to reach a goal 100,000 Autism Spectrum Disorder employees in US alone. A simple calculation shows they will need two billion dollars in donations (about 50% of the special education budget of California) and it will take between one hundred years to two hundred years to reach this goal. Simply its too expensive and doesn't scale

High Throughput Computing (HTC), User Experience (UX) and the  HFA employment

At the beginning one does not see the connection. Working as a Product Manager consultant for Open Science Grid , Bosco Job submission scheduler and HTC  made me click. We can have - using the principles of these technologies - a much more efficient, fast and natural way to engage the HFA population in productive employment.
High-throughput computing (HTC) is a computer science term to describe the use of many computing resources over long periods of time to accomplish a computational task.
We can reword perhaps like this

Using student tutors for HFA in colleges and universities we can create medium term a network of professionals that will act as advocates and mentors, to change the social fabric of Corporate America. Instead of companies like  Specialisterne placing fifty HFA with 1 million dollars, we can place exponentially tens of thousands of HFA without any additional cost. The salaries and bonuses of the professionals  who are able to make productive a new breed of employees pay for the system. These professionals are the former students who worked as tutors for HFA colleagues

If employing an HFA is our mission (we call this a job in HTC), then any tutor who gets a position in good organization becomes a resource, in this case a processing node. 

According to the National Center for Education Statistics  in 2009-2010 there were over 11,000 degree granting colleges, 4-year colleges, 2-year colleges and non-degree granting post secondary schools.

If we can have, for illustration sake, 10 tutors per college, we have a system of 110,000 processing nodes to help HFA to graduate and apply to jobs.

But most important, we created a network for HFA to find and apply for jobs. I wish we could incorporate the entire spectrum of HFA, but this is not possible day one.   We are not even able to define what a High Functional Autistic  is, in context to the ability to be accepted in higher paid , better quality jobs. But with 110,000 HFA students and tutor pairs, we can perform some Big Data research

A Story-driven Data Analysis as the name says, we build a story first, For example we can follow step by step the story of an HFA from the moment of intake to after a year or two of being  employed. We analyze data in this context, and we don't just crunch numbers expecting the data to speak to us.

This is just one possibility.

The Experiment

David, my son, enrolled at Sierra College in Rocklin, California  Fall semester in 2012 in Mechatronics. He has an ability to build Legos from drawings. He failed every single class test. But he had as a professor an extraordinary man; Steven Gillette.
Prof. Steven Gillette
He wrote:
David clearly has intellectual and social abilities that lend themselves to academic and to employment opportunities. He displayed a strong desire to work independently, particularly with the programming. He was able to work with other students, and seemed to enjoy his interactions with them. I believe that David can succeed in academic and technology endeavors, with sufficient levels of support. Academically, that support must provide him with focus on timely submittal of labs and quizzes. I found I had to directly request his lab reports and quizzes. Without direct oversight he did not submit the assignments required by the class"
Alta California made a total exception and offered funding for a tutor student for David. Roman Sitaruk, an outstanding young men and good student, became David tutor. David passed the Spring class grades up to 80%

Now in Fall 2013, David has two tutors students: Corbin Boyle and Scott Boughton.

The importance of students as tutors 

Tutor students are in the same age group, most time friendship bonds are created and will become the most important assets for an HFA certificate holder seeking employment. We need to pay for tutor salaries as students. They are distributed in any Community College and University throughout the nation. 

They are the one who bring the class User Experience to a level HFA, will comprehend. 

As I write this article, we have perhaps hundreds of thousands of excellent students, who will be delighted to tutor HFA and be paid for it, although there is a desire to do good that overrides other motivations. Right now they are unused, How can we use this potential as soon as possible.

The New Dandelion Project

The more college and universities we can engage, the better. In Open Science Grid, we call the organization that provide resources, the "VO", virtual organizations. Every college and university is a potential VO, Inside of them have tens of thousands of unused resources, the potential tutors

We need to create a non profit organization which will :
  1. Prepare programs ready to implement in each University or College
  2. Raise funds to start the process, step 1 is to pay for student tutors
  3. Lobby for network to make corporations hire on a priority basis each graduate who was an HFA tutor at school
Once hired, the HFA new employee still needs a mentor. They need someone to tell them what to do and to reward their efforts in ways they understand. Very few HFA will become 100% autonomous. Their bosses must act as mentors in the workplace

Peter Drucker: "We grow by feeding the opportunities and starving the problems"
We can make senator Harkin recommendations a reality

Summarizing in a few slides

A project promoted by Ahrono Associates

Tweet October 25, 2012 : People like California - including our billionaires - because we are all Asperger, or indistinguishable from them 

Waiting for the Apple Interview

My dear Apple Computer,

David Ahronovitz, my son is ready for an interview at Apple Store in Galleria Mall, Roseville, California. So far no one called.

Note added October 22, 2013

David started working in Best Buy. Than you Best Buy, you are a great company!

Apple never reacted. Apple made me think: “It is the certainty that they possess the truth that makes men cruel.” said Anatole France, Nobel Prize literature 1921. Arrogance will never replace Steve Jobs legacy

Who buys a blog?

Someone asked me whether my blog is for sale.

Years ago, I met a self made man from Southern Africa who said: "Everything I have is for sale, except my wife and children."

This made me curious and I goggled "who buys a blog". This is what I found

Warner Bros buys "40 Days Of Dating"

A wild auction has ended with Warner Bros acquiring screen rights to 40 Days Of Dating, a running blog compiled by New York designers JessicaWalsh and Timothy Goodman.
The blog was an experiment in romance by two friends who found themselves single at the same time. Walsh is a serial monogamist, while Goodman has commitment issues and dates a lot of girls. They dated exclusively for 40 days to try and overcome their relationship issues. The couple was bound by a set of rules that included the requirement they see each other every day and go on dates three times a week and take one weekend trip together; they also saw a couples therapist each week and agreed to a no outside booty call regimen for the 40 days. They bared all the results each day as they tried to make it work. The blog went viral and became popular. Even before the 40-day term was up — read the blog if you want to know how it ended — producers were all over this one. 

 Disney Buys 'Mom' Blogs" publisher

Disney paid between $40 million and $45 million for Babble, according to a person familiar with the matter.

Bollywood: Anurag Kashyap’s online script shopping, buys a blog post to make a film

Kashyap has bought the rights of a story Solanki posted on his blog. The three-part story is titled Hisar Mein Hahakar. Solanki blogs in Hindi. Here is Solanki's blog .This  is how he describes it:
Yeh Gaurav Solanki Ki Kitab Hai. Isme kuch chizee hai jo kahi aur nahi ho sakti. Aap aisa bhi samajh sakte hai ki unhe kahi aur jagah nahi mili'  (This is Gaurav Solanki’s book. There are things in it that cannot be found anywhere else. You may also think that these things never found a place anywhere).

Why buying blogs?

From The Golden Circle creator, Simon Sinek
I dug up an article of Danny Sullivan from more than a decade ago, in February 2003. The title is "Google Buys Blogging Company - But Why?"
News broke earlier this week that Google has purchased Pyra Labs, the company behind the popular weblog creation tool and Blogspot, a weblog hosting service. So far, Google is staying quiet about what it hopes to gain by such a purchase, leaving plenty of people speculating.
 Is any one speculating today? It it clearly a proof that people can play chess seeing a few more movements ahead of the crowd . Google is one of them.

And looking at there are many  of blogs that have a latent potential to create equity for the right buyer. may one day look like this

Apple and Autistic employment

The Experiment blog article I wrote
is a follow up to the many articles  in this blog about Highly Functional Autistic (HFA) employment and the project Dandelion.  See Silicon Valley and Autism. A creative approach. 
Ahronovitz eyes robotic future - Rocklin  from August 2012, mentioned that David,  my HFA son and  a Whitney High grad will attend Sierra College's Mechatronics Program.
On August 27 2013 see the US Office of Federal Contract Compliance Programs (OFCCP) published as law the Final Rule to Improve Job Opportunities for Individuals with Disabilities.

OFCCP Final Rule; 7% of the total work force doing business with the  Federal Government must be IWD's
This involves almost all Fortune 5,000 corporations in USA
David attends - in parallel to Sierra College - the Transitions program, run by Rocklin Unified School District . The Special education program here is superb, RUSD was  one of the main reasons for us to move out of Bay Area to relocate 100 miles north east in Sacramento greater area. David has a bus pass, goes to movies with other colleagues in the program.and these are mini-miracles.

Christina Connelly runs a program named WorkAbility helped by   Holly Gotwals, special teacher extraordinaire In this program, David will need to pass an interview and if successfully he will be employed for two hours a week. His salary will be paid by WorkAbility, not by the employer.

How Apple (not) Responded 

I thought Apple Store in Galleria Mall Roseville is a good fit. It is in the bus route. They do recommend disabled people from their web site to
"Provide your name and contact information to Apple’s Employee Relations department at  or 408-862-1160 . Your request will be responded to as soon as possible."
Two emails to the address above got us no reply. Calling the number, we got a recorded message saying this number will return calls only to people who need accommodations as disabled. No one returned the messages.

This  is one experience those of us  - facing the task to employ highly functional autistic young persons - know only too well. For the public image, they are making themselves up as defender of people with disability. But internally they provide no resources, no points of contacts.

Note that  all our team wanted as an interview at the Apple Store.

Best Buy

A kind junior manager at the Roseville, California store agreed to forward an email to human resource about David. 

It appears these Best Buy Human Resources, like Apple Human Resources deal with candidates that are hiding any imperfections, as most of us do, when we apply for jobs. Unfortunately, the young candidates in the autistic spectrum can not be fake or pretend to be what an employer wants to hear.

The last two courses David takes are Electronics and a lab on computer repair. Why he needs to work in a supermarket?

All of us, some time in our life had lower rated jobs, and so did Ashton Kuchner and Jack London. We are in the normal spectrum, and we are able to handle a situation like this. We know deep in our hearts this is temporary, and we have aspirations and hopes that make us feel rich, even if we don't have a penny. We can mobilize inside us the will to get out of it.

Autistic kids can not. As a team, we must also observe and elect work that dignifies them, because by themselves, they can not make the choices we have as mainstream people.

Apple, do you hear us? 

You happily sell Autism Apps on iTunes for iEverything. These kids who bought these apps are now growing and are young people. They are exactly what Steve Jobs was: a round peg in a square hole.

My vote for HPCwire Readers Choice Awards 2013

I voted for The UberCloud Project and the UberCloud Compendium of Case Studies

HPCwire is the best HPC publication on the planet, and has most diverse readership and interests, difficult to reduce to a common denominator.

As I looked at the list of nominees, I ask myself:  is this or that nominated solution / product easy to sell to other people who need a similar remedy? To answer, I looked at the eight bullets from my previous blog entry How to sell performance computing in 2013

See also the Figure 2 from the entry in the previous paragraph; I tried to mentally fill in the quadrants be answering on behalf of each nominee these questions

  1. Who needs our solution or product?
  2. Where these people live? (Demographics)
  3. What do they normally do? (Behavior) 
  4. What are their goals in life? (that we can help them achieve somehow)
The UberCloud  (UC) project answers best these questions. Almost any IT organization wants to manage and profit from a cloud operation. Here are the details:

1. Offer Open Ended Performance Solutions, not products

UC offers HPC as a Service consisting of ISV software, HPC Cloud platform, and HPC expertise, to industry end-users.

2. Listen

The end-users provide a detailed profile about their application, software, and expertise. This profile is the basis for the perfect match with resources and expertise 

3. Know who you want to please

UC has a user-friendly 22-step end-to-end process and guidance that end users agree.

4. Be different: resist the hype temptations

UC's sole goal is to help the industry end-users port and run their applications in the HPC Cloud, and overcome the roadblocks, no other promises.

5. Offer the best there is in technology, wherever you find it

In the UC HPC Experiment, about 50 HPC Cloud providers and about 60 software providers participate (among the 575 organizations). An automatic matching algorithm provides the perfect match for the end-user, with the best suited resource and expertise.

6. Make User Experience a priority

The 22 steps of the end-to-end process of accessing and using remote Cloud resources and the team building around the end-user's application are designed for an optimal user experience.

7. Generate and collect "Aha" testimonials

Each end-user's team is writing a case study at the end of their team project (so far 112). These case studies describe the application, present the benefits of using HPC Cloud resources, and describe lessons learned and recommendations.

8. Seed for follow up business

After the end of their HPC experiment end-users are encouraged to continue with business relationships with their previous team members.

There is no other nominated product that can so easily be adopted by an HPC consultancy. The system providers (IBM, HP, etc,) each sell their brands of HPC-aspirin, expecting to calm down every headache a customer can have.

They should look carefully at UberCloud approach.

Big Data Mysticism

In 2000, I compiled an internal Sun Microsystems document. We noticed that the shares of companies on Silicon Valley increase with the numbers of CPUs they have in their grids. I called  this "Faraone curve" from the name of the first CEO of Gridware Inc, the company I was a one of the co-founders, and acquired by Sun.

Fig. 1: Faraone Curve
When we came out with this curve, many company men laughed; Hah, Hah, Hah.

"Haughtiness is a terrible trait that we must flee from."
(Rabbi Nachman, Likutey Moharan I, 10)
Great mystical thinkers say we should always focus on the inner intelligence of every matter, and
"we must bind ourselves to the wisdom and intelligence that is to be found in each thing. The inner intelligence is a great light that clarifies all our decisions and illuminates our actions and deeds"
These are words from "Rebbe Nachman, great-grandson of the holy sage The Baal Shem Tov in Likutei Moharan , printed in Ostrog the Province of Volhyn, under the rule  of our master, the exalted and pious Czar Alexander Pavlovitch in the year 1808."

David Ungar is a seeker for inner intelligence as he investigated how to program manycore CPUs
The obstacle we shall have to overcome, if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers. This assumption is deeply embedded in how we think about programming. The folks who build web search engines already understand, but for the rest of us, to quote Firesign Theatre: Everything You Know Is Wrong!
Those who know, are not surprised that Google, Facebook, Amazon and Yahoo became the largest Silicon Valley companies and they perfected the technology of managing and extracting value from the largest number or processors available.

This generated the hunger for storage and Big Data. According to the latest gurus, Viktor Mayer Schonberger and Keneth Cukier
core, big data is about predictions. Though it is described as part of the branch of computer science called artificial intelligence, and more specifically, an area called machine learning, this characterization is misleading. Big data is not about trying to “teach” a computer to “think” like humans.
So what it is?
  1. Big data gives us an especially clear view of the granular: subcategories and submarkets that samples can’t assess.
  2. It permits us to loosen up our desire for exactitude, the second shift. We don’t give up on exactitude entirely; we only give up our devotion to it.
  3. A move away from the age-old search for causality. As humans we have been conditioned to look for causes, even though searching for causality is often difficult and may lead us down the wrong paths. In a big-data world, by contrast, we won’t have to be fixated on causality; instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening.
Google has a white paper by Kazunori Sato,  An Inside Look at Google BigQuery . Why Google uses Big Query rather than Hadoop / MapReduce ?
MapReduce was only a partial solution, capable of handling about a third of my problem. I couldn’t use it when I needed nearly instantaneous results because it was too slow. Even the simplest job would take several minutes to finish, and longer jobs would take a day or more. ...
So to discover the inner intelligence of things,  scientists do not have to go through the torture of setting up a Hadoop installation. This is not simple. This is not trivial. This is an obstacle to analyze creatively.

No wonder Cloudera, the dean of Hadoop companies came out with Cloudera Search . Unlike Google, "Cloudera has contributed its innovations and IP around the integration of Apache Solr and Apache Lucene with CDH back to the respective upstream projects."

Other tools that must change are the resource managers. Right know all assume they know every node where they run, but soon, they will not

Once these easy to use tools are accessible, Big Data will explode. Other will have the technology, ready for us to use, the same way Goggle has all we need to email and start an IT center.

When Rebbe Nachman dictated  to his favorite student, Reb Noson one of this book of secrets, Rebbe Nachman stopped and asked: "if only you knew what you are writing". Reb Noson replied; "I really have no idea at all." Rebbe Nachman then said to him; "you don't know what it is that you don't know."

"By eliminating haughtiness, our wisdom is repaired."
(Rabbi Nachman, Likutey Moharan I, 10)

An interview with Dr. Frank Wuerthwein. Can high energy particle physics change the way we do mainstream big data?

Frank Wuerthwein, PH.D Cornell 1995, teaches at University of California in San Diego (UCSD). He is an expert in particle physics new phenomena at the high energy frontier with the CMS detector at the LHC (Large Hadron Collider at CERN).

He is "developing, deploying, and now operating a worldwide distributed computing system for high throughput computing with large data volumes. In 2010, "large" data volumes are measured in Petabytes. By 2020, he expects this to grow to Exabytes." He is a key management member of  Open Science Grid (OSG)

He will present at ISC Big Data'13 in Heidelberg, Germany, September 25, 2013 a talk titled Dynamically Creating Big Data Processing Centers – a Large Hadron Collider Case Study . We chatted about it and below is our conversation. Here are the slides:

Dynamic Data Center concept 

Distributed Human Resources

M (Miha): What is the most significant thing about the paper you present at ISC Big Data'13?  

FW (Frank Wuerthwein):   The audience will be half university and half industry.  I am trying to present the logic of what we do in particle physics.. To describe the 30,000 ft picture and why we are doing that. When I start with the "why do we do what do", I start up with the question; "what is the most valuable asset we have?". Usually in Supercomputing it is having the biggest computers money can buy. But for us the most valuable assets are human. This is the most important commodity we have. We did a study in CMS on how much money we spent in salaries, compared to cost of computing resources .  We spent about five times more on people than in computing resources per-se.

So how do you maximize output, given that human effort is our most valuable commodity? I look where are all these people living. What are the organizational principles of our collaboration? How can technology support these principles, and thus support the productivity of the collaboration, rather than provide barriers?

M: Do you refer to positive user experience? What do you mean?

FW: It is not so much about user experience - of course we satisfy perhaps up to the 90% percentile but not more, -  it is about how to organize humans around the resources for maximum productivity. CMS includes 2,000 people or so across 180 institutions in 40 countries. The computing  infrastructure needs to both support centralized operations for the good of the collaboration as a whole, and reflect the distributed nature of the human capital.

M: How can you manage for distributed and centralized at the same time?

FW:  We must be able to dynamically (1) add resources because people all over the world may temporarily have access to resources they can contribute, or want to use for their local needs. We must be able to (2) dynamically switch allocations of resources from global to local and vice versa. There must be incentives to donate resources to the common good. And (3) it must be possible to use software tools the collaboration provides on resources that are locally controlled. The allocations must be ultimately under local control in order to allow rapid changes without bureaucratic overheads. Humans who are local should not have to wait for a centralized resource allocation decision. Otherwise, they will hold their local resources, rather than sharing them freely when they don’t need them.

M: So what you say what is centralized is the workflow, and everybody must fit into it, where ever they are?

FW: Yes and no. We have both centralized and local workflows. However the switch should be seamless between local and centralized workflows. One should be able to donate resources and get them back in a very short time when needed.

Fig. 1: A design  illustrating how raw data originating from CERN are processed at peak and at steady state
OSG = Open Science Grid, FNAL= Fermi National Lab, SDSC = San Diego Super Computing Center

Dynamic Data Centers

M: How does this tie up with the idea of Dynamic Data Centers?

FW:  This is something the future will bring.  It’s a natural next step given that we have already created dynamic compute centers. We will show an example of what we have actually done. There is a bigger picture. It answers the question; "Why do we have a distributed infrastructure in the first place?"  People sometimes ask. "Why don't you have all resources in one place on the planet?" It does not make sense to have a single big building with all the computing resources just like it doesn't make sense to  bring all the people into one place. Having a distributed architecture allows more resources, people and computers, to be more effective participants in the global CMS collaboration. Am I making sense?

M:Very much so. What you are saying is that are not only the machines, but the people too are distributed. This human distribution is as important as the machines distribution. Right?

FW:  Even if it were possible to put all computer resources in one place, it would be not desirable, because the skilled people do not live in one single place. You  want to add more resources to the system, without having to ask a central authority for the green light. You should be able to come with a rack of hardware and say: "Now I want to add this to the global system, while I can still use my rack  in whichever way I want to use it locally. But when I am not using it, you can have it." The transitions in and out of the infrastructure must be seamless.

M: So what is the difference between High  Throughput Computing (HTC) and what you propose?

FW: HTC is the technology that it makes this possible. I am trying to give people a sense on why is HTC a predominant  computing paradigm in our field  (high energy particle physics). Some people ask "Why don't you use the powerful supercomputers you have access to?" "How can it be that these distributed resources are better for you?". I will address these questions in my talk. In essence, the short answer is that a distributed system maximizes the human productivity. Then the ability to connect resources from  all over the world  is a tremendous advantage. This kind or organizing resource, at least in our field, is highly desirable.  Once you accept that, then the idea of Dynamic Computer Centers makes a lot of sense.

M: What could be a more formal  definition of what a Dynamic Computer Center is?

FW:  I would describe everything you need to have a Dynamic Computer Center. You need disks (very large ones for huge data sets), you need networked access to these to stage in the data, you need an output configuration where you place your processed data. All of these can be created out of the existing resources on the fly.

M: Why calling it 'Dynamic"?

FW: I mean you can have anytime a resource and use it, without a need for  pre-installed software on it. or use a very limited amount of such a software. So when I go away, the resource is "clean" for anybody to have it.

M: Is your work at UCSD on processing CMS data with Gordon supercomputer as a node in OSG, a good example of a Dynamic Data Center? Can one repeat this model in a different location?

FW:  One can replicate this experience, because it is a matter of (1) making the basic API's available and (2) making it easy for the hardware operator to support the APIs It is essential to use the mechanisms of log in into supercomputer ( with all its cooling and electricity consumption, etc)  for the distributed access (this is ssh). We export the batch system to the outside world, by interfacing with the ssh. We used Bosco for this. Then we needed an interface to move data in and out. We used gridftp. For applications we use things that work with the http:// protocol. And  so on. We can provide all the technical details to anyone interested, but the principles are very simple. Everything under the hood is abstracted away so we can mix resources as they become available. We only expose the HTCondor submission, that everyone - in our world - understands.

M:  Would you talk about it at ISC BigData'13?

FW: I will stay away from technical details and talk about the high level ideas I described in this interview.

M: Who do you think can benefit most from the Dynamic Data Center concept you helped develop?

FW: Once we implemented and deployed this dynamically "open" architecture for CMS, we realized that it is easily open in a second dimension. Not only is it open towards participants within the CMS collaboration from all over the world, but it is also easily made open across the entire range of scientific endeavors and  the basic principles transcend particle physics. Biologists, engineers, mathematicians, chemists, sociologists, etc.,  all benefit from the basic structure. Once you have this deployed for us, it can easily be opened up also for others.

And so today my biggest fascination is from finding "new customers" outside of particle physics.

ISC Big Data,  Heidelberg and  Southern California

M: You are born in Heidelberg area in Germany. what made you come to US?

FW: I came originally to California as a post-doc at Caltech for a couple years. But when you are born in a cold country, and then you lived in Southern California once, it is very hard not to want to come back. I like surfing, because this way I became a "native" in San Diego.

Miha's Note: Frank has the surfing whether forecast on his web site

I  never finished  my university degree in Germany. I was supposed to be in the US only for one year. In that year with a scholarship to Cornell, I met my wife, and well, I never went back to Germany. I got my PH.D at Cornell, then to Caltech for a couple of years, then I went to MIT for four years, then I came to UCSD (University of California, San Diego). I crossed the USA three times, coast to coast.

M: ISC Big Data '13 is first ever Big Data conference - in Europe - event of such magnitude.  Half of the attendants are from commercial world. What do you have to tell them?

FW:  I don't know. When I heard about this conference, I thought it is  a very interesting idea. I would expect to learn, more than I bring in. It is not obvious to me how much we have in common . I want to discover what they do, they will discover what we do and then define what common ground we share. Some of the most interesting things are not the talks, but the conversations during dinner, coffee breaks  or in the corridors . Does it make sense?

M: It makes a lot of sense. This is how TOP500 was born a few decades ago.

FW: I want to have the maximally broad exposure, so I can have a maximum of avenues  they can engage with me. If I talk only about Dynamic Data Centers, I can miss out ten other conversations which are worthwhile having.  For example we have this dichotomy between structured and unstructured data. On one side you have Oracle like structured data, and to other extreme data unstructured, where you don't even know what to look for until you actually look for a specific purpose. I want to position my hundreds of Petabytes of data from particle physics in this continuum, I don't see this as a either / or. There is a lot of grey between.

M: It seems the amount of Data you process in CMS - Compact Muon Solenoid experiment at the CERN physics laboratory) -  is way above what they accustomed today in commercial world

FW:   To my knowledge, there are only two other places on the planet that have our volume of data.  The National Security Agency (NSA) and Google. I am just guessing, as I don't know exactly how Google manages its data.

M: What should happen to consider ISC Big Data'13 a success for you?

FW:  One of my collaborators will attend the conference to see what attracts him as a career: Academia  or Industry. We want to discover how the commercial world values our work. If he has a clear picture, this would be a great success. For me personally, I am always looking for “new customers” as well as inspiration for doing things a different way.

M: How do you feel going back to a Heidelberg   conference  in the country you are born, after so many years in US?

FW:  I never actually gave a conference talk in Heidelberg,  In all these years, I only got back once to give a seminar talk. So for me, the  Heidelberg University is very prestigious, and in that sense, this is very special occasion for me.

