Monday, June 2, 2008

IT Operations Needs a "River of News"

Over time, user interfaces for a class of tools tend to settle on a common paradigm. User expectations and vendors' copycat nature form a self-correcting loop that settles somewhere around a comfortable middle. For a fun example, try to spot the differences between the Yahoo and AOL homepages. User interface paradigms in the systems management world have followed this same herd mentality.



Monitoring has always been the dominant feature of systems management, so it is of little surprise that the flavor of monitoring has dominated the user interfaces and dashboards of the major enterprise systems management tools. Show me my things. Show me the state of my things. Show me some rollup numbers about my things and their states. For answering questions about things (usually servers) and the state of those things, this classic monitoring paradigm tends to work quite well.



But this point of view falls short in an area that is gaining importance, visibility into IT operations processes. As systems become increasingly distributed and IT operations moves from a back of the house function to a revenue producing core competency, visibility into process is all the more important. Who has carried out what actions? What actions were performed on what machines? What's the history of a set of packages as they have moved from development to production? How can your organization know when a complex update process has been completed? How do you know when changes have been made outside of approved change windows? The inventory and state paradigm of traditional monitoring tools doesn't help you much when you are asking these kinds of questions.

A new kind of monitoring paradigm is needed to answer these questions. The most promising concept I've seen as of late comes not from the systems management world but rather from the blogosphere. Dave Winer first popularized the concept of a "River of News" and the concept has applicability here. Simply put, Dave describes the flow of information that comes into his feed reader to be more like a river of information going by his door than a set of messages being delivered to his mailbox.




The activity that takes place within IT operations can similarly be likened to a river. Today, most of that river of activity takes place in the shadows and little, if any of it, is captured by a central system. Usually, the only way to find out about these events is by word of mouth or person-to-person email chains. Anyone who is more than 1 or 2 degrees of separation away from the event is usually flying in the dark.

So how would a river of news style tool for enterprise systems management processes work?

1. You need to create the river. All of the scripts and tools you use to build your deployment artifacts and manipulate your environments need to dump events into the river. As a side benefit, creating the river is an incentive to enforce the rule that changes must only be made through change management tooling.

2. You need to create filters for the river. Filters allow you to view the river from a certain point of view (like a package, user, node, environment, etc) or only view events that happened between certain points in time. You're going to need both common views as well as the ability to setup and share ad-hoc views.

3. You need to setup notifications. You can't watch the river at all times and, unlike the blogosphere, there are some pieces of news that you just can't miss. You need to be able to set traps that watch for events or a series of events that match particular patterns. When those traps are hit you then have to make sure the right people are notified and sent the relevant information.

4. You need to introduce management dashboards and auditing reports. Keep the senior managers, bean counters, and compliance auditors happy and your life will be happier.

This whole idea is still a pretty fresh one. In upcoming releases of ControlTier's ReportCenter, we are going to be introducing these features and seeing where they take us. Any and all comments or suggestions are more than welcome.

4 comments:

Berkay said...

Hi Damon,

I'd say the river of news does exist in IT operations, more typically referred as manager of managers, where events from various sources are consolidated. Netcool is the most commonly used tool for this in NOCs.

In typical operations centers, these type of tools are used as the primary tool more commonly then monitoring tools of the type you describe that shows states, summary charts, etc.

However, your point is well taken. The answers to the type of questions you've outlined hardly ever pushed to these systems, which contributes to the gap between the developers and the operations. Operations typically stop at "tell me when you're going to mess around so I know not to react to alarms".

Some organizations capture the type of information you mention in tools like Remedy as part of the process flow, but in this case it is mostly manual.

There are also change and config management tools such as Voyence, ConfigureSoft that tracks the changes in the network devices and servers and provide event streams (they also integrate with MoMs)

However, I'm not aware of any solution that focuses specifically on the problem you've identified. Looking forward to what you come up with ReportCenter!

Damon Edwards said...

The manager of manager types tools are only as good as the info they roll up. Since most of them roll up classic state and inventory data, their usefulness for providing visibility into change activity is limited. Not saying they couldn't do it, but I've yet to encounter a real world implementation that includes the change activity data I'm talking about.

Berkay said...

MoMs like Netcool are event agnostic. Typically they don't pull in data, but receive events from whatever source that sends to them so they have no technical limitation to receive whatever event one would like to send to it.
I fully agree that you rarely see the type of events you mention being sent to the MoMs, but I think the problem is more cultural/organizational than technical.
There are very few people that have the crossover skills. Developers who have operations experience/knowhow and operations people who have development/deployment experience is rare. Further there are organizational silos enforcing this divide.
Having primarily operations background, I follow this blog with high interest. A lot of what you guys have to say is very different and fresh perspective for operations.

Damon Edwards said...

"There are very few people that have the crossover skills. Developers who have operations experience/knowhow and operations people who have development/deployment experience is rare. Further there are organizational silos enforcing this divide."

Very, true. Very, true... hmmm I think I'll have to write a post about that. :)