dev2ops: delivering application change: January 2008

Thursday, January 31, 2008

Nick Carr's "The Big Switch" - Review of Part 2

You may want to read my review of Part 1 first.

Part 1 of Nick Carr's "The Big Switch" was much more relevant to this blog so I'll be brief in discussing Part 2...

While Part 1 was a well thought out analysis of economic history and it's relevance to the trends we are seeing today, Part 2 just seems like a scare piece that you'd see in a major city newspaper looking to cook up some ratings.

Part 2 has lots of doom and gloom about the social ills of living our lives online and how there is a new breed of digital raiders (that's my term) who are getting rich off all of our online backs. Nick is very quick to pile on the examples of the negative impact of our new connected world, but somehow manages to miss almost all of the examples of how this new connected world improves millions of lives socially and economically.

It's my non-expert opinion that "The Big Switch" is really a collection of unfinished essays that were put together to make a book. Buy the book for Part 1, but stick around and read Part 2. If anything, Part 2 will give you great material to stir up some sensationalistic conversation at your next cocktail party.

Wednesday, January 30, 2008

Beware of the forklift

The idea of the "forklift" upgrade/replacement is an attractive option for high volume hardware environments. The idea of adding or moving things as one pre-configured mass is compelling, but shouldn't be applied overzealously to other domains... especially application deployment.

I remember when word first got out that Google just tossed servers in the trash on the first sign of failure. Then it was news when it leaked out that Google had pre-configured units of infrastructure - servers, switches, racks, etc. - that they just "popped" into place when new capacity was needed. Many found this new paradigm shocking, but it did shine an early light on the forklift concept.

Jump forward to today and the forklift idea has matured even further. Sun is now shipping the Modular Datacenter (formerly known by its sexier name, Project Blackbox), a cargo container with an entire pre-configured datacenter inside. Add power, connectivity, and water (for cooling) and the thing hums. Need more capacity? The truck arrives with another unit.

But the forklift paradigm isn't limited to just hardware these days. With the rise of virtualization, these same concepts are being applied to the software domain. Check out this recent press release from VMware. They are advocating the forklift model for production deployment: tinker with an image until you like it, snapshot it, copy it a bunch of times, and then move the whole lot of server images in masse to the production environment. It sounds like a reasonable theory, but in practice this often leads to more problems than it solves.

"Snapshot and copy" (the forklift of the software world) works well for OS provisioning. But when it comes to application provisioning, it's just not so simple. The key to any deployment process is the configuration and coordination of actions across software components (which naturally means across OS instances, physical or virtual). This has to be done even if you've utilized the snapshot and copy approach. This means you are still writing your own complex scripts or doing a lot of actions by hand.

I've seen several situations lately where the promise of virtualization and it's snaphot and copy paradigm has been embraced by an organization but in time they've only found themselves with a more complex and difficult environment to manage. What worked well in the small scale eventually ends up creating the Service Monolith hairball that Alex was discussing in his previous post. The primary culprit? The idea that the snapshot and copy paradigm replaces the need for tooling that allows you to build any OS or integrated application from a specification driven, fully automated process.

Don't get me wrong, I'm a big fan of virtualization. We use it extensively ourselves and build it into solutions for our clients. It solves a lot of problems, just not all problems. So remember, when designing you application deployment process bring the rest of your tools along with your forklift.

Tuesday, January 29, 2008

Service Monolith: Proposed OMC Anti-Pattern #2

Most software services developed and operated these days are increasingly more heterogeneous and distributed, and therefore, complex and hard to setup and manage. On top of all this, the rate of change seems to only go up. These trends can lead to an anti-pattern I've recently documented at the Open Management Consortium called "Service Monolith":

Complex integrated software systems end up being maintained as a single opaque mass with no-one understanding entirely how it was put together, or of what elements it is comprised, and how they interact.

I describe some common rationale for using this anti-pattern and some of its consequences. There are positive design patterns to avoid or mitigate the anti-pattern, too. I find one interesting solution that seems to be becoming more ubiquitous and sort of side steps the deeper problem: virtualization. Virtualization is one strategy for dealing with producing these complicated systems. Using this strategy you get the whole conglomeration working on one host, cobbling the pieces together using the preferred recipe, then "freeze dry" it as an operating system image that you can instantiate on another host. With the image in hand an organization can "stamp it out" as needed. This technique seems to work for small scale deployments but I haven't seen the approach work for maintaining large scale environments. Is this the sign of another emerging anti-pattern?

Edit (Damon 1/31/08):
This is a great quote by Kris Buytaert on his first booting of a vmware instance:
"And thus we joined the era of transferring an unmanagable image that everyone will copy around wile slightly modifying things and never placing them in version control . hence ending up one day with something nobody knows how we got there."

Wednesday, January 23, 2008

Configuration Bird Nest: Proposed OMC Anti-Pattern

Sometimes it is more interesting (and entertaining) to talk about things not to do. Let's face it, we have more experience doing things wrong and learning the hard way. For these reasons I decided to begin the OMC's Design Pattern work with Anti-Patterns.

I had a clear "favorite" in mind when it came to writing the first anti-pattern. I call it Configuration Bird Nest:

A network of circuitous indirections used to manage configuration and seem to intertwine like a labyrinth of straw in a bird nest. People often construct a bird nest in order to provide a consistent location for an external dependency.

The bird nest metaphor seems so apropos as it conveys the idea that the nest cradles something important, typically something crucial to supporting an application. The pattern is so typical and manifests itself inside so many IT disciplines, I look forward to hearing about its many forms.

The (unscientific) results are in... ESM = Monitoring

So I'm sure there was a wise person who was quoted as saying something like "If you ask people what is important they will tell you one thing. But observe their behavior and you'll often learn something very different". I had a bit of an experience with this principle at BarCampESM in Austin this weekend.

Ask any of the attendees (and their was decently broad representation from both big vendors, little vendors, and independent consultants) what "Enterprise Systems Management" (ESM) is all about and you get some great answers about topics like business service management, resource optimization, system control, business alignment, etc.

But as I watched the presentations and joined the lively discussions, there was one constant topic from which all other topics flowed... good old monitoring. It quickly became obvious to me that, if you boil things down to its core, the term "ESM" is all about monitoring. Everything else is an add on or an upsell! It makes since on some level since, like the old adage says, "you can't manage what you can't see". But it still came as a but of a surprise how much monitoring was the end point, if not the starting point, of most trains of thought.

For better or for worse, it appears that today's common world view goes something like this: How do you do Business Service Management? Correlation of monitoring events. Control? That's what you do in reaction to monitoring events. Resource optimization? That's when you setup your monitoring to gauge how well you run things. Business alignment? That's when you make sure the output of your monitoring tool is organized according to business concerns.

Monitoring. Monitoring. Monitoring. If you aren't a monitoring tool's "goesinta" (input) or "goesouta" (output) you really aren't ready to fit into today's ESM ecosystem. Hey, I'm not saying that there is anything intrinsically wrong with that. It's simply something about which we have to be honest with ourselves

Now, if you would excuse me, I have to go integrate with some monitoring tools.

Wednesday, January 16, 2008

Nick Carr's "The Big Switch" - Review of Part 1

I just finished reading Part 1 of Nick Carr's new book, The Big Switch. Let me just say that I'm both excited and disappointed thus far.

First, the excited part...
In my opinion, Nick Carr is a great writer. His way of explaining the future by retelling the past from an enlightened point of view reminds me a lot of George Gilder's seminal Telecosm essays. Both make for a fascinating read. Part 1 is really a retelling of how electrical power generation went from being a custom piece of local infrastructure to a commodity delivered almost exclusively through a variable-rate utility grid. Through some very accessible storytelling, Nick makes an obvious case for how the same economic factors that drove "the big switch" for the electrical power industry are about to hit the IT industry. If you know someone who can't clearly see the SaaS/cloud/outsourcing writing on the wall... give them this book. The economic argument is highly compelling.

Now for the disappointed part...
The vast majority of this first part is about the electrical power industry. While it makes for a compelling argument for a certain economic model, it doesn't really say what the impact is going to be or what the response should be other than the fact that 1. most hardware vendors and on premises software vendors are toast 2. everyone better quickly figure out how to plug into the new grid. I thought that Part 2 was going to cover this... but nope. I skimmed through Part 2 and it looks like it is all about what the impact of the new grid will be on our personal lives, government, etc. It feels like I'm watching a compelling PBS miniseries and I missed an episode. Perhaps this book should have been called "Warning: a big switch is coming".

The other disappointing part is how he glosses over over the fact that while computer cycles are a commodity, computing services are not. Electricity is electricity but not all computing needs are the same. You can't just pick AC or DC and a voltage and tell everyone to fall in line. Nick jumps a little too easily between things like Google Search, Amazon Web Services, and Salesforce.com. Those are all very different things and, aside from the need for power, computing cycles, and bandwidth, they all have very different technical requirements. While power, computing cycles, and bandwidth are commodities that can be grid delivered just like electricity, the various levels of computing services that can be delivered on top of them are not. As I've said before, manufacturing is a much better model to study for how those services are going to play out.

But all in all, this has been a fun read thus far. For the uninitiated masses who don't really know what we do in the IT trenches all day long, this book should be a startling wake up call for how fast the world is going to change.

I'm fired up to read Part 2 as soon as I get a chance. I'll be sure to post my $0.02 when I'm done.

BarCampESM... the start of something interesting?

A couple of us from ControlTier are headed to Austin for BarCampESM. It promises to be an interesting gathering of scrappy open source and big closed source ("but oh we so badly want to be viewed as 'open'") vendors all focused on the systems management space. The idea from this gathering came from the Open Management Consortium and is being organized by folks from BMC, Zenoss, and Zabovo (including the tasty sounding free food). This is the first BarCamp of this type so where this all leads is anyone's guess.

For those unfamiliar with the BarCamp concept, it can best be described as a self-forming "unconference". The idea is to encourage as much brainstorming and networking as possible. While sessions are proposed on a wiki ahead of time, much of the actual structure forms at the start of the conference and while it progresses. Oh and they are free and open to all.

A fun side observation:
Check out the these videos of past general technology BarCamps in San Francisco and Austin. I guess stereotypes about cities exist for a reason!

Monday, January 14, 2008

New IT Management / ESM podcast is worth a listen

I just listened to Remonk's latest podcast, "IT Management":
http://www.redmonk.com/cote/2008/01/11/it-management-podcast-001-barcampesm-monitoring-the-cloud-2008-predictions-and-more/

The hosts are Redmonk's Michael Cote and independent consultant John Willis.

This podcast shows real potential. Hopefully they'll feature other voices on the show as it progresses and avoid the all too common trap of just rehashing big vendor press releases (especially given John's deep immersion in the Tivoli universe). If they deliver on the promise of doing real analysis on the day-to-day reality of the IT Management / Enterprise Systems Management world... this is going to be a great addition to all of our ipods.

The "what is and isn't SaaS" debate comes up again

This question seems to pop back up every couple of months:

Does a SaaS relationship have to be between two different companies or can it be between internal IT and business users within the same company?

Now it's Todd Biske and Joe McKendirck re-igniting the debate.

My view hasn't changed much since the last time this went around. Saas formalizes and standardizes what business and IT leaders have been discussing for years. The business agrees to specific business requirements and to a minimum platform (in most cases, a standard browser). IT (or external provider) agrees to meet those requirements with a minimum SLA. How the bill is paid or what corporate entity owns the datacenter are secondary logistical concerns and not central to grasping the importance of what the SaaS model brings to the enterprise.

Wednesday, January 9, 2008

Nick Carr's "The Big Switch"... the big dream?

Vinnie, The Deal Architect, has an interesting early review of The Big Switch the latest book from of Nick "IT Doesn't Matter" Carr. The book sounds like an interesting read and I just ordered a copy from Amazon.

Vinnie has a great take on how the reality of utility computing just doesn't match up with the dream the pundits are selling.

My comment on his blog sums up my $0.02 on the matter:

But I would add to your analysis that the root problem isn't scale. The problem is that the business visions have jumped light years ahead of their internal capability to "deliver". Simply put, the technical tooling and technical processes are woefully inadequate. Poke around in how the large outsourcers, managed services providers, or even large e-commerce and SaaS providers manage their infrastructure and applications and I think you'll be shocked at how manual and ad-hoc things really are.

A good metaphor to use is manufacturing. The business minds behind the utility computing push are talking about things that are the equivalent to "mass customization" and "just in time delivery" while the technology and process model available to deliver those dreams is little more than the master craftsman and apprentice model of the pre-Ford Motors days (or maybe an early Ford assembly line, to be fair).

There are some interesting things under the radar in the open source community like ControlTier (plug) and Puppet, but the general interest in the problem space seems to be limited to the relatively limited pool of engineers who have tried to scale significant operations and know that a better way is out there. Unfortunately most of the technical fanfare in this area seems to be focused around "sexier" things like faster grid fabrics and hardware vendor wars. In general, automating and optimizing technical operations is a neglected field.

And for the time being, forget about help from the big 4 systems management vendors. Their state of the art is not much more than 15 year old Desktop/LAN management technology wrapped with a new marketing veneer.

So this is a problem that isn't going away soon and is a real impediment to all who don't have high profit margins or large pools of cheap labor to throw at the problem.

Monday, January 7, 2008

"What does Bob want?" - an amusing lesson about figuring out what actually matters to the business

This is a great episode of Redmonk's People Over Process podcast:
http://redmonk.com/cote/2007/11/02/open-source-in-it-management-with-john-willis-redmonk-radio-44/

For anyone interested in systems management or automating operations this one is not to be missed. The interview is with John Willis (master independent Tivoli consultant) on the state of the enterprise systems management world.

The most impressive part is John's retelling of his conference favorite "What does Bob want?" story. This modern business fable (based on a true story) really should strike a nerve in anyone who has been involved in systems management implementations. We've all heard terms like "business and IT alignment".. but how often does it really happen? What may seem like a success to the guys in the trenches will seem like a letdown (at best) or failure (at worst) to the business leader who signed the check.

Or as the interviewer, Coté from Redmonk, puts it:
This is about "understanding what it is that the company wants to accomplish with the software, not just making the software do what it does"

Thursday, January 3, 2008

Teaming with the Open Management Consortium on a Software Operations Design Pattern Repository

After Alex's post yesterday on the need for design patterns, he contacted the Open Management Consortium (OMC) about setting up Design Pattern Repository specifically for those who are creating Operations solutions.

Whurley (fearless leader of the OMC) liked the idea:

"Well, as you all know this is exactly how we want the OMC to operate; community lead. So we have created a new workspace under the "Open Standards" section of the website called "OMC Design Patterns". Thanks to ahonor for the idea and for volunteering to kick things off and help manage the workspace. You can link directly to the workspace (from your blog or other sites) using the following URL:

http://beta.openmanagement.org/community/open_standards/omc_design_patterns

It will be very interesting to see how much adoption this idea picks up. I for one will be participating heavily in the workspace as ahonor has a great idea/perspective that I hope others join in support of."

Be sure to subscribe to that section of the OMC site and join in the discussion.

Wednesday, January 2, 2008

Where are the design patterns for software operations?

In the world of software development, application developers are accustomed to drawing from the wealth of design patterns that address common programming problems, codify best practices, and establish proven reusable solutions. There are several well known design pattern repositories that catalog solutions into various categories from fundamental ones described by the GangOfFour to architecture specific ones like J2EE Patterns, even ones for social organization. An Anti-pattern is a pattern that tells how to go from a problem to a bad solution. Design patterns help avoid re-inventing solutions and when combined together can form the basis of a problem solving "play book." When used effectively, design patterns become a common problem solving language and can lead to better written software.

But what happens after the code is written? For most organizations today, software operations - the acts of deploying, configuring, and operating software (and all of its related code and data artifacts) - is arguably as important as writing the software itself. If such an organization can't efficiently and reliably operate the software, the quality of the software will not matter. But if one looks for design patterns that codify best practices for automating software operations, nothing turns up. Where is the catalog of design patterns that address the problems encountered when managing environments of software deployments and the overall life cycle of the business service?

Anyone that has managed software operations for different organizations, will recognize the same kinds of problems and will often re-invent solutions that were successful in the past. Others that work closer to the bleeding edge will encounter problems that other groups will face later. If these problems could be discussed in terms of design patterns (or failures as anti-patterns), solutions and best practices for managing software operations would be more consistent across organizations.

Here are two specific problem areas that everyone can identify with:

Packages: Depending on the application and infrastructure, one will find multiple package formats in use. Operating systems use their own (eg, .rpm, .deb, .pkg, .msi, etc) and so do software runtime environments (eg, java, .net). Each format has its own way (to greater or lesser extents) of being created, extracted, and described (including dependencies). These differences lead to multiple package silos and administrative gray areas (cumbersome handoffs between dev and admin groups). It would be preferable to have a common repository that can host any kind of package type, and a homogeneous interface to controlling their life cycle (creation, installation and removal).

Services: At a certain level, one can view applications as a set of interacting long running processes. Again, depending on the application architecture, these processes might be standalone unix-style daemons, or windows services. Each service has its own way of being started or stopped, as well as a procedure for checking its current runtime state. Often times, shutting down a service is not a simple matter of just invoking a single command. Things go wrong at shutdown requiring other logic to figure out the next course of action. Besides coping with these differences, the deployment process is also difficult because change of runtime state and software package installation is intertwined. Software operations would benefit from a body of design patterns that described proven strategies to managing runtime state and a common model for describing these states.

Here is a sampling of general recurring problems in the world of software operations:

Complex application deployments: Applications are based on technologies from different vendors, are spread out over numerous machines in multiple environments, and use different architectures

Inconsistent management interfaces: Every application component and supporting piece of infrastrucure has a different way of being managed. This includes both how components are controlled and how they are configured.

Hard to scale administrative management: As the layers of software components increase, so does the difficulty to coordinate actions across them. This is especially difficult when the same application can be setup to run in a minimal footprint while another can be designed to support massive load and redundancy.

Incoherent life cycles: Applications are typically multi-tiered, where each tier may be on its own development track, uses its own release paradigm and requisite tools.

Generally, these problems are found in combination which means coping with them on the whole is a difficult challenge.

What's needed: Domain specific patterns for software operations

The body of existing design patterns can and should be used to analyze and solve some of the above problems. To make the design patterns more readily useful to software operations, we need a set of domain specific patterns. These patterns would be expressed in terms of concepts familiar to software operations groups (eg, package, service, process, node, etc) and would be geared to coping with typical problems they face (eg, various startup, shutdown strategies for services among many others). Ideally, these patterns can be composed into a system of patterns that help solve larger scale problems.

Developing patterns is a bit of an organic process but the most durable patterns are ones that have been proven over and over again in different contexts. The first step is to establish a repository to which various patterns can be contributed and a supporting forum where their merits can be discussed. Ultimately, the software operations community will find consensus about some of these patterns, thus establishing some common vocabulary and a basis for framework development.

External links:
PortlandPatternRepository
Hillside