Friday, August 31, 2007

Dev sees the world one way while Ops sees it a different way

There is an old saying that if all you have is a hammer, everything begins to look like a nail.

Much of the dev2ops problem comes from Dev seeing the world one way while Ops sees it a different way. When it comes to developing, deploying, and supporting what the business owners would call an application, these two world views often spectacularly collide.

To the dev folks, their view of the world is all about the build (and its related dependencies) along with data/content requirements (schema, standing data, catalog/user data, web assets, etc...). They see the application from the inside looking out, fully aware of its internal components and inner workings. The lifecycle they care about begins with business requirements, continues through a set of related builds, and then on to an integrated running service. Their goal is to then promote this service from one dev and test environment to another until it becomes someone else's problem (that's usually the ill-fated handoff to ops). Supporting, and in many ways enforcing, this point of view are the day-to-day Dev tools: software version control systems, build tools, IDEs, requirements management tools, etc.

To the ops folks, their view of the world is all about the physical architecture, the network, and the box. Applications? Ops doesn't want to know about the the inner workings of the applications. They see applications from the outside looking in, as just another set of files that are part of the puzzle they manage. Environments? Sets of distinct boxes wherever possible (sometimes divided by datacenter, VLAN, racks, etc...). The lifecycle Ops cares about takes a much different form than the one that Dev cares about. Quality of service, capacity, and availability are the driving factors for Ops. Ops establishes the needed hardware and software platform profiles in a lab setting and then uses those profiles to build and refresh the different environments they control (generally staging, production, and disaster recovery). The preferred Ops toolsets, usually EMS systems (Tivoli, Opsware, OpenView, OpenNMS, BladeLogic, etc..), support and enforce this point of view.

Unfortunately, aligning these two world views is just as difficult as aligning their respective toolsets.

It is all too common that users (and by extension, vendors) from both groups want to force their tools and point of view on the other group. But no matter which direction the management mandate ultimately comes from, the truth lies just beneath the surface. Dev wants nothing to do with the Ops tools and Ops wants nothing to do with the Dev tools. They see little value in the other group's tooling of choice because they think it will add additional complexity to their lives and won't help them to complete their core jobs. Because of these dramatically differing points of view, the conversations in the trenches will often revolve around how it's "the the other group's fault" and why "they just don't get it".

Fundamentally, the Dev and Ops groups are at odds because there is a difference of primary concerns. Dev doesn't want to know about networks and boxes. In most situations Dev really don't even really care where something is deployed until they get a late night phone call from Ops to debug a mysterious problem. Ops, on the other hand, just wants "operations ready" application releases and are frustrated that they never seem to get that.

Sadly, too much corporate blood has been shed over a collision of differing points of view that are both equally valid.

2 comments:

Alex said...

Application deployment is a good case that shows how the differing dev and ops view points guide how they apply their associated toolset to a problem.
A developer might envision using an SCM to store all the release artifacts, building an automated tool that checks them out (perhaps using a tagging convention tied to release plan) to the target machines, and then using a build tool to finish the installation. They use SCM and build tools to conduct their daily work and understand their strengths so it's no wonder they see them useful outside of their typical context.
An ops person, on the other hand, may be accustomed to copying files from a central file server to the target machines using tools like rsync for distribution. They may script a utility that relies on a file directory structure of an NFS server to maintain release sets and use sed and awk commands to customize files after distribution. Those are familiar tools and approach to an ops person.
Just examples of how toolsets can reflect and enforce a view point.

claymation said...

Dev and Ops certainly do see the world differently. Maybe it's too much to ask for the two groups to work together to support software releases, when dev and ops have essentially opposite incentives: dev wants to release as many new features as they can, while ops strives to maintain stability and uptime.

Maybe what's needed is a mediator between the groups--a team that understands the software and the infrastructure. Google calls it Site Reliability Engineering, but I don't think the concept is limited to web shops.

If you're interested, I've outlined the case for SRE here:

http://daemons.net/~clay/2009/04/02/engineering-and-operations-bridging-the-divide/