The Technology Directors' Office: March 2010

Sunday, March 28, 2010

From Hero’s to Process: The IT Growth Challenge

I recently finished reading How to Castrate a Bull by Dave Hitz, one of the founders of NetApp. I found one concept in particular to stick in my head: as companies get larger they naturally gravitate away from needing hero’s to needing more process. Dave tells the story of how during the early years of NetApp, one support engineer went above and beyond to satisfy the needs of a customer. The support engineer took a call late in the evening and determined the customers NetApp unit needed to be replaced. The engineer went into manufacturing, took a new unit off the line, hopped a flight to the customers’ location and worked all night to install the new unit. Once word got back to Dave about the engineer's heroics, the engineer was nowhere to be found. Turns out, the engineer was asleep in the customers’ parking lot inside the van he had rented. Dave points out that at the time, this engineer was hailed as a hero but that now he hopes this kind of thing never happens again. Why? When NetApp was a small company its customer base was generally small organizations with very little gear. They themselves where very nimble and acted on failures in their enterprise very swiftly. The catch is, given their small size and the amount of IT gear in their enterprise, they usually did not encounter a large number of failures. When failures are rare, organizations can afford to rely on hero’s to step up in those rare occasions where duty calls. As enterprises get larger, the amount of IT gear and the complexity of the environment that gear supports also grow. Failures become more common place and the degree to which you can rely on hero’s is diminished. For example, for simplicity say a piece of IT gear has an average failure rate of once every 365 days. A small organization with only one of these devices can expect a failure once a year. A larger organization with 365 of these devices can expect a failure every day! Larger IT enterprises need process to deal with these repeated occurrences, not hero’s to step up on rare occasions. Hero’s come and go, process is permanent. Dave's point is that the same behavior that won the engineer and NetApp accolades from the small customer years ago, would likely lose business in a large enterprise

This evolution from needing hero’s to needing process is one of the most subtle yet important changes an IT leader must make as his or her enterprise grows larger and more complex. Tearing down technical fiefdoms, redefining reward systems, purposefully slowing down to gain control and potentially even making staffing adjustments is a daunting set of goals. This evolution inside of IT is a natural part of organizational growth and if handled correctly can be a very exciting managerial challenge.

Saturday, March 20, 2010

Considerations for Implementing SIP

In a recent Computer World article I discussed how the real cost savings associated with VoIP in the corporate enterprise comes from the implementation of Sesion Initiation Protocol (SIP). With most organizations paying very low long distance rates, justifying VoIP investments on long distance savings becomes a real challenge. SIP on the other hand allows you to get rid of local loop charges through a reduction in the number of PRI's you have dedicated to voice at remote and field locations. There are big savings to be had in SIP but in order to understand what your net savings will be you need to think through some of the following basic factors.

SIP Provider Pricing

Unlike traditional TDMS services, pricing for SIP is far less standardized. Different carriers will charge for the service in different ways. For example, PAETEC charges a flat fee for the bandwidth of the MPLS connection used to provide the SIP service plus $1.50/"virtual DID". On the other hand Time Warner Telecom charges more on an actual call volume basis. There really is no "best" way for a company to price its SIP service, what works best for you will depends on the dynamics of your particular organization. Make sure you understand the pricing scheme of the SIP providers you are evaluating and match that model to your organizations typical call dynamics.

Understand Where You Can Truly Remove Services

The implementation of SIP in your network is not the end to your PRI based TDMS service. Several factors will likely limit your ability to remove PRI's in certain areas. For example, make sure you understand for which of your remote locations your SIP provider can provide the virtual DID's. Not every SIP provider can provide local numbers in all markets, for those markets not covered by your SIP provider you will need to maintain your PRI service to maintain that locations local DID's. Also, consider services like faxing that may be riding on PRI's through FXO ports today. What will you do about this? How many local lines will you need to maintain for systems such as building security, fire alarms etc..? It may be that the need to maintain support for an antiquated technology such as faxing (my opinion) may hamper your ability to leverage modern network designs for financial gains. You may find yourself using your SIP implementation as a good time to rationalize the continued support for older communications mediums.

How Much Bandwidth Will You Need At Remote Sites?

Your current data link at each of your remote sites is running at some percent of utilization today. What will that be once you add voice traffic over the circuit? You need to make some estimates about a locations call volume combined with the type of voice compression you plan to use in order to determine if you will need to increase the bandwidth on your locations data pipe. This is significant because you could find yourself simply shifting costs around, from PRI to Data circuit. In order for SIP to represent a true cost savings at your field site, make sure the overall net spend decreases after PRI removal, additional services for local lines and potential increased data bandwidth.

Understand the Best Technical Architecture for your Organization

The technical details of a network design for SIP can be daunting. At the end of the day, there are really two major types of designs you will likely choose from. In a centralized model, all remote locations receive SIP trunks and thus call trunks through a centralized connection to the SIP provider. The SIP provider will deliver some type of connection to your corporate HQ or data center where you will connect to them through a CUBE. You can see an image of this here..
A distributed model requires each location to peer with the SIP provider. The distributed model provides a slightly more robust design in terms of backup and redundancy but at a higher CAPEX and OPEX cost. You can see the distributed model diagram here..

This is not an exhaustive list of considerations but thinking throh these things will help you to solidify your thinking on both the technical and financial benefits of SIP in your organization. I recommend that for evaluating SIP providers and doing your cost/benefit analysis, you utilize a third party who specializes in Telco analysis. It is critical that for you to make good decisions about SIP that you understand your current state. The SIP market is still young and services are still often tailored to your specific needs. Make sure you truly understand the net of it all before jumping in.

Sunday, March 14, 2010

Zen and the Art of Converged and Efficient Data Centers

What a few weeks it has been. Over the last month I have been fortunate enough to meet with the CTO from Frito-Lay, the CTO from NetApp, attend a joint SAP and NetApp executive briefing at SAP’s North American HQ in Philadelphia and tour two world class IT support centers (PepsiCo in Dallas and Dimension Data in Boston). I have spent days pouring through technical documentation geared towards architecting my organizations next generation data center centered around 10GB Ethernet, Virtualization, Blade Systems and efficient energy practices. So this post is probably as much for myself as anyone else, meant to simply document of few of the key learning’s I have taken away from the flurry of activity over the last few weeks. Hey, maybe someone else will find it interesting too?

PUE & The Green Grid

In a one on one conversation with Dave Robbins, the CTO of NetApp Information Technology, he asked what my data center space providers PUE is. My response was an inquisitive, what? PUE stands for Power Use Efficiency and is a measure of how effectively a data center is using its energy resources. Essentially, PUE is the amount of electricity used by a data center for cooling and mechanics divided by the actual IT load. Efficient Data Centers run at around 1.6. The concept of PUE and its measurement was created by an organization known as The Green Grid and you can find all kinds of great resources at their web site. This is an excellent tool for you to use when negotiating power costs with a Hosting provider. You should know their PUE and insist that you will not pay for their inefficacy. You can also find a cool tool for PUE calcualtion at 42U.com.

It is Time to Converge

The introduction of 10GB Ethernet in Data Centers (and perhaps even more important, lossless Ethernet) has truly created an opportunity to collapse Ethernet and Fiber Channel networks in the Data Center backbone, cutting huge costs in Fiber Channel infrastructure. 10 GB Ethernet and Lossless Ethernet serve as enablers for protocols such as FCoE and FIP which allow Fiber Channel frames to be encapsulated and carried across Ethernet backbones. There are a few watch outs when adopting FCoE that you need to be aware of. First, make sure your storage vendor has a CNA (Converged Network Adapter) that supports BOTH FCoE and other IP based traffic. Some of the early “converged” adapters only support FCoE, not much real convergence there. Put some effort in understanding Cisco’s current support of FCoE and Fiber Channel Initialization Protocol (FIP) in their Nexus line of switches. You will find some good resources here. The details of this are too complex for me to go into here but suffice it to say, you need to think long and hard about your data center switch layout in order to get full FCoE support across your 10GB backbone. Also, remember that lossless Ethernet or data center bridging are keys to FCoE success but are fairly new. So, when you hear people tell you they knew someone who tried FCoE a couple of years ago but found it lacking, take it with a grain of salt.

The FUD around Cisco UCS

Let me get one thing out of the way upfront, the Cisco Unified Computing System (UCS) is sexy. Cisco’s tight relationship with VMware, stateless computing and a seemingly end to end vision for the data center combine for a powerful allure. Competitors such as IBM and HP are quick to point out that their blade center products perform the same functions as Cisco’s UCS but with a proven track record. In general, these claims are true. I have been exposed to some competitive claims against the UCS that where simply meant to plant the seed of Fear Uncertainty and Doubt (FUD) in the mind of technology managers. What if Cisco changes their Chassis design, is your blade investment covered? UCS is meant for VMware only (not true). The list goes on. I have been heavily comparing the Cisco UCS to IBM’s H series Blade Center. I had originally convinced myself that the difference between these two offerings was all about the network. Cisco’s UCS does offer some interesting ways to scale across chassis and provides some great management tools. For a mid-sized organization, the ability to scale across chassis becomes less important however when you can get a concentrated amount of compute power inside one or maybe two chassis. Some new technology coming from IBM in the form of their MAX5 blades is going to allow for some massive compute power inside a two socket blade. If you are a large organization planning on adding many UCS chassis, the networking innovations in the UCS likely will fit your needs well. For a mid-sized company, consider getting more compute power inside fewer chassis by using some hefty blades. This not only reduces your need to scale across many chassis, it also helps lower your VMware costs. VMware is licensed by the socket so fewer sockets with more cores on blades with higher memory capabilities ultimately drives down your VMware licensing needs. Also, before you completely convince yourself that the Cisco UCS has a strong hold on the networking space in the data center, spend some time understanding IBM’s Virtual Fabric technology. This offers similar features to the VIC cards in the Cisco UCS. The point is this, don’t be immediately sucked in by the sexy UCS. Cisco has come to the blade market with some cool innovation and in some circumstances, it will be exactly what you need. Make the investment in time to really understanding competing products. Avoid FUD in all directions.