home

Evaluations

Search

Company
Solutions
Request A Quote
Become A Trainer
Resources
News
Contact

Industry News


 

Maintain a Resilient IT Infrastructure  
January 23, 2008 - By Bob Yang, Catherine Anderson and George Westerman 

When so many business processes that determine your success rely on technology, you'll want to make sure your IT team remains resilient and failure free. 
 
During the last decade, the increasing pace of business—and the growing dependence on IT for its operations—has forced organizations to invest in technology to manage and protect their critical information assets. But keeping today's businesses operational and resilient requires more than leading-edge technologies—it requires a significant and continual investment in the people and processes that operate and support these technologies.  
 
As dependence increases, the potential for an IT failure to disrupt business operations becomes a serious management concern. Organizations must find a way to reduce exposure to IT risks, decrease costs and build greater capacity for IT to drive business innovation. 
 
The Man Behind IT  
 
Despite all the leading-edge technology, it's ultimately people who are responsible for maintaining a resilient IT infrastructure and business continuity. Managing multiple job sites, upgrading or patching systems and storing and securing data all takes manpower. Meaning the root cause of IT failure frequently lies in process and skills issues.  
 
According to a recent study conducted by Symantec, researchers at the University of Maryland and MIT, 53 percent of IT failures were linked to process issues involving asset management, testing, change control and patching. In addition, more than 40 percent of IT failures analyzed were tied to gaps in end-user expertise and product knowledge.  
 
Master the Basics  
 
The solution lies in establishing processes for these regular and routine processes. Processes enable workers to treat all components the same, reducing effort and potential risk that would be entailed if each component is managed differently.  
 
And, with today's often tumultuous workforce turnover, process is needed to fill in the gaps left in the knowledge base. In the event of an employee's permanent absenceýýor even a temporary one such as sickness or vacation—a lack of processes can prove devastating if information is not passed along accordingly.  
 
A telecommunications carrier recently learned the value of processes the hard way. Without protocols in place for rotating and reusing backup tapes, the wrong backup tapes were erased and prepared for reuse. IT staff realized they selected the wrong tapes only after they were already cleared. Establishing and following set processes for rotating backup tapes would have saved the data, which was never recovered.  
 
Processes also must be established for disseminating information across teams of all types. The size and geographic location of IT departments often impacts the flow of communication; however, sharing best practices and lessons learned with cross-functional groups is vital for increasing productivity and eliminating further IT headaches.  
 
For instance, take a healthcare provider with three major sites. After two sites were infected with a virus, alerting the third site of the pending danger would have prevented an infection from the same virus six months later. Why was information not shared? The answer: no communication processes were in place to share experiences and learning across locations. 
 
Reacting to the Unexpected  
 
Processes provide two key benefits to IT personnel responding to incidents. First, established processes leave behind an audit trail of changes and activities that can be referred to when determining the source of a crisis. Second, depending on the needs of each individual situation, personnel can customize pre-determined protocols instead of creating new ones on the fly, saving significant time, effort and potential for error.  
 
The processes define a checklist of critical tasks to be performed and questions to be asked, allowing people to focus their attention on identifying additional tasks rather than trying to remember all of the basics. When unexpected events occur, it's nice to know that certain standards will be kept and staff can spend time effectively addressing the most critical and unique elements of the problem.  
 
Recently, a financial institution rolled out a weekend upgrade to their cluster environment. As the roll-out progressed, a configuration issue cropped up. Although the institution had processes in place to rollout an upgrade to their environment, there was no protocol to follow for an unsuccessful roll-out. The problem was compounded because the key systems architect was on vacation at the time. Recognizing the potential for problems with the upgrade would have enabled the institution to better prepare for and respond to the issue by having the resources available to support problem resolution in a timely manner. 
 
More than Words on a Page  
 
Even when processes are in place, organizations struggle with getting IT staff to follow established procedures. Unfortunately pages of notes or thick binders with step-by-step processes for handling routine or crises situations will not guarantee success.  
 
For many IT departments, processes for handling change are either not comprehensive enough or organizations do not have the right pieces in place to keep them resilient. In fact, of the 53 percent of cases caused by process issues, 11 percent were due to poor execution rather than poor or missing processes.  
 
Although there are no processes that can adequately address all incidents, ITIL and Six-Sigma practices provide solid starting frameworks and disciplines to implement and reliably utilize processes in a variety of circumstances. Adopting such practices will also help mitigate many incidents.  
 
While processes can play important roles in handling unexpected events and ensuring mistakes don’t happen, it's people that help ensure the right steps occur. For example, a recent study conducted by IDC showed that well-trained teams were twice as likely to properly protect their PCs from security threats and were 60 percent more likely to successfully complete backup jobs. With IT failure occurring more than 40 percent of the time from lack of IT staff skill and training, the need for proper instruction is evident.  
 
 
It All Comes Down to Culture  
 
Part of creating a resilient infrastructure is building a high performance culture that can manage change effectively. In addition to training, holding IT staff to the highest operational standards, such as those held by other critical business operations within a company, will help streamline the implementation of proper procedures. Much like the manufacturing industry, which tolerates little or no downtime, IT organizations should strive to minimize its level of tolerance for downtime by adhering to stricter policies and procedures.  
 
In order to successfully make this paradigm shift organizations should do the following: 
 
• Recognize the value and need for investing in training, certification and expertise amongst staff.  
 
• Provide Six Sigma-like level of attention to IT operations around process definition, documentation, performance measurement, and continuous improvement. 
 
• Focus on understanding the true root cause of issues rather than settling for convenient explanations, separating near term incident management from longer term problem management. 
 
• Recognize warning signs and learn from near misses. Become preoccupied with small failures as a signal of deeper process or skills issues that should be addressed before larger failures occur. 
 
• Build a culture of resilience so that everyone in the organization can react appropriately when inevitable problems occur. 
 
Although there will never be a process for every situation, IT teams can eliminate the root cause of failures—and identify the cause of failures more easily—by establishing and following a standard set of protocols and equipping people with the knowledge to manage and adapt them properly. Only then can organizations build a culture and skill set that addresses the issues standard protocols cannot.  
 
 
Bob Yang, senior director,Symantec Services. Catherine Anderson, Smith School of Business, University of Maryland. George Westerman, Center for Information Systems Research, MIT Sloan School of Management.

 


ACARRA LEARNING is a wholly owned subsidiary of  ACARRA LLC