COVID-19 operational costs IT ops predictive healing

Reducing Operational Costs and Improving IT Ops Productivity in a Post-COVID World

The COVID-19 pandemic caused a seismic shift in IT operations worldwide. It forced businesses to reevaluate ways to cut down expenses, the most significant among them being operational costs. Traditionally, IT operations teams used to huddle in person, but the pandemic forced many teams to work remote. Leaders were challenged with having to operate with skeletal staff, while also needing to keep enterprises running 24×7 with virtually no downtime. In this article, we look at the consequences of the pandemic on IT operations and the outcomes that can ensure that enterprises implement cost-cutting measures effectively.

Effect #1: Leaner teams

Most jobs in network operations centers are contractual in nature. The first step organizations took in 2020 was to downsize the variable talent pool. Leaner teams were expected to perform a similar quantum of work, which in turn emphasized the need to move to greater automation in IT operations management (ITOM).

Challenges: Attracting the right talent, doing more with less and introducing automation in business processes to make do with smaller teams.

Outcome: Cutting back on non-discretionary resource costs

To help reduce the contractual operations workforce, there is growing interest in intelligent tools for notification and escalation, artificial intelligence and machine learning (AI/ML) solutions that minimize alert fatigue and automation of ticketing workflows via IT service management (ITSM) integrations. Further, some amount of in-house work can be shifted to consultants and contractors on an as-needed basis, so the resource is no longer on the payroll after the project completes.

Effect #2: Remote management

Businesses were forced to give up leased and rented office spaces to cut back on expenses. Skeletal staff were deployed in the remaining office(s), with others moving to a work-from-home mode of operation almost overnight.

Challenges: The need for greater security; availability of the right talent with the required software and hardware resources; and the need for collaboration among geographically dispersed teams.

Outcome: The rise of DaaS and remote collaboration tools

One of the largest areas of cloud to experience an increase in demand was desktop-as-a-service (DaaS), an inexpensive option for organizations looking to support their workers by providing secure access to enterprise applications remotely. Tool integrations for notifications and collaboration (like Slack and Trello) also rose to prominence.

Effect #3: Automated deployments

Organizations needed mechanisms for remote and automated deployments due to staff shortages and the absence of a centrally located workforce. This necessitated agile practices for breaking down organizational silos between software developers and IT operations personnel.

Challenges: Adoption and continued use of DevOps and agile practices and increased automation in the application deployment and maintenance process.

Outcome: Moving towards 100% data center automation

Automation replaces labor costs with software and configuration costs. Dedicated automation architects can ensure that DevOps and agile practices are implemented across the enterprise. Data center automation (DCA) reduces the need for manual configuration, monitoring and maintenance tasks.

Effect #4: Revamping application infrastructure

Infrastructural cost-cutting measures included a review of organizational spend on dedicated hardware and an assessment of software solutions deployed in the enterprise to see if a switch to open source was possible. Virtualization i.e. moving to cloud (microservice and container-based architectures) reduces the number of physical servers required in the enterprise and the cost of maintaining applications can be significantly reduced.

Challenges: Deployment of virtualization management systems, evaluating software tool replacement options and co-sourcing environment management to cut costs.

Outcome: Infrastructure as a service for intelligent scaling

Cost savings in cloud services have a real, immediate and perceptible cash impact. If organizations deploy a virtualization management system, they will enable faster adoption of cloud platforms. Co-sourcing environment management functions provides the added advantage of having the right talent managing the environment with technical know-how and service guarantees in place.

Moving to the cloud reduces capital expenditures for servers and related network equipment, transforming one-time capital costs to monthly operating expenses. Cloud providers can also provision additional resources like disk space, CPU, memory and communication lines faster and cheaper than on-premise servers and infrastructure. Intelligent workload trend-based capacity forecasting can help provision resources accurately and avoid unnecessary expenditure.

Software licenses for new and existing tools can be re-examined to ensure that the cost of onboarding and integration with the existing toolset does not include hidden expenses. Eliminating unnecessary tools will also reduce the annual maintenance bill and staff time to keep the systems up and running.

Effect #5: Higher level of automation for maximum uptime

As businesses moved to more digital transactions and saw a marked increase in online traffic due to storefronts being shut, the primary challenge was to provide close to 24x7x365 uptime with reduced IT operations personnel, something made possible by automation. Enterprises adopted artificial intelligence for IT operations (AIOps) solutions providing proactive incident detection and autonomous resolution capabilities coupled with ITSM integrations, so the entire ticketing process was completely automated without the need for human intervention.

Challenges in 2021: Adoption of preventive healing solutions for zero-downtime and complete automation and intelligent scaling of businesses for increased online presence and traffic.

Outcome: Minimizing mean time to repair (MTTR) and moving from AIOps to preventive healing

Preventive healing solutions use patented techniques for the true predictive detection of issues before they occur and allow for remedial steps to be put in place so the issue can be averted. Some modes of preventive healing include dynamically optimizing or shaping the workload so the underlying system behavior remains unaffected, provisioning additional resources in cloud environments so the system can handle workload surges or projecting resource requirements based on a what-if analysis of future workload trends so businesses can perform app-aware scaling. Automation of ticketing workflows can be achieved by integrating notification and ITSM platforms.

Despite predictive alerting, some issues may still occur due to sudden network or storage outages, hardware glitches or 3rd party dependencies being unavailable. In such cases, accelerated root cause analysis with event correlations and suggestions on where the error originated can significantly reduce MTTR. In the hands of a skilled IT operations analyst, time-synchronized contextual data comprising logs, diagnostic data, business error codes and code-level traces prove invaluable in establishing the chain of causation and closing the incident with minimal time and effort spent, thus leading to a more cost- and resource-efficient data center.

Scroll to Top