While the trend toward cloud infrastructure and hyperscale data centers means enterprises are growing more dependent on third parties for their IT operations, a recent Uptime Institute survey found that 48% of North American organizations still rely on on-premises data centers.
For those organizations, it is critical that they invest in and maintain high availability to ensure mission-critical systems and services run as expected.
As a business imperative, high availability is vital to maintaining business continuity, maximizing customer satisfaction, and minimizing financial losses. Whether you are starting from scratch or are responsible for existing systems and critical infrastructure, three key steps must be mastered to achieve high availability:
-
Protection of the physical plant
-
Architecting a resilient infrastructure
-
Choosing the right operational tools
Physical Data Center Security
Addressing vulnerabilities in the facility housing an organization’s data center is often an overlooked aspect of high availability.
Whether that data center is a standalone structure or dedicated space within a larger campus, investments in resilient IT architecture, excellent operational tools, and a meticulous response strategy are moot if your IT infrastructure is subject to issues like malicious human intrusion, environmental failures, power outages, or other disasters.
To guard against and minimize the risk of avoidable non-cyber incidents like these requires physical security measures, including:
-
Security cameras for real-time monitoring
-
Strong access controls to limit entry to authorized individuals
-
Reliable power infrastructure, including a generator and an uninterruptible power supply (UPS)
-
Gas-based fire suppression systems, such as FM-200
-
Environmental monitoring with temperature and humidity controls
Resilient IT Architecture
A cornerstone of high availability is the redundancy of IT infrastructure. By identifying potential critical single points of failure and, where possible, ensuring there is an option for failover to a secondary resource, you can reduce the risk of downtime in the event of an incident. Redundancy should extend across both hardware and software layers.
Implementing failover clusters, resilient networking paths, storage redundancy using RAID, and offsite data replication for disaster recovery are proven strategies. Adopting a hybrid or multi-cloud approach can also reduce reliance on any single service provider.
If you operate an off-site data center, ensure it is not dependent on the same power source as your main campus. Be sure to have a disaster recovery and business continuity plan that includes local and offsite backup storage.
High Availability Operational Tools
You’ve protected your data center and built a resilient IT infrastructure. Now it’s time to ensure everything works how you need it to. That means choosing tools that enable you to respond to incidents and execute response plans as intended, embrace automation where possible, and make good decisions under pressure when things have gone wrong.
Because good decisions require good data, the first step is investing in IT operations management tools that excel at discovering network assets, ingesting their data, and updating a configuration management database (CMDB).
Building from a foundation of accurate data, application performance monitoring (APM) tools are a good choice for gaining a precise understanding of the health of the systems comprising the network. APM and network monitoring platforms give IT management the information to make timely decisions for operational issues like maintenance, load balancing, and incident response. That’s important for maintaining high availability (HA) since bad decisions increase the risk of service outages resulting from preventable system failure.
Whether your infrastructure is on-premises, cloud-based, or hybrid, the other key component to achieving high availability is the establishment of failover clusters to facilitate – and even automate – the movement of services and workloads to a secondary resource. Whether hardware (SAN-based) or software (SANless), clusters support the seamless failover of services to back up resources and ensure continuity in the event of a severely degraded performance or an outage incident.
Enterprises today tend to favor high availability SANless clusters for their flexibility operating in IT environments more heavily dependent on cloud systems and services, virtual machines, and software. SANless clusters offer the same functionality as legacy SAN clusters but with more flexibility and lower cost. Moreover, SANless clusters support on-premises, cloud, or hybrid infrastructure and can support geographically distributed data centers, which is a key consideration in network resiliency and disaster planning.
Keeping Services Online
With trends like hyperscale data centers, cloud workload repatriation, and digital transformation in full bloom, much is changing for today’s IT operations managers.
However, one consistent requirement is keeping services available to users and avoiding downtime. With planning that includes physical security, resilient architecture, and high availability, you can keep your users and customers happy.