This objective talks about Capacity Management and Performance that comprises of VMware technology. It is important to understand the business demands and capacity to fulfill those requirements. This information can be sourced from Infrastructure Delivery leads or an IT director. This Capacity Management process focuses to meet the long term demand, whereas short demand can be catered by extensively using DRS, vMotion, VM Rightsizing and just anything which helps balance the workload. And this should be implemented via Configuration and Change management so that any reallocation of resources can be tracked.
Service Capacity Management: By using Service Capacity Management, it’s the Service Level Manager’s duty to ensure the response time targets are defined within the agreed SLA. Apart from CPU, Memory utilization metrics, it is important to ascertain overall response time of an application when its moved to VM. It will also provide a valuable business metric as to the benefits of moving to a VMware environment i.e. “We have consolidated Service A to VMware and it has reduced the user response time by 25%”.
Component Capacity Management: This sub process is concerned with the performance and capacity of the underlying components that support the IT services viz CPU, Memory, disks etc whether they are running at optimal performance and have enough capacity to meet the near future growth of the business.
key metrics to monitor:
- %CPU Ready time, especially when there is resource contention
- Memory swapping and reclamation metrics, although VMware uses TPS to get rid of redundant memory pages
- Storage Capacity in terms of number of TB’s available
- Storage Overallocation if at all you are using Thin Provisioning
- Disk I/O
- Network I/O
- CPU/Memory Utilization of both Virtual and Physical (ESX/ESXi) Machines
- VM/Host Availability
How often do we monitor the above metrics?
- If you are gathering data for the purposes of performance monitoring for an application sizing project or to analyze a particular performance issue then we would usually suggest that the sample time is between 2 and 5 minutes
- If you are looking more at capacity planning then we would suggest around the 10 minute capture period for dynamic metrics such as CPU, memory, etc and maybe 60 minutes for largely static metrics such as disk utilization
What tools to use?
- Before Virtualization, tools like Capacity Planner, Novell PlateSpin etc..can be used to ascertain the workload
- Post Virtualization, tools like Capacity IQ, Veeam Monitor, vKernel, Quest etc can be used
- vCenter Alarms are good to start with, wherein every Alarm will have an Event, A Trigger and Action to be taken i.e. either send an email, SNMP Trap or execute a script and so on….
- There are application which consume less CPU and there are some which consume very high CPU, so this really depends on your functional requirements as to what sort of thresholds are applicable in your environment
- Whenever the alerts are generated, ensure they are sent to the correct team so that the appropriate actions can be taken
From the ITIL perspective there is certainly scope to expand your KPIs to include metrics that will reflect the new environment and you may also need to consider how you interact with both the business and the other processes