Google Compute Engine

Compute Engine instances can run the public images for Linux and Windows Server that Google provides and private custom images that you can create or import from your existing systems. You can also deploy Docker containers, which are automatically launched on instances running the Container-Optimized OS public image.

You can choose the machine properties of your instances, such as the number of virtual CPUs and the amount of memory, by using a set of predefined machine types or by creating your own custom machine types.

Each instance belongs to a Google Cloud Console project, and a project can have one or more instances.

When you create an instance in a project, you specify the zone, operating system, and machine type of that instance.
When you delete an instance, it is removed from the project.

Setup

To set up the Google integration and discover the Google service, go to Google Integration Discovery Profile,
select Instances > Perform Actions and check Manage Devices.

Supported metrics

Opsramp Metric	Google Metric	Metric Display Name	Unit	Description
google_compute_firewall_dropped_bytes_count	compute.googleapis.com/firewall/dropped_bytes_count	Dropped bytes	bytes	Count of incoming bytes dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_firewall_dropped_packets_count	compute.googleapis.com/firewall/dropped_packets_count	Dropped packets	count	Count of incoming packets dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_guest_cpu_runnable_task_count	compute.googleapis.com/guest/cpu/runnable_task_count	Runnable task count.	count	The average number of runnable tasks in the run-queue. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_cpu_usage_time	compute.googleapis.com/guest/cpu/usage_time	CPU usage	CPU seconds	CPU usage, in seconds. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_bytes_used	compute.googleapis.com/guest/disk/bytes_used	Disk usage	bytes	Number of bytes used on disk for file systems. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_io_time	compute.googleapis.com/guest/disk/io_time	IO Time	ms	The cumulative time spent on the I/O operations that are in progress; that is, the actual time in queue and when disks were busy. Requests issued in parallel are counted as a single one. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_disk_merged_operation_count	compute.googleapis.com/guest/disk/merged_operation_count	Merged disk operations	count	Merged disk operations count. Disk operations which are adjacent to each other may be merged by the kernel for efficiency. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_bytes_count	compute.googleapis.com/guest/disk/operation_bytes_count	Disk bytes transferred	bytes	Bytes transferred in disk operations. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_count	compute.googleapis.com/guest/disk/operation_count	Disk operations	count	Disk operations count. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_time	compute.googleapis.com/guest/disk/operation_time	Disk operation time	ms	Amount of time spent on the disk operations, by direction. This metric only includes time spent on completed operations. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_percent_used	compute.googleapis.com/guest/disk/percent_used	Percent Used	%	Percentage of total disk capacity currently in use.
google_compute_guest_disk_queue_length	compute.googleapis.com/guest/disk/queue_length	Queue Length	count	The queue length on the disk averaged over the last 60 seconds. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_disk_weighted_io_time	compute.googleapis.com/guest/disk/weighted_io_time	IO Time	ms	The cumulative weighted IO time spent on the disk. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_memory_anonymous_used	compute.googleapis.com/guest/memory/anonymous_used	Anonymous memory usage in Bytes	bytes	Anonymous memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_bytes_used	compute.googleapis.com/guest/memory/bytes_used	Memory usage in Bytes	bytes	Memory usage by each memory state, in Bytes. Summing values of all states yields the total memory on the machine. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_dirty_used	compute.googleapis.com/guest/memory/dirty_used	Dirty pages usage in Bytes.	bytes	Dirty pages usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_page_cache_used	compute.googleapis.com/guest/memory/page_cache_used	Page cache memory usage in Bytes	bytes	Page cache memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_percent_used	compute.googleapis.com/guest/memory/percent_used	Percent Used	%	Percentage of total system memory actively in use. Calculated as (Total Memory - Free Memory - Buffers - Cached - Slab) / Total Memory * 100.
google_compute_guest_memory_unevictable_used	compute.googleapis.com/guest/memory/unevictable_used	Unevictable memory usage in Bytes	bytes	Unevictable memory usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_system_os_feature_enabled	compute.googleapis.com/guest/system/os_feature_enabled	OS Feature	count	OS Features like GPU support, KTD kernel, third party modules as unknown modules. 1 if the feature is enabled and 0, if disabled. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_problem_count	compute.googleapis.com/guest/system/problem_count	Problem Count	count	Number of times a machine problem has happened. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_problem_state	compute.googleapis.com/guest/system/problem_state	Problem State	count	Whether a problem is affecting the system or not. The problem is affecting the system when set to 1 and is not affecting the system when set to 0. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_uptime	compute.googleapis.com/guest/system/uptime	Uptime	seconds	Number of seconds that the operating system has been running for. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_instance_clock_accuracy_ptp_kvm_nanosecond_accuracy	compute.googleapis.com/instance/clock_accuracy/ptp_kvm/nanosecond_accuracy	Clock Accuracy	ns	Accuracy of the host clock in nanoseconds.
google_compute_instance_cpu_guest_visible_vcpus	compute.googleapis.com/instance/cpu/guest_visible_vcpus	Guest Visible vCPUs	count	Number of vCPUs visible inside the guest. For many GCE machine types, the number of vCPUs visible inside the guest is equal to the `compute.googleapis.com/instance/cpu/reserved_cores` metric. For shared-core machine types, the number of guest-visible vCPUs differs from the number of resereved cores. For example, e2-small instances have two vCPUs visible inside the guest and 0.5 fractional vCPUs reserved. Therefore, for an e2-small instance, `compute.googleapis.com/instance/cpu/guest_visible_vcpus` has a value of 2 and `compute.googleapis.com/instance/cpu/reserved_cores` has a value of 0.5. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_reserved_cores	compute.googleapis.com/instance/cpu/reserved_cores	Reserved vCPUs	count	Number of vCPUs reserved on the host of the instance. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_scheduler_wait_time	compute.googleapis.com/instance/cpu/scheduler_wait_time	Scheduler Wait Time	idle seconds	Wait time is the time a vCPU is ready to run, but unexpectedly not scheduled to run. The wait time returned here is the accumulated value for all vCPUs. The time interval for which the value was measured is returned by Monitoring in whole seconds as start_time and end_time. This metric is only available for VMs that belong to the e2 family or to overcommitted VMs on sole-tenant nodes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_usage_time	compute.googleapis.com/instance/cpu/usage_time	CPU usage	CPU seconds	Delta vCPU usage for all vCPUs, in vCPU-seconds. To compute the per-vCPU utilization fraction, divide this value by (end-start)*N, where end and start define this value's time interval and N is `compute.googleapis.com/instance/cpu/reserved_cores` at the end of the interval. This value is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/usage_time`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_utilization	compute.googleapis.com/instance/cpu/utilization	CPU utilization	%	Fractional utilization of allocated CPU on this instance. Values are typically numbers between 0.0 and 1.0 (but some machine types allow bursting above 1.0). Charts display the values as a percentage between 0% and 100% (or more). This metric is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/utilization`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_average_io_latency	compute.googleapis.com/instance/disk/average_io_latency	Disk average latency	microseconds	Disk's average io latency in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_average_io_queue_depth	compute.googleapis.com/instance/disk/average_io_queue_depth	Disk average io queue depth	count	Disk's average io queue depth in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_read_bytes_count	compute.googleapis.com/instance/disk/max_read_bytes_count	Peak disk read bytes	bytes	Disk's maximum per-second read throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_read_ops_count	compute.googleapis.com/instance/disk/max_read_ops_count	Peak disk read ops	count	Disk's maximum per-second read requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_write_bytes_count	compute.googleapis.com/instance/disk/max_write_bytes_count	Peak disk write bytes	bytes	Disk's maximum per-second write throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_write_ops_count	compute.googleapis.com/instance/disk/max_write_ops_count	Peak disk write ops	count	Disk's maximum per-second write requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_performance_status	compute.googleapis.com/instance/disk/performance_status	Disk performance status	count	Whether the disk performance is normal or could potentially be impacted by an issue within Compute Engine during the period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_iops	compute.googleapis.com/instance/disk/provisioning/iops	Provisioned disk IOPS	count	Disk's provisioned IOPS specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_size	compute.googleapis.com/instance/disk/provisioning/size	Provisioned disk size	bytes	Disk's provisioned size specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_throughput	compute.googleapis.com/instance/disk/provisioning/throughput	Instance Disk Provisioning Throughput	count	Disk's provisioned throughput (bytes/sec) specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_read_bytes_count	compute.googleapis.com/instance/disk/read_bytes_count	Disk read bytes	bytes	Count of bytes read from disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_read_ops_count	compute.googleapis.com/instance/disk/read_ops_count	Disk read operations	count	Count of disk read IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_write_bytes_count	compute.googleapis.com/instance/disk/write_bytes_count	Disk write bytes	bytes	Count of bytes written to disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_write_ops_count	compute.googleapis.com/instance/disk/write_ops_count	Disk write operations	count	Count of disk write IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_gpu_accumulated_context_utilization_seconds	compute.googleapis.com/instance/gpu/accumulated_context_utilization_seconds	Accumulated Context Utilization Seconds	count	Accumulated context utilization time (in seconds).
google_compute_instance_gpu_cache_correctable_ecc_error_count	compute.googleapis.com/instance/gpu/cache_correctable_ecc_error_count	Correctable Cache ECC Errors	count	The number of correctable ECC errors in cache memory.
google_compute_instance_gpu_cache_uncorrectable_ecc_error_count	compute.googleapis.com/instance/gpu/cache_uncorrectable_ecc_error_count	Uncorrectable Cache ECC Errors	count	The number of uncorrectable ECC errors in cache memory.
google_compute_instance_gpu_dram_correctable_ecc_error_count	compute.googleapis.com/instance/gpu/dram_correctable_ecc_error_count	Correctable DRAM ECC Errors	count	The number of correctable ECC errors in GPU DRAMs.
google_compute_instance_gpu_dram_correctable_row_remapping_count	compute.googleapis.com/instance/gpu/dram_correctable_row_remapping_count	Correctable DRAM Row Remapping Count	count	The number of row remappings from correctable errors in GPU DRAMs.
google_compute_instance_gpu_dram_row_remapping_failed	compute.googleapis.com/instance/gpu/dram_row_remapping_failed	DRAM Row Remapping Failed	count	Whether row remapping failed previously. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_dram_row_remapping_pending	compute.googleapis.com/instance/gpu/dram_row_remapping_pending	DRAM Row Remapping Pending	count	Whether row remapping is set to occur at the next GPU reset. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_dram_uncorrectable_ecc_error_count	compute.googleapis.com/instance/gpu/dram_uncorrectable_ecc_error_count	Uncorrectable DRAM ECC Errors	count	The number of uncorrectable ECC errors in GPU DRAMs.
google_compute_instance_gpu_dram_uncorrectable_row_remapping_count	compute.googleapis.com/instance/gpu/dram_uncorrectable_row_remapping_count	Uncorrectable DRAM Row Remapping Count	count	The number of row remappings from uncorrectable errors in GPU DRAMs.
google_compute_instance_gpu_failure_prediction_status	compute.googleapis.com/instance/gpu/failure_prediction_status	VM Degradation Status	count	This metric indicates the probability of a VM entering a degraded state within the next 5 hours as predicted by our proprietary algorithm. Value label for this metric would be NO_DEGRADATION_PREDICTED, DEGRADATION_PREDICTED, POSSIBLE_DEGRADATION_PREDICTED. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_gpu_nvlink_active_speed	compute.googleapis.com/instance/gpu/gpu_nvlink_active_speed	GPU NVLink Port Active Speed	count	Current NVLink port speed in Gbps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_effective_ber	compute.googleapis.com/instance/gpu/gpu_nvlink_effective_ber	GPU NVLink Effective BER	count	Effective bit error rate (BER) is the error rate of the port after a forward error correction (FEC). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_link_downed_counter	compute.googleapis.com/instance/gpu/gpu_nvlink_link_downed_counter	GPU NVLink Port Link Downed Counter	count	The number of link-down events on the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_link_error_recovery_count	compute.googleapis.com/instance/gpu/gpu_nvlink_link_error_recovery_count	GPU NVLink Port Link Error Recovery Counter	count	The number of successful link recovery processes. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_physical_effective_errors	compute.googleapis.com/instance/gpu/gpu_nvlink_physical_effective_errors	GPU NVLink Port Physical Effective Errors	count	Effective error count is the number of bit errors that the port receives post-Forward Error Correction (FEC). Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_rcv_data	compute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_data	GPU NVLink Port Rcv Data	count	Total number of bytes received, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_rcv_errors	compute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_errors	GPU NVLink Port RCV Errors	count	Total number of packets containing an error that were received on the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_data	compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_data	GPU NVLink Port Xmit Data	count	Total number of bytes transmitted, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_discards	compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_discards	GPU NVLink Port Xmit Discards	count	Total number of outbound packets that were discarded by the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_wait	compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_wait	GPU NVLink Port Xmit Wait	count	The number of transmitted packets incurred transmit wait. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_vl15_dropped	compute.googleapis.com/instance/gpu/gpu_nvlink_vl15_dropped	GPU NVLink Port VL15 Dropped	count	The number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only.
google_compute_instance_gpu_infra_health	compute.googleapis.com/instance/gpu/infra_health	VM Infra Health	count	This metric captures the infrastructure health of the VM as a string. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_inter_block_tx	compute.googleapis.com/instance/gpu/inter_block_tx	Network Traffic at Inter-Block	bytes	This metric represents network traffic at the inter-block level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_inter_subblock_tx	compute.googleapis.com/instance/gpu/inter_subblock_tx	Network Traffic at Inter-Subblock	bytes	This metric represents network traffic at the inter-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_intra_subblock_tx	compute.googleapis.com/instance/gpu/intra_subblock_tx	Network Traffic at Intra-Subblock	bytes	This metric represents network traffic at the intra-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_link_carrier_changes	compute.googleapis.com/instance/gpu/link_carrier_changes	Link Carrier Changes	count	This metric captures the network link carrier change as delta value computed at 1 minute granularity. This metric is available for all GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs.
google_compute_instance_gpu_nccl_latency_tx	compute.googleapis.com/instance/gpu/nccl/latency_tx	NCCL Send Latency	ns	The metric measures the latency distribution of NCCL send operations.
google_compute_instance_gpu_nccl_latency_variance	compute.googleapis.com/instance/gpu/nccl/latency_variance	NCCL Send Latency Variance	ns	The metric measures the latency variance distribution of NCCL send operations.
google_compute_instance_gpu_nccl_message_size_tx	compute.googleapis.com/instance/gpu/nccl/message_size_tx	NCCL Send Message Size	bytes	The metric measures the message size distribution of NCCL send operations.
google_compute_instance_gpu_network_rtt	compute.googleapis.com/instance/gpu/network_rtt	Network RTT	microseconds	This metric measures network round-trip time in your GPU VMs. This metric is available for GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs.
google_compute_instance_gpu_nvlink_active_speed	compute.googleapis.com/instance/gpu/nvlink_active_speed	NVLink Active Speed	count	Current access link port speed in Gb/s. Supported for A4X VMs only.
google_compute_instance_gpu_nvlink_port_state	compute.googleapis.com/instance/gpu/nvlink_port_state	NVLink Port State	count	Logical and Physical port states for NVswitch ports as defined in the OpenConfig YANG model. Supported for A4X VMs only.
google_compute_instance_gpu_nvlink_runtime_error	compute.googleapis.com/instance/gpu/nvlink_runtime_error	NVLink Runtime Error	count	Whether an NVLink Runtime Error occurred. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_nvswitch_effective_ber	compute.googleapis.com/instance/gpu/nvswitch_effective_ber	NVSwitch Effective BER	count	Effective BER (Bit Error Rate) is the error rate of the port after FEC (Forward Error Correction). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_effective_errors	compute.googleapis.com/instance/gpu/nvswitch_effective_errors	NVSwitch Effective Errors	count	Effective error count is the number of bit errors that the port receives after FEC (Forward Error Correction). Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_link_downed_counter	compute.googleapis.com/instance/gpu/nvswitch_link_downed_counter	NVSwitch Link Downed Counter	count	The count of link-down events on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_link_error_recovery_counter	compute.googleapis.com/instance/gpu/nvswitch_link_error_recovery_counter	NVSwitch Link Error Recovery Counter	count	The count of successful link recovery processes on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_rcv_data	compute.googleapis.com/instance/gpu/nvswitch_port_rcv_data	NVSwitch Port Rcv Data	count	Total number of bytes received, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_rcv_errors	compute.googleapis.com/instance/gpu/nvswitch_port_rcv_errors	NVSwitch Port Rcv Errors	count	Total number of packets containing an error that were received on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_constraint_errors	compute.googleapis.com/instance/gpu/nvswitch_port_xmit_constraint_errors	NVSwitch Port Xmit Constraint Errors	count	Total number of packets not transmitted from the switch physical port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_data	compute.googleapis.com/instance/gpu/nvswitch_port_xmit_data	NVSwitch Port Xmit Data	count	Total number of bytes transmitted, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_discards	compute.googleapis.com/instance/gpu/nvswitch_port_xmit_discards	NVSwitch Port Xmit Discards	count	Total number of outbound packets that were discarded by the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_wait	compute.googleapis.com/instance/gpu/nvswitch_port_xmit_wait	NVSwitch Port Xmit Wait	count	The number of transmitted packets incurred transmit wait. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_status	compute.googleapis.com/instance/gpu/nvswitch_status	NV Switch Status	count	This metric represents the health of an individual NV Switch on the host as a string. If a machine has multiple NV Switches attached, the metric provides each NV Switch health status on the host. The possible values for this metric are provided by NVIDIA BMC. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_nvswitch_symbol_errors	compute.googleapis.com/instance/gpu/nvswitch_symbol_errors	NVSwitch Symbol Errors	count	Symbol error count is the number of bit errors that the port receives after FEC (Forward Error Correction) and PLR (Physical Layer Retransmission). Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_vl15_dropped	compute.googleapis.com/instance/gpu/nvswitch_vl15_dropped	NVSwitch VL15 Dropped	count	The number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_zero_hist	compute.googleapis.com/instance/gpu/nvswitch_zero_hist	NVSwitch Zero Histogram FEC	count	First FEC histogram bin with value of 0. Monitor max of bits errors in the FEC block occurred up to the time of measurement. Supported for A4X VMs only.
google_compute_instance_gpu_packet_retransmission_count	compute.googleapis.com/instance/gpu/packet_retransmission_count	Packet Retransmission Count	count	This metric, representing the packet retransmission count observed by network interface cards (NICs) attached to GPUs on the host, is a single INT64 value per timestamp. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_pcie_correctable_error_count	compute.googleapis.com/instance/gpu/pcie_correctable_error_count	Correctable PCIe Errors	count	The number of correctable PCIe errors.
google_compute_instance_gpu_pcie_fatal_error_count	compute.googleapis.com/instance/gpu/pcie_fatal_error_count	Fatal PCIe Errors	count	The number of fatal PCIe errors.
google_compute_instance_gpu_pcie_l0_to_recovery_count	compute.googleapis.com/instance/gpu/pcie_l0_to_recovery_count	PCIe L0 To Recovery Count	count	The number of times the PCIe link entered the recovery state from the L0 state.
google_compute_instance_gpu_pcie_nak_received_count	compute.googleapis.com/instance/gpu/pcie_nak_received_count	PCIe NAK Received Count	count	The number of NAKs the host root complex issued on the PCIe link.
google_compute_instance_gpu_pcie_nak_sent_count	compute.googleapis.com/instance/gpu/pcie_nak_sent_count	PCIe NAK Sent Count	count	The number of NAKs the GPU issued on the PCIe link.
google_compute_instance_gpu_pcie_non_fatal_error_count	compute.googleapis.com/instance/gpu/pcie_non_fatal_error_count	Non Fatal PCIe Errors	count	The number of non-fatal PCIe errors.
google_compute_instance_gpu_pcie_replay_count	compute.googleapis.com/instance/gpu/pcie_replay_count	PCIe Replays	count	The number of replays the GPU issued on the PCIe link.
google_compute_instance_gpu_pcie_replay_rollover_count	compute.googleapis.com/instance/gpu/pcie_replay_rollover_count	PCIe Replay Rollovers	count	The number of replay rollovers the GPU issued on the PCIe link.
google_compute_instance_gpu_power_consumption	compute.googleapis.com/instance/gpu/power_consumption	GPU Power Consumption	count	This metric represents power consumption observed on individual GPUs on the host as a double value. If a machine has multiple GPUs attached, the metric provides each GPU power consumption on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_sm_utilization	compute.googleapis.com/instance/gpu/sm_utilization	SM Utilization	%	This metric represents the Streaming Multiprocessor (SM) utilization of an individual GPU on the host as a percentage value. if a machine has multiple GPUs attached, the metric provides each GPU SM utilization on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_straggler_status	compute.googleapis.com/instance/gpu/straggler_status	Straggler Status	count	This metric indicates if a VM is identified as a Straggler node affecting the performance of an AI/ML job. This metric is supported for the A3-mega, A3-ultra and A4* VM families. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_tcpxo_receive_chunk_latency	compute.googleapis.com/instance/gpu/tcpxo_receive_chunk_latency	TCPXO Receive Chunk Latency	ns	The metric measures TCPXO received chunk latency in VM. This metric is available only for A3 mega VMs.
google_compute_instance_gpu_tcpxo_send_chunk_latency	compute.googleapis.com/instance/gpu/tcpxo_send_chunk_latency	TCPXO Send Chunk Latency	ns	The metric measures TCPXO send chunk latency in VM. This metric is available only for A3 mega VMs.
google_compute_instance_gpu_temperature	compute.googleapis.com/instance/gpu/temperature	GPU Temperature	count	This metric represents the temperature of an individual GPU on the host, as a double value. If a machine has multiple GPUs attached, the metric provides each GPU temperature on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_throughput_rx_bytes	compute.googleapis.com/instance/gpu/throughput_rx_bytes	Throughput Rx Bytes	bytes	This metric represents network throughput as an INT64 value, calculated as the delta of received bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_throughput_tx_bytes	compute.googleapis.com/instance/gpu/throughput_tx_bytes	Throughput Tx Bytes	bytes	This metric represents network throughput as an INT64 value, calculated as the delta of transferred bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_tlimit	compute.googleapis.com/instance/gpu/tlimit	GPU Thermal Margin	count	Represents the thermal margin of an individual GPU on the host. This metric represents the temperature in C from a software slowdown event for an individual GPU on the host as a double value. For an `n-GPU` machine, each timestamp we will have `n` values representing the thermal margin of each GPU on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_integrity_early_boot_validation_status	compute.googleapis.com/instance/integrity/early_boot_validation_status	Early Boot Validation	count	The validation status of early boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_integrity_late_boot_validation_status	compute.googleapis.com/instance/integrity/late_boot_validation_status	Late Boot Validation	count	The validation status of late boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_interruption_count	compute.googleapis.com/instance/interruption_count	Interruption Count	count	Interruptions are system evictions of infrastructure while the customer is in control of that infrastructure. This metric is the current count of interruptions by type and reason. The stream is often undefined when the count is zero. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
google_compute_instance_memory_balloon_ram_size	compute.googleapis.com/instance/memory/balloon/ram_size	VM Memory Total	bytes	The total amount of memory in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_ram_used	compute.googleapis.com/instance/memory/balloon/ram_used	VM Memory Used	bytes	Memory currently used in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_swap_in_bytes_count	compute.googleapis.com/instance/memory/balloon/swap_in_bytes_count	VM Swap In	bytes	The amount of memory read into the guest from its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_swap_out_bytes_count	compute.googleapis.com/instance/memory/balloon/swap_out_bytes_count	VM Swap Out	bytes	The amount of memory written from the guest to its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_received_bytes_count	compute.googleapis.com/instance/network/received_bytes_count	Received bytes	bytes	Count of bytes received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_received_packets_count	compute.googleapis.com/instance/network/received_packets_count	Received packets	count	Count of packets received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_sent_bytes_count	compute.googleapis.com/instance/network/sent_bytes_count	Sent bytes	bytes	Count of bytes sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_sent_packets_count	compute.googleapis.com/instance/network/sent_packets_count	Sent packets	count	Count of packets sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_tpu_infra_health	compute.googleapis.com/instance/tpu/infra_health	TPU Instance Health	count	Indicates the overall health status of a TPU instance. The metric labels help identify the specific health status and reasons for issues on degraded or unhealthy TPU instances, primarily focusing on TPU hardware and system health. Health status changes may take several minutes to be reflected in this metric. Sampled every 60 seconds. After sampling, data is not visible for up to 420 seconds.
google_compute_instance_uptime	compute.googleapis.com/instance/uptime	Uptime	uptime	Delta of how long the VM has been running, in seconds.googleapis.com/instance/uptime_total. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_uptime_total	compute.googleapis.com/instance/uptime_total	Uptime Total	seconds	Elapsed time since the VM was started, in seconds. After sampling, data is not visible for up to 120 seconds. When VM is Stopped (https://cloud.google.com/compute/docs/instances/stop-start-instance#stop-vm-google-cloud), the time is not calculated. On starting the VM again, the timer will reset to 0 for that VM. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
google_compute_intercept_intercepted_bytes_count	compute.googleapis.com/intercept/intercepted_bytes_count	Intercepted bytes	bytes	The number of intercepted bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_intercept_intercepted_packets_count	compute.googleapis.com/intercept/intercepted_packets_count	Intercepted packets	count	The number of intercepted packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_dropped_packets_count	compute.googleapis.com/mirroring/dropped_packets_count	Dropped packets	count	Count of dropped mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_mirrored_bytes_count	compute.googleapis.com/mirroring/mirrored_bytes_count	Mirrored bytes	bytes	Count of mirrored bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_mirrored_packets_count	compute.googleapis.com/mirroring/mirrored_packets_count	Mirrored packets	count	Count of mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_nat_allocated_ports	compute.googleapis.com/nat/allocated_ports	Allocated ports	ports	Number of ports allocated to a VM by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_closed_connections_count	compute.googleapis.com/nat/closed_connections_count	Closed connections count	connections	Count of connections closed over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_dropped_received_packets_count	compute.googleapis.com/nat/dropped_received_packets_count	Received packets dropped count	packets	Count of received packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_dropped_sent_packets_count	compute.googleapis.com/nat/dropped_sent_packets_count	Sent packets dropped count	packets	Count of sent packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_new_connections_count	compute.googleapis.com/nat/new_connections_count	New connections count	connections	Count of new connections created over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_open_connections	compute.googleapis.com/nat/open_connections	Open connections	connections	Number of connections currently open on the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_port_usage	compute.googleapis.com/nat/port_usage	Port usage	ports	Maximum number of connections from a VM to a single internet endpoint (IP:port). Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_received_bytes_count	compute.googleapis.com/nat/received_bytes_count	Received bytes count	bytes	Count of bytes received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_received_packets_count	compute.googleapis.com/nat/received_packets_count	Received packets count	packets	Count of packets received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_sent_bytes_count	compute.googleapis.com/nat/sent_bytes_count	Sent bytes count	bytes	Count of bytes sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_sent_packets_count	compute.googleapis.com/nat/sent_packets_count	Sent packets count	packets	Count of packets sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.

Event support

Not Supported

External reference

Google Compute Engine - Virtual Machine Instances