Compute Engine instances can run the public images for Linux and Windows Server that Google provides and private custom images that you can create or import from your existing systems. You can also deploy Docker containers, which are automatically launched on instances running the Container-Optimized OS public image.

You can choose the machine properties of your instances, such as the number of virtual CPUs and the amount of memory, by using a set of predefined machine types or by creating your own custom machine types.

Each instance belongs to a Google Cloud Console project, and a project can have one or more instances.

  • When you create an instance in a project, you specify the zone, operating system, and machine type of that instance.
  • When you delete an instance, it is removed from the project.

Setup

To set up the Google integration and discover the Google service, go to Google Integration Discovery Profile,
select Instances > Perform Actions and check Manage Devices.

Supported metrics

Opsramp MetricGoogle MetricMetric Display NameUnitDescription
google_compute_firewall_dropped_bytes_countcompute.googleapis.com/firewall/dropped_bytes_countDropped bytesbytesCount of incoming bytes dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_firewall_dropped_packets_countcompute.googleapis.com/firewall/dropped_packets_countDropped packetscountCount of incoming packets dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_guest_cpu_runnable_task_countcompute.googleapis.com/guest/cpu/runnable_task_countRunnable task count.countThe average number of runnable tasks in the run-queue. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_cpu_usage_timecompute.googleapis.com/guest/cpu/usage_timeCPU usageCPU secondsCPU usage, in seconds. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_bytes_usedcompute.googleapis.com/guest/disk/bytes_usedDisk usagebytesNumber of bytes used on disk for file systems. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_io_timecompute.googleapis.com/guest/disk/io_timeIO TimemsThe cumulative time spent on the I/O operations that are in progress; that is, the actual time in queue and when disks were busy. Requests issued in parallel are counted as a single one. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_disk_merged_operation_countcompute.googleapis.com/guest/disk/merged_operation_countMerged disk operationscountMerged disk operations count. Disk operations which are adjacent to each other may be merged by the kernel for efficiency. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_bytes_countcompute.googleapis.com/guest/disk/operation_bytes_countDisk bytes transferredbytesBytes transferred in disk operations. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_countcompute.googleapis.com/guest/disk/operation_countDisk operationscountDisk operations count. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_operation_timecompute.googleapis.com/guest/disk/operation_timeDisk operation timemsAmount of time spent on the disk operations, by direction. This metric only includes time spent on completed operations. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_disk_percent_usedcompute.googleapis.com/guest/disk/percent_usedPercent Used%Percentage of total disk capacity currently in use.
google_compute_guest_disk_queue_lengthcompute.googleapis.com/guest/disk/queue_lengthQueue LengthcountThe queue length on the disk averaged over the last 60 seconds. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_disk_weighted_io_timecompute.googleapis.com/guest/disk/weighted_io_timeIO TimemsThe cumulative weighted IO time spent on the disk. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_memory_anonymous_usedcompute.googleapis.com/guest/memory/anonymous_usedAnonymous memory usage in BytesbytesAnonymous memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_bytes_usedcompute.googleapis.com/guest/memory/bytes_usedMemory usage in BytesbytesMemory usage by each memory state, in Bytes. Summing values of all states yields the total memory on the machine. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_dirty_usedcompute.googleapis.com/guest/memory/dirty_usedDirty pages usage in Bytes.bytesDirty pages usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_page_cache_usedcompute.googleapis.com/guest/memory/page_cache_usedPage cache memory usage in BytesbytesPage cache memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_memory_percent_usedcompute.googleapis.com/guest/memory/percent_usedPercent Used%Percentage of total system memory actively in use. Calculated as (Total Memory - Free Memory - Buffers - Cached - Slab) / Total Memory * 100.
google_compute_guest_memory_unevictable_usedcompute.googleapis.com/guest/memory/unevictable_usedUnevictable memory usage in BytesbytesUnevictable memory usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE.
google_compute_guest_system_os_feature_enabledcompute.googleapis.com/guest/system/os_feature_enabledOS FeaturecountOS Features like GPU support, KTD kernel, third party modules as unknown modules. 1 if the feature is enabled and 0, if disabled. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_problem_countcompute.googleapis.com/guest/system/problem_countProblem CountcountNumber of times a machine problem has happened. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_problem_statecompute.googleapis.com/guest/system/problem_stateProblem StatecountWhether a problem is affecting the system or not. The problem is affecting the system when set to 1 and is not affecting the system when set to 0. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_guest_system_uptimecompute.googleapis.com/guest/system/uptimeUptimesecondsNumber of seconds that the operating system has been running for. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds.
google_compute_instance_clock_accuracy_ptp_kvm_nanosecond_accuracycompute.googleapis.com/instance/clock_accuracy/ptp_kvm/nanosecond_accuracyClock AccuracynsAccuracy of the host clock in nanoseconds.
google_compute_instance_cpu_guest_visible_vcpuscompute.googleapis.com/instance/cpu/guest_visible_vcpusGuest Visible vCPUscountNumber of vCPUs visible inside the guest. For many GCE machine types, the number of vCPUs visible inside the guest is equal to the `compute.googleapis.com/instance/cpu/reserved_cores` metric. For shared-core machine types, the number of guest-visible vCPUs differs from the number of resereved cores. For example, e2-small instances have two vCPUs visible inside the guest and 0.5 fractional vCPUs reserved. Therefore, for an e2-small instance, `compute.googleapis.com/instance/cpu/guest_visible_vcpus` has a value of 2 and `compute.googleapis.com/instance/cpu/reserved_cores` has a value of 0.5. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_reserved_corescompute.googleapis.com/instance/cpu/reserved_coresReserved vCPUscountNumber of vCPUs reserved on the host of the instance. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_scheduler_wait_timecompute.googleapis.com/instance/cpu/scheduler_wait_timeScheduler Wait Timeidle secondsWait time is the time a vCPU is ready to run, but unexpectedly not scheduled to run. The wait time returned here is the accumulated value for all vCPUs. The time interval for which the value was measured is returned by Monitoring in whole seconds as start_time and end_time. This metric is only available for VMs that belong to the e2 family or to overcommitted VMs on sole-tenant nodes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_usage_timecompute.googleapis.com/instance/cpu/usage_timeCPU usageCPU secondsDelta vCPU usage for all vCPUs, in vCPU-seconds. To compute the per-vCPU utilization fraction, divide this value by (end-start)*N, where end and start define this value's time interval and N is `compute.googleapis.com/instance/cpu/reserved_cores` at the end of the interval. This value is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/usage_time`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_cpu_utilizationcompute.googleapis.com/instance/cpu/utilizationCPU utilization%Fractional utilization of allocated CPU on this instance. Values are typically numbers between 0.0 and 1.0 (but some machine types allow bursting above 1.0). Charts display the values as a percentage between 0% and 100% (or more). This metric is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/utilization`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_average_io_latencycompute.googleapis.com/instance/disk/average_io_latencyDisk average latencymicrosecondsDisk's average io latency in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_average_io_queue_depthcompute.googleapis.com/instance/disk/average_io_queue_depthDisk average io queue depthcountDisk's average io queue depth in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_read_bytes_countcompute.googleapis.com/instance/disk/max_read_bytes_countPeak disk read bytesbytesDisk's maximum per-second read throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_read_ops_countcompute.googleapis.com/instance/disk/max_read_ops_countPeak disk read opscountDisk's maximum per-second read requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_write_bytes_countcompute.googleapis.com/instance/disk/max_write_bytes_countPeak disk write bytesbytesDisk's maximum per-second write throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_max_write_ops_countcompute.googleapis.com/instance/disk/max_write_ops_countPeak disk write opscountDisk's maximum per-second write requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_performance_statuscompute.googleapis.com/instance/disk/performance_statusDisk performance statuscountWhether the disk performance is normal or could potentially be impacted by an issue within Compute Engine during the period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_iopscompute.googleapis.com/instance/disk/provisioning/iopsProvisioned disk IOPScountDisk's provisioned IOPS specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_sizecompute.googleapis.com/instance/disk/provisioning/sizeProvisioned disk sizebytesDisk's provisioned size specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_provisioning_throughputcompute.googleapis.com/instance/disk/provisioning/throughputInstance Disk Provisioning ThroughputcountDisk's provisioned throughput (bytes/sec) specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_read_bytes_countcompute.googleapis.com/instance/disk/read_bytes_countDisk read bytesbytesCount of bytes read from disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_read_ops_countcompute.googleapis.com/instance/disk/read_ops_countDisk read operationscountCount of disk read IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_write_bytes_countcompute.googleapis.com/instance/disk/write_bytes_countDisk write bytesbytesCount of bytes written to disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_disk_write_ops_countcompute.googleapis.com/instance/disk/write_ops_countDisk write operationscountCount of disk write IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_gpu_accumulated_context_utilization_secondscompute.googleapis.com/instance/gpu/accumulated_context_utilization_secondsAccumulated Context Utilization SecondscountAccumulated context utilization time (in seconds).
google_compute_instance_gpu_cache_correctable_ecc_error_countcompute.googleapis.com/instance/gpu/cache_correctable_ecc_error_countCorrectable Cache ECC ErrorscountThe number of correctable ECC errors in cache memory.
google_compute_instance_gpu_cache_uncorrectable_ecc_error_countcompute.googleapis.com/instance/gpu/cache_uncorrectable_ecc_error_countUncorrectable Cache ECC ErrorscountThe number of uncorrectable ECC errors in cache memory.
google_compute_instance_gpu_dram_correctable_ecc_error_countcompute.googleapis.com/instance/gpu/dram_correctable_ecc_error_countCorrectable DRAM ECC ErrorscountThe number of correctable ECC errors in GPU DRAMs.
google_compute_instance_gpu_dram_correctable_row_remapping_countcompute.googleapis.com/instance/gpu/dram_correctable_row_remapping_countCorrectable DRAM Row Remapping CountcountThe number of row remappings from correctable errors in GPU DRAMs.
google_compute_instance_gpu_dram_row_remapping_failedcompute.googleapis.com/instance/gpu/dram_row_remapping_failedDRAM Row Remapping FailedcountWhether row remapping failed previously. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_dram_row_remapping_pendingcompute.googleapis.com/instance/gpu/dram_row_remapping_pendingDRAM Row Remapping PendingcountWhether row remapping is set to occur at the next GPU reset. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_dram_uncorrectable_ecc_error_countcompute.googleapis.com/instance/gpu/dram_uncorrectable_ecc_error_countUncorrectable DRAM ECC ErrorscountThe number of uncorrectable ECC errors in GPU DRAMs.
google_compute_instance_gpu_dram_uncorrectable_row_remapping_countcompute.googleapis.com/instance/gpu/dram_uncorrectable_row_remapping_countUncorrectable DRAM Row Remapping CountcountThe number of row remappings from uncorrectable errors in GPU DRAMs.
google_compute_instance_gpu_failure_prediction_statuscompute.googleapis.com/instance/gpu/failure_prediction_statusVM Degradation StatuscountThis metric indicates the probability of a VM entering a degraded state within the next 5 hours as predicted by our proprietary algorithm. Value label for this metric would be NO_DEGRADATION_PREDICTED, DEGRADATION_PREDICTED, POSSIBLE_DEGRADATION_PREDICTED. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_gpu_nvlink_active_speedcompute.googleapis.com/instance/gpu/gpu_nvlink_active_speedGPU NVLink Port Active SpeedcountCurrent NVLink port speed in Gbps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_effective_bercompute.googleapis.com/instance/gpu/gpu_nvlink_effective_berGPU NVLink Effective BERcountEffective bit error rate (BER) is the error rate of the port after a forward error correction (FEC). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_link_downed_countercompute.googleapis.com/instance/gpu/gpu_nvlink_link_downed_counterGPU NVLink Port Link Downed CountercountThe number of link-down events on the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_link_error_recovery_countcompute.googleapis.com/instance/gpu/gpu_nvlink_link_error_recovery_countGPU NVLink Port Link Error Recovery CountercountThe number of successful link recovery processes. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_physical_effective_errorscompute.googleapis.com/instance/gpu/gpu_nvlink_physical_effective_errorsGPU NVLink Port Physical Effective ErrorscountEffective error count is the number of bit errors that the port receives post-Forward Error Correction (FEC). Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_rcv_datacompute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_dataGPU NVLink Port Rcv DatacountTotal number of bytes received, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_rcv_errorscompute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_errorsGPU NVLink Port RCV ErrorscountTotal number of packets containing an error that were received on the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_datacompute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_dataGPU NVLink Port Xmit DatacountTotal number of bytes transmitted, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_discardscompute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_discardsGPU NVLink Port Xmit DiscardscountTotal number of outbound packets that were discarded by the port. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_port_xmit_waitcompute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_waitGPU NVLink Port Xmit WaitcountThe number of transmitted packets incurred transmit wait. Supported for A4X VMs only.
google_compute_instance_gpu_gpu_nvlink_vl15_droppedcompute.googleapis.com/instance/gpu/gpu_nvlink_vl15_droppedGPU NVLink Port VL15 DroppedcountThe number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only.
google_compute_instance_gpu_infra_healthcompute.googleapis.com/instance/gpu/infra_healthVM Infra HealthcountThis metric captures the infrastructure health of the VM as a string. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_inter_block_txcompute.googleapis.com/instance/gpu/inter_block_txNetwork Traffic at Inter-BlockbytesThis metric represents network traffic at the inter-block level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_inter_subblock_txcompute.googleapis.com/instance/gpu/inter_subblock_txNetwork Traffic at Inter-SubblockbytesThis metric represents network traffic at the inter-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_intra_subblock_txcompute.googleapis.com/instance/gpu/intra_subblock_txNetwork Traffic at Intra-SubblockbytesThis metric represents network traffic at the intra-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_link_carrier_changescompute.googleapis.com/instance/gpu/link_carrier_changesLink Carrier ChangescountThis metric captures the network link carrier change as delta value computed at 1 minute granularity. This metric is available for all GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs.
google_compute_instance_gpu_nccl_latency_txcompute.googleapis.com/instance/gpu/nccl/latency_txNCCL Send LatencynsThe metric measures the latency distribution of NCCL send operations.
google_compute_instance_gpu_nccl_latency_variancecompute.googleapis.com/instance/gpu/nccl/latency_varianceNCCL Send Latency VariancensThe metric measures the latency variance distribution of NCCL send operations.
google_compute_instance_gpu_nccl_message_size_txcompute.googleapis.com/instance/gpu/nccl/message_size_txNCCL Send Message SizebytesThe metric measures the message size distribution of NCCL send operations.
google_compute_instance_gpu_network_rttcompute.googleapis.com/instance/gpu/network_rttNetwork RTTmicrosecondsThis metric measures network round-trip time in your GPU VMs. This metric is available for GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs.
google_compute_instance_gpu_nvlink_active_speedcompute.googleapis.com/instance/gpu/nvlink_active_speedNVLink Active SpeedcountCurrent access link port speed in Gb/s. Supported for A4X VMs only.
google_compute_instance_gpu_nvlink_port_statecompute.googleapis.com/instance/gpu/nvlink_port_stateNVLink Port StatecountLogical and Physical port states for NVswitch ports as defined in the OpenConfig YANG model. Supported for A4X VMs only.
google_compute_instance_gpu_nvlink_runtime_errorcompute.googleapis.com/instance/gpu/nvlink_runtime_errorNVLink Runtime ErrorcountWhether an NVLink Runtime Error occurred. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_nvswitch_effective_bercompute.googleapis.com/instance/gpu/nvswitch_effective_berNVSwitch Effective BERcountEffective BER (Bit Error Rate) is the error rate of the port after FEC (Forward Error Correction). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_effective_errorscompute.googleapis.com/instance/gpu/nvswitch_effective_errorsNVSwitch Effective ErrorscountEffective error count is the number of bit errors that the port receives after FEC (Forward Error Correction). Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_link_downed_countercompute.googleapis.com/instance/gpu/nvswitch_link_downed_counterNVSwitch Link Downed CountercountThe count of link-down events on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_link_error_recovery_countercompute.googleapis.com/instance/gpu/nvswitch_link_error_recovery_counterNVSwitch Link Error Recovery CountercountThe count of successful link recovery processes on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_rcv_datacompute.googleapis.com/instance/gpu/nvswitch_port_rcv_dataNVSwitch Port Rcv DatacountTotal number of bytes received, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_rcv_errorscompute.googleapis.com/instance/gpu/nvswitch_port_rcv_errorsNVSwitch Port Rcv ErrorscountTotal number of packets containing an error that were received on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_constraint_errorscompute.googleapis.com/instance/gpu/nvswitch_port_xmit_constraint_errorsNVSwitch Port Xmit Constraint ErrorscountTotal number of packets not transmitted from the switch physical port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_datacompute.googleapis.com/instance/gpu/nvswitch_port_xmit_dataNVSwitch Port Xmit DatacountTotal number of bytes transmitted, measured as bps. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_discardscompute.googleapis.com/instance/gpu/nvswitch_port_xmit_discardsNVSwitch Port Xmit DiscardscountTotal number of outbound packets that were discarded by the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_port_xmit_waitcompute.googleapis.com/instance/gpu/nvswitch_port_xmit_waitNVSwitch Port Xmit WaitcountThe number of transmitted packets incurred transmit wait. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_statuscompute.googleapis.com/instance/gpu/nvswitch_statusNV Switch StatuscountThis metric represents the health of an individual NV Switch on the host as a string. If a machine has multiple NV Switches attached, the metric provides each NV Switch health status on the host. The possible values for this metric are provided by NVIDIA BMC. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_nvswitch_symbol_errorscompute.googleapis.com/instance/gpu/nvswitch_symbol_errorsNVSwitch Symbol ErrorscountSymbol error count is the number of bit errors that the port receives after FEC (Forward Error Correction) and PLR (Physical Layer Retransmission). Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_vl15_droppedcompute.googleapis.com/instance/gpu/nvswitch_vl15_droppedNVSwitch VL15 DroppedcountThe number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only.
google_compute_instance_gpu_nvswitch_zero_histcompute.googleapis.com/instance/gpu/nvswitch_zero_histNVSwitch Zero Histogram FECcountFirst FEC histogram bin with value of 0. Monitor max of bits errors in the FEC block occurred up to the time of measurement. Supported for A4X VMs only.
google_compute_instance_gpu_packet_retransmission_countcompute.googleapis.com/instance/gpu/packet_retransmission_countPacket Retransmission CountcountThis metric, representing the packet retransmission count observed by network interface cards (NICs) attached to GPUs on the host, is a single INT64 value per timestamp. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_pcie_correctable_error_countcompute.googleapis.com/instance/gpu/pcie_correctable_error_countCorrectable PCIe ErrorscountThe number of correctable PCIe errors.
google_compute_instance_gpu_pcie_fatal_error_countcompute.googleapis.com/instance/gpu/pcie_fatal_error_countFatal PCIe ErrorscountThe number of fatal PCIe errors.
google_compute_instance_gpu_pcie_l0_to_recovery_countcompute.googleapis.com/instance/gpu/pcie_l0_to_recovery_countPCIe L0 To Recovery CountcountThe number of times the PCIe link entered the recovery state from the L0 state.
google_compute_instance_gpu_pcie_nak_received_countcompute.googleapis.com/instance/gpu/pcie_nak_received_countPCIe NAK Received CountcountThe number of NAKs the host root complex issued on the PCIe link.
google_compute_instance_gpu_pcie_nak_sent_countcompute.googleapis.com/instance/gpu/pcie_nak_sent_countPCIe NAK Sent CountcountThe number of NAKs the GPU issued on the PCIe link.
google_compute_instance_gpu_pcie_non_fatal_error_countcompute.googleapis.com/instance/gpu/pcie_non_fatal_error_countNon Fatal PCIe ErrorscountThe number of non-fatal PCIe errors.
google_compute_instance_gpu_pcie_replay_countcompute.googleapis.com/instance/gpu/pcie_replay_countPCIe ReplayscountThe number of replays the GPU issued on the PCIe link.
google_compute_instance_gpu_pcie_replay_rollover_countcompute.googleapis.com/instance/gpu/pcie_replay_rollover_countPCIe Replay RolloverscountThe number of replay rollovers the GPU issued on the PCIe link.
google_compute_instance_gpu_power_consumptioncompute.googleapis.com/instance/gpu/power_consumptionGPU Power ConsumptioncountThis metric represents power consumption observed on individual GPUs on the host as a double value. If a machine has multiple GPUs attached, the metric provides each GPU power consumption on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_sm_utilizationcompute.googleapis.com/instance/gpu/sm_utilizationSM Utilization%This metric represents the Streaming Multiprocessor (SM) utilization of an individual GPU on the host as a percentage value. if a machine has multiple GPUs attached, the metric provides each GPU SM utilization on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_straggler_statuscompute.googleapis.com/instance/gpu/straggler_statusStraggler StatuscountThis metric indicates if a VM is identified as a Straggler node affecting the performance of an AI/ML job. This metric is supported for the A3-mega, A3-ultra and A4* VM families. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_tcpxo_receive_chunk_latencycompute.googleapis.com/instance/gpu/tcpxo_receive_chunk_latencyTCPXO Receive Chunk LatencynsThe metric measures TCPXO received chunk latency in VM. This metric is available only for A3 mega VMs.
google_compute_instance_gpu_tcpxo_send_chunk_latencycompute.googleapis.com/instance/gpu/tcpxo_send_chunk_latencyTCPXO Send Chunk LatencynsThe metric measures TCPXO send chunk latency in VM. This metric is available only for A3 mega VMs.
google_compute_instance_gpu_temperaturecompute.googleapis.com/instance/gpu/temperatureGPU TemperaturecountThis metric represents the temperature of an individual GPU on the host, as a double value. If a machine has multiple GPUs attached, the metric provides each GPU temperature on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_throughput_rx_bytescompute.googleapis.com/instance/gpu/throughput_rx_bytesThroughput Rx BytesbytesThis metric represents network throughput as an INT64 value, calculated as the delta of received bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_throughput_tx_bytescompute.googleapis.com/instance/gpu/throughput_tx_bytesThroughput Tx BytesbytesThis metric represents network throughput as an INT64 value, calculated as the delta of transferred bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_gpu_tlimitcompute.googleapis.com/instance/gpu/tlimitGPU Thermal MargincountRepresents the thermal margin of an individual GPU on the host. This metric represents the temperature in C from a software slowdown event for an individual GPU on the host as a double value. For an `n-GPU` machine, each timestamp we will have `n` values representing the thermal margin of each GPU on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds.
google_compute_instance_integrity_early_boot_validation_statuscompute.googleapis.com/instance/integrity/early_boot_validation_statusEarly Boot ValidationcountThe validation status of early boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_integrity_late_boot_validation_statuscompute.googleapis.com/instance/integrity/late_boot_validation_statusLate Boot ValidationcountThe validation status of late boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_interruption_countcompute.googleapis.com/instance/interruption_countInterruption CountcountInterruptions are system evictions of infrastructure while the customer is in control of that infrastructure. This metric is the current count of interruptions by type and reason. The stream is often undefined when the count is zero. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
google_compute_instance_memory_balloon_ram_sizecompute.googleapis.com/instance/memory/balloon/ram_sizeVM Memory TotalbytesThe total amount of memory in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_ram_usedcompute.googleapis.com/instance/memory/balloon/ram_usedVM Memory UsedbytesMemory currently used in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_swap_in_bytes_countcompute.googleapis.com/instance/memory/balloon/swap_in_bytes_countVM Swap InbytesThe amount of memory read into the guest from its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_memory_balloon_swap_out_bytes_countcompute.googleapis.com/instance/memory/balloon/swap_out_bytes_countVM Swap OutbytesThe amount of memory written from the guest to its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_received_bytes_countcompute.googleapis.com/instance/network/received_bytes_countReceived bytesbytesCount of bytes received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_received_packets_countcompute.googleapis.com/instance/network/received_packets_countReceived packetscountCount of packets received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_sent_bytes_countcompute.googleapis.com/instance/network/sent_bytes_countSent bytesbytesCount of bytes sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_network_sent_packets_countcompute.googleapis.com/instance/network/sent_packets_countSent packetscountCount of packets sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_tpu_infra_healthcompute.googleapis.com/instance/tpu/infra_healthTPU Instance HealthcountIndicates the overall health status of a TPU instance. The metric labels help identify the specific health status and reasons for issues on degraded or unhealthy TPU instances, primarily focusing on TPU hardware and system health. Health status changes may take several minutes to be reflected in this metric. Sampled every 60 seconds. After sampling, data is not visible for up to 420 seconds.
google_compute_instance_uptimecompute.googleapis.com/instance/uptimeUptimeuptimeDelta of how long the VM has been running, in seconds.googleapis.com/instance/uptime_total. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_instance_uptime_totalcompute.googleapis.com/instance/uptime_totalUptime TotalsecondsElapsed time since the VM was started, in seconds. After sampling, data is not visible for up to 120 seconds. When VM is Stopped (https://cloud.google.com/compute/docs/instances/stop-start-instance#stop-vm-google-cloud), the time is not calculated. On starting the VM again, the timer will reset to 0 for that VM. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
google_compute_intercept_intercepted_bytes_countcompute.googleapis.com/intercept/intercepted_bytes_countIntercepted bytesbytesThe number of intercepted bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_intercept_intercepted_packets_countcompute.googleapis.com/intercept/intercepted_packets_countIntercepted packetscountThe number of intercepted packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_dropped_packets_countcompute.googleapis.com/mirroring/dropped_packets_countDropped packetscountCount of dropped mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_mirrored_bytes_countcompute.googleapis.com/mirroring/mirrored_bytes_countMirrored bytesbytesCount of mirrored bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_mirroring_mirrored_packets_countcompute.googleapis.com/mirroring/mirrored_packets_countMirrored packetscountCount of mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_compute_nat_allocated_portscompute.googleapis.com/nat/allocated_portsAllocated portsportsNumber of ports allocated to a VM by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_closed_connections_countcompute.googleapis.com/nat/closed_connections_countClosed connections countconnectionsCount of connections closed over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_dropped_received_packets_countcompute.googleapis.com/nat/dropped_received_packets_countReceived packets dropped countpacketsCount of received packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_dropped_sent_packets_countcompute.googleapis.com/nat/dropped_sent_packets_countSent packets dropped countpacketsCount of sent packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_new_connections_countcompute.googleapis.com/nat/new_connections_countNew connections countconnectionsCount of new connections created over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_open_connectionscompute.googleapis.com/nat/open_connectionsOpen connectionsconnectionsNumber of connections currently open on the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_port_usagecompute.googleapis.com/nat/port_usagePort usageportsMaximum number of connections from a VM to a single internet endpoint (IP:port). Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_received_bytes_countcompute.googleapis.com/nat/received_bytes_countReceived bytes countbytesCount of bytes received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_received_packets_countcompute.googleapis.com/nat/received_packets_countReceived packets countpacketsCount of packets received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_sent_bytes_countcompute.googleapis.com/nat/sent_bytes_countSent bytes countbytesCount of bytes sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.
google_compute_nat_sent_packets_countcompute.googleapis.com/nat/sent_packets_countSent packets countpacketsCount of packets sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds.

Event support

  • Not Supported

External reference