Compute Engine instances can run the public images for Linux and Windows Server that Google provides and private custom images that you can create or import from your existing systems. You can also deploy Docker containers, which are automatically launched on instances running the Container-Optimized OS public image.
You can choose the machine properties of your instances, such as the number of virtual CPUs and the amount of memory, by using a set of predefined machine types or by creating your own custom machine types.
Each instance belongs to a Google Cloud Console project, and a project can have one or more instances.
- When you create an instance in a project, you specify the zone, operating system, and machine type of that instance.
- When you delete an instance, it is removed from the project.
Setup
To set up the Google integration and discover the Google service,
go to Google Integration Discovery Profile,
select Instances > Perform Actions and check Manage Devices.
Supported metrics
| Opsramp Metric | Google Metric | Metric Display Name | Unit | Description |
|---|---|---|---|---|
| google_compute_firewall_dropped_bytes_count | compute.googleapis.com/firewall/dropped_bytes_count | Dropped bytes | bytes | Count of incoming bytes dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_firewall_dropped_packets_count | compute.googleapis.com/firewall/dropped_packets_count | Dropped packets | count | Count of incoming packets dropped by the firewall. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_guest_cpu_runnable_task_count | compute.googleapis.com/guest/cpu/runnable_task_count | Runnable task count. | count | The average number of runnable tasks in the run-queue. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_cpu_usage_time | compute.googleapis.com/guest/cpu/usage_time | CPU usage | CPU seconds | CPU usage, in seconds. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_bytes_used | compute.googleapis.com/guest/disk/bytes_used | Disk usage | bytes | Number of bytes used on disk for file systems. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_io_time | compute.googleapis.com/guest/disk/io_time | IO Time | ms | The cumulative time spent on the I/O operations that are in progress; that is, the actual time in queue and when disks were busy. Requests issued in parallel are counted as a single one. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_disk_merged_operation_count | compute.googleapis.com/guest/disk/merged_operation_count | Merged disk operations | count | Merged disk operations count. Disk operations which are adjacent to each other may be merged by the kernel for efficiency. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_operation_bytes_count | compute.googleapis.com/guest/disk/operation_bytes_count | Disk bytes transferred | bytes | Bytes transferred in disk operations. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_operation_count | compute.googleapis.com/guest/disk/operation_count | Disk operations | count | Disk operations count. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_operation_time | compute.googleapis.com/guest/disk/operation_time | Disk operation time | ms | Amount of time spent on the disk operations, by direction. This metric only includes time spent on completed operations. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_disk_percent_used | compute.googleapis.com/guest/disk/percent_used | Percent Used | % | Percentage of total disk capacity currently in use. |
| google_compute_guest_disk_queue_length | compute.googleapis.com/guest/disk/queue_length | Queue Length | count | The queue length on the disk averaged over the last 60 seconds. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_disk_weighted_io_time | compute.googleapis.com/guest/disk/weighted_io_time | IO Time | ms | The cumulative weighted IO time spent on the disk. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_memory_anonymous_used | compute.googleapis.com/guest/memory/anonymous_used | Anonymous memory usage in Bytes | bytes | Anonymous memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_memory_bytes_used | compute.googleapis.com/guest/memory/bytes_used | Memory usage in Bytes | bytes | Memory usage by each memory state, in Bytes. Summing values of all states yields the total memory on the machine. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_memory_dirty_used | compute.googleapis.com/guest/memory/dirty_used | Dirty pages usage in Bytes. | bytes | Dirty pages usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_memory_page_cache_used | compute.googleapis.com/guest/memory/page_cache_used | Page cache memory usage in Bytes | bytes | Page cache memory usage, in Bytes. Summing values of all states yields the total anonymous memory used. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_memory_percent_used | compute.googleapis.com/guest/memory/percent_used | Percent Used | % | Percentage of total system memory actively in use. Calculated as (Total Memory - Free Memory - Buffers - Cached - Slab) / Total Memory * 100. |
| google_compute_guest_memory_unevictable_used | compute.googleapis.com/guest/memory/unevictable_used | Unevictable memory usage in Bytes | bytes | Unevictable memory usage, in Bytes. For Container-Optimized OS, or Ubuntu running GKE. |
| google_compute_guest_system_os_feature_enabled | compute.googleapis.com/guest/system/os_feature_enabled | OS Feature | count | OS Features like GPU support, KTD kernel, third party modules as unknown modules. 1 if the feature is enabled and 0, if disabled. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_system_problem_count | compute.googleapis.com/guest/system/problem_count | Problem Count | count | Number of times a machine problem has happened. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_system_problem_state | compute.googleapis.com/guest/system/problem_state | Problem State | count | Whether a problem is affecting the system or not. The problem is affecting the system when set to 1 and is not affecting the system when set to 0. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_guest_system_uptime | compute.googleapis.com/guest/system/uptime | Uptime | seconds | Number of seconds that the operating system has been running for. For Container-Optimized OS, or Ubuntu running GKE. Sampled every 60 seconds. |
| google_compute_instance_clock_accuracy_ptp_kvm_nanosecond_accuracy | compute.googleapis.com/instance/clock_accuracy/ptp_kvm/nanosecond_accuracy | Clock Accuracy | ns | Accuracy of the host clock in nanoseconds. |
| google_compute_instance_cpu_guest_visible_vcpus | compute.googleapis.com/instance/cpu/guest_visible_vcpus | Guest Visible vCPUs | count | Number of vCPUs visible inside the guest. For many GCE machine types, the number of vCPUs visible inside the guest is equal to the `compute.googleapis.com/instance/cpu/reserved_cores` metric. For shared-core machine types, the number of guest-visible vCPUs differs from the number of resereved cores. For example, e2-small instances have two vCPUs visible inside the guest and 0.5 fractional vCPUs reserved. Therefore, for an e2-small instance, `compute.googleapis.com/instance/cpu/guest_visible_vcpus` has a value of 2 and `compute.googleapis.com/instance/cpu/reserved_cores` has a value of 0.5. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_cpu_reserved_cores | compute.googleapis.com/instance/cpu/reserved_cores | Reserved vCPUs | count | Number of vCPUs reserved on the host of the instance. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_cpu_scheduler_wait_time | compute.googleapis.com/instance/cpu/scheduler_wait_time | Scheduler Wait Time | idle seconds | Wait time is the time a vCPU is ready to run, but unexpectedly not scheduled to run. The wait time returned here is the accumulated value for all vCPUs. The time interval for which the value was measured is returned by Monitoring in whole seconds as start_time and end_time. This metric is only available for VMs that belong to the e2 family or to overcommitted VMs on sole-tenant nodes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_cpu_usage_time | compute.googleapis.com/instance/cpu/usage_time | CPU usage | CPU seconds | Delta vCPU usage for all vCPUs, in vCPU-seconds. To compute the per-vCPU utilization fraction, divide this value by (end-start)*N, where end and start define this value's time interval and N is `compute.googleapis.com/instance/cpu/reserved_cores` at the end of the interval. This value is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/usage_time`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_cpu_utilization | compute.googleapis.com/instance/cpu/utilization | CPU utilization | % | Fractional utilization of allocated CPU on this instance. Values are typically numbers between 0.0 and 1.0 (but some machine types allow bursting above 1.0). Charts display the values as a percentage between 0% and 100% (or more). This metric is reported by the hypervisor for the VM and can differ from `agent.googleapis.com/cpu/utilization`, which is reported from inside the VM. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_average_io_latency | compute.googleapis.com/instance/disk/average_io_latency | Disk average latency | microseconds | Disk's average io latency in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_average_io_queue_depth | compute.googleapis.com/instance/disk/average_io_queue_depth | Disk average io queue depth | count | Disk's average io queue depth in the last 60s. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_max_read_bytes_count | compute.googleapis.com/instance/disk/max_read_bytes_count | Peak disk read bytes | bytes | Disk's maximum per-second read throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_max_read_ops_count | compute.googleapis.com/instance/disk/max_read_ops_count | Peak disk read ops | count | Disk's maximum per-second read requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_max_write_bytes_count | compute.googleapis.com/instance/disk/max_write_bytes_count | Peak disk write bytes | bytes | Disk's maximum per-second write throughput over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_max_write_ops_count | compute.googleapis.com/instance/disk/max_write_ops_count | Peak disk write ops | count | Disk's maximum per-second write requests count over a period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_performance_status | compute.googleapis.com/instance/disk/performance_status | Disk performance status | count | Whether the disk performance is normal or could potentially be impacted by an issue within Compute Engine during the period of time specified by the user. The period must be one minute or longer. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_provisioning_iops | compute.googleapis.com/instance/disk/provisioning/iops | Provisioned disk IOPS | count | Disk's provisioned IOPS specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_provisioning_size | compute.googleapis.com/instance/disk/provisioning/size | Provisioned disk size | bytes | Disk's provisioned size specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_provisioning_throughput | compute.googleapis.com/instance/disk/provisioning/throughput | Instance Disk Provisioning Throughput | count | Disk's provisioned throughput (bytes/sec) specified by the user. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_read_bytes_count | compute.googleapis.com/instance/disk/read_bytes_count | Disk read bytes | bytes | Count of bytes read from disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_read_ops_count | compute.googleapis.com/instance/disk/read_ops_count | Disk read operations | count | Count of disk read IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_write_bytes_count | compute.googleapis.com/instance/disk/write_bytes_count | Disk write bytes | bytes | Count of bytes written to disk. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_disk_write_ops_count | compute.googleapis.com/instance/disk/write_ops_count | Disk write operations | count | Count of disk write IO operations. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_gpu_accumulated_context_utilization_seconds | compute.googleapis.com/instance/gpu/accumulated_context_utilization_seconds | Accumulated Context Utilization Seconds | count | Accumulated context utilization time (in seconds). |
| google_compute_instance_gpu_cache_correctable_ecc_error_count | compute.googleapis.com/instance/gpu/cache_correctable_ecc_error_count | Correctable Cache ECC Errors | count | The number of correctable ECC errors in cache memory. |
| google_compute_instance_gpu_cache_uncorrectable_ecc_error_count | compute.googleapis.com/instance/gpu/cache_uncorrectable_ecc_error_count | Uncorrectable Cache ECC Errors | count | The number of uncorrectable ECC errors in cache memory. |
| google_compute_instance_gpu_dram_correctable_ecc_error_count | compute.googleapis.com/instance/gpu/dram_correctable_ecc_error_count | Correctable DRAM ECC Errors | count | The number of correctable ECC errors in GPU DRAMs. |
| google_compute_instance_gpu_dram_correctable_row_remapping_count | compute.googleapis.com/instance/gpu/dram_correctable_row_remapping_count | Correctable DRAM Row Remapping Count | count | The number of row remappings from correctable errors in GPU DRAMs. |
| google_compute_instance_gpu_dram_row_remapping_failed | compute.googleapis.com/instance/gpu/dram_row_remapping_failed | DRAM Row Remapping Failed | count | Whether row remapping failed previously. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_dram_row_remapping_pending | compute.googleapis.com/instance/gpu/dram_row_remapping_pending | DRAM Row Remapping Pending | count | Whether row remapping is set to occur at the next GPU reset. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_dram_uncorrectable_ecc_error_count | compute.googleapis.com/instance/gpu/dram_uncorrectable_ecc_error_count | Uncorrectable DRAM ECC Errors | count | The number of uncorrectable ECC errors in GPU DRAMs. |
| google_compute_instance_gpu_dram_uncorrectable_row_remapping_count | compute.googleapis.com/instance/gpu/dram_uncorrectable_row_remapping_count | Uncorrectable DRAM Row Remapping Count | count | The number of row remappings from uncorrectable errors in GPU DRAMs. |
| google_compute_instance_gpu_failure_prediction_status | compute.googleapis.com/instance/gpu/failure_prediction_status | VM Degradation Status | count | This metric indicates the probability of a VM entering a degraded state within the next 5 hours as predicted by our proprietary algorithm. Value label for this metric would be NO_DEGRADATION_PREDICTED, DEGRADATION_PREDICTED, POSSIBLE_DEGRADATION_PREDICTED. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_gpu_nvlink_active_speed | compute.googleapis.com/instance/gpu/gpu_nvlink_active_speed | GPU NVLink Port Active Speed | count | Current NVLink port speed in Gbps. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_effective_ber | compute.googleapis.com/instance/gpu/gpu_nvlink_effective_ber | GPU NVLink Effective BER | count | Effective bit error rate (BER) is the error rate of the port after a forward error correction (FEC). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_link_downed_counter | compute.googleapis.com/instance/gpu/gpu_nvlink_link_downed_counter | GPU NVLink Port Link Downed Counter | count | The number of link-down events on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_link_error_recovery_count | compute.googleapis.com/instance/gpu/gpu_nvlink_link_error_recovery_count | GPU NVLink Port Link Error Recovery Counter | count | The number of successful link recovery processes. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_physical_effective_errors | compute.googleapis.com/instance/gpu/gpu_nvlink_physical_effective_errors | GPU NVLink Port Physical Effective Errors | count | Effective error count is the number of bit errors that the port receives post-Forward Error Correction (FEC). Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_port_rcv_data | compute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_data | GPU NVLink Port Rcv Data | count | Total number of bytes received, measured as bps. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_port_rcv_errors | compute.googleapis.com/instance/gpu/gpu_nvlink_port_rcv_errors | GPU NVLink Port RCV Errors | count | Total number of packets containing an error that were received on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_port_xmit_data | compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_data | GPU NVLink Port Xmit Data | count | Total number of bytes transmitted, measured as bps. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_port_xmit_discards | compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_discards | GPU NVLink Port Xmit Discards | count | Total number of outbound packets that were discarded by the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_port_xmit_wait | compute.googleapis.com/instance/gpu/gpu_nvlink_port_xmit_wait | GPU NVLink Port Xmit Wait | count | The number of transmitted packets incurred transmit wait. Supported for A4X VMs only. |
| google_compute_instance_gpu_gpu_nvlink_vl15_dropped | compute.googleapis.com/instance/gpu/gpu_nvlink_vl15_dropped | GPU NVLink Port VL15 Dropped | count | The number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_infra_health | compute.googleapis.com/instance/gpu/infra_health | VM Infra Health | count | This metric captures the infrastructure health of the VM as a string. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_inter_block_tx | compute.googleapis.com/instance/gpu/inter_block_tx | Network Traffic at Inter-Block | bytes | This metric represents network traffic at the inter-block level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_inter_subblock_tx | compute.googleapis.com/instance/gpu/inter_subblock_tx | Network Traffic at Inter-Subblock | bytes | This metric represents network traffic at the inter-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_intra_subblock_tx | compute.googleapis.com/instance/gpu/intra_subblock_tx | Network Traffic at Intra-Subblock | bytes | This metric represents network traffic at the intra-subblock level as an INT64 value, calculated as the delta of observed traffic within a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_link_carrier_changes | compute.googleapis.com/instance/gpu/link_carrier_changes | Link Carrier Changes | count | This metric captures the network link carrier change as delta value computed at 1 minute granularity. This metric is available for all GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs. |
| google_compute_instance_gpu_nccl_latency_tx | compute.googleapis.com/instance/gpu/nccl/latency_tx | NCCL Send Latency | ns | The metric measures the latency distribution of NCCL send operations. |
| google_compute_instance_gpu_nccl_latency_variance | compute.googleapis.com/instance/gpu/nccl/latency_variance | NCCL Send Latency Variance | ns | The metric measures the latency variance distribution of NCCL send operations. |
| google_compute_instance_gpu_nccl_message_size_tx | compute.googleapis.com/instance/gpu/nccl/message_size_tx | NCCL Send Message Size | bytes | The metric measures the message size distribution of NCCL send operations. |
| google_compute_instance_gpu_network_rtt | compute.googleapis.com/instance/gpu/network_rtt | Network RTT | microseconds | This metric measures network round-trip time in your GPU VMs. This metric is available for GPU VM machine types starting with A3 mega, A3 ultra, A4 and all future GPU VM families except Spot VMs. |
| google_compute_instance_gpu_nvlink_active_speed | compute.googleapis.com/instance/gpu/nvlink_active_speed | NVLink Active Speed | count | Current access link port speed in Gb/s. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvlink_port_state | compute.googleapis.com/instance/gpu/nvlink_port_state | NVLink Port State | count | Logical and Physical port states for NVswitch ports as defined in the OpenConfig YANG model. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvlink_runtime_error | compute.googleapis.com/instance/gpu/nvlink_runtime_error | NVLink Runtime Error | count | Whether an NVLink Runtime Error occurred. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_nvswitch_effective_ber | compute.googleapis.com/instance/gpu/nvswitch_effective_ber | NVSwitch Effective BER | count | Effective BER (Bit Error Rate) is the error rate of the port after FEC (Forward Error Correction). The value indicates the overall average BER since the last counter reset. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_effective_errors | compute.googleapis.com/instance/gpu/nvswitch_effective_errors | NVSwitch Effective Errors | count | Effective error count is the number of bit errors that the port receives after FEC (Forward Error Correction). Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_link_downed_counter | compute.googleapis.com/instance/gpu/nvswitch_link_downed_counter | NVSwitch Link Downed Counter | count | The count of link-down events on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_link_error_recovery_counter | compute.googleapis.com/instance/gpu/nvswitch_link_error_recovery_counter | NVSwitch Link Error Recovery Counter | count | The count of successful link recovery processes on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_rcv_data | compute.googleapis.com/instance/gpu/nvswitch_port_rcv_data | NVSwitch Port Rcv Data | count | Total number of bytes received, measured as bps. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_rcv_errors | compute.googleapis.com/instance/gpu/nvswitch_port_rcv_errors | NVSwitch Port Rcv Errors | count | Total number of packets containing an error that were received on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_xmit_constraint_errors | compute.googleapis.com/instance/gpu/nvswitch_port_xmit_constraint_errors | NVSwitch Port Xmit Constraint Errors | count | Total number of packets not transmitted from the switch physical port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_xmit_data | compute.googleapis.com/instance/gpu/nvswitch_port_xmit_data | NVSwitch Port Xmit Data | count | Total number of bytes transmitted, measured as bps. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_xmit_discards | compute.googleapis.com/instance/gpu/nvswitch_port_xmit_discards | NVSwitch Port Xmit Discards | count | Total number of outbound packets that were discarded by the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_port_xmit_wait | compute.googleapis.com/instance/gpu/nvswitch_port_xmit_wait | NVSwitch Port Xmit Wait | count | The number of transmitted packets incurred transmit wait. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_status | compute.googleapis.com/instance/gpu/nvswitch_status | NV Switch Status | count | This metric represents the health of an individual NV Switch on the host as a string. If a machine has multiple NV Switches attached, the metric provides each NV Switch health status on the host. The possible values for this metric are provided by NVIDIA BMC. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_nvswitch_symbol_errors | compute.googleapis.com/instance/gpu/nvswitch_symbol_errors | NVSwitch Symbol Errors | count | Symbol error count is the number of bit errors that the port receives after FEC (Forward Error Correction) and PLR (Physical Layer Retransmission). Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_vl15_dropped | compute.googleapis.com/instance/gpu/nvswitch_vl15_dropped | NVSwitch VL15 Dropped | count | The number of management (VL15) packets that were dropped due to a lack of resources on the port. Supported for A4X VMs only. |
| google_compute_instance_gpu_nvswitch_zero_hist | compute.googleapis.com/instance/gpu/nvswitch_zero_hist | NVSwitch Zero Histogram FEC | count | First FEC histogram bin with value of 0. Monitor max of bits errors in the FEC block occurred up to the time of measurement. Supported for A4X VMs only. |
| google_compute_instance_gpu_packet_retransmission_count | compute.googleapis.com/instance/gpu/packet_retransmission_count | Packet Retransmission Count | count | This metric, representing the packet retransmission count observed by network interface cards (NICs) attached to GPUs on the host, is a single INT64 value per timestamp. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_pcie_correctable_error_count | compute.googleapis.com/instance/gpu/pcie_correctable_error_count | Correctable PCIe Errors | count | The number of correctable PCIe errors. |
| google_compute_instance_gpu_pcie_fatal_error_count | compute.googleapis.com/instance/gpu/pcie_fatal_error_count | Fatal PCIe Errors | count | The number of fatal PCIe errors. |
| google_compute_instance_gpu_pcie_l0_to_recovery_count | compute.googleapis.com/instance/gpu/pcie_l0_to_recovery_count | PCIe L0 To Recovery Count | count | The number of times the PCIe link entered the recovery state from the L0 state. |
| google_compute_instance_gpu_pcie_nak_received_count | compute.googleapis.com/instance/gpu/pcie_nak_received_count | PCIe NAK Received Count | count | The number of NAKs the host root complex issued on the PCIe link. |
| google_compute_instance_gpu_pcie_nak_sent_count | compute.googleapis.com/instance/gpu/pcie_nak_sent_count | PCIe NAK Sent Count | count | The number of NAKs the GPU issued on the PCIe link. |
| google_compute_instance_gpu_pcie_non_fatal_error_count | compute.googleapis.com/instance/gpu/pcie_non_fatal_error_count | Non Fatal PCIe Errors | count | The number of non-fatal PCIe errors. |
| google_compute_instance_gpu_pcie_replay_count | compute.googleapis.com/instance/gpu/pcie_replay_count | PCIe Replays | count | The number of replays the GPU issued on the PCIe link. |
| google_compute_instance_gpu_pcie_replay_rollover_count | compute.googleapis.com/instance/gpu/pcie_replay_rollover_count | PCIe Replay Rollovers | count | The number of replay rollovers the GPU issued on the PCIe link. |
| google_compute_instance_gpu_power_consumption | compute.googleapis.com/instance/gpu/power_consumption | GPU Power Consumption | count | This metric represents power consumption observed on individual GPUs on the host as a double value. If a machine has multiple GPUs attached, the metric provides each GPU power consumption on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_sm_utilization | compute.googleapis.com/instance/gpu/sm_utilization | SM Utilization | % | This metric represents the Streaming Multiprocessor (SM) utilization of an individual GPU on the host as a percentage value. if a machine has multiple GPUs attached, the metric provides each GPU SM utilization on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_straggler_status | compute.googleapis.com/instance/gpu/straggler_status | Straggler Status | count | This metric indicates if a VM is identified as a Straggler node affecting the performance of an AI/ML job. This metric is supported for the A3-mega, A3-ultra and A4* VM families. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_tcpxo_receive_chunk_latency | compute.googleapis.com/instance/gpu/tcpxo_receive_chunk_latency | TCPXO Receive Chunk Latency | ns | The metric measures TCPXO received chunk latency in VM. This metric is available only for A3 mega VMs. |
| google_compute_instance_gpu_tcpxo_send_chunk_latency | compute.googleapis.com/instance/gpu/tcpxo_send_chunk_latency | TCPXO Send Chunk Latency | ns | The metric measures TCPXO send chunk latency in VM. This metric is available only for A3 mega VMs. |
| google_compute_instance_gpu_temperature | compute.googleapis.com/instance/gpu/temperature | GPU Temperature | count | This metric represents the temperature of an individual GPU on the host, as a double value. If a machine has multiple GPUs attached, the metric provides each GPU temperature on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_throughput_rx_bytes | compute.googleapis.com/instance/gpu/throughput_rx_bytes | Throughput Rx Bytes | bytes | This metric represents network throughput as an INT64 value, calculated as the delta of received bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_throughput_tx_bytes | compute.googleapis.com/instance/gpu/throughput_tx_bytes | Throughput Tx Bytes | bytes | This metric represents network throughput as an INT64 value, calculated as the delta of transferred bytes at a one-minute interval. This metric is available for GPU VM machine types starting with A3 mega and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_gpu_tlimit | compute.googleapis.com/instance/gpu/tlimit | GPU Thermal Margin | count | Represents the thermal margin of an individual GPU on the host. This metric represents the temperature in C from a software slowdown event for an individual GPU on the host as a double value. For an `n-GPU` machine, each timestamp we will have `n` values representing the thermal margin of each GPU on the host. This metric is available for all GPU VM machine types starting with A3 mega, A3 edge, A3 high and all future GPU VM families except Spot VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 540 seconds. |
| google_compute_instance_integrity_early_boot_validation_status | compute.googleapis.com/instance/integrity/early_boot_validation_status | Early Boot Validation | count | The validation status of early boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_integrity_late_boot_validation_status | compute.googleapis.com/instance/integrity/late_boot_validation_status | Late Boot Validation | count | The validation status of late boot integrity policy. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_interruption_count | compute.googleapis.com/instance/interruption_count | Interruption Count | count | Interruptions are system evictions of infrastructure while the customer is in control of that infrastructure. This metric is the current count of interruptions by type and reason. The stream is often undefined when the count is zero. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_compute_instance_memory_balloon_ram_size | compute.googleapis.com/instance/memory/balloon/ram_size | VM Memory Total | bytes | The total amount of memory in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_memory_balloon_ram_used | compute.googleapis.com/instance/memory/balloon/ram_used | VM Memory Used | bytes | Memory currently used in the VM. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_memory_balloon_swap_in_bytes_count | compute.googleapis.com/instance/memory/balloon/swap_in_bytes_count | VM Swap In | bytes | The amount of memory read into the guest from its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_memory_balloon_swap_out_bytes_count | compute.googleapis.com/instance/memory/balloon/swap_out_bytes_count | VM Swap Out | bytes | The amount of memory written from the guest to its own swap space. This metric is only available for VMs that belong to the e2 family. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_network_received_bytes_count | compute.googleapis.com/instance/network/received_bytes_count | Received bytes | bytes | Count of bytes received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_network_received_packets_count | compute.googleapis.com/instance/network/received_packets_count | Received packets | count | Count of packets received from the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_network_sent_bytes_count | compute.googleapis.com/instance/network/sent_bytes_count | Sent bytes | bytes | Count of bytes sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_network_sent_packets_count | compute.googleapis.com/instance/network/sent_packets_count | Sent packets | count | Count of packets sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_tpu_infra_health | compute.googleapis.com/instance/tpu/infra_health | TPU Instance Health | count | Indicates the overall health status of a TPU instance. The metric labels help identify the specific health status and reasons for issues on degraded or unhealthy TPU instances, primarily focusing on TPU hardware and system health. Health status changes may take several minutes to be reflected in this metric. Sampled every 60 seconds. After sampling, data is not visible for up to 420 seconds. |
| google_compute_instance_uptime | compute.googleapis.com/instance/uptime | Uptime | uptime | Delta of how long the VM has been running, in seconds.googleapis.com/instance/uptime_total. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_instance_uptime_total | compute.googleapis.com/instance/uptime_total | Uptime Total | seconds | Elapsed time since the VM was started, in seconds. After sampling, data is not visible for up to 120 seconds. When VM is Stopped (https://cloud.google.com/compute/docs/instances/stop-start-instance#stop-vm-google-cloud), the time is not calculated. On starting the VM again, the timer will reset to 0 for that VM. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_compute_intercept_intercepted_bytes_count | compute.googleapis.com/intercept/intercepted_bytes_count | Intercepted bytes | bytes | The number of intercepted bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_intercept_intercepted_packets_count | compute.googleapis.com/intercept/intercepted_packets_count | Intercepted packets | count | The number of intercepted packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_mirroring_dropped_packets_count | compute.googleapis.com/mirroring/dropped_packets_count | Dropped packets | count | Count of dropped mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_mirroring_mirrored_bytes_count | compute.googleapis.com/mirroring/mirrored_bytes_count | Mirrored bytes | bytes | Count of mirrored bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_mirroring_mirrored_packets_count | compute.googleapis.com/mirroring/mirrored_packets_count | Mirrored packets | count | Count of mirrored packets. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds. |
| google_compute_nat_allocated_ports | compute.googleapis.com/nat/allocated_ports | Allocated ports | ports | Number of ports allocated to a VM by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_closed_connections_count | compute.googleapis.com/nat/closed_connections_count | Closed connections count | connections | Count of connections closed over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_dropped_received_packets_count | compute.googleapis.com/nat/dropped_received_packets_count | Received packets dropped count | packets | Count of received packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_dropped_sent_packets_count | compute.googleapis.com/nat/dropped_sent_packets_count | Sent packets dropped count | packets | Count of sent packets dropped by the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_new_connections_count | compute.googleapis.com/nat/new_connections_count | New connections count | connections | Count of new connections created over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_open_connections | compute.googleapis.com/nat/open_connections | Open connections | connections | Number of connections currently open on the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_port_usage | compute.googleapis.com/nat/port_usage | Port usage | ports | Maximum number of connections from a VM to a single internet endpoint (IP:port). Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_received_bytes_count | compute.googleapis.com/nat/received_bytes_count | Received bytes count | bytes | Count of bytes received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_received_packets_count | compute.googleapis.com/nat/received_packets_count | Received packets count | packets | Count of packets received (destination -> VM) via the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_sent_bytes_count | compute.googleapis.com/nat/sent_bytes_count | Sent bytes count | bytes | Count of bytes sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
| google_compute_nat_sent_packets_count | compute.googleapis.com/nat/sent_packets_count | Sent packets count | packets | Count of packets sent (VM -> destination) over the NAT gateway. Sampled every 60 seconds. After sampling, data is not visible for up to 165 seconds. |
Event support
- Not Supported