Cloud Dataproc is a managed Apache Spark and Apache Hadoop service so you can take advantage of open source data tools for batch processing, querying, streaming, and machine learning.
Cloud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you do not need them. With less time and money spent on administration, you can focus on your jobs and your data.
Setup
To set up the Google integration and discover the Google service,
go to Google Integration Discovery Profile and select GOOGLE/Dataproc Cluster.
Supported metrics
| New OpsRamp Metric | Google Metric | Metric Display Name | Unit | Description |
|---|---|---|---|---|
| google_dataproc_cluster_capacity_deviation | dataproc.googleapis.com/cluster/capacity_deviation | Cluster capacity deviation | count | Difference between the expected node count in the cluster and the actual active YARN node managers. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_hdfs_datanodes | dataproc.googleapis.com/cluster/hdfs/datanodes | HDFS DataNodes | count | Indicates the number of HDFS DataNodes that are running inside a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_hdfs_storage_capacity | dataproc.googleapis.com/cluster/hdfs/storage_capacity | HDFS capacity | GB | Indicates capacity of HDFS system running on cluster in GB. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_hdfs_storage_utilization | dataproc.googleapis.com/cluster/hdfs/storage_utilization | HDFS storage utilization | count | The percentage of HDFS storage currently used. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_hdfs_unhealthy_blocks | dataproc.googleapis.com/cluster/hdfs/unhealthy_blocks | Unhealthy HDFS blocks by status | count | Indicates the number of unhealthy blocks inside the cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_job_completion_time | dataproc.googleapis.com/cluster/job/completion_time | Job duration | seconds | The time jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_job_duration | dataproc.googleapis.com/cluster/job/duration | Job state duration | seconds | The time jobs have spent in a given state. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_job_failed_count | dataproc.googleapis.com/cluster/job/failed_count | Failed jobs | count | Indicates the number of jobs that have failed on a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_job_running_count | dataproc.googleapis.com/cluster/job/running_count | Running jobs | count | Indicates the number of jobs that are running on a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_job_submitted_count | dataproc.googleapis.com/cluster/job/submitted_count | Submitted jobs | count | Indicates the number of jobs that have been submitted to a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_mig_instances_failed_count | dataproc.googleapis.com/cluster/mig_instances/failed_count | Dataproc Managed Instance Group Instance Errors | count | Indicates the number of instance failures for a managed instance group. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_nodes_expected | dataproc.googleapis.com/cluster/nodes/expected | Expected Nodes | count | Indicates the number of nodes that are expected in a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_nodes_failed_count | dataproc.googleapis.com/cluster/nodes/failed_count | Failed Nodes | count | Indicates the number of nodes that have failed in a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_nodes_recovered_count | dataproc.googleapis.com/cluster/nodes/recovered_count | Recovered Nodes | count | Indicates the number of nodes that are detected as failed and have been successfully removed from cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_nodes_running | dataproc.googleapis.com/cluster/nodes/running | Running Nodes | count | Indicates the number of nodes in running state. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_operation_completion_time | dataproc.googleapis.com/cluster/operation/completion_time | Operation duration | seconds | The time operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_operation_duration | dataproc.googleapis.com/cluster/operation/duration | Operation state duration | seconds | The time operations have spent in a given state. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_operation_failed_count | dataproc.googleapis.com/cluster/operation/failed_count | Failed operations | count | Indicates the number of operations that have failed on a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_operation_running_count | dataproc.googleapis.com/cluster/operation/running_count | Running operations | count | Indicates the number of operations that are running on a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_operation_submitted_count | dataproc.googleapis.com/cluster/operation/submitted_count | Submitted operations | count | Indicates the number of operations that have been submitted to a cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_allocated_memory_percentage | dataproc.googleapis.com/cluster/yarn/allocated_memory_percentage | YARN allocated memory percentage | count | The percentage of YARN memory is allocated. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_apps | dataproc.googleapis.com/cluster/yarn/apps | YARN active applications | count | Indicates the number of active YARN applications. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_containers | dataproc.googleapis.com/cluster/yarn/containers | YARN containers | count | Indicates the number of YARN containers. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_memory_size | dataproc.googleapis.com/cluster/yarn/memory_size | YARN memory size | GB | Indicates the YARN memory size in GB. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_nodemanagers | dataproc.googleapis.com/cluster/yarn/nodemanagers | YARN NodeManagers | count | Indicates the number of YARN NodeManagers running inside cluster. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_pending_memory_size | dataproc.googleapis.com/cluster/yarn/pending_memory_size | YARN pending memory size | GB | The current memory request, in GB, that is pending to be fulfilled by the scheduler. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
| google_dataproc_cluster_yarn_virtual_cores | dataproc.googleapis.com/cluster/yarn/virtual_cores | YARN virtual cores | count | Indicates the number of virtual cores in YARN. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds. |
Event support
- Not supported