Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real-time) and batch (historical) modes with equal reliability and expressiveness – no more complex workarounds or compromises needed. With its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use.

Cloud Dataflow unlocks transformational use cases across industries, including:

  • Check Clickstream, Point-of-Sale, and segmentation analysis in retail.
  • Check Fraud detection in financial services.
  • Check Personalized user experience in gaming.
  • Check IoT analytics in manufacturing, healthcare, and logistics.

Setup

To set up the Google integration and discover the Google service, go to Google Integration Discovery Profile and select GOOGLE/Dataflow Job.

Supported metrics

New OpsRamp MetricGoogle MetricMetric Display NameUnitDescription
google_dataflow_job_backlog_bytesdataflow.googleapis.com/job/backlog_bytesPer-stage backlog in bytesbytesAmount of known, unprocessed input for a stage, in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_backlog_elementsdataflow.googleapis.com/job/backlog_elementsPer-stage backlog in elementscountAmount of known, unprocessed input for a stage, in elements. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_billable_shuffle_data_processeddataflow.googleapis.com/job/billable_shuffle_data_processedBillable shuffle data processedbytesThe billable bytes of shuffle data processed by this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_bundle_user_processing_latenciesdataflow.googleapis.com/job/bundle_user_processing_latenciesBundle user processing latenciesmsBundle user processing latencies from a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_current_num_vcpusdataflow.googleapis.com/job/current_num_vcpusCurrent number of vCPUs in usecountThe number of vCPUs currently being used by this Dataflow job. This is the current number of workers times the number of vCPUs per worker. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_current_shuffle_slotsdataflow.googleapis.com/job/current_shuffle_slotsCurrent shuffle slots in usecountThe current shuffle slots used by this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_data_watermark_agedataflow.googleapis.com/job/data_watermark_ageData watermark lagsecondsThe age (time since event timestamp) up to which all data has been processed by the pipeline. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_disk_space_capacitydataflow.googleapis.com/job/disk_space_capacityDisk Space CapacitybytesThe amount of persistent disk currently being allocated to all workers associated with this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_dofn_latency_averagedataflow.googleapis.com/job/dofn_latency_averageAverage message processing time per DoFn.msThe average processing time for a single message in a given DoFn (over the past 3 min window). Note that this includes time spent in GetData calls. Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_dofn_latency_maxdataflow.googleapis.com/job/dofn_latency_maxMaximum message processing time per DoFn.msThe maximum processing time for a single message in a given DoFn (over the past 3 min window). Note that this includes time spent in GetData calls. Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_dofn_latency_mindataflow.googleapis.com/job/dofn_latency_minMinimum message processing time per DoFn.msThe minimum processing time for a single message in a given DoFn (over the past 3 min window). Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_dofn_latency_num_messagesdataflow.googleapis.com/job/dofn_latency_num_messagesNumber of messages processed per DoFn.countThe number of messages processed by a given DoFn (over the past 3 min window). Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_dofn_latency_totaldataflow.googleapis.com/job/dofn_latency_totalTotal message processing time per DoFn.msThe total processing time for all messages in a given DoFn (over the past 3 min window). Note that this includes time spent in GetData calls. Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_duplicates_filtered_out_countdataflow.googleapis.com/job/duplicates_filtered_out_countDuplicate message count per stagecountThe number of messages being processed by a particular stage that have been filtered out as duplicates. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_elapsed_timedataflow.googleapis.com/job/elapsed_timeElapsed timesecondsDuration that the current run of this pipeline has been in the Running state so far, in seconds. When a run completes, this stays at the duration of that run until the next run starts. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_element_countdataflow.googleapis.com/job/element_countElement countcountNumber of elements added to the pcollection so far. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_estimated_byte_countdataflow.googleapis.com/job/estimated_byte_countEstimated byte countbytesAn estimated number of bytes added to the pcollection so far. Dataflow calculates the average encoded size of elements in a pcollection and mutiplies it by the number of elements. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_estimated_bytes_activedataflow.googleapis.com/job/estimated_bytes_activeActive SizebytesEstimated number of bytes active in this stage of the job.
google_dataflow_job_estimated_bytes_consumed_countdataflow.googleapis.com/job/estimated_bytes_consumed_countThroughputbytesEstimated number of bytes consumed by the stage of this job.
google_dataflow_job_estimated_bytes_produced_countdataflow.googleapis.com/job/estimated_bytes_produced_countEstimated Bytes ProducedcountThe estimated total byte size of elements produced by each PTransform. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_is_faileddataflow.googleapis.com/job/is_failedFailedcountA value of 1 indicates that the job has failed. This metric isn't recorded for jobs that fail before launch. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_max_worker_instances_limitdataflow.googleapis.com/job/max_worker_instances_limitAutoscaling worker instances ceilingcountThe maximum number of workers autoscaling is allowed to request.
google_dataflow_job_memory_capacitydataflow.googleapis.com/job/memory_capacityMemory CapacitybytesThe amount of memory currently being allocated to all workers associated with this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_min_worker_instances_limitdataflow.googleapis.com/job/min_worker_instances_limitAutoscaling worker instances flooringcountThe minimum number of workers autoscaling is allowed to request.
google_dataflow_job_oldest_active_message_agedataflow.googleapis.com/job/oldest_active_message_ageOldest active message processing time per DoFn.msHow long the oldest active message in a DoFn has been processing for. Available for jobs running on Streaming Engine on the Legacy Runner. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_per_stage_data_watermark_agedataflow.googleapis.com/job/per_stage_data_watermark_agePer-stage data watermark lagsecondsThe age (time since event timestamp) up to which all data has been processed by this stage of the pipeline. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_per_stage_system_lagdataflow.googleapis.com/job/per_stage_system_lagPer-stage system lagsecondsThe current maximum duration that an item of data has been processing or awaiting processing in seconds, per pipeline stage. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_processing_parallelism_keysdataflow.googleapis.com/job/processing_parallelism_keysThe approximate number of parallel processing keyscountApproximate number of keys in use for data processing for each stage. Processing for any given key is serialized, so the total number of keys for a stage represents the maximum available parallelism at that stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_late_messages_countdataflow.googleapis.com/job/pubsub/late_messages_countJob Pubsub Late Messages CountcountThe number of messages from Pub/Sub with timestamp older than the estimated watermark. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_published_messages_countdataflow.googleapis.com/job/pubsub/published_messages_countJob Pubsub Published Messages CountcountThe number of Pub/Sub messages published broken down by topic and status. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_pulled_message_agesdataflow.googleapis.com/job/pubsub/pulled_message_agesJob Pubsub Pulled Message AgesmsThe distribution of pulled but unacked Pub/Sub message ages. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_read_countdataflow.googleapis.com/job/pubsub/read_countPubsubIO.Read requests from Dataflow jobscountPub/Sub Pull Requests. For Streaming Engine, this metric is deprecated. See the "Using the Dataflow monitoring interface" page for upcoming changes. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_streaming_pull_connection_statusdataflow.googleapis.com/job/pubsub/streaming_pull_connection_statusJob Pubsub Streaming Pull Connection Status%Percentage of all Streaming Pull connections that are either active (OK status) or terminated because of an error (non-OK status). When a connection is terminated, Dataflow will wait some time before attempting to re-connect. For Streaming Engine only. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_write_countdataflow.googleapis.com/job/pubsub/write_countJob Pubsub Write CountcountPub/Sub Publish requests from PubsubIO.Write in Dataflow jobs. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_pubsub_write_latenciesdataflow.googleapis.com/job/pubsub/write_latenciesJob Pubsub Write LatenciesmsPub/Sub Publish request latencies from PubsubIO.Write in Dataflow jobs. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_streaming_engine_key_processing_availabilitydataflow.googleapis.com/job/streaming_engine/key_processing_availabilityCurrent processing key-range availability%Percentage of streaming processing keys that are assigned to workers and available to perform work. Work for unavailable keys will be deferred until keys are available.
google_dataflow_job_streaming_engine_persistent_state_read_bytes_countdataflow.googleapis.com/job/streaming_engine/persistent_state/read_bytes_countStorage bytes readcountStorage bytes read by a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_streaming_engine_persistent_state_stored_bytesdataflow.googleapis.com/job/streaming_engine/persistent_state/stored_bytesCurrent persistence state usagebytesCurrent bytes stored in persistent state for the job.
google_dataflow_job_streaming_engine_persistent_state_write_bytes_countdataflow.googleapis.com/job/streaming_engine/persistent_state/write_bytes_countStorage bytes writtencountStorage bytes written by a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_streaming_engine_persistent_state_write_latenciesdataflow.googleapis.com/job/streaming_engine/persistent_state/write_latenciesStorage write latenciesmsStorage write latencies from a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_streaming_engine_stage_end_to_end_latenciesdataflow.googleapis.com/job/streaming_engine/stage_end_to_end_latenciesPer stage end to end latencies.msDistribution of time spent by streaming engine in each stage of the pipeline. This time includes shuffling messages, queueing them for processing, processing, queueing for persistent state write, and the write itself. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_system_lagdataflow.googleapis.com/job/system_lagSystem lagsecondsThe current maximum duration that an item of data has been processing or awaiting processing, in seconds. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_target_worker_instancesdataflow.googleapis.com/job/target_worker_instancesTarget Worker InstancescountThe desired number of worker instances.
google_dataflow_job_timers_pending_countdataflow.googleapis.com/job/timers_pending_countTimers pending count per stagecountThe number of timers pending in a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_timers_processed_countdataflow.googleapis.com/job/timers_processed_countTimers processed count per stagecountThe number of timers completed by a particular stage. Available for jobs running on Streaming Engine. Sampled every 60 seconds. After sampling, data is not visible for up to 60 seconds.
google_dataflow_job_total_dcu_usagedataflow.googleapis.com/job/total_dcu_usageTotal DCU usagecompute unitsThe total amount of DCUs (Data Compute Unit) used by the Dataflow job since it was launched.
google_dataflow_job_total_memory_usage_timedataflow.googleapis.com/job/total_memory_usage_timeTotal memory usage timeGB-secondsThe total GB seconds of memory allocated to this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_total_pd_usage_timedataflow.googleapis.com/job/total_pd_usage_timeTotal PD usage timeGB-secondsThe total GB seconds for all persistent disk used by all workers associated with this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_total_shuffle_data_processeddataflow.googleapis.com/job/total_shuffle_data_processedTotal shuffle data processedbytesThe total bytes of shuffle data processed by this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_total_streaming_data_processeddataflow.googleapis.com/job/total_streaming_data_processedTotal streaming data processedbytesThe total bytes of streaming data processed by this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_total_vcpu_timedataflow.googleapis.com/job/total_vcpu_timeTotal vCPU timesecondsThe total vCPU seconds used by this Dataflow job. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
google_dataflow_job_user_counterdataflow.googleapis.com/job/user_counterUser CountercountA user-defined counter metric. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.

Event support

  • Supported
  • Configurable in OpsRamp Google Integration Discovery Profile.

External reference