The Secret Of Google Professional-Data-Engineer Testing Software

Ucertify offers free demo for Professional-Data-Engineer exam. "Google Professional Data Engineer Exam", also known as Professional-Data-Engineer exam, is a Google Certification. This set of posts, Passing the Google Professional-Data-Engineer exam, will help you answer those questions. The Professional-Data-Engineer Questions & Answers covers all the knowledge points of the real exam. 100% real Google Professional-Data-Engineer exams and revised by experts!

Free demo questions for Google Professional-Data-Engineer Exam Dumps Below:


All Google Cloud Bigtable client requests go through a front-end server they are sent to a Cloud Bigtable node.

  • A. before
  • B. after
  • C. only if
  • D. once

Answer: A

In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.


Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

  • A. dataflow.worker
  • B. dataflow.compute
  • C. dataflow.developer
  • D. dataflow.viewer

Answer: A

The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline


You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. Which two actions should you take? (Choose two.)

  • A. Configure your Cloud Dataflow pipeline to use local execution
  • B. Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
  • C. Increase the number of nodes in the Cloud Bigtable cluster
  • D. Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
  • E. Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable

Answer: DE


You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?

  • A. Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.
  • B. Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.
  • C. Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.
  • D. Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.

Answer: C


You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:
Professional-Data-Engineer dumps exhibit The user profile: What the user likes and doesn’t like to eat
Professional-Data-Engineer dumps exhibit The user account information: Name, address, preferred meal times
Professional-Data-Engineer dumps exhibit The order information: When orders are made, from where, to whom
The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?

  • A. BigQuery
  • B. Cloud SQL
  • C. Cloud Bigtable
  • D. Cloud Datastore

Answer: A


You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

  • A. Assign the users/groups data viewer access at the table level for each table
  • B. Create SQL views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the SQL views
  • C. Create authorized views for each team in the same dataset in which the data resides, and assign theusers/groups data viewer access to the authorized views
  • D. Create authorized views for each team in datasets created for each tea
  • E. Assign the authorized views data viewer access to the dataset in which the data reside
  • F. Assign the users/groups data viewer access to the datasets in which the authorized views reside

Answer: C


Cloud Bigtable is Google's Big Data database service.

  • A. Relational
  • B. mySQL
  • C. NoSQL
  • D. SQL Server

Answer: C

Cloud Bigtable is Google's NoSQL Big Data database service. It is the same database that Google uses for services, such as Search, Analytics, Maps, and Gmail.
It is used for requirements that are low latency and high throughput including Internet of Things (IoT), user analytics, and financial data analysis.


You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query – -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

  • A. Create a separate table for each ID.
  • B. Use the LIMIT keyword to reduce the number of rows returned.
  • C. Recreate the table with a partitioning column and clustering column.
  • D. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Answer: B


Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

  • A. Redefine the schema by evenly distributing reads and writes across the row space of the table.
  • B. The performance issue should be resolved over time as the site of the BigDate cluster is increased.
  • C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
  • D. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Answer: A


You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

  • A. Use Cloud Dataproc to run your transformation
  • B. Monitor CPU utilization for the cluste
  • C. Resize the number of worker nodes in your cluster via the command line.
  • D. Use Cloud Dataproc to run your transformation
  • E. Use the diagnose command to generate an operational output archiv
  • F. Locate the bottleneck and adjust cluster resources.
  • G. Use Cloud Dataflow to run your transformation
  • H. Monitor the job system lag with Stackdrive
  • I. Use the default autoscaling setting for worker instances.
  • J. Use Cloud Dataflow to run your transformation
  • K. Monitor the total execution time for a sampling of job
  • L. Configure the job to use non-default Compute Engine machine types when needed.

Answer: B


Cloud Dataproc is a managed Apache Hadoop and Apache service.

  • A. Blaze
  • B. Spark
  • C. Fire
  • D. Ignite

Answer: B

Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you use open source data tools for batch processing, querying, streaming, and machine learning.


You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

  • A. Denormalize the data as must as possible.
  • B. Preserve the structure of the data as much as possible.
  • C. Use BigQuery UPDATE to further reduce the size of the dataset.
  • D. Develop a data pipeline where status updates are appended to BigQuery instead of updated.
  • E. Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro fil
  • F. Use BigQuery’ssupport for external data sources to query.

Answer: DE


You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.
What should you do?

  • A. Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.
  • B. Place the MariaDB instances in an Instance Group with a Health Check.
  • C. Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.
  • D. Install the StackDriver Agent and configure the MySQL plugin.

Answer: C


Your company needs to upload their historic data to Cloud Storage. The security rules don’t allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing
on-premises applications every day. What should they do?

  • A. Execute gsutil rsync from the on-premises servers.
  • B. Use Cloud Dataflow and write the data to Cloud Storage.
  • C. Write a job template in Cloud Dataproc to perform the data transfer.
  • D. Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

Answer: B


Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

  • A. Use Google Stackdriver Audit Logs to review data access.
  • B. Get the identity and access management IIAM) policy of each table
  • C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
  • D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Answer: C


Which Java SDK class can you use to run your Dataflow programs locally?

  • A. LocalRunner
  • B. DirectPipelineRunner
  • C. MachineRunner
  • D. LocalPipelineRunner

Answer: B

DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests


When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

  • A. Your gcloud does not have access to the BigQuery resources
  • B. BigQuery cannot be accessed from local machines
  • C. You are missing gcloud on your machine
  • D. Pipelines cannot be run locally

Answer: A

When reading from a Dataflow source or writing to a Dataflow sink using DirectPipelineRunner, the Cloud Platform account that you configured with the gcloud executable will need access to the corresponding source/sink


Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)

  • A. Disable writes to certain tables.
  • B. Restrict access to tables by role.
  • C. Ensure that the data is encrypted at all times.
  • D. Restrict BigQuery API access to approved users.
  • E. Segregate data across multiple tables or databases.
  • F. Use Google Stackdriver Audit Logging to determine policy violations.

Answer: BDF


When a Cloud Bigtable node fails, is lost.

  • A. all data
  • B. no data
  • C. the last transaction
  • D. the time dimension

Answer: B

A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated with a specific Cloud Bigtable node.
Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:
Rebalancing tablets from one node to another is very fast, because the actual data is not copied. Cloud Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
When a Cloud Bigtable node fails, no data is lost Reference:


You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output. Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

  • A. Cancel
  • B. Drain
  • C. Stop
  • D. Finish

Answer: B

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job will immediately stop ingesting new data from input sources, but the Dataflow
service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline.


What Dataflow concept determines when a Window's contents should be output based on certain criteria being met?

  • A. Sessions
  • B. OutputCriteria
  • C. Windows
  • D. Triggers

Answer: D

Triggers control when the elements for a specific key and window are output. As elements arrive, they are put into one or more windows by a Window transform and its associated WindowFn, and then passed to the associated Trigger to determine if the Windows contents should be output.


MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

  • A. The zone
  • B. The number of workers
  • C. The disk size per worker
  • D. The maximum number of workers

Answer: A


Which of these statements about exporting data from BigQuery is false?

  • A. To export more than 1 GB of data, you need to put a wildcard in the destination filename.
  • B. The only supported export destination is Google Cloud Storage.
  • C. Data can only be exported in JSON or Avro format.
  • D. The only compression option available is GZIP.

Answer: C

Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.


Recommend!! Get the Full Professional-Data-Engineer dumps in VCE and PDF From, Welcome to Download: (New 239 Q&As Version)