Top Tips Of Update Professional-Data-Engineer Free Practice Test

Top Quality of Professional-Data-Engineer torrent materials and answers for Google certification for consumer, Real Success Guaranteed with Updated Professional-Data-Engineer pdf dumps vce Materials. 100% PASS Google Professional Data Engineer Exam exam Today!

Free Professional-Data-Engineer Demo Online For Google Certifitcation:

NEW QUESTION 1

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.
You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

  • A. Introduce data compression for each file to increase the rate file of file transfer.
  • B. Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
  • C. Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
  • D. Assemble 1,000 files into a tape archive (TAR) fil
  • E. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
  • F. Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.

Answer: CE

NEW QUESTION 2

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?

  • A. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.
  • B. Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.
  • C. Use the NOW () function in BigQuery to record the event’s time.
  • D. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Answer: B

NEW QUESTION 3

You’ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average
200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you’d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you’d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?

  • A. Increase the size of your parquet files to ensure them to be 1 GB minimum.
  • B. Switch to TFRecords formats (app
  • C. 200MB per file) instead of parquet files.
  • D. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
  • E. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Answer: C

NEW QUESTION 4

Which Google Cloud Platform service is an alternative to Hadoop with Hive?

  • A. Cloud Dataflow
  • B. Cloud Bigtable
  • C. BigQuery
  • D. Cloud Datastore

Answer: C

Explanation:
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis.
Google BigQuery is an enterprise data warehouse. Reference: https://en.wikipedia.org/wiki/Apache_Hive

NEW QUESTION 5

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

  • A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
  • B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
  • C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
  • D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Answer: D

NEW QUESTION 6

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.
What should you do?

  • A. Increase the size of the dataset by collecting additional data.
  • B. Train a linear regression to predict a credit default risk score.
  • C. Remove the bias from the data and collect applications that have been declined loans.
  • D. Match loan applicants with their social profiles to enable feature engineering.

Answer: B

NEW QUESTION 7

You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?

  • A. Use Cloud TPUs without any additional adjustment to your code.
  • B. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
  • C. Use Cloud GPUs after implementing GPU kernel support for your customs ops.
  • D. Stay on CPUs, and increase the size of the cluster you’re training your model on.

Answer: B

NEW QUESTION 8

You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use on-demand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organization sometimes don’t get slots to execute their query and you need to correct this. You’d like to avoid introducing new projects to your account.
What should you do?

  • A. Convert your batch BQ queries into interactive BQ queries.
  • B. Create an additional project to overcome the 2K on-demand per-project quota.
  • C. Switch to flat-rate pricing and establish a hierarchical priority model for your projects.
  • D. Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.

Answer: C

Explanation:
Reference https://cloud.google.com/blog/products/gcp/busting-12-myths-about-bigquery

NEW QUESTION 9

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

  • A. Export the records from the database as an Avro fil
  • B. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
  • C. Export the records from the database as an Avro fil
  • D. Copy the file onto a Transfer Appliance and send itto Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
  • E. Export the records from the database into a CSV fil
  • F. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storag
  • G. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.
  • H. Export the records from the database as an Avro fil
  • I. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storag
  • J. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Answer: A

NEW QUESTION 10

You are responsible for writing your company’s ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

  • A. PigLatin using Pig
  • B. HiveQL using Hive
  • C. Java using MapReduce
  • D. Python using MapReduce

Answer: D

NEW QUESTION 11

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

  • A. Perform hyperparameter tuning
  • B. Train a classifier with deep neural networks, because neural networks would always beat SVMs
  • C. Deploy the model and measure the real-world AUC; it’s always higher because of generalization
  • D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer: D

NEW QUESTION 12

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

  • A. They have not assigned the timestamp, which causes the job to fail
  • B. They have not set the triggers to accommodate the data coming in late, which causes the job to fail
  • C. They have not applied a global windowing function, which causes the job to fail when the pipeline is created
  • D. They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Answer: C

NEW QUESTION 13

You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application’s interface. You do not want individual users to authenticate to BigQuery and you do not want to give them access to the dataset. You need to securely access BigQuery from your IT application.
What should you do?

  • A. Create groups for your users and give those groups access to the dataset
  • B. Integrate with a single sign-on (SSO) platform, and pass each user’s credentials along with the query request
  • C. Create a service account and grant dataset access to that accoun
  • D. Use the service account’s private key to access the dataset
  • E. Create a dummy user and grant dataset access to that use
  • F. Store the username and password for that user in a file on the files system, and use those credentials to access the BigQuery dataset

Answer: C

NEW QUESTION 14

You need to compose visualization for operations teams with the following requirements:
Professional-Data-Engineer dumps exhibit Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute)
Professional-Data-Engineer dumps exhibit The report must not be more than 3 hours delayed from live data.
Professional-Data-Engineer dumps exhibit The actionable report should only show suboptimal links.
Professional-Data-Engineer dumps exhibit Most suboptimal links should be sorted to the top.
Professional-Data-Engineer dumps exhibit Suboptimal links can be grouped and filtered by regional geography.
Professional-Data-Engineer dumps exhibit User response time to load the report must be <5 seconds.
You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

  • A. Look through the current data and compose a series of charts and tables, one for each possible combination of criteria.
  • B. Look through the current data and compose a small set of generalized charts and tables bound to criteria filters that allow value selection.
  • C. Export the data to a spreadsheet, compose a series of charts and tables, one for each possible combination of criteria, and spread them across multiple tabs.
  • D. Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizes the data across each criteria, and then renders results using the Google Charts and visualization API.

Answer: B

NEW QUESTION 15

MJTelco is building a custom interface to share data. They have these requirements:
Professional-Data-Engineer dumps exhibit They need to do aggregations over their petabyte-scale datasets.
Professional-Data-Engineer dumps exhibit They need to scan specific time range rows with a very fast response time (milliseconds). Which combination of Google Cloud Platform products should you recommend?

  • A. Cloud Datastore and Cloud Bigtable
  • B. Cloud Bigtable and Cloud SQL
  • C. BigQuery and Cloud Bigtable
  • D. BigQuery and Cloud Storage

Answer: C

NEW QUESTION 16

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

  • A. Make a call to the Stackdriver API to list all logs, and apply an advanced filter.
  • B. In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.
  • C. In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.
  • D. Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

Answer: B

NEW QUESTION 17

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

  • A. Transform text files to compressed Avro using Cloud Dataflo
  • B. Use BigQuery for storage and query.
  • C. Transform text files to compressed Avro using Cloud Dataflo
  • D. Use Cloud Storage and BigQuerypermanent linked tables for query.
  • E. Compress text files to gzip using the Grid Computing Tool
  • F. Use BigQuery for storage and query.
  • G. Compress text files to gzip using the Grid Computing Tool
  • H. Use Cloud Storage, and then import into Cloud Bigtable for query.

Answer: D

NEW QUESTION 18

Google Cloud Bigtable indexes a single value in each row. This value is called the .

  • A. primary key
  • B. unique key
  • C. row key
  • D. master key

Answer: C

Explanation:
Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.
Reference: https://cloud.google.com/bigtable/docs/overview

NEW QUESTION 19

An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to transact directly via the application.
They need to manage their shopping transactions and analyze combined data from multiple datasets using a business intelligence (BI) tool. They want to use only a single database for this purpose. Which Google Cloud database should they choose?

  • A. BigQuery
  • B. Cloud SQL
  • C. Cloud BigTable
  • D. Cloud Datastore

Answer: C

NEW QUESTION 20

Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

  • A. Threading
  • B. Serialization
  • C. Dropout Methods
  • D. Dimensionality Reduction

Answer: C

Explanation:
Reference
https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505

NEW QUESTION 21

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

  • A. Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.
  • B. Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
  • C. Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
  • D. Import the new records from the CSV file into a new BigQuery tabl
  • E. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.

Answer: A

NEW QUESTION 22

You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

  • A. Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
  • B. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
  • C. Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
  • D. Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.

Answer: A

NEW QUESTION 23

Dataproc clusters contain many configuration files. To update these files, you will need to use the --properties option. The format for the option is: file_prefix:property= .

  • A. details
  • B. value
  • C. null
  • D. id

Answer: B

Explanation:
To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-properties#formatting

NEW QUESTION 24
......

Recommend!! Get the Full Professional-Data-Engineer dumps in VCE and PDF From Dumps-hub.com, Welcome to Download: https://www.dumps-hub.com/Professional-Data-Engineer-dumps.html (New 239 Q&As Version)