GCP GCP-PDE Free Practice Questions — Page 2

Professional Data Engineer • 5 questions • Answers & explanations included

Question 6

Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

A. Issue a command to restart the database servers.
B. Retry the query with exponential backoff, up to a cap of 15 minutes.
C. Retry the query every second until it comes back online to minimize staleness of data.
D. Reduce the query frequency to once every hour until the database comes back online.
Show Answer & Explanation

Correct Answer: B. Retry the query with exponential backoff, up to a cap of 15 minutes.

Exponential backoff prevents thundering herd problems — millions of App Engine instances all retrying simultaneously would overwhelm a recovering database. Capping at 15 minutes aligns with the query frequency, so no extra staleness is introduced. Option A (restart DB) is not the frontend's responsibility. Option C (retry every second) creates massive load on a struggling DB. Option D (reduce to hourly) introduces unnecessary data staleness when the DB may recover quickly.

Question 7

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

A. Linear regression
B. Logistic classification
C. Recurrent neural network
D. Feedforward neural network
Show Answer & Explanation

Correct Answer: A. Linear regression

Housing price prediction is a continuous value regression problem. Linear regression is lightweight, computationally cheap, and runs well under resource constraints. Logistic classification (B) is for categorical/binary outcomes, not price prediction. Recurrent neural networks (C) are for sequential/time-series data and are computationally expensive. Feedforward neural networks (D) can work for regression but require significantly more compute than linear regression on a constrained VM.

Question 8

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

A. Include ORDER BY DESK on timestamp column and LIMIT to 1.
B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.
Show Answer & Explanation

Correct Answer: D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

BigQuery streaming inserts can produce at-least-once delivery, meaning duplicates are possible. Using ROW_NUMBER() OVER (PARTITION BY unique_id ORDER BY event_timestamp) and filtering WHERE row_number = 1 keeps only the first occurrence per unique ID, effectively deduplicating. Option A (ORDER BY + LIMIT 1) only works on the full result set, not per-ID. Option B (GROUP BY + SUM) distorts non-aggregatable fields. Option C (LAG function) detects consecutive duplicates but doesn't reliably deduplicate across the full dataset.

Question 9

Your company is using WILDCARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error: pde_009_image.png Which table name will make the SQL statement work correctly?

Diagram for question 9
A. bigquery-public-data.noaa_gsod.gsod'
B. bigquery-public-data.noaa_gsod.gsod*
C. 'bigquery-public-data.noaa_gsod.gsod'*
D. bigquery-public-data.noaa_gsod.gsod*`
Show Answer & Explanation

Correct Answer: D. bigquery-public-data.noaa_gsod.gsod*`

Why Option D is the correct solution: Wildcard Syntax: To query multiple tables using a wildcard, you must append an asterisk (*) to the table name base. Backticks for Project/Dataset IDs: In BigQuery, when a project ID contains hyphens (like bigquery-public-data), the entire table path must be enclosed in backticks (`). Without them, the SQL parser sees the hyphens as subtraction operators, which explains the specific error message: Expected end of statement but got "-" at line 4. Enabling _TABLE_SUFFIX: The _TABLE_SUFFIX column only becomes available for use in the WHERE clause when the FROM clause successfully identifies a wildcard pattern.

Question 10

Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)

A. Disable writes to certain tables.
B. Restrict access to tables by role.
C. Ensure that the data is encrypted at all times.
D. Restrict BigQuery API access to approved users.
E. Segregate data across multiple tables or databases.
F. Use Google Stackdriver Audit Logging to determine policy violations.
Show Answer & Explanation

Correct Answers: B. Restrict access to tables by role.; D. Restrict BigQuery API access to approved users.; E. Segregate data across multiple tables or databases.

The principle of least privilege requires limiting access to only what's necessary. Restricting table access by role (B) uses BigQuery IAM to control who can read/write specific tables. Restricting BigQuery API access (D) prevents unauthorized users from even calling the API. Segregating data across multiple tables or databases (E) ensures users only have access to their relevant dataset. Option A (disable writes) doesn't restrict reads. Option C (encryption) is always on by default in BigQuery and doesn't control access scope. Option F (Stackdriver logging) detects violations after the fact — it doesn't enforce access control.

Ready for the Full GCP-PDE Experience?

Access all 55 pages of practice questions and simulate the real exam with timed mode.

Start Interactive Quiz →