GCP GCP-PDE Free Practice Questions — Page 3

Professional Data Engineer • 5 questions • Answers & explanations included

Question 11

You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules: ✑ No interaction by the user on the site for 1 hour Has added more than $30 worth of products to the basket ✑ Has not completed a transaction You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

A. Use a fixed-time window with a duration of 60 minutes.
B. Use a sliding time window with a duration of 60 minutes.
C. Use a session window with a gap time duration of 60 minutes.
D. Use a global window with a time based trigger with a delay of 60 minutes.
Show Answer & Explanation

Correct Answer: C. Use a session window with a gap time duration of 60 minutes.

A session window groups events by user activity and closes after a defined period of inactivity — exactly matching the rule "no interaction for 1 hour." When the session closes after 60 minutes of inactivity, Dataflow can evaluate the basket state and trigger a message. Fixed windows (A) split time into equal chunks regardless of user activity. Sliding windows (B) overlap and don't represent inactivity gaps. Global windows (D) collect all data and don't naturally model per-user inactivity.

Question 12

Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other's data. You want to ensure appropriate access to the data. Which three steps should you take? (Choose three.)

A. Load data into different partitions.
B. Load data into a different dataset for each client.
C. Put each client's BigQuery dataset into a different table.
D. Restrict a client's dataset to approved users.
E. Only allow a service account to access the datasets.
F. Use the appropriate identity and access management (IAM) roles for each client's users.
Show Answer & Explanation

Correct Answers: B. Load data into a different dataset for each client.; D. Restrict a client's dataset to approved users.; F. Use the appropriate identity and access management (IAM) roles for each client's users.

Loading each client's data into a separate dataset (B) provides natural isolation boundaries in BigQuery. Restricting each dataset to approved users (D) ensures clients can't cross-access. Using proper IAM roles (F) enforces permissions at the dataset or project level. Option A (partitions) doesn't isolate clients — partitions exist within one table. Option C is backwards — datasets contain tables, not the other way. Option E (service account only) would block clients from querying their own data directly.

Question 13

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling. Which Google database service should you use?

A. Cloud SQL
B. BigQuery
C. Cloud Bigtable
D. Cloud Datastore
Show Answer & Explanation

Correct Answer: D. Cloud Datastore

Cloud Datastore (now Firestore in Datastore mode) is a fully managed, serverless NoSQL database that scales automatically — no infrastructure management needed. It suits transactional, high-growth applications. Cloud SQL (A) requires instance sizing and manual scaling. BigQuery (B) is an analytics warehouse, not a transactional database. Cloud Bigtable (C) is for massive analytical/time-series workloads and requires cluster management.

Question 14

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

A. There are very few occurrences of mutations relative to normal samples.
B. There are roughly equal occurrences of both normal and mutated samples in the database.
C. You expect future mutations to have different features from the mutated samples in the database.
D. You expect future mutations to have similar features to the mutated samples in the database.
E. You already have labels for which samples are mutated and which are normal in the database.
Show Answer & Explanation

Correct Answers: A. There are very few occurrences of mutations relative to normal samples.; C. You expect future mutations to have different features from the mutated samples in the database.

Unsupervised anomaly detection works best when mutations are rare (A) — the model learns "normal" and flags deviations. It also works when future mutations are different from known mutations (C), because the method doesn't rely on labeled examples — it detects novelty. Option B (equal occurrences) favors supervised learning. Option D (similar future mutations to known ones) favors supervised classification. Option E (labeled data exists) is the defining characteristic of supervised learning, not unsupervised.

Question 15

You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

A. Re-write the application to load accumulated data every 2 minutes.
B. Convert the streaming insert code to batch load for individual messages.
C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
D. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
Show Answer & Explanation

Correct Answer: A. Re-write the application to load accumulated data every 2 minutes.

BigQuery streaming inserts have a known limitation: data is not immediately available for queries due to the streaming buffer. Aggregations right after inserts will miss in-flight rows. Switching to micro-batch loads every 2 minutes ensures data is fully committed before queries run, achieving near-real-time with strong consistency. Option B (batch per individual message) defeats the purpose of batching. Option C (Cloud SQL export) adds unnecessary complexity and latency. Option D (waiting twice the latency) is unreliable and not a design solution.

Ready for the Full GCP-PDE Experience?

Access all 55 pages of practice questions and simulate the real exam with timed mode.

Start Interactive Quiz →