Comprehensive study notes covering Bigtable architecture, schema design, operational best practices, and configuration. Everything you need to pass with confidence.
Designing a schema is different from a typical RDBMS. There is no support for joining (no foreign key concept). Each table has only one index (the row key, up to 4 KB). All operations are atomic (ACID) at the row level only, and both reads and writes should be distributed evenly.
| Row Key Design | Characteristics / Examples |
|---|---|
| Good Row Keys | Distributed load. Design from a common value to a granular value. Include a timestamp (but not at the beginning). |
| Poor Row Keys | Sequential numeric IDs. Timestamps alone or at the start. Hashed values. Values expressed as raw bytes. |
| Hot-spotting Avoidance | Avoid auto-incrementing keys or time-values alone as they hit the same node repeatedly. |
Q1. What happens if you use a sequential numeric ID or a timestamp as the beginning of your row key?
Q2. You initially provisioned an HDD Bigtable instance for batch analytics, but now want to serve real-time app traffic requiring lower latency. How do you switch to SSD?
Q3. At what level are operations guaranteed to be atomic (ACID) in Bigtable?
Q4. Because Garbage Collection is asynchronous and can take up to a week, how do you ensure deleted data isn't returned in your queries?
Q5. What is a common way to avoid hot-spotting for row keys when writing time-series data?