×
Study Notes — Security

Cloud DLP
Google Cloud

Definition: Cloud DLP offers a range of de-identification transformations to help protect sensitive data while preserving its utility for analysis.

Author: Michaël Bettan
01

Overview & Details

Availability
Not specified
SLA
Not specified
Use Cases
Not specified
Billing
Not specified
02

De-identification Transformations

Redaction

This involves completely removing the sensitive data. It's the most straightforward method but can significantly reduce the data's usefulness. Think of this as permanently erasing the information.

Masking

This obscures sensitive data by replacing it with a character, like a 'X' for redacted text or a '0' for numbers. This preserves the data format and some utility while protecting the original value.

Tokenization

This replaces sensitive data with a surrogate value (a token) that preserves the original data format without revealing its actual meaning. The original data is stored separately and can be retrieved using the token. This is excellent for situations requiring both privacy and the ability to reverse the de-identification process. Tokenization can be deterministic or format-preserving:

Pseudonymization

Similar to tokenization, pseudonymization substitutes identifiers with pseudonyms or aliases. While it protects individual identities, it can still allow linking of records across datasets if the same pseudonym is used consistently for the same individual.

Date Shifting

This method shifts dates by a random number of days, preserving the time difference between dates but obscuring the actual dates. This can be useful for analyzing time-based trends without revealing precise dates.

Bucketing

This groups data into ranges or buckets (e.g., age ranges instead of specific ages). It simplifies data representation and offers some level of privacy while retaining some analytical value.

Generalization (k-anonymization)

This technique aims to make individuals indistinguishable within a group of 'k' individuals by suppressing or generalizing quasi-identifiers. The goal is to ensure that at least 'k' records share the same combination of quasi-identifiers, making it difficult to single out any individual.

Homomorphic Encryption

This allows computations on encrypted data without requiring decryption. While not strictly de-identification, it's a privacy-enhancing technique that allows data processing while keeping the data encrypted throughout the process.