Definition: Cloud DLP offers a range of de-identification transformations to help protect sensitive data while preserving its utility for analysis.
This involves completely removing the sensitive data. It's the most straightforward method but can significantly reduce the data's usefulness. Think of this as permanently erasing the information.
This obscures sensitive data by replacing it with a character, like a 'X' for redacted text or a '0' for numbers. This preserves the data format and some utility while protecting the original value.
This replaces sensitive data with a surrogate value (a token) that preserves the original data format without revealing its actual meaning. The original data is stored separately and can be retrieved using the token. This is excellent for situations requiring both privacy and the ability to reverse the de-identification process. Tokenization can be deterministic or format-preserving:
Similar to tokenization, pseudonymization substitutes identifiers with pseudonyms or aliases. While it protects individual identities, it can still allow linking of records across datasets if the same pseudonym is used consistently for the same individual.
This method shifts dates by a random number of days, preserving the time difference between dates but obscuring the actual dates. This can be useful for analyzing time-based trends without revealing precise dates.
This groups data into ranges or buckets (e.g., age ranges instead of specific ages). It simplifies data representation and offers some level of privacy while retaining some analytical value.
This technique aims to make individuals indistinguishable within a group of 'k' individuals by suppressing or generalizing quasi-identifiers. The goal is to ensure that at least 'k' records share the same combination of quasi-identifiers, making it difficult to single out any individual.
This allows computations on encrypted data without requiring decryption. While not strictly de-identification, it's a privacy-enhancing technique that allows data processing while keeping the data encrypted throughout the process.