Best practices to protect sensitive data
Introduction
The shift from on-premises data centers to cloud infrastructure requires new secrets management techniques for the cloud's dynamic environments,applications, machines, and user credentials. Securing infrastructure, data, and access across clouds requires careful planning. You must identify and categorize your organization's data based on its sensitivity to decide how to secure them. You must apply different practices to protect data in transit and at rest.
Protect data in-transit
Data in-transit is any data moving between systems, such as passwords, secrets, and keys. In-transit data includes data moving between resources within your organization, and incoming and outgoing data with services outside your organization. By protecting your data in-transit, you protect the confidentiality and integrity of the data within your organization.
TLS for client to server communication
Human client to machine communication is the first hop of data in-transit. TLS/SSL certificates are used to encrypt such communication - in most cases via browsers - using HTTPS instead of HTTP. TLS can also wrap FTP (FTPS, not to be confused with SFTP which uses the SSH protocol), IMAP (IMAPS), POP3 (POP3S), and SMTP (SMTPS), among others.
HTTP is dangerous because someone (man in the middle) can intercept the traffic and insert malicious code before forwarding it to the user's browser. TLS (Transport Layer Security) solves this problem by identifying either the server or both client and server to each other. You should use the latest TLS version (presently v1.3) to gain the highest security. That is because cryptographic algorithms can be broken over time. TLS 1.3 removed use of ECDHE_RSA among the three ciphers: QUIC, X25519, and AES_128_GCM. Microsoft enabled TLS 1.3 by default in 2020.
A way to protect yourself is to verify that your browser supports TLS v1.3. Additionally, identify whether the site supports HTTP Strict Transport Security (HSTS) to protect against man-in-the-middle attacks using Qualys SSL Server Test. On browsers, a lock icon appears next to the address to show the use of TLS encryption.
Consul for universal networking
Unencrypted cross-application communications are as susceptible to attacks as client-to-application communications. An application can protect itself against malicious activities by requiring the use of mTLS (mutual TLS) on both ends of the application to application communications.
HashiCorp Consul automates the enablement of mTLS for all communication between application services (machine-to-machine). Even legacy apps can use mTLS through local Consul proxies that intercept network traffic as parts of a service mesh. A service mesh architecture lets Consul enforce mTLS across clouds and platforms. Consul generates signed certificates automatically and lets you rapidly and comprehensively upgrade TLS versions and cipher suites in the future. This process helps resolve the typical slow process of updating the TLS version in your application.
Consul automatically encrypts communications within the service mesh with mTLS. Outside traffic entering the service mesh, however, should also be secured. Outside traffic entering the service mesh, however, should also be secure. Two common entry points for traffic into the Consul Service mesh are the Ingress Gateway and the API Gateway. To secure content on inbound traffic to these gateways, you can enable TLS on ingress gateways, and enable TLS on the API gateway listeners.
Vault for securing specific types of content
A common practice to protect highly-sensitive data is to encrypt the data you are sending across the public network. However, managing the encryption key introduces operational overhead. An organization may require a specific type of encryption key. Vault's Transit secrets engine supports a number of key types to encrypt and decrypt your data, and manage the lifecycle of those keys.
The Vault Transit secret engine handles cryptographic functions on data in-transit. Vault doesn't store the data sent to the secrets engine. It can also be viewed as "cryptography as a service" or "encryption as a service". The transit secrets engine can also sign and verify data; generate hashes and HMACs of data; and act as a source of random bytes.
For more advanced use cases (e.g. encoding credit card numbers), data transformation and tokenization are more desirable data protection methods. Vault's Transform secrets engine provides data encryption service similar to the Transit secrets engine. The key difference is that the users can specify the format of the resulting ciphertext using the Transform secrets engine's format-preserving encryption (FPE) feature.
In addition to FPE, the Transform secrets engine provides data tokenization capability. See the Vault Tokenization section to learn how the Transform secrets engine tokenizes data for secure in-transit data transmission.
Note
Transform secrets engine is a Vault Enterprise feature.
Tutorials
Protect data at-rest
Data at-rest represents any data you maintain in non-volatile storage in your environment. Encrypting data at-rest, and implementing secure access to your data, are two ways you can protect your applications from security threats.
Vault
Vault uses a security barrier for all requests made to its API endpoints. This security barrier automatically encrypts all data leaving Vault using a 256-bit Advanced Encryption Standard (AES) cipher in the Galois Counter Mode (GCM) with 96-bit nonces. So, regardless of where your Vault is configured to persist the data (Consul, Integrated Storage, etc.), they are encrypted by the barrier. In other words, Vault's storage backend is storing the ciphertexts. To leverage the Vault's cryptographic barrier, use Vault's Key/Value (KV) secrets engine as your secret storage. All data sent to the KV secrets engine gets encrypted by the barrier. To access those data, the requesting client must be authorized by Vault.
In some cases, you may not want to store your secrets in Vault due to the volume of data. (Keep in mind that Vault is not a database.) Think of a scenario where your data is persisted in an MS SQL database which uses TDE (Transparent Data Encryption). Vault can encrypt the DEK (data encrypt key), which encrypts data in the database. This key usually resides alongside the database. You can store the KEK (key encryption key) in the KV secrets engine as shown in the diagram below.
Another layer for protecting data is to control access to the data. Vault can secure access to your external data at-rest through dynamic credentials. These dynamic credentials have a lifecycle attached to them, and are invalid after a predefined period of time. We recommend using dynamic secrets when accessing your external data.
For example, you can use Vault to issue your CI/CD pipeline dynamic credentials to an external service, such as a PostgreSQL database. This allows your CI/CD pipelines to access your data at-rest, and then once the pipeline finishes, Vault invalidates the credentials. The next time your pipeline runs, Vault issues your pipeline new credentials.
To learn more about how your CI/CD pipelines can pull secrets from Vault, see our Well-Architected Retrieving CI/CD secrets from vault guide.
Tutorials
Resources
- Enabling transparent data encryption for Microsoft SQL with Vault
- Why you should use ephemeral credentials
- Retrieving CI/CD secrets from Vault
Terraform
We recommend setting encryption standards in your infrastructure-as-code. Terraform can help you secure your data at-rest by deploying infrastructure from code that specifies resource and data encryption, along with access control policies.
An example of using Terraform to create infrastructure that securely stores data, consider
enabling server-side encryption by default in an AWS S3 bucket.
Terraform can create a KMS key
using the aws_kms_key
resource. It can then create an S3 bucket,
enable default server side encryption
for the S3 bucket, and then use the KMS key to encrypt objects.
The following example creates a KMS key and enforces S3 object encryption server-side:
resource "aws_kms_key" "mykey" { description = "This key is used to encrypt bucket objects" deletion_window_in_days = 10} resource "aws_s3_bucket" "mybucket" { bucket = "mybucket" server_side_encryption_configuration { rule { apply_server_side_encryption_by_default { kms_master_key_id = aws_kms_key.mykey.arn sse_algorithm = "aws:kms" } } }
Deprecation notice
The parameter server_side_encryption_configuration
is deprecated.
Use the resource aws_s3_bucket_server_side_encryption_configuration
instead.
The AWS S3 module provides an example of Terraform code that creates a KMS key and encrypts objects stored in the S3 bucket with the KMS key.
This same pattern applies to RDS and database systems in other cloud environments such as GCP, Azure, and on-premises.
Tutorials
Tokenize Critical Data
Tokenization converts sensitive data into nonsensitive data called tokens. Tokens are helpful when sensitive data is being sent out remotely, such as client authentication like GitHub login authentication, credit card numbers, banking credentials, or any other systems which require external authentication or data exchange.
Organizations who wish to create tokens to secure their data can utilize HashiCorp Vault. Vault Transform secrets engine can tokenize data to replace highly sensitive data (e.g. credit card numbers) with unique values (tokens) that are unrelated to the original value in any algorithmic sense. Therefore, the tokens cannot risk exposing the critical data satisfying the Payment Card Industry Data Security Standard (PCI-DSS) guidance.
The following diagram shows how Vault can take sensitive data, such as a customer's credit card number, encrypt the value, and allow the application to use that credit card securely.
Tutorials
Protect Sensitive Data Used by Terraform
Terraform stores, creates, uses, and manages data that can be considered sensitive. This data includes, but is not limited to Terraform state, input variables, plan/apply output, and logs:
Practitioners implementing the Security pillar must secure this data, whether using HCP Terraform/Enterprise or self-hosted OSS.
Storing Terraform state
Terraform records data about the infrastructure it manages in a state file. The backend block defines where to store the state file. By default, Terraform uses the local backend, which stores the state file as plaintext in the directory where you run Terraform. While this might be acceptable in development environments, using the local backend in production environments can lead to sensitive data stored as plaintext in insecure locations, which may compromise your systems [CWE-256].
We recommend storing your state file in secure remote storage. Both HCP Terraform and Terraform Enterprise store state in a secure backend and encrypt the state with Vault in-transit encryption. For practitioners who are using AWS S3 as a backend and wish to move to HCP Terraform, we provide a tutorial to walk you through the process.
Warning
Never store your Terraform state file in a remote code repository such as GitHub or GitLab.
Terraform can also store your state in backends hosted by multiple cloud providers or on-prem solutions.
The Amazon S3 backend supports encryption at rest when the encrypt option is enabled. AWS IAM policies can control access to the state file, and logging can be used to identify any access requests.
The Google Cloud backend uses a bucket to store Terraform state. Google Cloud offers a tutorial on using a GCP bucket for storing Terraform state, and how to configure the cloud resources to do so.
The Azurerm backend stores the state as a Blob with the given Key within the Blob Container within the Blob Storage Account. Azure provides documentation and a walkthrough on setting up Azure Storage to store Terraform state files
Regardless of the provider you choose, we recommend enabling versioning in your state backend of choice. Versioning allows you to recover state lost due to accidental deletions or human error. We also recommend encrypting the state file via technologies that encrypt data at rest and in transit such as S3 encryption using KMS, Azure Storage encryption, or Google Cloud storage
Sensitive data in Terraform state
The state file may contain the initial password for resources such as databases or users and other resource-identifying data. Therefore, it is best practice to securely store your Terraform state file remotely and treat the state with similar controls as a secret.
For example, when Terraform provisions an RDS PostgreSQL database on AWS, Terraform saves the root password in the state
file.
We recommend immediately using HashiCorp Vault to rotate the root credential for RDS.
Rotating the root credential with Vault renders the password in the state
file useless as the root password is now stored and managed with Vault.
Vault's database root credential rotation feature is available on other database secrets engines. For a full list of supported databases,
see the database capabilities chart.
Tutorials
- Use HCP Terraform for state
- Store Terraform state in a cloud storage bucket
- Store Terraform state in Azure Storage
- MIgrate remote S3 backend to HCP Terraform
Input Variables
It is common for Terraform practitioners to input variable values containing a password or other data that, in the wrong hands,
could negatively impact business operations.
Terraform allows users to mask these values when running terraform apply
, terraform plan
, or terraform output
.
By default, the Terraform variable block does not mask input values assigned to them.
Use the sensitive
argument
on the variable block to show that the variable holds a sensitive value .
variable "user_information" { type = object({ name = string address = string }) sensitive = true}
Note
Masking these variables masks their values in the output of Terraform commands. Terraform still stores them as plaintext in your Terraform state file.