A practical guide to implementing sensitive data encryption using Postgres Pgcrypto
In the previous blog, we discussed the importance of securing PII and sensitive data points captured by enterprises as part of their business apps, analyzed different techniques of securing these data points namely, Encryption and Tokenization, and finally, compared these different techniques.
Continuing in this blog series, we will now cover the database-side encryption technique of sensitive data with Spring Boot and Hibernate Technologies. Information on this topic is limited, incomplete and fragmented across different websites, documentation, and blogs. This blog consolidates all these fragmented pieces of information to aid developers in implementing Database side Encryption with Postgres & Spring Boot effectively.
The link to the complete code base of this project can be found on GitHub at the following link:
First, we will start with a few concepts of Postgres Database Level Encryption technique before we deep dive into the code:
- Postgres provides Pgcrypto Extension which supports the encryption and decryption algorithm function mentioned above. This module is disabled by default and needs to be enabled explicitly.
- Pgcrypto Encryption mechanism — PGCrypto uses the PGP Encryption technique to encrypt data which supports both Symmetric and Asymmetric encryption techniques. In the Asymmetric Encryption technique, PGP generates a session key. Data is encrypted with a session key for every encryption instance while the session key is encrypted with the public key. Thus, it uses symmetric key encryption in some way. This is mostly to aid the speed of encryption or decryption for large text since public key encryption is slower than symmetric key encryption.
- pgp_pub_encrypt(msgToEncrypt: text, dearmor(publicKey)): Function provided by PgCrypto module of Postgres to encrypt messages. It takes the message/text to be encrypted along with a public key used to encrypt the message.
- pgp_pub_decrypt(msgToDecrypt: bytea, dearmor(privateKey), privateKeyPassword: text): Function used to decrypt the encrypted text.
- dearmor(data: text): This function converts the text format of the public key into the binary format required by pgp_pub_encrypt and pgp_pub_decrypt function.
- Hibernate @ColumnTransformer: Used to customize the default SQL generated by Hibernate to read and write value for a specific entity field to which this annotation is applied. For example, the encryption function can be applied during insert and decryption during select calls generated from Hibernate.
- SET LOCAL — Postgres SQL command to set configuration values for a specific transaction context. Values are retained for the duration of the transaction.
- current_setting() — Postgres SQL Function used to get the value of configuration parameters from a transaction context. For our use case, it reads the value set by the “Set Local” command.
The ‘SET LOCAL’ and ‘current_setting()’ functions are needed since the @ColumnTransformer annotation cannot read the value from our Spring Application Context. The values of public and private keys are passed to @ColumnTransformer annotation via the current_setting function
In our sample Example, we would encrypt PII (Personal Identifiable Information) CIBIL Score in a CustomerDetails Table.
brew install --cask gpg-suite
gpg
Check if gpg is properly installed
Generate the key pair. Respond to a couple of Questions asked as part of the Key Generation
gpg --gen-key
gpg will also ask for a passphrase <Test@1234> to protect your key. This is a password that you quote to access your private key. Enter a strong password
Keys would be generated once you enter the passphrase.
You can now list the Keys generated in your system and which are part of gpg registry.
gpg --list-secret-keys
You can now export the generated keys in a file using the KeyID of the generated key which you can see when you list the generated keys.You would need those keys to be configured in your Spring Boot project/Deployment Environment and in our case Ansible-vault. You can also use cloud services like AWS Secret Manager to maintain the keys securely
You can now export the keys in a text File
gpg -a --export BA53E92BB901B4C44B82152B98BDA42B512C636F > pgCrypto_public.txt
gpg -a --export-secret-keys BA53E92BB901B4C44B82152B98BDA42B512C636F > pgCrypto_public.txt
When you export the private Key, enter the passphrase added as part of the Key generation process.
Step 2: To check if Keys generated work fine, perform the following dry run in the database:
First, encrypt a sample message with the public key with the pgp_pub_encrypt function. Decrypt the same message with the pgp_pub_decrypt function and private key. It should spit out the original encrypted message:
select pgp_pub_encrypt('my secret message', dearmor('<public key Content>')) as msg into table encrypted_message1;
select pgp_pub_decrypt((select msg from encrypted_message1), dearmor('<Private Key content'), 'Test@1234') as "decrypted message";
Step 3: Create a Spring Boot Project with the required dependencies.
Step 4: Create a table named Customer_Details. I am using Flyway to set up the table. Setup the following DB scripts
CREATE TABLE customer_details (
id uuid NOT NULL,
cibil_score bytea NOT NULL,
name character varying(255) ,
is_active boolean default true,
CONSTRAINT customer_details_pkey PRIMARY KEY (id)
);
CREATE EXTENSION IF NOT EXISTS pgcrypto;
Note that the PII data — cibil_score has bytea format (byte format — stream data) representing binary data holding the encrypted CIBIL score for the customer.
Step 5: Setup the private key, public key, and private key password generated in Step 1 in the application.properties File. As a best practice, generate a separate set of keys for Production, UAT, and Development Environments. In the Production Environment, key values would come from Environment variables. You can use an ansible vault setup to store this key information in Encrypted format in your Deployment pipeline scripts. During deployment, these keys would be deployed in your Sandbox environment
Step 6:
Now, set up your domain Entity class as follows. Notice the use of pgcrypto Encryption and Decryption functions. Also, the Data type for the “cibilScore” field is “String” while in Database it’s “bytea” (columnDefinition property). Conversion from String to Bytea and vice versa is taken care by @ColumnTransformer annotation.
Step 7: Add a wrapper over the core Entity Repository Function. The purpose is that before we insert and get the CIBIL Score details, public and private key credentials are set in the transaction context. These context configuration parameters are then used by the pgcrypto function to encrypt and decrypt the CIBIL score details. The Session Object of EntityManager is retrieved to carry out this operation.
Note that getCustomerDetails() is also annotated with @Transactional since the Setting of keys and decryption of data should happen as part of the same transactional context.
This is how our Controller method would look like:
Find below the screenshot from Database, showing the Cibil score in Encrypted format:
There is also a Test Class that tests the Service layer class — CustomerDetailsService. It checks if what was inserted in the Database is same as received from DB — an unencrypted CIBIL score.
Following precautions need to be followed while using the Encryption approach for sensitive data:
- Private Key and private passphrase have to be maintained outside application.properties. It can be injected via Environment variables.
- If Key rotation is a compliance requirement, it would require one of the following approaches to be implemented:
- Write a migration script to migrate values encrypted with Older Keys with newer keys.
- Maintain different versions of keys. Mark each record with the version of the key it was encrypted with. Use the corresponding version to decrypt the key.
We have only addressed concerns surrounding data protection at rest. We need to further address concerns around the authorization of consumers of data. Consumers can be other systems or system users. Only authorized systems or users should view the PII data points. For example, credit approval teams can view PAN and Cibil details of the loan applicant. The Sales representatives have no business viewing these data points. We can implement Data Masking in API implementation. Basis identity of the principal wanting access to PII Data and his authorization rights, masked or unmasked data can be shared as part of JSON response.
In the next blog post, we will cover PII protection with tokenisation implementation using Spring Boot.
References:
https://www.postgresql.org/docs/current/pgcrypto.html
https://www.percona.com/resources/technical-presentations/introduction-pgcrypto-percona-tech-days