Expedite Data Masking in Salesforce -- Approach A

 Objective: 

Welcome to this chapter where we would be exploring approaches to expedite data masking in a Salesforce Full Copy sandbox that has millions of data. 

Considering GDPR regulations, its imminent for the Salesforce Org to mask the Restricted and PII related data in an instance. But as the data in the instance grows, our Dev/Admin would eventually see the Data Masking Operation taking more time (Sometimes its days and sometimes its weeks). 

This would also result in a delay with sandbox readiness and further delay feature deliverables for an Org, if it has a development life cycle involving SIT/UAT/Performance test Sign OFF in a Full copy sandbox prior to deploying features to production.

With that lets look at Universal Builders Instance where Eric wears the hat of salesforce administrator and Smita is our Solution Architect. Lets look at their usecase in detail.

Usecase:

Universal Builders has roughly 20k Salesforce users and 150k external Partner Users. Agile methodology is followed and as per their development life cycle, SIT BUILD and UAT are done in Full Copy sandboxes. Once sign OFF obtained, features get deployed to production.

Universal Builders were also compliant with GDPR regulations, as they used an ETL tool (For instance Informatica, Boomi et al) for masking data in sandboxes (Partial Copy and Full Copy). Their Sprint deliverables had a decent velocity but as the Contact volume grew from 20M to 40M, Universal Builders started observing latency in their sandbox readiness which impacted their sprint velocity. This was because the Data Masking in sandbox was averaging 4M/day (Implies the Data Masking activity was taking 10 days.)

Eric approached Smita regarding this situation, because the eventual reduction in sprint velocity has made leadership unhappy. Smita is thus tasked to find a solution for expediting Data Masking process. Her analysis is as below.

Analysis:

  1. Sneha checked on the Org and found that there are hardly any custom sharing Rules created for Contact Object.
  2. Sneha checked on ETL user used for doing data masking as well as existing Trigger and Validation Rules. The Best practice of skipping automations for ETL user while performing data masking was followed.
  3. Defer Sharing was also enabled while the masking operation was performed.
  4. The ETL tool was performing an UPSERT operation for data masking leveraging BULK API capabilities.
ETL tool was performing an UPSERT operation and was yielding 4M per day consistently. But as the data grew to 40M and whole process taking 10 days; UPSERT operation against such huge volume was not advised.

Considering that, Smita came with a new (custom) approach for data masking that was very rewarding.

Aren’t you curious? Let’s Cut to the Chase  😊

Solution:

Considering the best practices followed in the current implementation, the new solution approach is as below:

  1. A common sandbox or a database which will always host masked data in a different sandbox.
  2. Re-design the ETL tool to have 3 more Jobs.
  • Job A which will pull the masked data from Sandbox1 and upload to the common database/Sandbox.
  • Job B will perform a simple Update operation against the newly refreshed Full Copy Sandbox (to be masked) based on masked data from the common database/Sandbox. Job B as such is leveraging the out of box concept in salesforce for Full Copy sandbox whereby Record id matches between Production and Sandbox.
** In case the Job tries to mask a record (Via UPDATE) that doesnt exist in newly refreshed sandbox a "Record Not available" error will be thrown and such records will then be removed from Common database/Sandbox as part of "Exception Handling".

(A better approach will be mentioned in the next article)
  • Job C, which will query data left ("delta") for masking from Sandbox2 post UPDATE operation. Then apply data masking for this batch and upload the same masked data to the common sandbox/database.
For example, if Sandbox 1 is refreshed on Jan month and Sandbox 2 gets refreshed by Feb month.. then Job C will be masking the fresh data that got created between the 2 sandbox refresh windows.
Above approach gets repeated for future Full Copy sandboxes.

Pictorial representation of this concept is as below

  1. ETL Tool pulling data from Sandbox1 meant for data masking.
  2. ETL Tool masks data based on the rules in the ETL tool in the sandbox.
  3. ETL Tool also publishing the masked data to a database or another Full copy sandbox. Such a masked data will be used for future masking process.
  4. ETL Tool does UPDATE operation for data masking (levering BULK API) to the newly refreshed Full copy sandbox 2. This is based on existing record id that does exist.
  5. Any "Delta" observed for data left to be masked are processed by the Data masking rules followed originally by the ETL tool.
  6. ETL tool then perform UPDATE operation for data masking using BULK API towards newly refreshed Full Copy sandbox 3. And, any "delta" observed for masking will be done by the rules in ETL tool and such masked data is pushed to common databased.
  7. The cycle continues for newly refreshed Full Copy sandbox.
With the above approach, Eric observed remarkable improvement in Data Masking performance. 4M/day improved to 20M/day.

Conclusion:

Eric was ecstatic with such a marked improvement and elated that he was able to meet/exceed the business expectations. He thanked Smita for her suggestions.

Lets give both of them a big Cheer of their accomplishment. 👋👋

And so, we conclude Approach A for expediting data masking. I hope this article helps you to resolve similar issues at your end. I Am eager to hear your feedback and inputs.

Article Links/References

  1. Sandbox Types (https://www.salesforceben.com/salesforce-sandbox/)
  2. Bulk API details (https://developer.salesforce.com/docs/atlas.en-us.220.0.api_asynch.meta/api_asynch/asynch_api_intro.htm)

Comments

Post a Comment

Popular posts from this blog

Bulk data processing in Salesforce CRM -- How to Expedite?? Tips n Tricks \\ Part1

Bulk Partner Account Owner Changes in Salesforce CRM -- How to Expedite?? Tips n Tricks \\ Part3

Bulk data processing in Salesforce CRM through Batch Job -- How to Expedite?? Tips n Tricks \\ Part2