Expedite Data Masking in Salesforce -- Approach B

 Objective: 

Welcome to this chapter where we would be exploring another approach to expedite Data Masking in Salesforce. This is a continuation to the Data Masking Approach discussed in our previous article (https://manojn2sf.blogspot.com/2022/01/expedite-data-masking-in-salesforce.html). The approach we discussed in our previous article had

  • A Common database/sandbox which will be sync'd with sandboxes masked data as and when we refresh it.
  • The masked data in this common database/sandbox will then be re-used to apply data masking for a newly refreshed sandbox.
In this article, the alternate approach would be providing a similar performance as referred in previous article but we will be re-arranging the data flow with regards to a use case as discussed below. As you read through, you will understand how this approach will serve multiple benefits.

Usecase:

Universal Builders has roughly 20k Salesforce users and 150k external Partner Users. Agile methodology is followed and as per their development life cycle, SIT BUILD and UAT are done in Full Copy sandboxes. Once sign OFF obtained, features get deployed to production.

Universal Builders were also compliant with GDPR regulations, as they used an ETL tool (For instance Informatica, Boomi et al) for masking data in sandboxes (Partial Copy and Full Copy). Their Sprint deliverables had a decent velocity but as the Contact volume grew from 20M to 40M, Universal Builders started observing latency in their sandbox readiness which impacted their sprint velocity. This was because the Data Masking in sandbox was averaging 4M/day (Implies the Data Masking activity was taking 10 days.)

Eric approached Smita regarding this situation, because the eventual reduction in sprint velocity has made leadership unhappy. Smita is thus tasked to find a solution for expediting Data Masking process.

Smitha thus introduces us to the Alternative approach for Data Masking as below

Solution:

Pictorial representation of the approach is as below:



Overall approach has 3 Parts.

Part A: Initial Data Masking

  1. We are using an ETL tool to first Mask data in a recently refreshed Full copy sandbox so as to comply with GDPR regulations. 
  2. The ETL tool would be leveraging pk chunking method to first query the data to be masked and then leverage BULK API capabilities to perform Data Masking in an expedited fashion. Few ETL tools that support pk chunking and BULK API capabilities can be Boomi, Informatica, Mulesoft etc.
  3. With this we would be getting a masked Full Copy sandbox which can be referred as "Common Masked Database/sandbox".
Part B: Data Sync
  1. Data Sync is the next part of the solution where we would look at continuously syncing data between production and the Common sandbox via the ETL tool.
  2. ETL tool would also be masking the fresh data that got sync'd from production based on the masking rules.
Few of the benefits of Data Sync process include

  • Having an environment which has the latest production data and can bee used for performance test use cases. 
  • One can also explore the possibility of establishing the Common sandbox/database as a secondary environment to Salesforce Production since the data gets replicated from production via data sync process. As a result, one can use the secondary environment to run their business (or redirect API transactions) in case Salesforce is having a downtime. Though this aspect excites use, its beyond the scope of this article. 

Part C: Continuous Data Masking

As we now enrich an environment with production data leveraging Data Sync process, we now move on to Continuous Data Masking capability whereby

  1. The Prod equivalent data in common database/sandbox will then be acting as Source data for our Data Masking process for any Full/Partial Copy sandboxes that we refresh there of.
  2. An ETL tool is then used for Data Masking process thus performing a Simple "UPDATE" operation via BULK API to mask respective PII/Personal information fields data in the newly refreshed sandbox.
Reason we coin this as Continuous Data Masking process is because with this approach we can spin off as many Full copy sandboxes we want and run Data Masking process for each sandbox in Parallel mode, using the (masked) data in Common sandbox/Database as the source.

With the above approach also, Eric observed remarkable improvement in Data Masking performance. 4M/day improved to 20M/day.

Conclusion:

Eric was ecstatic with such a marked improvement and elated that he was able to meet/exceed the business expectations. He thanked Smita for her suggestions.

This is Because, 

  1. Eric exceeded Business expectations in masking data with improved performance.
  2. Eric now has a secondary environment to Production that can be used for Performance testing usecase.
  3. Erica has to less worry about errors like "Record Not Available" OR "Delta" records as this approach solves those issues in comparison to Approach A.
Lets give both of them a big Cheer of their accomplishment. 👋👋

And so, we conclude Approach A for expediting data masking. I hope this article helps you to resolve similar issues at your end. I Am eager to hear your feedback and inputs.

Article Links/References

  1. Sandbox Types (https://www.salesforceben.com/salesforce-sandbox/)
  2. Bulk API details (https://developer.salesforce.com/docs/atlas.en-us.220.0.api_asynch.meta/api_asynch/asynch_api_intro.htm)
  3. PK Chunking in Salesforce (https://developer.salesforce.com/blogs/engineering/2015/03/use-pk-chunking-extract-large-data-sets-salesforce)




Comments

Popular posts from this blog

Bulk data processing in Salesforce CRM -- How to Expedite?? Tips n Tricks \\ Part1

Bulk Partner Account Owner Changes in Salesforce CRM -- How to Expedite?? Tips n Tricks \\ Part3

Bulk data processing in Salesforce CRM through Batch Job -- How to Expedite?? Tips n Tricks \\ Part2