Bulk data processing in Salesforce CRM through Batch Job -- How to Expedite?? Tips n Tricks \\ Part2
Welcome to Part 2 of the Expediting Bulk Data Processing Blog series. Here we would be looking at certain smart ways of improving performance of Batch Job in updating/processing large volume of data.
Objective :
As a Recap, Part 1 of this series looked at impact of Data Skew - Ownership on holding the performance of Bulk data processing and how we could mitigate it accordingly (I would encourage you to visit Part1 of this series if you haven’t already)
In this article, we will be looking
at some techniques on improving the performance of existing Batch Job.
I am sure some of our Salesforce
Admins/Developers would have observed Batch Job slowness while processing large
volume of data. Triggers OR backend customizations could be a factor to slowness
BUT what if we need those customizations to execute as part of data updates and
still expect Batch Job to perform faster?
This is exactly the situation with
Mike, who is a Salesforce Developer with System Administrator Profile in
Universal Container. Lets look at his usecase in detail.
Usecase:
Universal Container has Account object which stores Account’s country. There are 3M Accounts in the system and the associated Contacts are almost 8M.
Mike has developed a Batch Job that will
populate Region__c (A Custom Field) in Contact record based on the associated
Account’s Country. The Country to Region mapping is defined in a custom Table for
the Code to refer. Batch Job got deployed on a launch weekend and the Business
expectation is to have the Region value populated for 8M Contacts within that weekend.
Prior to Launch weekend, Mike had tested the batch Job performance in a sandbox. The Batch job took some time to query data but
eventually processed 100k in 2 hrs. That’s a decent enough performance, BUT if
we factor in 8M… by math the overall process will then consume 6 plus days. For
sure, Mike will not be able to meet business expectations and his situation is
very bad now.
What approach will Mike take to
tackle this situation and what action he took eventually?
Aren’t you curious? Lets cut to
the chase.
Approach:
Mike had a 3 phased approach to this
situation -> Query Analysis || Data Analysis || Admin settings. Lets dive
onto them.
- Query Analysis: Mike looked at the
Query of batch job and spotted couple of things:
i.
Field indexing: Though the Query has
proper WHERE clause and date range defined, the custom fields in the WHERE
clause were not indexed. Custom Indexing plays a major factor in improving the SOQL
query performance and one needs to raise a support case with Salesforce to
enable it. Please refer to Salesforce Help Article for more details (https://help.salesforce.com/articleView?id=000325247&type=1&mode=1)
. Another interesting article on indexing is below
ii.
Need of Skinny Table: Mike observed
that the Batch job query had both standard fields and Custom fields present. Apart
from Indexing, Skinny Table also plays a helpful hand in improving the Query
performance. Skinny table is very useful
when you have large volume of data to process and the query refers both custom
fields and standard fields. A normal query that refers both type of custom and
standard fields will need a JOIN at the backend database table. BUT the same
query against a Skinny table will avoid that Join, thus improving query
performance. This is because Skinny table would host all these fields in a
single table. Please refer to Salesforce Knowledge Article for more details (https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_skinny_tables.htm)
A positive observation from Mike was
that, he had the overall query stored in a custom label and had the label
referred in the Batch Job. This is one of the best and recommended practices in
Salesforce development. Mike then proceed to the 2nd phase of the
approach – Data Analysis.
2. 2. Data Analysis: Mike checked on Organization wide default setting of the Contact Object and it was set as “Controlled by Parent” (Implies Contacts OWD is controlled by associated Account). Mike then proceeded to understand the distribution of 8M contacts and found below. This helped him to plan the execution of Jobs.
Region |
Volume |
APJ |
2.5M |
EMEA |
1.5M |
LATAM |
1M |
US |
3M |
TOTAL |
8M |
3.
Admin Settings: Mike learnt about
two capabilities that ended up helping his situation.
i.
Defer Sharing Calculation: Defer
Sharing feature of Salesforce helps to suspend custom sharing rule calculations
initiated by Salesforce as part of record update. Considering the Org having
huge volume of data, such a capability will help improve performance. Please
help refer Salesforce help article for more details (https://help.salesforce.com/articleView?id=security_sharing_rule_recalculation.htm&type=5)
ii.
User Pin : This capability is
Salesforce Product/Support team controlled and can only be enabled by them for requests
pertaining to Large Data Volumes. Salesforce does not entertain such requests
frequently unless there is a business Urgency. Salesforce shall only pin
minimum number users based on the duration of request and one has to contact Salesforce
support for the same. To my understanding, when a user gets pinned by
Salesforce.. all the operations handled by the user are considered as priority
and assigned additional resources by Salesforce at the backend. As a result,
performance of the Job run by a pinned user has a better performance.
With the above, lets look at the actions
Mike took during Launch weekend
Action:
1. 1. Mike raised SFDC Support case for below purposes and Salesforce did help considering the case severity.
i.
Enabling Skinny Table for improving
Query Performance.
ii.
Indexing the fields that were
present in WHERE clause of the query.
iii.
Pinning the User that will be used
for running the batch job.
2.
As Mike had the Batch Job query in
custom label, he adjusted the query to be region specific and maintained 4
queries accordingly.
3.
Mike Enabled Defer Sharing after
deployment was over.
4.
Then ran the Batch Job that had the
query referring APJ region, using the pinned user. Mike followed it up by executing
the Batch Job again after 10 minutes interval from the first job. This time the
job’s query was referring EMEA.
5.
Mike repeated above step for
subsequent Regions. In other words, Mike invoked the power of Parallelism in expediting
Large Data Volume data processing. Parallelism is one of the recommended
practices of Salesforce and can be used in situations where BULK API doesn’t fit.
And guess what?
With the above actions, Mike was
able to process 8M of contact within a day and successfully met business
expectations. The overall performance improved 8X and Mike was elated π
Conclusion:
With appropriate analysis and
decision making, Mike had a great accomplishment. Let’s give a big cheer to
Mike.
And so, we conclude Part 2 of this
Blog series. I hope this article helps you to resolve similar issues at your
end. Am eager to hear your feedback and inputs.
Our Next
article will focus on improving performance while update ownership of Accounts.
Article Links/References:
- Salesforce Field indexing (https://help.salesforce.com/articleView?id=000325247&type=1&mode=1)
- Salesforce Indexing performance Gotchas (https://developer.salesforce.com/blogs/engineering/2013/03/force-com-formula-fields-indexes-and-performance-gotchas.html )
- Skinny tables for LDV:
(https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/ldv_deployments_infrastructure_skinny_tables.htm)
- Defer Sharing calculations(https://help.salesforce.com/articleView?id=security_sharing_rule_recalculation.htm&type=5)
Great article. Waiting for the next oneπ.
ReplyDeleteThank you Kush :)
DeleteI am learning so much from your articles! Thank you!
ReplyDeleteGreat Post. Very informative. Keep Sharing!!
ReplyDeleteApply Now for Salesforce Training Classes in Noida
For more details about the course fee, duration, classes, certification, and placement call our expert at 70-70-90-50-90
Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
ReplyDeleteContinue to share your knowledge through articles like these, and keep posting more blogs.
And more Information Enterprise application for Construction