The intent of this guide is to provide information on the need of data purging, data purging techniques and best practices to get the optimal response times.
The purpose of this guide is not to provide a single data purging guideline, which works for all but the techniques you may use to determine the need for purging and the ways to purge the data. It is understandable that the data purging approach differs with each customer’s business requirements however we provide a framework that helps to proactively work on data volume and its impact on performance.
Because two implementations are not the same, Salesforce does not have a magic number for recurrent questions like:
Salesforce Commerce Cloud allows customers to extend the platform by adding custom objects or attributes. Custom objects and attributes are just one aspect of the customization; another aspect might be the logic or other integrations, which might impact the stability of the platform. A badly written logic might impact the performance irrespective of the data volume.
Web site efficiency is influenced by many other factors, including integration with external systems, the processing time spent on the web adaptor, application server, and database tiers, and the complexity of the HTML pages the browser renders to create the customer experience.
The platform scales to meet traffic and order peaks most effectively when the web tier handles the majority of transaction requests. As you design your site, it’s important to minimize the number of transactions that pass through the web tier to the application tier and then from the application tier to the database.
A million customers of Client A ! = million customers of Client B. If we know the bottleneck in the system; we can apply some adjustment to the data model with acceptable performance far above the current limit. The underlying database has no difficulty in maintaining large sets of data; but the extension model might introduce the problems, which is different for every implementation.
From the scalability standpoint it is important to purge the obsolete or stale data periodically. A good idea may be to analyze the business objects and determine their churn pattern over the time.
If data is frequently updated, it might not be a good candidate for purging; if objects are not updated over time and became stale then it might be a good candidate for purging. However, there is no hard and fast rule which applies to every customer, it all depends on the data churn rate and the data volume for a given client.
It is advisable to purge of older orders. For some long-term clients, it helped a lot as they only kept the orders from the last 90 days on our platform. This led to less than 500k orders in their system at any time and searches, export etc. worked with improved performance. But this possibly brings other disadvantages to your online business. You might not be able to reliable set up promotions for recurring buyers, first-timebuyers etc., if you only have a history of the recent x days for each customer.
It’s not so much the problem of having that many records in the database; it is really the query performance. With the Search APIs (uses elastic search to retrieve the data) for customer and order it is now feasible to maintain high number of customer and order records in the database with always a consistent response times. However, Salesforce Commerce Cloud is not designed as a system of records to maintain a very high volume of data. To achieve optimal eCommerce experience, it is advised to maintain only the required records in the database.
You will see constantly references to Orders and Customer Objects however concepts explained are applicable to other object types as well including but not limited to the Custom Objects.
Salesforce Commerce Cloud cleans up obsolete customer data based on the settings merchants/ administrators apply in Business Manager.
You can configure the lifetime of the following data. In Business Manager, go to Administration > Global Preferences > Retention Settings, to specify retentions settings.
There is a system job called "PurgeObsoleteData" that checks the last visited time stamp (if not available, last login and creation date as well) to find customers that are eligible for removal according to the retention preference mentioned in the Business Manager.
In general, it’s nice to have only as much objects in the database as needed. So if you keep your orders in a third party OMS and don’t require them in Salesforce Commerce Cloud, there is no reason not to purge them. Other customers however may use Commerce Cloud as the system of record, or they want to display the order history for the last 7 years for consumers. There might also be legal obligations forcing them to keep that data.
Salesforce cleans up obsolete order data based on the settings merchants / administrators apply in Business Manager. You may configure the lifetime of an order in Business Manager at following path: Merchant Tools > Site Preferences > Order > Order Data Settings
The system job “PurgeObsoleteData” uses the specified number of days as configuration. Orders older than the specified number of days will be automatically removed from the system. Leave blank if orders should never be purged from the system.
The usage of Custom Objects is inevitable and it is one of the core features of Salesforce Commerce Cloud however retention details for the Custom Objects should always be provided. Retention information can be provided in the “General” tab of the custom object definition.
The “PurgeObsoleteData” System Job reads this value to determine if the object may be purged on any given day or not.
We can extend Salesforce Commerce Cloud platform by creating:
The System stores custom attributes and localizable system attributes in the database in tables with the System Objects. You access these using a compound key of the attribute ID, the locale, and the corresponding system object ID. Accessing these attributes in the database is expensive – especially as the data set grows over time. Defining many custom attributes and creating object queries with many attribute conditions impedes performance.
The system processes Custom Objects similarly, so use Custom Objects carefully, as well. For example, if you implement custom analytics using custom objects, you’ll have to write to these objects for each request. This implementation might seem fine in your sandbox, but on production, customers can generate hundreds or thousands of these custom objects, degrading performance. A valid use of custom objects is to store small, temporary data sets or data that an administrator manages using Business Manager. A common use case for custom objects is to store temporary data to configure analytics integrations.
Be sure to access Custom Objects and System Object attributes using primary keys rather than secondary keys. You can determine the primary key for each object in Business Manager. The primary key is the object’s attribute shown with the key icon.
The main problem is always caused by the extension model. The required join between order/customer table and the corresponding attribute table. This one is easily 30 to 50 times bigger than the actual object table and this is where the problems start. If you know and think about this upfront, you can:
There may be various interaction points where a customer search may take place. It is important to understand that the high data volume might drastically increase the customer search response times or number of search timeouts.
You should use searchProfiles() and processProfiles() methods of CustomerMgr Class for retrieving and manipulating customer objects. These methods use the latest Search service. Elastic search nodes are the actual search backbone for these new APIs.
Please note: the queryProfile() method will be deprecated in the future releases as this API would directly fetch the records from the DB. The performance would be fine as far as you are using the indexed attributes for the query however if you use a custom attribute, the database needs to join the tables, which might degrade the performance. Therefore it is recommended to use the searchProfiles() and processProfiles() method to get a consistent performance.
Recommended methods are searchOrder(String, Object…), searchOrders(Map, String) and searchOrders(String, String, Object…) to search for orders, and to use method processOrders(Function, String, Object…) to search and process orders in jobs. More details in the OrderMgr Class.
Please note that queryOrders() methods will be deprecated in future releases. This API directly fetches the Order records from the DB which might cause slow response if query is not performed over indexed columns. Therefore it is always recommended to use the searchOrder() methods which uses the indexed order objects.
The current Salesforce Commerce Cloud platform is purpose-built for commerce, is not presently architected to be the system-of-record for orders. This has to do with the design of the underlying core business objects (such as orders), and the design of the API methods and associated functions (like ‘Search’) for accessing these core objects in real-time and at high-volume.
For that reason, it’s a best practice to periodically archive orders external to the Commerce Cloud system, for improved performance of the storefront and/or other tools (Business Manager) – which accesses this data.