System Jobs Maintenance and Deleting System Jobs using ALM Toolkit PowerShell cmdlets

I was working on a CRM 2011 system that had several million system job records in it, and the number of records was growing very rapidly. The database had reached a size of over 120 gigabytes, and the number of continually growing system jobs was becoming an administrative and performance issue. I explain in this blog post how I analyzed and prevented the accumulation of new system jobs and deleted the existing ones. All of this information is still relevant to newer versions of CRM.

The first step I took was to figure out why so many system jobs were in the CRM in the first place. There were several contributing factors:

  1. System jobs with a status reason of succeeded were from plugin steps and workflows that weren’t configured to delete the automatically associated system job record upon completion.
  2. System jobs with a status reason of cancelled were from plugins and process workflows modifying records and causing a cyclical chain of other plugins and workflows to execute. The asynchronous service detected this infinite loop of executions with the error message “This workflow job was canceled because the workflow that started it included an infinite loop. Correct the workflow logic and try again.”
  3. System jobs with a status reason of failed were from plugins and workflows with logical errors in them.
  4. The recurring bulk deletion jobs that normally delete old system jobs were failing to run because there were too many system jobs in the CRM.

Having identified the source of the system jobs, I next took steps to prevent the accumulation of new system jobs. I performed all these next steps in a test environment first to ensure I would only promote changes to production once they were validated to work.

  1. For successful system jobs, I updated the associated plugin steps and workflows to delete themselves upon completion. A plugin step can be configured to do so using the plugin registration tool and enabling the ‘Delete AsyncOperation if Statuscode = Successful’ checkbox.
    Delete-AsyncOperation
    A workflow can be configured to do so by editing the workflow in the CRM and enabling the ‘Automatically delete completed workflow jobs (to save disk space)’ checkbox.
    workflow-job-retention
  2. For failed system jobs, I reviewed the various error messages shown in the system jobs and fixed the bugs that triggered them and published new versions of the associated plugin assemblies and process workflows.
  3. For cancelled system jobs, I had to trace the sequence of execution of plugins and workflows and identify which records were being modified and the triggering conditions for the other plugins and workflows. The audit log proved to be very helpful in identifying which fields were being modified on records and find the plugin or workflow causing the changes. The final solution was to fine tune the logic in the plugins to only save fields that were changing (rather than saving the entire record after modifying fields on it), and modifying the workflows to only be triggered off of the minimum set of necessary fields.
    Record-fields-change

The next step was to delete the massive number of system jobs and get the recurring bulk deletion jobs running again. Again, I performed all these next steps in a test environment first. These were all in vain but lead me to the final solution.

  1. I tried to setup more granular bulk deletion jobs, but they still failed to start.
  2. This was a CRM 2011 database and I tried to run SQL scripts published by Microsoft to perform a cleanup of the AsyncOperationBase table, but after some time the script would fail and the database became corrupted. I tried this a few times without any success.
  3. I tried deleting system jobs incrementally through the CRM web interface, and while this worked it was only possible to delete 250 records at a time, and would have taken far too long to manually to run over and over again to delete millions of records. At this point I realized I could use the ALM toolkit to automate the deletion process.

ALM Toolkit is a standalone commercial software product that provides PowerShell scripts and cmdlets for interacting with CRM. We most often use the ALM toolkit for migrating data and solutions between environments, but it can also perform general purpose operations against the CRM. I wrote the following PowerShell script that retrieves the system jobs and deletes them using cmdlets included in the ALM Toolkit. I’ve added comments to explain each step.

# import the ALM module to access its cmdlets
& "C:\Program Files (x86)\Adxstudio\ALM Toolkit\1.0.0016\Scripts\ImportAlmModule.ps1"

# create a connection to the CRM. Update Url= using a connection string format documented at https://msdn.microsoft.com/en-us/library/gg695810.aspx
$crm = Get-CrmConnection "Url=http://crm.contoso.com/xrmContoso; Domain=CONTOSO; Username=jsmith; Password=passcode;"

# define the Fetch XML query to retrieve the system jobs to delete. Use the advanced find window to help construct this query
$jobsQuery = '
<fetch version="1.0" output-format="xml-platform" mapping="logical" distinct="false" count="250">
  <entity name="asyncoperation">
    <attribute name="asyncoperationid" />
    <order attribute="startedon" descending="true" />
    <filter type="and">
      <condition attribute="startedon" operator="on-or-before" value="2015-07-07" />
      <condition attribute="statecode" operator="eq" value="3" />
      <condition attribute="recurrencestarttime" operator="null" />
    </filter>
  </entity>
</fetch>
'

Write-Host "$(Get-Date) Starting"

# keep track of the number of deleted records for reporting
$countDeleted = 0

# retrieve the first batch of system job records
$jobs = Get-CrmEntity -OrganizationServiceConnection $crm -FetchXml $jobsQuery

# continue if any records were found
while($jobs.Count -gt 0) {

    # iterate through the records and delete each record by its ID
    foreach ($job in $jobs) {
        Remove-CrmEntity -OrganizationServiceConnection $crm -EntityLogicalName "asyncoperation" -EntityId $job.Attributes["asyncoperationid"].Guid.ToString() | Out-Null
        $countDeleted++
    }

    # show the total accumulated number of deleted records
    Write-Host "$(Get-Date) Deleted $countDeleted"

    # retrieve the next batch of system jobs
    $jobs = Get-CrmEntity -OrganizationServiceConnection $crm -FetchXml $jobsQuery
}

Write-Host "$(Get-Date) Done"

A few notes about this code:

  • The fetch XML specifies a batch size of 250 because it provided a reasonable 10 to 15 seconds interval of feedback time per iteration, feel free to tweak this number.
  • The returned attributes only includes the asyncoperationid to ensure the minimal amount of network traffic and memory are used during the script’s execution and to achieve the best possible performance.
  • The results are ordered by started on date descending to ensure the queries execute quickly. I found this specific sorting was necessary when dealing with a very large set of records. A different sorting caused the queries to time out.
  • The statecode condition controls which status reason of system jobs to delete. 3 means completed, and you can substitute the other state code values for your purposes.
  • The recurrencestarttime condition set to null ensures recurring jobs are not deleted.
  • The startedon condition ensures that it only deletes records prior to a certain date. In my case I didn’t want to delete new system jobs that were created while the script was running, so I set it to the day before the current date. You may have data retention policies for how far back system jobs are retained and this value can be set to adhere to that.
  • When running this code without a license key, ALM Toolkit runs in trial mode with a byproduct of displaying a prompt to continue before each CRM query. It is necessary to purchase a license if you want to use the PowerShell code without seeing the prompt. Pricing options to purchase a license are shown on our corporate site’s Pricing & Licensing page.

The script took several days to run, and I ran it during off hours to ensure system performance wasn’t impacted. Once complete, shrinking the database reduced its size by about 90 gigabytes.

If you do use or modify the code included in this sample script, ensure you run and validate it in a test environment before running in a production environment.