Handling CRM Online micro-outages in a web portal

CRM Online sometimes has temporary outages, and if it weren’t for applications that repeatedly connect to CRM using its web services, one may not even know they occur. This situation unfortunately recently occurred for one of our clients, and they asked us to look into the problem.

We’ve been deploying websites that connect to CRM Online using web services for several years now, and every so often, out of the blue, we’ll receive a batch of intermittent exceptions. They’re so short lived that we can’t reproduce them – that’s why I called them a “micro” outage in the title of this blog post. Some of these exceptions include:

System.Data.SqlClient.SqlException: Microsoft Dynamics CRM has experienced an error.

The underlying connection was closed: An unexpected error occurred on a receive.

An error occurred while receiving the HTTP response to https://<site>.crm.dynamics.com/XRMServices/2011/Organization.svc. This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details.

Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.

The common theme in all these exceptions is there’s an error occurring on the CRM side of the connection. On top of this, considering that CRM Online is a software as a service (SAAS) offering, there’s nothing you can personally do to improve the reliability of the CRM itself. So the big question is, what can you do to build a reliable application when faced with an unreliable connection? The answer: use transient fault handling in your portal.

What are transient faults? To paraphrase Microsoft’s Transient Fault Handling Application Block, they’re “…errors that occur because of some temporary condition such as network connectivity issues or service unavailability. Typically, if you retry the operation that resulted in a transient error a short time later, you find that the error has disappeared”.

Thankfully Adxstudio Portals already has built-in support for handling transient faults in the form of an IOrganizationService implementation (called CrmOnlineOrganizationService) that provides fault handling capabilities using the previously mentioned Transient Fault Handling Application Block. With a simple web.config change, your portal will selectively retry failed operations. I say selectively because not all operations are safe to retry. For example, creating a record wouldn’t be a safe operation to perform multiple times because the create operation may have actually succeeded on the server before the error occurred, and trying it again could result in duplicate records being created.

This is a larger topic than I’ll cover here, but in general idempotent operations – queries that can be executed multiple times because they won’t change the state of the system you’re dealing with – are covered by the transient fault handling included in Adxstudio Portals.

The provided transient fault handling should be applied to all deployments that use CRM Online. One tip is that even though the class is called CrmOnlineOrganizationService, it doesn’t just apply to CRM Online, and can be used in any deployment where the connection may be unreliable. Any deployment where a portal is connecting to a CRM organization over the internet would be a candidate for using this functionality.

For full details on the transient fault handling functionality included in Adxstudio Portals and instructions for incorporating it into your portal, see the knowledge base article Transient Fault Handling Organization Service.