Tuesday, October 1, 2013

Unable to display this Web Part, only affecting some users, load balancing partially to blame?

This was a particularly strange one.  I can't imagine anyone having the exact same issue as we had this morning, but you never know.

Some users this morning reported the following error:

Unable to display this Web Part. To troubleshoot the problem, open this Web page in a Microsoft SharePoint Foundation-compatible HTML editor such as Microsoft SharePoint Designer. If the problem persists, contact your Web server administrator.

Looking into the SharePoint error logs, I found this:

Error while executing web part: System.NullReferenceException: Object reference not set to an instance of an object.     at Microsoft.Xslt.MethodCollection.ResolveMemberRef(Int32 tokenNum)     at Microsoft.Xslt.MethodCollection.ResolveToken(Int32 token)     at Microsoft.Xslt.MethodCollection.MethodDescription.DeclareDynamicMethod(MethodCollection methodColl)     at Microsoft.Xslt.MethodCollection.CreateDynamicMethods()     at Microsoft.Xslt.MethodCollection.GetMethodInfoInternal(Int32 methodNumber)     at Microsoft.Xslt.MethodCollection.GetMethodInfo(Int32 methodNumber)     at Microsoft.Xslt.STransform.GetCompiledTransform()     at Microsoft.SharePoint.WebPartPages.BaseXsltListWebPart.LoadXslCompiledTransform(WSSXmlUrlResolver someXmlResolver)     at Microsoft.SharePoint.WebPartPages.DataFormWebPart.GetXslCompiledTransform()     at Microsoft.SharePoint.WebPartPages.DataFormWebPart.PrepareAndPerformTransform(Boolean bDeferExecuteTransform)

Now, some Googling found many references to a Windows Security Update that caused a problem back in July.  However, there was no record of this update being installed on any of the SharePoint servers, and certainly nothing had changed overnight (this problem wasn't occurring yesterday).

The strange thing is, this problem wasn't affecting me, no matter which account I used to try to replicate.  But it was affecting my colleague.  In fact, it was his laptop - when I logged into his laptop with my account, I had the same problem.  His laptop is Windows 7, and my machine is Windows Server 2012 (not the SharePoint server).

I initially thought it was just him, because he was able to fix each individual list view by going into SharePoint Designer, saving the view (without making changes), and the list view worked again.  Until he restarted his laptop, then the problem came back.  Eventually, I had a few emails from other people receiving the same error message.

Very strange indeed, and continued to not affect me.

Some more Googling, and I finally found someone else on TechCenter - Recycling app pool causes webparts to fail - who had the same errors as displayed above, but had occurred before the Security Update, so obviously wasn't caused by that.  They figured out that the application pool used by SharePoint hadn't recycled correctly, so recycling it manually fixed the problem (for a while, although it came back some days later).

As we have two web front ends, using Windows Network Load Balancing, I checked the theory - I accessed one of the offending list views from one web front end, and the error happened.  On the other web front end, the error didn't happen.  Clearly one of my web front ends was broken.

I went into Network Load Balancing, took the broken web front end offline, and recycled the SharePoint application pool.  Problem solved.  Restored NLB, and everything is back to normal again.

But what's puzzling me is why this only affected Windows 7 users (when they went through the NLB), and Windows Server 2012 didn't.  It seemed that Windows 7 machines were constantly being sent to WFE1, and Windows Server 2012 machine was being sent to WFE2.  Surely that defeats the point of load balancing?

No comments:

Post a Comment