We have recently developed a RESTful .NET 4 service which was deployed to a server running Windows Server AppFabric. We wanted to take advantage of all the Windows Server AppFabric goodness and its WCF monitoring capabilities.
Deployment could not have been easier: after creating the zip file and release instructions, our admin deployed the service using the MSDeploy add-on to IIS to import the zip file. The only thing to do next was test it. When we called our RESTful service, it worked as expected. But the ‘WCF Call History” counts AppFabric Dashboard were not increasing. All the counts were zero.
After some time getting our heads around it, we fixed the issue. Before I share what the resolution was, which was pretty simple actually, here are some of the steps you might want to take when troubleshooting Windows Server AppFabric. Start by checking that Application monitoring is enabled and set to write the Event Trace for Windows (ETW) events to the Server AppFabric monitoring database. To do this right click your service, select “Manage WCF and WF Services” and select “Configure…”:
Then select the “Monitoring” tab and check that “Write events to database” is checked, that the connection string is configured, and that the “Level” is something other than “Off” and “Errors Only”:
After that, check that the connection string selected in the previous step is correct. To do this select your service and double click “Connection Strings” on the “Features View” pane in IIS, then double click the connection string:
Note how integrated security is being used. One important thing to notice here is that the identity of the application pool used by the WCF service has nothing to do with the Windows Server AppFabric monitoring. It will use either the identity of the user configured on the AppFabric Event Collection Service – when the connection string is set to integrated security – or the SQL user defined in the connection string when using SQL authentication. In our case as you can see above we were using integrated security, so the user setup in the service should be the one connecting to SQL Server:
It all seems normal, the AS_Administrator has write access to the monitoring database, so it’s time to kick it up a notch: as Ron Jacobs has mentioned on Enpoint.TV, .NET 4.0 takes advantage of Event Trace for Windows (ETW) to create traces of WCF services and workflows. These can be seen in the “Application Server-Applications” part of the Windows event log. There is another part of the event log that gets created when Windows Server AppFabric is installed: “Application Server-System Services”. These are the logs used by the two AppFabric windows services. So we enabled the Admin log in this folder to see why the event collector service did not seem to be writing the events to the monitoring database:
After a few seconds we started seeing the errors show up on the log, being thrown by the two AppFabric services (notice the title bar, “System Service Event Collector” and “System Service Workflow Management Service”:
All the errors pointed at the same exception: “Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’”. What this means is that the Windows Server AppFabric services were not sending the service account through but instead were connecting with the anonymous log on, and, of course, this account does not have access to the AppFabric databases. This error is also visible from the SQL Server logs, which you can also use for this kind of troubleshooting. The error messages in the SQL Server log were (I masked the IP address):
“Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: XXX.XX.XXX.XX]”
“Error: 18456, Severity: 14, State: 11. “
Usually this error means that the SQL Server Service Principal Name (SPN) was not configured, and NTLM was not being used as an authentication mechanism.
Now that we knew that this was the issue the first thing we tried was what fixed it – we simple restarted the two Windows Server AppFabric services. That simple. Maybe some changes were made to SQL Server after AppFabrc had been configured and the windows services got out of synch from SQL Server. While they could find the server, they could not authenticate against properly. If the restart had not fixed it we would go the path of ensuring the SQL Server service’s SPN was registered correctly.
Once the services were restarted we could then see the number of WCF completed calls start to rise and it all started working properly: