Wednesday 18 July 2012

Investigating issues with locks

Tadammm

Last post from my "Windbg bible" series :) Let's finish it and then I'll go for something new and more complicated.
What does lock means? In simple words. I call "Lock" a situation, when several threads are requesting one and the same resource (object, file) when this resouce is already locked by some other thread. While this thread is holding a lock on a resource, while it is working on it - other threads are not able to touch it and just spend time in waiting mode.

What are the sympthoms of having lock-issue?


Sympthoms:
  • A lot of threads are created, so when you check !threads in dump, it can have hundreds of threads.
  • If you run !threadpool, you'll notice that processor usage is low, when number of running requests is pretty high:
  • For another lock-issue I had it was a contrary situation:
    Means that processor is busy with something (possibly related to lock), but no threads are running. Most of all, they are waiting for something.
Steps to handle lock-related issue:

1. !threadpool. As we did it above. I'll set an example second case:

2. !syncblk.  Main command to find out what is going on:




Using this command, we can easily find out, which thread is holding the main lock, while other threads are waiting for it. In this case - our owner is thread #59.
Object that is a used for locking is of class AdvancedPublishing.Hooks.PublishBegin.LocalWebDeploy.
There are 197 monitors, watching the locked object state. One of them is our lock owner itself, and two are for each of waiting threads. Why two, I haven't found yet :) I suppose that one of them is checking lock state, and second is trying to obtain a lock, but I may be mistaken.

But how to find out which threads are also participating in this contention on a lock? To do this, we'll need to perform some additional actions on configuring our debugger's workspace.

First you should utilize .chain and .unload command for sos.dll, exactly as it specified in Pocket-Size vocabulary.
Then, download psscor2 and psscor4 dlls and put appropriate versions of them to the folder where Windbg is installed. x86 versions of dlls to x86 folder (C:\Program Files (x86)), x64 versions of dlls to x64 folder (C:\Program Files\Debugging Tools for Windows (x64)).

Then, instead of SOS dll, load psscor2 (or psscor4, depending on .NET version of the process that you are debugging):

3. .load psscor2 Now we'll have a bit better output for syncblk command:


Look, we have now a number not only for holding thread, but for every other thread waiting on this one. Let's take closer look at them.

4. Go to thread #59 and run !clrstack for it:

I have marked in light green important lines. So, our thread is stopped on creating some file, and it was induced by Sitecore's WebDeploy task. Not much information.

5. Now let's switch to thread 20 - one of the threads that are waiting for a lock to be released - and take a look at clrstack:



Now let's go to Sitecore.Publishing.WebDeploy.PublishHandler.OnPublish in Reflector:

Remember that in reality code looks a bit different, than in Reflector. But we can definitely see here our lock place. When code calls lock(this), any other piece of code can't use this object, until lock is released. Usually, lock(this) is not a good practice, but in this case it could be possible on my mind. WebDeploy, using such locks, prevent other WebDeploy threads from using PublishHandler, that does only one thing - uses Microsoft WebDeploy to transfer files between two locations.

But how about custom object we've seen in syncblk, AdvancedPublishing.Hooks.PublishBegin.LocalWebDeploy? How does it relate to our Sitecore code? Let's get custom dlls to find out why this exact object is locked despite all clrstacks do not have any reference to it. Yes, it is possible!


6. !sam c:\temp This command will export all dlls from current dump to the location you've specified as a parameter. Cool, huh? Note, that this is a command from psscor dlls, so if you still have loaded sos, it won't work.




Now you can open all exported dlls in Reflector and find our AdvancedPublishing.Hooks.PublishBegin.LocalWebDeploy class. It appeared that this class is just derived from Sitecore's WebDeploy with slight changes, so in all "important" places Sitecore's WebDeploy is called.


I just wanted to show you that it is possible to get custom dll's from memory dump :)


Conclusion.


After investigation above I've just pointed customer to the possible place of the issue. It appeared that it was in needs of their solution transfer a large amount of files every time when publish appeared. We found out how this can be reworked, so that WebDeploy doesn't need to go through a wide structure of files and perform replace action on each publishing.

Sure, in each concrete case solution will be different, but generally actions should be similar.

Take your cupcake, it's great if you have read this article to the end!



Credits for the images to:
me

3 comments: