Today we are talking about heavy CPU load. How it all begins? Usually customers are complaining about the fact that his solution suddenly started to eat 100% CPU. Actually, it can be not only 100, but any value that is higher than standard CPU consumption for this process. Requests can be performed slowly or are not served at all.
How to gather dumps in these situations, is described here. But often it will be even better to gather dumps simply by clicking context menu on w3wp process in TaskManager.
What should be your steps in investigating such issues:
1. Take 2 dumps within 1-3 minutes. Why do you need 2 dumps instead of one, as we usually do for high-memory-usage issue? The answer is logical: memory dumps shows us only a snapshot of memory at some discrete moment of time. The process that is now occupying a processor may be freed and replaced with another one - at next point of time. So to understand which process is the most time consuming, we need to observe memory through some period of time.
2. If you've got two dumps, next thing is to open both of them in two different WinDBG windows. Then define, which of them is earlier and which one is later (and also find out how much time passed between them) by using .time command:
As can be seen from screenshot, blue one is first dump, pink one is second. Time between dumps is about 1,5 minutes.
3. Run in both of them next command:
!threadpool
This command allows us to make sure, that dumps show high CPU usage, and also give first look at current situation with threads:
Ok, so CPU utilization for both processes is high - 100 and 89. Next thing that we can pay attention to, is Total number of threads, Running and Idle.
4. !runaway. Main command today. It show all threads sorted by time spent on CPU by each thread:
Now is the time to determine the most time-consuming thread by counting difference between first output and second output. On this exact dump pair it is not that obvious - time is almost equally divided between first ~10 threads. For example, threads 23, 34, 33, 32, 36, 40, 41 has taken ~11 seconds each.
5. Now we'll take the most time-consuming threads one by one and run !clrstack and !dso for them to find out, what is going on in each concrete thread. You may run these commands on one of the dumps, as second most of all will hold same values.
Important note:
It happens often, that most time-consuming threads are ones that are actually in waiting mode. !runaway shows how much time each thread has spent on the processor from the application start, so it doesn't make difference between sleeping threads and actual workers. This is where 2-3 dumps may help. You should always compare difference between time taken, not the absolute value.
One more important note:
In Sitecore memory dumps you will often notice stacks like:
Sitecore.Threading.Semaphore.P()
Sitecore.Threading.Semaphore.WaitOne()
Sitecore.Threading.ManagedThreadPool.ProcessQueuedItems()
This is how Sitecore manages threads, and you shouldn't pay attention to these callstacks - they spent most of their thread life in waiting mode.
Let's get back to our stacks.
Important note:
It happens often, that most time-consuming threads are ones that are actually in waiting mode. !runaway shows how much time each thread has spent on the processor from the application start, so it doesn't make difference between sleeping threads and actual workers. This is where 2-3 dumps may help. You should always compare difference between time taken, not the absolute value.
One more important note:
In Sitecore memory dumps you will often notice stacks like:
Sitecore.Threading.Semaphore.P()
Sitecore.Threading.Semaphore.WaitOne()
Sitecore.Threading.ManagedThreadPool.ProcessQueuedItems()
This is how Sitecore manages threads, and you shouldn't pay attention to these callstacks - they spent most of their thread life in waiting mode.
Let's get back to our stacks.
~23s (switch to 23rd thread)
!clrstack:
Ok, something related to Sitecore's watchers.
!dso:
Something interesting: a lot of ThreadAbortException.
Now about causes of the problem. There was an issue in early Sitecore 6.X related to performance trobles invoked by watchers: http://sdn.sitecore.net/products/sitecore%20v5/sitecore%20cms%206/releasenotes/changelog.aspx. However, on my Sitecore 6.4-6.5 it should not be the case.
I've taken a look at FileWatcher.Worker() in Reflector:
Looks like if Exception is thrown, we'll still try to make Thread.Sleep and possibly receive an exception again. I can't say exactly why thread is being aborted. BTW, if you make !do for any of the strings in !dso output above, you will receive standard message "Thread was being aborted.":
What I can say is that this issue happens on my site right after switching its AppPool to .NET 4.0.
Taking into account that other threads (34, 33, 32, 36, 40, 41...) also contain mentions of Sitecore's watchers (ConfigWatcher, XslWatcher etc.), I have just disabled watchers in web.config by commenting them out.
After that action CPU problem above has gone. I think that I would not experience this issue if I make some "deep restart" of my solution after switching to .NET 4.0, but I just needed fast switch, and after noticing high CPU I've made these dumps for this article. :)
From my practice I would say that mainly high CPU is caused by some custom classes and code, and usually in "top-runaway" threads you'll find custom code, xsl function calls or code behind of some sublayouts.
Credits for the images to:
A Windows feature called superfetch preloads frequently used programmes into memory for quicker access. On some systems, nevertheless, it can result in performance concerns. To disable Superfetch in Windows, open Services, locate SysMain, right-click and choose Properties, change Startup type to Disabled, and then click Stop.
ReplyDelete