Hello,
I've a rather odd issue I was hoping someone could shed
some light on.
We have 6 Windows 2003 Web Edition servers set up with
Network Load Balancing (Unicast, port 80 and 443 with no
affinity, plugged into a switch, dual-NIC'd with the NICs
going to separate subnets).
On these 6 servers I have 30 websites, each configured,
each with their own application pool configured.
For a reason I've yet to determin, at random times we get
sporadic losses of connectivity to sites on the load-
balance cluster, often times for hours. When this
happens, I if I terminal service to the servers, I notice
that whichever server is the primary one (the one with the
highest priority in the NLB manager) the CPU is at 100%.
When I look at the running processes, it appears the
W3WP.EXE processes are all consuming as much CPU as they
can.
By comparison, all 5 of the other servers have moderate to
low CPU utilization. Spread across all 6 there are maybe
100 active connections.
If I set the primary host to a "stopped" state, the next
highest priority server immediately pegs at 100% CPU
utilization.
Looking at the IIS Manager on the primary server, the
application pools all eventually stop and say "Unspecified
Error" next to them.
Looking in the Event Viewer on the primary server, I see
the following during the period of this happening:
In the Application Log:
Source: W3SVC-WP
Event ID: 2269
Description: The worker process failed to initialize
the http.sys communication or the w3svc communication
layer and therefore could not be started. The data field
contains the error number.
In the System Log:
Source: W3SVC
Event ID: 1127
Description: A worker process '<some number>' serving
application pool '<app pool name>' is no longer trusted by
the World Wide Web Publishing Service, based on ill-formed
data the worker process sent to the service.
If I switch the priority of the servers, whichever becomes
the primary has these errors, but only for the duration of
being the primary.
This doesn't happen often. But every few days it happens,
and thus far my only resolution has been to reboot all 6
servers at the same time. Doing a drainstop and rebooting
individually doesn't seem to resolve it.
I'd be grateful for any assistance anyone can provide.
Thanks.
>> Stay informed about: Worker Processes consume CPU on NLB Primary host