Sudden Onset of High Utilization / Sluggish Performance

Welchia Worm - Sept. 1, 2003

As of September 1, 2003, I have seen at least three systems in the past week that were all displaying the same symptoms with the same cause - very sluggish internet access, very sluggish server performance and high server utilization with filters up.

In each of these cases, the problem was due to W32.Welchia.worm, or one that has a similar profile.

BorderManager servers configured to allow ICMP through the filters from private to public interfaces will have a real problem with this worm. The problem will occur with either ICMP-ST or ICMP filter exceptions, and a non-stateful exception is not going to give better results for you than a stateful exception.

If you remove the filter exception that allows ICMP through the filters, you should see an immediate drop in server utilization percent in MONITOR.NLM (for instance, from 50% to 10%), and the server will be much more responsive.

The Welchia worm will be trying to ping thousands of IP addresses, and most of that traffic will end up at your BorderManager server because it is the default gateway.

Once you remove any exceptions allowing outbound ICMP, you should see a very large disparity in the ratio of transmitted to received packets on the server's private IP address. Look in MONITOR, LAN/WAN Drivers, and select the private interface with frame type Ethernet_II. Press the TAB key. Normal traffic has approximately equal amounts of transmitted and received packets. However, with Welchia being filtered, I have seen in excess of 150 times as much received traffic as transmitted. (ICMP packets received, being filtered, and not transmitted on).

If you try to turn on filter debugging to see icmp discards, the server will be so overwhelmed with that display that you will lose control of the server and not be able to type commands. (For this reason, I advise you not to enable icmp filter debugging). However, if you insist on doing this, write an NCF file to enable the debug commands, and add a line to the end to disable the debug commands, with a ? in front of the disable command to disable debug after a 10-second delay. Should you have simply manually set filter debugging on, you should disconnect the private interface cable and wait for up to 2 minutes for the buffered display to quit so that you can get control of the server console again.

Bad Hardware on LAN

In two cases in the past week, the symptoms were similar to the Welchia, in that the server was very 'slow', and in one case there was high utilization in the filtering modules. In both cases, the problem was due to bad hardware on the private LAN flooding the BorderManager server with bad packets. One of the servers showed thousands of miscellaneous errors per second on the private NIC (looking in MONITOR, LAN/WAN Drivers). I did not see the stats on the other server. In one case, a bad RSM module on a Cisco switch was sending the error packets. In the other case, a 3COM switch was having a problem after a recent lightning strike and power outage and needed to be rebooted.

It is worth noting that in 5 cases I saw in the past week, none of the problems were due to a 'problem' on the BorderManager server, and all were due to one or more internal hosts sending excessive amounts of problem traffic to the server.



Return to the Main Page