348 351

Previous Table of Contents Next Killer Packets Here's a case in point for how filtering applies to troubleshooting a busy server. I saw a problem in which a UNIX server started to have trouble sending print jobs to a Novell server. The Novell server would all of the sudden, and at seemingly random times, generate errors on its UNIX services screen (PLPD) and stop processing. Only a reload of the PLPD.NLM file would make the server start processing UNIX print again. Our first question was, "Who changed something on the Novell server?" The answer was...nobody. Nothing had changed on the Novell server. No interrogation or torture was spared to verify this; we were absolutely certain that nobody had changed anything in the time frame that we were talking about. This was a really tough problem to troubleshoot: A search on the Novell support site for the particular PLPD error message revealed nothing, and the problem was still popping up intermittently. We needed an answer relatively quickly, because this print gateway was responsible for processing print for a time-sensitive function. Because we were relatively certain that nothing had changed on either the Novell server or the UNIX server (in fact, the UNIX server was printing fine to other Novell servers), we decided to see what was happening on the network. Maybe some errant evil packet was causing the PLPD server some mental illness. We connected a sniffer to the server's segment (because we suspected something bad was happening to the server) and considered what we wanted to filter on: o Because we knew something was happening to the Novell server, we would only capture packets destined for the Novell server's MAC address. o Because we knew that this was a very busy file and print server, it wasn't feasible to capture all packets destined for this server. o Because we knew that the problem was with PLPD (and knew that PLPD accepted UNIX print services via TCP/IP), we would only accept TCP/IP packets. This eliminated most of the packets destined for this server, which were Novell file and print IPX/SPX packets. This left us with a test setup that looked something like what's shown in Figure 21.4. [21-04t.jpg] Figure 21.4 The test setup for a tough NetWare-to-UNIX printing problem. As soon as the problem occurred again, we looked at the packet capture. There are two important concepts here: First, we ran and stopped the analyzer right after the trouble report. Second, we synchronized the clock on the network analyzer to the network time before we started capturing, and we asked the user who reported the problem to also report the time of the problem. Because this was a pretty busy print service, we were sure that the problem report was within plus or minus two or three minutes, so we now only had to consider packets around the time of the report, thus limiting how much junk we had to wade through. Skipping to the end of the trace, we first filtered on the LPD TCP socket, number 515. We did see a problem: The server stopped responding to the LPD requests from the UNIX host at the end. Well, we knew that without taking a trace. Still, this was useful: It let us know where in the packet list the problem occurred. Therefore, we got rid of the LPD filter, jumped to the packet where the problem occurred, and looked at the packets right before the problem. Apparently, right before the problem occurred, there was an ARP request (TCP/IP's Address Resolution Protocol). Remember, each TCP/IP address must have a corresponding MAC address in order for two network cards to talk. The ARP request I saw was responding with the wrong MAC address. An ARP packet with the wrong MAC address typically means that someone else has used a TCP/IP address that's the same as yours, thus interrupting communications-but that was not the case here. We tried to find the MAC address reported by the ARP request, but there was no such network card on our network. Not only that, but I couldn't find the OUI of the MAC address in my OUI table, which was also suspicious. Furthermore, this was a network where only one or two well-known vendors' cards were in use. Because there was no such device on the network, we next looked at the switch configuration (remember from Hour 14, "Router and Switch Basics," that devices on different sides of a switch do not actually talk directly to each other). Because there was a MAC-level problem, we naturally suspected the switch. We asked the person responsible for switch configuration if anything had changed in the last couple of days-and, in fact, something had. He therefore changed the configuration back to the way it used to be, and the problem went away. Tough problem solved! Two things still bothered me, though. Why could I ping the Novell server at all if the ARP was incorrect? Well, because ARP is "redone" every couple of minutes, by the time I was on the scene troubleshooting, the ARP was correct again; therefore, I could ping the server without a problem. The switch was only sometimes messing up the ARP; usually, it was just fine. Second, why did a bad ARP mess up the LPD service? That's a tougher question, and one I wasn't going to find the answer to, mostly because it didn't matter. The PLPD.NLM file (and for that matter, the TCPIP.NLM file) on the Novell server in question was somewhat old, and an interruption in the data stream was apparently driving it berserk. After the switch configuration was fixed and the ARP problem went away, everything was okay once more (and that, after all, is what's really important). Previous Table of Contents Next

Wyszukiwarka

Podobne podstrony:
348 351
351 354
348 (2)
351,17,artykul
03 (351)
12 (348)
02 (351)
02 (348)
351 Ujęcie w ksiegach rachunkowych dodatniej wartości firmy
s 351 Leksykon onkologii Dozymetria
14 (351)

więcej podobnych podstron