1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

Chapter 41. Network Troubleshooting (Topic 214)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (10.57 MB, 1,207 trang )


This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



41.1. Network Troubleshooting Essentials



One of the things that the creators of the LPI Exams knew was that you need to take a systematic approach to troubleshooting. When

faced with a networking problem, inexperienced administrators often begin clicking the mouse button and pecking away at the keyboard

without first clearly identifying the problem.

You need a methodology before you can begin altering files and running commands. The LPI Exam will not necessarily ask you what

that method should be. However, the questions that deal with troubleshooting will often be phrased in a way that suggests a problem

exists for some mysterious reason.

If you take the time to think about the situation being presented, you will find that the problem is not that mysterious and that you can

make an educated guess at the correct answer. Once you are equipped with the proper perspective, you can then focus on the

commands to use and the files to edit. Table 41-1 discusses essential troubleshooting steps to take when approaching a problem.



Table 41-1. Essential network troubleshooting steps

Troubleshooting step



Description



Gather all of the facts.



Carefully observe the problem. Many times, you will be presented with different ways to survey the issue.

Read log files, analyze screen output, and use applications such strace and ltrace to gain a more informed

perspective.



Listen to your first

impressions.



If you have experience with system administration, your intuition can be useful.



Remain flexible.



Do not stubbornly stick to one idea. After a few failed attempts, consider taking a new approach to the

problem, at least for a while. If this new approach does not solve the problem, return to your original idea,

or take another direction.



Categorize the problem.



Try to determine if the problem is hardware- or software-related. Then, determine if the client or the server

is experiencing issues. Remember to be flexible. A problem might seem to belong in one category at first,

but may belong in another.



Make educated

guesses.



An educated guess is not a wild stab at a solution. Form a hypothesis, then conduct experiments. Make

sure that the changes you make are not harmful.



Document your

impressions and

attempts.



Writing down the steps you take will help you be more systematic and will help you recover from mistakes.

Other administrators who come after you will appreciate it. In many cases, troubleshooting can take

considerable time. Notes can help you remember exactly what steps you took, what files you reviewed and

altered, and the result of each step.



Create and verify

backups.



Before you alter a file to solve a problem, make sure that you store backups of the files.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



.



41.2. Common Troubleshooting Commands



A discussion follows of commands you can use to resolve network problems.



41.2.1. ping

ping can do more than just determine basic connectivity. You can also use it to discover the quality of a network connection. If users are

complaining about a spotty network connection, using ping in the right way can give you a reasonably accurate idea of how much of a

problem exists.

You will not be able to determine that this is the problem by using the ping command in the standard way. However, if you useping to

generate a flood of packets, you can get a fairly accurate idea of how intermittent the connection really is. As root, use the -f option to

generate a ping flood, as shown here:

root@james:~ # ping -f albion

PING albion.stangernet.com (192.168.2.57) 56(84) bytes of data.

.........................................................................................

.........................................................................................

.........................................................................................

.............

--- albion.stangernet.com ping statistics --433 packets transmitted, 153 received, 64% packet loss, time 4833ms

rtt min/avg/max/mdev = 2.470/2.859/6.359/0.623 ms, ipg/ewma 11.189/3.012 ms

root@james:~ #



Notice that the output says that 64% of the packets were lost. Generally, a packet loss rate between 1% and 2% is tolerable, except by

the most sensitive applications. Rates higher than even 2%, and especially 5%, are generally too high for a reliable network connection

that is doing any real work (e.g., an X Window System session or a database connection).

So far, ping has not yet helped you determine if this system is experiencing a hardware problem or a software problem. But now that you

know some sort of problem exists, you can begin hypothesizing. Steps to take might include:

1.



Make sure that other systems are not experiencing the same problem. This may involve verifying that the hub or switch is

working properly.



2.



Send a flood of packets to additional systems to make sure that the problem does not reside on the remote system.



3.



Check the physical connection on the local system, as well as to the hub or switch.



4.



Make sure that the driver on the local system matches the hardware.



5.



Check the NIC's subnet mask.



Tip: Intrusion detection systems (IDS), described inChapter 40, cannot tell the difference between an authorized,



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



well-intended ping flood and one that is intended as an attack. If necessary, warn your security team that you are

conducting ping floods before you create one.



If the system can't connect to a remote network such as the Internet, ping the router. Doing so involves more than a ping of the interface

for the subnet. Ping the interface on the far side of the router. Then move to pinging hosts on the other side of the router. Understand,

however, that many system administrators use access control lists to disable pinging across routers and switches.

Finally, when using ping, consider the following:



Use the -I option to choose the correct interface

Many systems have multiple Ethernet interfaces. You will want to make sure you are pinging the correct system.



-Use the n option if name resolution has failed

Doing so helps ensure that only IP address information is used and returned.



41.2.2. telnet and netcat



You have already learned that you can use telnet and netcat (nc on some systems) to query ports and gather information. It is important,

however, to understand that you will be presented with different types of messages and errors in the context of a troubleshooting

situation. Not all responses and errors are equally meaningful. But most of the responses can be quite useful. Table 41-2 provides a

useful list of the most common responses.



Table 41-2. Responses to telnet and netcat queries

Response



Explanation



"Name or service not known" or

"No route to host"



No system exists with that IP address or name.



"Connection refused"



Confirms that a remote system is listening. However, the port you have attempted to connect to is

not open or is blocked by an iptables or ipchains rule. You nevertheless have found a live system.



"Name or service not known" or

"Forward host lookup failed:

Unknown host"



A DNS error indicating that no host by this name exists. This does not mean that the host does not

exist at all. The name server is simply reporting that this name does not exist. Try connecting to

this host by IP address. Possibly useful when troubleshooting DNS.



Connection hangs for a moment,

then is dropped with no

explanation



An application such as TCP wrappers has processed the connection, then dropped it. Useful when

troubleshooting TCP wrappers configuration or in determining problems with nonworking services.



Connection seems to hang

indefinitely



Usually implies that telnet or netcat has connected to a port. Note that in some cases, if you do not

wait long enough, you can mistake a connection for a failed connection. Wait 4 or 5 seconds

before you think that you have made a connection.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register

. it. Thanks



Once you have made a connection using telnet or netcat, you can then type in commands and send them to the listening port.

Sometimes, that port may not respond at all. At other times, the port drops the connection immediately or returns gibberish.

In some cases, the port may allow an interactive session. SMTP, POP-3, and IMAP servers allow you to open a session and send

commands. Many system administrators have memorized the necessary commands to send and receive email using nothing more than

a telnet client or netcat. Following is an example of how you can use netcat to read e-mail from a POP-3 server:

# netcat mail.company.com 110

Trying 214.27.208.3...

Connected to mail.company.com.

Escape character is '^]'.

+OK (rwcrpxc59) Maillennium POP3/PROXY server #65



USER lpicprofessional

+OK



PASS passedexam1

+OK ready



LIST

+OK 1 messages (31227)

1 31227

.



RETR 1

From: certification@lpi.org

Subject: Congratulations

Congratulations upon achieving LPIC 2 status.



QUIT



In this sequence, netcat was used to connect to port 110 (the standard POP-3 port), and the user proceeded to enter a series of

commands to read an email. First, the user issued the USER and PASS commands to authenticate to the remote system. Then, theLIST

command was issued to see if any emails were waiting. In this particular session, one message was waiting.

To read the email message, the user simply typed RETR 1. The contents of the message were then displayed, giving good news in this

case. If multiple email messages existed, the user could have typed RETR 3 to read the third message, or RETR 41 to read the 41st

message. To end the session, the user typed QUIT. A session using telnet would use the identical POP-3 commands.

You can also communicate with web servers using telnet or netcat. Following is a simple HTTP session usingnetcat:

$ netcat stageserver.company.com 80



GET /





Web site





Placeholder for Web site.







$



First, netcat was used to connect to the Web server namedstageserver.company.com. The HTTP GET / command was then used to

returned the default web page that would normally be read by a standard web client. Instead of using the GET / command, you can

simply type in gibberish. Many web servers, especially if they are still using default settings, will reveal the server version and other

information:

$ netcat james 80



asdf





This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



501 Method Not Implemented



Method Not Implemented



asdf to /index.html not supported.







Apache/2.0.53 (Ubuntu) PHP/4.3.10-10ubuntu4.3 Server at james.stangernet.com

Port 80




$



Here, netcat was used to connect to a private web server maintained by the author at the host systemjames. In response to the gibberish

entered by the user, the server issued a response that included not only the version of Apache Server, but also the server operating

system and the fact that PHP is enabled. Not all daemons will respond with useful information, however.



41.2.3. ifconfig



The ifconfig command can be quite helpful during troubleshooting if you take the time to read all the information it provides. In addition to

standard networking information (e.g., the IP address and subnet mask), the typical ifconfig output tells you the following:



Whether or not the interface is up (the UP flag)

If it is in broadcast and multicast mode

The number of packets received and transmitted since last activation

The number of errors and overruns

The number of bytes received and transmitted

The interrupt used, and base address



Here is an example of ifconfig output:

$ ifconfig eth0

eth0

Link encap:Ethernet HWaddr 00:80:5F:EA:86:8F

inet addr:24.17.140.230 Bcast:255.255.255.255 Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:44354070 errors:0 dropped:0 overruns:0 frame:0

TX packets:3078006 errors:0 dropped:0 overruns:0 carrier:0

collisions:113575 txqueuelen:100

RX bytes:1730626695 (1650.4 Mb) TX bytes:553335663 (527.7 Mb)

Interrupt:11 Base address:0x6100



The information gathered here can help you narrow down both hardware and software errors.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register

. it. Thanks



41.2.4. traceroute

Don't underestimate the usefulness of the traceroute command. Don't be too confident that you know everything abouttraceroute, either.

For the exam, be able to identify each element of traceroute output. Consider the following example:

# traceroute 213.236.195.41

traceroute to 213.236.195.41 (213.236.195.41), 30 hops max, 38 byte packets

1 linpro-intra-gw (80.232.36.129) 0.212 ms 0.154 ms 0.133 ms

2 tott (80.232.38.218) 0.931 ms 0.783 ms 1.209 ms

3 tdc-A100M-0225-hsrp.linpro.net (80.232.38.220) 1.471 ms 1.505 ms 1.678 ms

4 212.37.252.2 (212.37.252.2) 1.469 ms 1.834 ms 2.457 ms

5 pos3-0.622M.osl-nyd-cr1.ip.teledanmark.no (213.236.195.41) 2.043 ms * 2.906 ms



In the output, notice that each hop has three latency times shown. If you were to ping these systems, you would receive the same times.

If the routing is randomized through some routing daemon, subsequent uses of traceroute could discover new hosts.

An asterisk represents either a lost packet or the fact that a router has been programmed not to respond to the particular type of ICMP

packets traceroute uses within the timeout period you have specified. The default timeout period fortraceroute is five seconds.

Sometimes, you may see the !N or !X flags in place of the latency informationtraceroute usually provides. The !N flag means that the host

or network cannot be reached. The !X flag means that the administrator of the remote system has prohibited the use of ICMP, but was

kind enough to configure the router to send a message informing traceroute about the prohibition.



41.2.5. netstat and route



You already know that netstat is useful for checking open connections, as well as viewing the routing table. We also discussed the

route

command in Chapter 19. The output of each command is slightly different, and this difference might be important in a troubleshooting

situation.

Consider the following netstat output:

$ netstat -r

Kernel IP routing table

Destination Gateway

Genmask

Flags MSS Window irtt Iface

localnet

*

255.255.255.0 U

500 0

0 eth0

default

system1234.stan 0.0.0.0

UG

5000 0

0 eth0

$



For the sake of comparison, consider the following route output:

$ route

Kernel IP routing table

Destination Gateway

Genmask

Flags Metric Ref Use Iface

localnet

*

255.255.255.0 U 0

0

0 eth0

default

system1234.stan 0.0.0.0

UG 0

0

0 eth0

$



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks.



The information from the two commands seems identical, but there are subtle differences. The output for netstat contains information for

both the Maximum Segment Size (MSS) and Initial Round Trip Time (IRRT). The route command does not report these values by

default.

The MSS value indicates the largest amount of data (in bytes) that the system can handle without fragmenting the packet. Generally, you

want the MSS value to be less than the Maximum Trnsmission Unit (MTU), which is 1500 for Ethernet systems. A value of 0 means that

the default is used, which is 536 bytes for Linux systems. The previous output shows that the MSS is 500, so the system is likely

functioning well in this regard.

The IRRT value displays (in milliseconds) the amount of time allowed for initial TCP connections to complete. On our system there is a 0

value, which means that the system is using the default value (300 milliseconds).

The route command provides the routing metric and theUse field, which netstat does not. The MeTRic field indicates the distance to a

destination target. It is no longer used by modern systems, though if you use a routing daemon, you may need to read this value.

Knowledge of routing daemons is not required for the LPI Exams.

The Use field indicates the number of lookups for the particular route. If you useroute's -C option, you will see the number of times the

cache has correctly looked up the route. If you use route's -F option, you will see the number of misses.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



41.3. Hardware Problems



One of the truisms of networking is that the majority of problems occur at the physical layer. Problems can include a failed NIC, a hub

that has lost power, or bad cabling. A discussion of the relevant applications for troubleshooting physical networking issues follows.



41.3.1. Physical Connection Issues

Although the LPI Exam focuses mostly on local configuration issues that can be discovered and resolved by using applications such as

ifconfig and hostname, it is important to understand that other devices may be causing problems.



41.3.1.1. Cabling



While relatively unusual, cables can become weakened and wires can sever. Some offices may have wiring that is routed beneath

carpets that receive substantial traffic. Users may also be able to roll over the cables with their chairs. When it comes to troubleshooting

such cabling problems, consider the following steps:

1.



Obtain a working system and attach it to the cable in question. If the cable works, you know you have a problem with the

Linux system.



2.



Check for loose or broken cable connectors. If a connector is broken, the cable may be only partially inserted, causing failed

or intermittent connections.



41.3.1.2. Failed networking devices



Hubs, switches, and routers are generally quite reliable, but it is possible that an intervening device, rather than the Linux system, has

failed. One way to confirm your suspicion that a hub or switch has failed is to use a crossover cable and connect the affected system to

another working system. A crossover cable is essentially the same thing as a standard LAN cable, but with four of the pins reversed. As a

result, two systems can communicate directly with each other. If the two systems can communicate, it is likely that a hub or switch

servicing the system you once suspected is experiencing a problem.

When inspecting a hub or switch, look for the following:



Whether the device has power

Most hubs and all switches in a larger network are active devices. If they do not have power, you won't have a network

connection.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



Disconnected cables

The system's cable may have been simply disconnected and needs to be reconnected.



Steadily blinking lights

If lights are blinking steadily and continuously or remain on steadily, the device is experiencing a problem. Sometimes power

surges cause devices to fail. Try powering it down and back on or simply replacing it.



Activated warning lights

Some hubs and switches have warning lights that will indicate a problem. Look for them, then take steps to solve the problem.



Misconfigured hardware settings

In one case, a Linux administrator noticed that a hub had a button pressed that caused one of its ports to effectively act as a

straight through connection, rather than as a proper hub. Simply deselecting the button solved the problem. Look on the hub

to see if there are any other improperly selected switches.



41.3.2. Problems with the Interface Card



If an NIC completely fails, it will fail to initialize at boot or respond to the ifup command. The lspci and usbview commands can help you

determine whether the NIC has been recognized as valid hardware. If these commands do not show that the system recognizes the NIC,

consider installing a new one. But problems can also exist elsewhere than the NIC. Make sure that you know what your system is telling

you by consulting log files and reviewing screen output .

Finally, you can inspect the lights that ship with most NICs. These lights may seem to be useless, but they can help you determine

whether the NIC is receiving power. If the lights are flashing randomly, it is likely that the device is receiving traffic. If you find that the

lights are blinking steadily, it is likely the device has a configuration problem. If you find that one or more lights are simply staying on

constantly, you likely have a hardware configuration problem. Of course, if the lights are not turned on at all, the NIC has not been

recognized by your system's bus or has completely failed.

To solve such problems, make sure that the NIC is recognized by the Linux server. Consult the distribution's Hardware Compatibility List

(HCL). You may find that you will have to get a new NIC.



41.3.3. Reviewing Screen Output

Do not simply focus on the NIC's lights or on the system's log files. Some systems are configured to report critical problems directly to

the screen. In other cases, problems experienced by the NIC can cause warning messages to be printed on the screen, even though the

system is not specially configured for this.

In most cases, the messages you see printed to the screen will be seen when the system boots. Messages can indicate that the system

is delaying the interface's intialization or can report errors in transmission and reception.



This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks



41.3.4. Changes to the Kernel and /etc/modules



When a Linux system scans for PCI devices at boot time, it recognizes the devices it finds in the order that it finds them. The first card

recognized becomes eth0, the second card recognized becomeseth1, and so forth. Recognition involves the act of detecting the

hardware and assigning any drivers and modules. Sometimes, a seemingly innocuous change to the system can cause problems with

PCI-based network devices.

In some cases, changes to the /etc/modules.conf file can cause devices to go undetected or to be detected in a different order. In one

case, an application completely unrelated to networking rewrote the /etc/modules.conf file and inadvertently changed the order that

modules were installed for a dual-NIC Linux router. The update to /etc/modules.conf changed the order in which the NICs were

recognized. With the change, the NIC that used to be recognized as eth0 for two years was suddenly recognized aseth1. Because this

system was a router, the eth0 device was configured to masquerade connections, whereas theeth1 device was not.

Normally, this would not be a problem, except that the company's ISP required all Internet-facing network interfaces to register their MAC

addresses. Now the system was recognizing an unregistered card as eth0, and the ISP would not recognize the neweth1 card as an

Internet-addressable device. So a seemingly simple change to the /etc/modules.conf file caused serious networking problems for the

company.

Updating the kernel can also sometimes affect the order in which PCI devices are scanned, similarly to the previous example.

Finally, if for some reason a NIC's driver is loaded at a different time from previous boots, this NIC may be recognized earlier or later than

before. As a result, the NIC may be assigned a different name.

Solving the problems discussed in this section is relatively trivialonce you know what caused the problems in the first place. In the first

instance, simply editing the /etc/modules.conf file and specifying the previous module installation order solved the problem. For the

second problem, physically swapping the NICs would work. For the third problem, you can either swap the NICs or change the time

when the drivers are assigned during the boot process. You would have to either reconfigure some boot scripts or possibly use an

application such as YaST or netconfig.

Whatever solution you choose, it is important to understand that a seemingly unrelated change to the system can cause a ripple effect.



41.3.5. Checking Log Files

If you have an interface experiencing problems, you likely will be able to read about it in the /var/log/messages or /var/log/syslog files.

You can also review dmesg output to view the contents of the kernel message buffer. Understand, however, that this buffer can be

overwritten, resulting in incomplete information from the kernel.

This is because this buffer is a "round-robin buffer," meaning that if the kernel needs more space, it will delete log entries, starting at the

beginning. It is important to understand that even though the log file can be overwritten starting at the beginning of the log file, the latest

messages are stored at the end of the buffer. So, you will at least be able to read the most current messages.



Xem Thêm
Tải bản đầy đủ (.pdf) (1,207 trang)

×