16 Linux Troubleshooting Steps
16 Linux Troubleshooting Steps
Lesson Introduction
While working with a Red Hat® Enterprise Linux® operating system, users may
experience unexpected technical issues. To provide uninterrupted services to the
users, you need to be able to solve the problems that arise while functioning. In this
lesson, you will troubleshoot Linux-related issues.
Lesson Objectives
In this lesson, you will troubleshoot Linux system issues. You will:
• Use the Linux rescue environment for troubleshooting the Linux system issues.
• Troubleshoot hardware issues.
• Troubleshoot network connection and security issues.
Troubleshooting Strategies
Troubleshooting is the recognition, diagnosis, and resolution of problems.
Troubleshooting begins with the identification of a problem, and it does not end until
services have been restored and the problem no longer adversely affects users.
Troubleshooting can take many forms, but all approaches have the same goal: to solve
a problem efficiently with a minimal interruption of service. A troubleshooting
strategy is a plan of action for identifying the causes and resolving the effects of a
system-related issue. Various guidelines have to be considered while troubleshooting.
Guideline Description
Analyze the problem. Before attempting to troubleshoot an issue, try to identify the problem through its sym
and configuration files.
Also, check if the relevant services are working properly.
Back up data. Before experimenting with issues in configuration files, log files, or any other import
further complication of the issues.
Eliminate possible causes. Observe whether the issue is related with the hardware, an application, a process, or a
cause. Eliminating the root cause will rectify all the related issues.
Adopt fundamental problem- solving After identifying the underlying causes, try out the fundamental methods of resolving
approaches.
Some companies developed troubleshooting processes that are systematic and logical.
Following these guidelines will help you find and correct problems on your network
quickly and efficiently.
One troubleshooting model divides the troubleshooting process into the following
steps.
1. Identify the problem. This stage includes:
• Gathering information.
• Duplicating the problem, if possible.
• Questioning users to gain experiential information.
• Identifying the symptoms.
• Determining if anything has changed.
• Approaching multiple problems individually.
2. Establish a theory of probable cause. This stage includes:
• Questioning the obvious.
• Considering multiple approaches, such as examining the OSI (Open System Interconnect) mo
conquering.
3. Test the theory to determine the cause.
a. When the theory is confirmed, determine the next steps to resolve the problem.
b. If the theory is not confirmed, establish a new theory or escalate the issue.
4. Establish a plan of action to resolve the problem, while identifying the potential effects of your
5. Implement the solution, or escalate the issue.
6. Verify full system functionality and, if applicable, implement preventative measures.
7. Document your findings, actions, and the outcomes.
Troubleshooting can be a difficult process. It is not likely that anyone can develop a
complete and accurate approach to troubleshooting, because troubleshooting is often
done through intuitive guesses based on experience.
Note: In some cases, when system directories cannot be mounted on the /mnt/sysimage directory, the prompt will
chroot Mode
The chroot mode shifts the root (/) directory to a different location for recovery. It is
also known as jail mode because it can be used in production scenarios to ensure a
user will not be able to access any other file or directory except this directory and its
subdirectories.
The following table can help you troubleshoot the boot process.
Cause Solution
If the boot loader screen does not appear, then GRUB (GRand Unified Bootloader) may not be properly Reconfigure the /bo
configured.
If the grub> prompt appears, then GRUB may be corrupted. Install GRUB again
If the kernel does not load, then the kernel image may be corrupted. Install a new kernel
Cause Solution
If the kernel does not load, then the parameter passed during the system startup may be wrong. Specify the correct
2. 2.
The /etc/inittab file is misconfigured, or Systemd configuration is incorrect or incomplete. In rescue m
3. 3.
The root filesystem is misconfigured. In rescue m
If the kernel loads, but /etc/rc.d (or systemd settings) causes an issue, then the /etc/fstab file may have In rescue mode, fix
an error.
If the kernel loads, but /etc/rc.d (or systemd settings) causes an issue, then the fsck utility may have In rescue mode, run
failed.
If the services do not start correctly, then they may not have been configured properly. Configure the servic
Category Utility
•
Software RAID (Redundant Array of Independent Disks) utility such
as mdadm.
•
Disk partitioning and swap utilities such as fdisk, sfdisk, gdisk,
mount, umount, and mkswap.
•
Filesystem utilities such as mkfs, tune2fs, fsck, e2fsck, and XFS utilities.
Networking utilities
•
Network debugging utilities such as ip, ifconfig, route, dig,
Category Utility
•
Network connectivity utilities such as ssh, ftp, and scp.
Other utilities
•
Shell commands such as chroot and bash.
•
Process management tools such as ps and kill.
•
Editors such as vi and nano.
•
File management commands such as cd, ls, cp, rm, and mv.
•
Kernel management utility such as sysctl.
•
Package management tools such as rpm and yum.
•
Archiving and compression utilities such as tar and gzip.
The user is unable to create a user or a group account. Cause: The user does not have admin privileges, or the system is u
Solution: Check whether the required privileges are granted to the
account.
The user is unable to log in. Cause: The settings in the user account or the group account could
Solution: Check the user or group account settings, including the p
The user is unable to access files and directories. Cause: The required permission is not granted to the user.
Solution: Check the user or group quota and the privileges granted
Symptom Cause and Solution
The user is unable to execute basic commands or Cause: The environmental variable is not properly set.
applications. Solution: Check the environmental variables and the library files o
The scheduled jobs are not executed. Cause: The crond daemon has not started or stopped due to the inv
Solution: Check whether the crond daemon is running.
Otherwise, check whether the configuration set in the crontab file
The user is unable to switch between the runlevels. Cause: The PATH variable is not set properly or permission is not
Solution: Check whether the user is granted the necessary privileg
PATH variable.
Core system variables affect the behavior of applications and commands. Some of the
system variables and their functions are given in the following table.
Single-User Mode
Single-user mode in Linux can be initialized by changing the runlevel to 1. It is used
when the system does not allow you to log in after booting. The networking feature is
disabled in single-user mode, which makes it an ideal mode to troubleshoot network
problems. Single-user mode can be used for filesystem checks, because most of the
partitions are not mounted in runlevel 1. This mode can even be used to recover the
root password.
Figure 16-2: Changing the root user password in single-user mode.
Boot Disks
A boot disk contains operating system files, such as init, klogd, and syslogd, required
to start a system.
It can be a hard disk, floppy disk, CD-ROM, DVD-ROM, or USB (Universal Serial
Bus) drive. The boot disk contains configuration files, startup files, and programs. The
boot disk is used to boot a system following a hard disk crash. Some distributions use
the first CD in the installation set as the boot disk. Other distributions allow you to
create a floppy disk that can be used to boot the system.
The ramdisk word keyword is a keyword that specifies the location of the root
filesystem. The ramdisk word can be set and accessed using the rdev command.
Root Disks
A root disk contains directories, such as etc, bin, home, and so on, which contain files
required to run a Linux system. It need not contain a kernel or a boot loader. The root
disk can run a system without depending on any other disk.
Figure 16-4: Components of the root disk.
Zero-Filled Files
There are times when you might need to create a filesystem that does not contain any
data or partition table. One of these times might be when you need to build a
compressed root filesystem.
Kernel Panic
If a user is unable to boot a system, it may be due to disk errors caused by hardware
devices. When the "Kernel Panic" message is displayed, the filesystem is corrupted or
inaccessible. To resolve this issue, log in to rescue mode and perform an integrity
check on the filesystem.
1. To boot from the recovery disc, ensure that your system is set to boot from your DVD drive, mo
2. Insert the CentOS Installation DVD into the DVD drive and boot the system.
3. To view the Troubleshooting menu, at the boot menu, press the Down Arrow once to select Trou
4. To enter rescue mode, on the Troubleshooting menu, press the down arrow once to select
Rescue a CentOS system and press Enter.
5. To enter rescue mode, on the Rescue menu, press the Tab key once to select Continue and press
6. A message is displayed, stating that the root partition will be mounted in the /mnt/sysimage direc
select OK.
7. A message is displayed, stating that your system has been mounted under the /mnt/sysimage dire
8. The root directory is now mounted on the ramdisk and a shell prompt is displayed. To change th
the /mnt/sysimage directory, enter chroot /mnt/sysimage.
9. Troubleshoot to find the cause of system failure and make the necessary changes to recover the
10. To exit the chroot environment, enter exit.
11. Enter sync so that the changes you made are reflected in the filesystem on the hard disk.
12. To exit from rescue mode, enter exit. The system will now reboot.
Troubleshooting Tools
There are many troubleshooting tools that you can use, depending on the type of
problem you are facing and the environment in which you are working. Some of these
tools are described in the following table.
Tool Description
dmesg A system administration command that is used to examine and control the kernel initialization process. It is u
during kernel initialization. Status messages can also be accessed from the /var/log/ dmesg file.
GNU Parted A program that allows you to create, destroy, resize, move, and copy hard disk partitions.
KNOPPIX A bootable CD (or DVD) that contains GNU/Linux software, which includes automatic hardware detection a
ifconfig A command that is used to view the IP address and subnet mask and verify that they are allocated. It can also
/proc and /sys The /proc and /sys filesystems are pseudo-filesystems that are used as an interface to the kernel data structure
The pgrep command is used to look up or signal processes based on their names or
other attributes. It looks through the running processes and lists PIDs that match the
criteria you specify.
For instance, the pgrep -u root sshd command lists only processes called sshd and that
are owned by the root user. The command pgrep -u root,daemon lists all processes
owned by root or daemon.
The pkill command can be used in conjunction with the pgrep command to stop
processes.
Starting and stopping processes is just one more way to troubleshoot problems. When
you see a certain symptom, such as a process taking too long, you should first check
on the process using the ps or pgrep command; then if necessary, end the process
using the kill or pkill command. You should next examine the process (the script or
other command sequences associated with that process) and check for any problems.
After fixing the problems, you should try running the command or script again. Check
on it periodically to see if it is working properly.
Hardware Problems
Hardware devices may experience failures anytime while the system is being used.
The user is unable to hear from the speakers. Cause: The speaker or the sound card is not functioning proper
Solution: Check the speaker and its corresponding driver. If yo
A system connected to the UPS shuts down abruptly. Cause: The UPS is malfunctioning, or there is a mismatch betw
Solution: Check the serial ports, the cable, and the configuratio
The user is unable to move the pointer in GUI mode. Cause: The mouse does not function properly due to the config
Solution: Unplug and reconnect the mouse, then restart the sys
The user is unable to access the CD/DVD drive. Cause: The drive is not mounted or there is some problem with
Solution: Check whether the read/write indicator is on.
Otherwise, check the power cable connected to the drive.
Viewing Hardware Details
Some commands that are frequently used for viewing hardware details are listed in the
table.
Command Used To
/bin/uname View system information such as the kernel name, release and version numbers, hardware platform
Note: You can also load the module using the modprobe or insmod command. If you want to use the modprobe
• Verify that the power connector to the drive is connected and working.
• If the connection is not powered on, then there is a problem with the power connector.
1. Verify that the drive access light indicator is glowing.
2. If it is not glowing, the power connector needs to be checked and replaced.
• If the power connector is working and the access issue persists, then there is a problem with the D
1. With your hardware engineer's help, verify that the DVD drive is functioning properly.
2. If the DVD drive is functional, verify that your DVD is functioning properly.
Troubleshoot Printing Problems
• Verify that the printer cables are connected properly and the power source is switched on.
• Verify that the paper trays are stocked.
• To verify that the printer daemon is running, enter systemctl status cups.service and, if the daemon
• Check the status of the print job in the queue.
• In the CLI or in the GUI terminal window, enter lpq -P (print queue name}.
• To restart the CUPS service, enter service cups restart.
• To verify that the print job is getting executed, enter lpr {file name}.
You need to continually identify and prepare for vulnerabilities. In this topic, you will
troubleshoot network connection and security issues.
Network Issues
If users are unable to connect to a network, they will not be able to log in to their
systems or access the services or shared resources. Network problems can be
categorized as hardware-related issues and service-related issues. Hardware-related
network issues can be solved by checking the network devices, including the network
cable and the network card. Service-related network issues can be fixed by checking
the network settings of a system or the server.
Network Troubleshooting Utilities
The traceroute, ping, and arp utilities are very useful in troubleshooting issues related
to remote network services.
Utility Used To
traceroute Track the route data that it takes to get to its destination. Utilizing the Time to Live (TTL) field of the IP protoco
(ICMP) Time_Exceeded response from each gateway encountered on the path between the sender and the final d
User Datagram Protocol (UDP) probe packets are sent with a short TTL. The traceroute utility then listens for an
ICMP Port_Unreachable response, which means that you either got to the host or reached the default maximum n
pass through) is printed to your screen; if no response is received within five seconds, an asterisk ( * ) is printed f
ping Verify that a system can be reached on a network. It checks the hostname, the IP address, and whether the remote
ping uses the ICMP Echo_Request datagram to check connections among hosts, by sending echo packets and the
arp Display information, such as the hardware address, the hostname, and the network interfaces, about the Address
ARP
Tool Description
System Log Files There are three types of system log files that can help in monitoring system security:
Log: This file contains information about connections established and files transferred.
Stats: This file lists file transfer statistics.
Debug: This file contains debugging information and login and password information for rem
Central Network Log Server The reports generated from the server contain useful information on server logs and online ale
chkconfig and systemctl These commands can be used to check configuration files and update and query runlevel infor
Vulnerabilities include:
Software Vulnerabilities
Software vulnerabilities account for many successful attacks because attackers are
opportunistic.
They exploit well-known flaws using the most effective and widely available attack
tools. They also count on organizations that do not fix the problems and scan the
Internet for vulnerable systems.
BIND Attack
In a BIND attack, an intruder can erase your system logs and install tools to gain
administrative access. In addition, once the attacker has gained access, he or she uses
the attacked system to scan for and attack other network systems running vulnerable
versions of BIND. In effect, the intruder uses the compromised system to attack
hundreds of remote systems, resulting in additional successful compromises.
Sendmail Flaws
Over the years, flaws have been found in Sendmail. In one of the most common
intrusions, the attacker sends a crafted mail message to a machine running Sendmail.
Sendmail, in turn, interprets the message as instructions requiring it to send the
password file to the attacker’s machine.
SNMP Flaws
• To provide a lure so that attackers stay away from other equipment. You want the attackers to see
access to. This system needs to be as such that the attacker focuses his or her energy on exploiting
sitting right next to it.
• To know that the honeypot system will be attacked, so that you can take extra measures when logg
frequently—perhaps hourly or daily if your network is a high-profile target.
• To increase the ability to detect and respond to incidents. The theory is that if you are aware of wh
prepared to defend or, if possible, prevent the attack on your production systems.
Legal Issues Regarding Honeypots
Be aware that there may be legal issues surrounding the use of the honeypot
technology. The intentional setup of a honeypot may be considered entrapment, and
therefore the same rules apply as in the real world.
Another issue is that of privacy. If an attacker were to set up an IRC server on the
honeypot, it will be possible to log all conversations on that server. There is currently
no defined law explicitly regarding this subject. However, it should be noted that an
attorney could make privacy a viable defense argument.
ACTIVITY 16-1
Troubleshooting Linux Systems Review
Scenario
Answer the following review questions.
1.
How does troubleshooting in Linux differ from the troubleshooting approach you’ve taken with other systems?
2.
Provide an example of a recent problem you encountered in your environment and how you were able to resolve
Summary
In this lesson, you acquainted yourself with the various troubleshooting strategies in
Linux. This will enable you to effectively tackle most of the issues that may arise
while working with Linux-based systems.