Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (10.65 MB, 1,190 trang )
1016 Appendix E The Linux 2.6 Kernel
Native Posix Thread Library (NPTL)
Classically programs start execution at the beginning of a series of instructions and
execute them in sequence. While this technique works well for simple programs
running on single CPU systems, it is often better to allow a program to execute different parts of itself simultaneously in parallel. Most programs with a GUI benefit
from this functionality as it can prevent the user interface from freezing while the
program performs computations.
The traditional way of writing parallel code under UNIX is to execute a fork() system
call, which creates a copy of the running program in memory and starts it executing at
the same point as the original. At the point fork() is called, the two copies of the program are indistinguishable, except for the fact that they receive different return values
from their fork() call. One disadvantage of this approach is that each time fork() is
called, the system must create a complete copy of the process. This copying takes a
relatively long time and causes parallel applications to use a lot of memory. (This
description is not quite accurate: Copy-on-write functionality in a modern operating
system copies only those parts of memory that would be different.)
A more efficient solution to this problem is to allow a single process to run multiple
threads. A thread exists in the same memory space as other threads and so has a
much smaller overhead than a single program running multiple processes. The disadvantage of this strategy is that multithreaded applications must be designed more
carefully and thus take more time to write than multiprocessor ones. Operating systems, such as Solaris, rely heavily on threads to provide scalability to very large
SMP (symmetric multiprocessing) systems. The new threading support in the Linux
2.6 kernel uses the same industry standard POSIX APIs as Solaris for implementing
threads and provides high-performance processing.
IPSec is a network layer protocol suite that secures Internet connections by encrypting IP packets. IPSec is an optional part of IPv4 (page 1043) and a required part of
IPv6 (page 1043). See page 999 for more information on IPSec.
Kernel integration of IPSec means that any kernel module or application can use
IPSec in the same way that it would use unsecured IP.
Asynchronous I/O (AIO)
Without AIO, when an application needs to get data from a hardware device or a
network connection, it can either poll the connection until the data becomes available or spawn a thread for the connection that waits for the data. Neither of these
techniques is particularly efficient.
Reverse Map Virtual Memory (rmap VM)
Asynchronous I/O allows the kernel to notify an application when it has data ready
to be read. This feature is most useful to large servers but can provide moderate performance gains in almost any application.
One of the responsibilities of the kernel is to make sure that each execution thread
gets a reasonable amount of time on the CPU(s). The scheduling algorithm used in
the Linux 2.4 kernel gradually decreased performance as more processes were
added and additional CPUs were brought online, making it hard to use Linux on
large SMP systems. The 2.6 scheduling algorithm runs in O(1) time, a term that
indicates that a process takes the same time to run under all conditions, making
Linux better able to run large numbers of processes and scale to large systems.
It is often said that a program spends 90 percent of its time executing 10 percent of
the code. Programmers use profiling tools to identify bottlenecks in code and target
this 10 percent for optimization. OProfile is an advanced profiling tool that identifies common programming inefficiencies. Thanks to its close relationship with the
kernel, OProfile is able to identify hardware-specific efficiency problems, such as
cache misses, which are often not possible to identify from source code.
When something goes wrong in the kernel, it generates an error message called an
OOPS. This message is an in-joke from the Linux Kernel Mailing List, where developers would start bug reports with “Oops, we’ve found a bug in the kernel.” An
OOPS provides debugging information that can help kernel developers track down
the offending code or indicate that the OOPS was caused by hardware failure.
The kksymoops functionality provides detailed debugging information, allowing a
developer to determine the line of code in the kernel that caused the OOPS. While
this feature does not directly benefit the end user, it allows developers to find kernel
bugs more quickly, resulting in a more stable kernel.
Reverse Map Virtual Memory (rmap VM)
Virtual memory (VM) allows each process to exist in its own memory space. Every
time a process attempts to access a portion of memory, the kernel translates the
memory location from an address in the process’s own address space to one in real
1018 Appendix E The Linux 2.6 Kernel
memory. The reverse map enables the kernel to perform this process in reverse:
Given a location in physical memory, the kernel can determine which process owns
it. The reverse map allows pages to be unallocated quickly, giving the system more
free memory, fewer page faults, and less overhead when quitting a program.
HugeTLBFS: Translation Look-Aside Buffer
The kernel allocates memory in units of pages. Virtual memory uses these pages to
map between the virtual and real memory address spaces. Older versions of the
Linux kernel set the size of these pages to 4 kilobytes. In cases where a lot of virtual
memory is used, such as in large database servers, this small size can place a heavy
load on the VM subsystem. HugeTLBFS allows for much larger pages, which significantly improves performance under heavy VM load conditions.
When retrieving data from or writing data to a file, it is common practice to map
the file on disk to an area of memory. The system then translates accesses to that
area of memory directly into accesses to disk.
For additional flexibility, large database systems map different parts of a file to different parts of memory. Each mapping results in an additional load on the kernel
and VM subsystems. The remap_file_pages() system call can perform a nonuniform
mapping, meaning that a file needs to be mapped only once, which significantly
improves the performance of large database servers.
2.6 Network Stack Features (IGMPv3, IPv6,
The Linux 2.6 kernel includes a large number of improvements in the area of networking, including support for IPv6 (page 1043) and enhanced multicast
(page 1049) support. Although these features do not immediately benefit end users,
they do permit the development and deployment of network services that will not
require significant modification for integration with future technologies.
Block I/O (BIO) Block Layer
Internet Protocol Virtual Server (IPVS)
IPVS implements transport layer switching inside the kernel for load balancing.
This feature enables a single machine to distribute connections to a server farm,
allowing transparent load balancing.
Access Control Lists (ACLs)
The traditional UNIX permission system allows three permissions to be assigned to
each file: controlling access by the owner, by a single group, and by everyone else.
ACLs provide much finer-grained access control. In theory, ACLs can increase security. However, they make setting correct permissions more complicated, which may
encourage administrators to establish weaker controls than they should.
4GB-4GB Memory Split: Physical Address
The 32-bit CPUs are limited in that they can address only 232 bytes (4 gigabytes) of
memory. With the Pentium Pro, Intel introduced a work-around to this limitation
called Physical Address Extension (PAE), which permits the operating system to
address up to 64 gigabytes of memory. Because they are limited to addressing 4
gigabytes each, 32-bit programs cannot access this much memory. A Linux kernel
from the main tree is able to allocate up to 1 gigabyte for the kernel and 3 gigabytes
for each userspace (page 1067) process.
Scheduler Support for HyperThreaded CPUs
The Linux 2.6 kernel supports Intel’s HyperThreading. The 2.6 kernel treats each
virtual CPU as the equivalent of a physical CPU.
Block I/O (BIO) Block Layer
The 2.6 kernel includes a completely redesigned interface to drivers for block
devices (page 569). While this conveys a number of benefits, it also means that these
device drivers need to be rewritten and tested.