Tải bản đầy đủ - 0 (trang)
Chapter 3. Working with C++ Data and Expressions

Chapter 3. Working with C++ Data and Expressions

Tải bản đầy đủ - 0trang

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



In this chapter, we will study how to work with C++ data: what data types are available, what

operations over values of these data types are supported, and what pitfalls the C++ programmer

should be aware of. As with many other aspects of the language, C++ combines opposites. Its set of

numeric data types is very small, and differences between existing types are not that drastic, so that

the choice between them is not always clear-cut. Its set of operators is very large. Some C++

operators are quite complex; others have unusual notation. What is common to both C++ data types

and operators is potential for portability problems. Things very often do not work the same way on

different machines.

C++ inherits from C exceptional flexibility for converting the values from one type to another and

for combining them into sophisticated expressions. Let us take a look at what is available.



Values and Their Types

In C++, every value, at every moment in its lifetime (during program execution), is characterized

by its type. C++ variables are associated with their types at the time of definition. The type

describes three characteristics of the value:

ϒΠ



the size of the values of that type in computer memory



ϒΠ



the set of values that are legal for the type (the method of interpretation of the bit pattern

that represents the value of that type)



ϒΠ



the set of operations that are legal on the values of that type



For example, the values of type int on my machine are allocated four bytes, and the set of legal

values ranges from -2,147,483,648 to +2,147,483,647. The set of legal operations includes

assignment, comparisons, shifts, arithmetic operations, and some others. The values of the type

TimeOfDay that I defined in the section "Classes," in Chapter 2, "Getting Started Quickly: A Brief

Overview of C++," are allocated the size of two int values (unless the compiler adds more space to

align values in memory for faster access). The set of TimeOfDay legal values is any combination of

values for the first integer (from 0 to 23) and for the second integer (from 0 to 59). The set of

TimeOfDay legal operations includes setTime(), displayTime(), and

displayMilitaryTime(); it includes assignment but not comparisons. Sure, TimeOfDay

components can be compared (they are integers, and the rules of int apply to them) but not the

TimeOfDay values: You should distinguish between properties of the type and properties of its

components. If the client code has to compare TimeOfDay values, class TimeOfDay has to support

this by implementing functions such as isLater() or compareTime() or something like that.

(Again, notice the client-server terminology I am using here.)



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (91 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



Every C++ variable has to be defined by specifying the type of its values. In addition, type also

characterizes the values of constants, functions, and expressions. This means that you can combine

typed values into expressions that give other typed values as results, and these values can be used in

other expressions and so on.

In most cases, the type is denoted by an identifier, that is, the type has a name (e.g., int or

TimeOfDay). This is common and natural, but this is not the only way to define types. C++ allows

so-called anonymous types that do not have specific names. These types are not common.

Type names of built-in C++ types are reserved words: int, char, bool, float, double, and

void (actually, this is it, this is the whole list). In this list void denotes the absence of the value that

can be manipulated in an expression. We use it to indicate that further use of the value in other

expressions is not appropriate. For example, the function computeSquare() in the section

"Functions and Function Calls," in Chapter 2, returns the value that can be used in expressions, and

the function displayResults() in the same section cannot be used this way: It returns no value. If

you try to use it incorrectly, the compiler will tell you that this is an error.

int a, b;

a = computeSquare(x,y) * 5;

b = displayResults(PI*PI) * 5;



// this is legal C++

// this is an error



Other languages do not have this special "type" because they distinguish between functions (that

return values) and procedures (that do not return values). C++ inherited from C the function syntax

that doubles both as a function and as a procedure. Logically, the absence of the specified return

type should be interpreted as the absence of the return value; not so in C. To add insult to injury,

the absence of type specification in C denotes the integer type and requires a return statement that

returns an integer value. C++ implements a compromise. If you do not specify the return type, the

compiler does not go after you and does not demand that the function return an integer value (as the

C compiler does); the new C++ compiler assumes that you want the void return type.

displayResults(double y)

// C++ it is void

{

cout << "In that world, pi square is " << y << endl;

cout << "Have a nice day!" << endl;

// no error in C++

}



However, if you use this function as an operand in an expression, C++ assumes that you are using

an old C convention and want to return an integer. At run time, displayResults() silently returns

junk. As they say, the compiler "does not second-guess the programmer" and removes compile time

protection.

b = displayResults(PI*PI) * 5;



// not a syntax error



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (92 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



If you supply the return statement, the function with no return type is treated as if it returns an

integer value.

displayResults(double y)

// C++ assumes it is int

{

cout << "In that world, pi square is " << y << endl;

cout << "Have a nice day!" << endl;

return 0;

// no syntax error

}



The client code can use the return value as it sees fit.

b = displayResults(PI*PI) * 5;



// this is legitimate



The use of int as a default return type goes back to the days when most C functions were designed

to return values and saving the programmer three keystrokes was viewed as a nontrivial advantage.

Avoid this practice. If the return type is integer, say int. If a function returns no value, denote

return type as void.

ALERT

Always specify the return type of a function. If the function returns no value, specify type void. Do

not rely on C++ default.

The types defined by the program in addition to built-in C++ types are called user-defined types. I

do not like this terminology, because users do not define types. A user is a person or an

organization that uses the implemented system to achieve the stated objectives. It is the

programmer who defines the type composition and the name of the type, similar to the type

TimeOfDay in the section "Classes," in Chapter 2. This is why I prefer to call these types

programmer-defined types.

Although different types in C++ are of different sizes, there is nothing unusual for values of

different types to have the same size in memory. For different types, it is the interpretation of the

bit pattern that distinguishes the values. For example, the bit pattern 01000001 is interpreted as

value 65 if it is stored in an integer variable; the same bit pattern is interpreted as A if it is stored in

a variable of character type.

In the old days, programmers had to know how to read binary numbers, octal numbers,

hexadecimal numbers, ASCII codes, EBCDIC codes, remember by heart the powers of 2 to the

16th power (sometimes to the 20th or even 32nd power), understand one-complement and twocomplement representation for negative numbers and whatnot. Today, most programmers do not

need that. Still, the computer hardware is built in sizes that are increments of 8 bits. A byte has 8

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (93 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



bits, a half-word has 16 bits, a word has 32 bits. On some machines, it is a word that has 16 bits,

and a double word has 32 bits. This is why it is a good idea to know at least the ranges of values

that can be stored in memory of different sizes.

So, 4 bits can represent 16 different combinations of bits (one hexadecimal digit). Usually, these 16

combinations are assigned to integer numbers from zero to 15. Similarly, 8 bits can represent 256

values (2 to the power of 8). These 256 bit combinations are assigned to integer numbers from zero

to 255. What if we want to represent both positive and negative numbers, not just positive? We still

have only 256 bit combinations at our disposal. The range from -128 to +128 would not do because

this range has 257 values, not 256. The common solution is to represent numbers from -128 to

+127.

Two bytes (16 bits) can represent 65,536 bit combinations (this magic number is 2 to the power of

16). For positive numbers, the range is from zero to 65,535. For signed values (positive and

negative numbers) the range is from -32,768 (2 to the power of 15) to +32,767 (2 to the power of

15 minus 1). Similarly, 32 bits (four bytes) can represent 4,294,967,296 values. For signed

numbers, four bytes cover the range from -2,147,483,648 (2 to the power of 31) to +2,147,483,647.

This is probably all that you should know about binary numbers.



Integral Types

On all computer architectures, the C++ integer type represents the most basic type. What does

"basic" mean? It simply means that the values of this type are always the fastest to operate on the

given platform. The keyword int is used to denote this type.

int cnt;



The size of int defines the range of values available for representation (2 to the power of the

number of bits). The industry is now shifting from 16-bit architectures to 32-bit architectures, but

both architectures are going to be used for some time. Most stationary installations will use 32-bit

computers, but embedded systems and communications systems will continue to use 16-bit

computers, and the number of these systems is going to grow as computers find their way into cars,

major appliances, or even toasters.

This means that programs written for one architecture might not run exactly the same way on

another architecture.

What happens if the value that can be stored in an integer does not fit? The answer is: nothing

much. There is no such thing as arithmetic overflow in C++. You want to add 1 to 32,767 on a 16bit machine? Go ahead and do it. The result will be -32,768. You want to add another one? Go

ahead and do it. The result will be -32,767.

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (94 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



Listing 3.1 shows a program that I ran on a 16-bit platform (it was a 32-bit machine with a 16-bit

compiler). The limit header file contains library constants for implementation-dependent numeric

values for the given platform. The constant INT_MAX is one such value (32767). In this example, I

am using the while loop similar to one discussed in Chapter 2, and the iostream library. The

output of this program is presented on Figure 3-1. The variable num happily goes around the clock

and assumes negative values. Each element in the cout statement has its own output operator <<;

even the separator (in double quotes) between the printed values of variables cnt and num.



Figure 3-1. Integer overflow does not terminate the program; it silently produces

incorrect results.



Example 3.1. Demonstration of integer overflow.

#include

#include

using namespace std;

int main(void)

{

int num = INT_MAX - 2;

int cnt = 0;

cout << "Integer overflow in C++:" endl;

cout << "Incrementing from " << num << endl;

while (cnt < 5)

{

num = num + 1;

cnt = cnt + 1;

cout << cnt << "

" << num << endl;

}

cout << "Thank you for worrying about integer limits" << endl;

return 0;

}



Earlier versions of C++ (and C) did not allow run-time values to be used to initialize variables; they

had to be computed at compile time. However, you could always initialize variables not only to a

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (95 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



specific value, but also to an expression (for example, INT_MAX-2 in Listing 3.1). In modern C++,

the initializing expression can be of arbitrary complexity. It can even include the run-time return

values of function calls. For example, this is legal in C++:

int a = computeSquare(x,y) * 5;



// this is legal C++



This is quite a feat from the point of view of compiler design. This is why older C and C++

compilers do not support this feature. Now see if you think that I am repeating myself. Did I not

say in the section, "Values and Their Types," that you can use the return value of a function in

computations?

a = computeSquare(x,y) * 5;



// legal in C and C++



Make sure you see the difference. The example from the section, "Values and Their Types"

demonstrates assignment. It is always possible in C++, C, or any other language. The example from

this section demonstrates initialization. Although the code is quite similar, they mean two different

things. Initialization allocates memory and sets its value. Assignment deals with the object

(variable) that is already allocated, has its address in memory (and probably some initial value at

that address), and the value at this address is being replaced. I mentioned that difference in Chapter

2, and you will see further implications later.

Despite this progress in compiler writing, C++ does not expect its compilers to be two-pass

compilers. They are all one-pass compilers and have no ability to see forward. This is why they

cannot use a value that is not defined yet even if it is defined on the next line. For example, this is

an error:

int a = b, b(5);



// error in C++



Here, variable b cannot be used to initialize variable a. The inverse order is legitimate. (Notice the

syntax for initialization that is similar to a function call: It is not allowed in C but is acceptable in

C++.)

int b(5), a = b;



// this is acceptable



Integer Type Qualifiers

C++ inherits from C a technique for fine-tuning integer ranges: the use of qualifiers. These are

keywords that change either the size of memory allocated for integers or the interpretation of the bit

pattern: signed, unsigned, short, and long.

The signed qualifier we have been using all along is a default, and it does not have to be specified.

This definition for the variable cnt means exactly the same thing as the previous one:

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (96 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



signed int cnt;



// signed is default



The unsigned qualifier can be used for variables that cannot take on negative values (indices,

counters, tallies, inventory quantities, etc.). This qualifier does not change the size of the memory

allocated for the value (16 bits or 32 bits) but it changes the interpretation of the bit pattern. The

legal range of unsigned integers is not from -32,768 to +32,767 but from zero to 65,535 on a 16bit machine, and from zero to 4,294,967,295 on 32-bit machine. Listing 3.2 shows the previous

example where an unsigned integer is used instead of signed. The output of this version is

presented in Figure 3-2. We see that the problem disappears. Well, it disappears at this stage. Of

course, it will manifest itself at the upper range of unsigned integers, but it will manifest itself

differently. When the unsigned number overflows, its value silently goes back to zero rather than

to a large negative value. I am not sure this is much better.



Figure 3-2. For unsigned integer values, the overflow happens at larger values than for

plain integers with the same memory size.



Example 3.2. Demonstration of unsigned



int



type.



#include

#include

using namespace std;

int main(void)

{

int unsigned num = INT_MAX - 2;

int cnt = 0;

cout << "Integer overflow in C++:" << endl;

cout << "Incrementing from " << num << endl;

while (cnt < 5)

{

num = num + 1;

cnt = cnt + 1;

cout << cnt << "

" << num << endl;

}

cout << "Thank you for worrying about integer limits" << endl;

return 0;

}

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (97 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



Using unsigned numbers is a nice idea (not so much for extending the range of the values but for

passing on to the maintainer the knowledge of the designer) that the value of the particular variable

cannot be negative. On the other hand, if this intent will somehow get lost on the maintainer, and

the unsigned variable will be used for negative values, the results will be quite disastrous. Listing

3.3 shows the previous version of the program where the variable num is initialized to 2 and is

unwittingly decremented in a loop. The output of the program is shown in Figure 3-3.



Figure 3-3. Unsigned variables cannot hold negative values; when decremented, they

assume large positive values without warning.



Example 3.3. Negative values in an unsigned variable.

#include

using namespace std;

int main(void)

{

int unsigned num = 2;

int cnt = 0;

cout << "Negative values in an unsigned variable" << endl;

cout << "Count down starting with +1" << endl;

while (cnt < 5)

{

num = num - 1;

cnt = cnt + 1;

cout << cnt << "

" << num << endl;

}

cout << "Thank you for worrying about integer limits" << endl;

return 0;

}



Two qualifiers that control the amount of memory allocated to an integer are long and short:

int cnt;



short int short_cnt;



long int long_cnt;



The goal here is not only to provide a larger range for integers, but also to save space where it can

be saved. C++ programmers are supposed to be concerned with performance, both in terms of

execution time and in terms of space. Using signed integers (without qualifiers) provides the

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (98 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



fastest data type, using long integers protects from overflow at the expense of memory, and using

short integers allows the programmer to avoid wasting memory. For example, the variable cnt in

the previous examples changes from 0 to 5. Why should it be allocated 32 bits on a modern

machine? Actually, a byte would be more than enough. Using the short integer data type for this

variable might sound like a good option for a machine with scarce memory.

How important is it to use the short qualifier to save memory and the long qualifier to expand the

range of values? These size qualifiers make the program more complex. Many programmers use

them only if they do know that the problem of overflow (or memory scarcity) exists and they know

that the use of qualifiers resolves the problem (often neither one is the case). Otherwise, most

programmers use regular integers without qualifiers and do not worry about these issues. This is

especially true on modern 32-bit machines. The use of four bytes for regular integers protects the

program from early overflow. The abundance of memory makes savings from using short integer

irrelevant.

As is often the case with C++ features inherited from C, the situation is not exactly what you think.

Logically, a short integer should be allocated less memory than an integer, and a long integer

should be allocated more memory than an integer. The C (and C++) standard, however, requires

from the compiler designers only that the shortint not be longer than the regular int and that the

longint not be shorter than the regular int. This is not as confusing as it sounds. On 16-bit

machines, both shortint and int variables are allocated the same amount of memory, 16 bits, and

longint variables are allocated 32 bits. On 32-bit machines, this is quite the opposite. How is it

opposite? Simple: shortint values are allocated 16 bits, and both int and longint are allocated

32 bits.

C++ has the sizeof operator that can be used to compute the size of data in bytes; its argument can

be either a variable name or a type name. For any platform, the following relation between returned

values of the sizeof operator holds.

sizeof(short int) <= sizeof(int) <= sizeof(long int)



There is an interesting consequence of this design: shortint and longint are always of the same

size no matter whether the machine is 16 bits or 32 bits. On any architecture, shortint is always

16 bits, and longint is always 32 bits. This is why those programmers who are concerned with the

issues of portability do not use plain vanilla integers. They use shortint for relatively small values

and longint for all other values that might not fit into shortint. These are usually the

programmers who design embedded and communications systems. In these systems, computer

memory is often at a premium because of size and price limitations, and the same code should be

able to run on multiple hardware platforms.



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (99 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



TIP

Integers are either 16 bits or 32 bits depending on the hardware; short integers are always 16 bits,

and long integers are always 32 bits. Their use eliminates the portability issue.

Is it possible to combine the unsigned qualifier with the short and long qualifiers, as in these

examples?

unsigned short int short_cnt;



long unsigned int long_cnt;



Yes, it is possible. (Notice that the order of qualifiers does not matter.) This can be found, for

example, in hard disk controllers, where the size of the file or the number of cylinders requires

large integers that can never be negative. Still, for most applications, it is a good idea to avoid extra

complexity and use regular integers.

One more comment. At the beginning of this chapter, I mentioned the old rule that when the type

name is omitted the default type is integer. This rule applies to this situation too. When you use

long and short data types, there is no need to specify the keyword int.

int cnt;



short short_cnt;



long long_cnt;



// same meaning



Integer literal values can be represented as decimal, octal, values and hexadecimal values. For

example, decimal 64 can be represented as octal 100 or as hexadecimal 40. To avoid confusion, the

integer literal starts with 0 to denote the octal system and with 0x (or 0X) to denote the

hexadecimal system. So, 100 means 100 (decimal), but 0100 is octal and means 64 in decimal, and

0¡Ñ100 is hexadecimal and means 256 in decimal.

Literal values are allocated memory pretty much in the same way as variables are; the only

difference is that we cannot manipulate their addresses and hence cannot change the values that are

stored there. So, 63 can be allocated two bytes as a short literal and four bytes as a long literal. To

indicate the difference, we can denote short and long constants using qualifiers in upper-or in

lowercase: 63s, 63S, 63l, 63L. The same is true for unsigned values: 63u, 63U, or 63us or 63UL.

This is rarely of practical importance.



Characters

The character type is treated by C++ as just another kind of integer. Its size is 1 byte (8 bits). It can

represent any ASCII symbol: a letter, a digit, or a nonprintable control character. Here are examples

of definitions for character variables.

char c, ch;



char first, last;



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (100 of 1187) [8/17/2002 2:57:46 PM]



file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm



There are no short and long qualifiers for characters. However, signed or unsigned qualifiers are

allowed for characters. Unfortunately, the default is not standardized; on some machines, the type

char means unsigned char; on other machines, the type char means signedchar.

Why do you care? Usually, you do not, and this is why it has not been standardized. The difference

becomes important, however, when you treat char values as integers in computations. For

example, a signed char can contain the "end-of-file" library constant EOF whose value is defined

as -1; unsigned char can contain positive values only. So, if you try to put -1 into an unsigned

char, you will find there the code for 255, not -1. Since the char type can be signed or unsigned

implicitly, this can introduce portability issues.

As any variable, a char variable can be initialized at definition or assigned a value later. Small

integers can be used for initialization and assignment, and their values will be interpreted as

character codes. Character literals are enclosed in single quotes and can be characters, octal or

hexadecimal numeric values, or escape sequences. It is important not to confuse single quotes and

double quotes. Single quotes are used to denote character literals; double quotes are used to denote

string literals (sequences or arrays of characters).

char c = 'A',



ch = 65;



// both c and ch contain 'A'



This is an example of using a character literal in quotes and using a decimal literal number. Other

character representations start with the escape character '\'. The escape character is not treated as an

ordinary character; it is a signal to the compiler to treat what follows in a special way, for example,

as an octal or a hexadecimal value.

c = '\0101';



ch = '\0¡Ñ41';



// octal and hex values for 'A'



Here, the quotes and the escape characters are not really necessary. You can use octal and hex

values directly, similar to the way the decimal value 65 was used above, by starting an octal literal

with a 0 and a hexadecimal literal with a 0x (or 0X).

c = 0101;



ch = 0¡Ñ41;



// octal and hex values for 'A'



The escape characters are necessary only if these values are embedded in a string. Also, the use of

the escape character indicates to the maintenance programmer that we are dealing with characters

and not with numbers. The most common escape sequence is the new line character '\n'. Other

escape sequences are '\r' (carriage return), '\f' (form feed or new page), '\t' (tab character), '\v'

(vertical tab character), and '\a' (sound bell).

Since single and double quotes have a special significance in C++, we have to use the escape

character to represent them too, '\"' and '\"'. The same is true of the escape character itself. If you

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (101 of 1187) [8/17/2002 2:57:46 PM]



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 3. Working with C++ Data and Expressions

Tải bản đầy đủ ngay(0 tr)

×