Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

C++ FAQ (part 12 of 14)

( Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Part8 - Part9 - Part10 - Part11 - Part12 - Part13 - Part14 )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Property taxes ]
Archive-name: C++-faq/part12
Posting-Frequency: monthly
Last-modified: Jun 17, 2002
URL: http://www.parashift.com/c++-faq-lite/

See reader questions & answers on this topic! - Help others by sharing your knowledge
AUTHOR: Marshall Cline / cline@parashift.com / 972-931-9470

COPYRIGHT: This posting is part of "C++ FAQ Lite."  The entire "C++ FAQ Lite"
document is Copyright(C)1991-2002 Marshall Cline, Ph.D., cline@parashift.com.
All rights reserved.  Copying is permitted only under designated situations.
For details, see section [1].

NO WARRANTY: THIS WORK IS PROVIDED ON AN "AS IS" BASIS.  THE AUTHOR PROVIDES NO
WARRANTY WHATSOEVER, EITHER EXPRESS OR IMPLIED, REGARDING THE WORK, INCLUDING
WARRANTIES WITH RESPECT TO ITS MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR
PURPOSE.

C++-FAQ-Lite != C++-FAQ-Book: This document, C++ FAQ Lite, is not the same as
the C++ FAQ Book.  The book (C++ FAQs, Cline and Lomow, Addison-Wesley) is 500%
larger than this document, and is available in bookstores.  For details, see
section [3].

==============================================================================

SECTION [28]: Newbie Questions / Answers


[28.1] What is this "newbie section" all about? [NEW!]

[Recently created (in 6/02).]

It's a randomly ordered collection containing a few questions newbies might
ask.
 * This section doesn't pretend to be organizied.  Think of it as random.  In
   truth, think of it as a hurried, initial cut by a busy guy.
 * This section doesn't pretend to be complete.  Think of it as offering a
   little help to a few people.  It won't help everyone and it might not help
   you.

Hopefully someday I'll be able to improve this section, but for now, it is
incomplete and unorganized.  If that bothers you, my suggestion is to click
that little x on the extreme upper right of your browser window :-).

==============================================================================

[28.2] Where do I start? Why do I feel so confused, so stupid? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

Read the FAQ, especially the section on learning C++[27], read comp.lang.c++,
read books[27.4] plural.

But if everything still seems too hard, if you're feeling bombarded with
mysterious terms and concepts, if you're wondering how you'll ever grasp
anything, do this:

 1. Type in some C++ code from any of the sources listed above.

 2. Get it to compile and run.

 3. Repeat.

That's it.  Just practice and play.  Hopefully that will give you a foothold.

==============================================================================

[28.3] What are the criteria for choosing between short / int / long data
       types? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

Other related questions: If a short int is the same size as an int on my
particular implementation, why choose one or the other? If I start taking the
actual size in bytes of the variables into account, won't I be making my code
unportable (since the size in bytes may differ from implementation to
implementation)? Or should I simply go with sizes much larger than I actually
need, as a sort of safety buffer?

Answer: It's usually a good idea to write code that can be ported to a
different operating system and/or compiler.  After all, if you're successful at
what you do, someone else might want to use it somewhere else.  This can be a
little tricky with built-in types like int and short, since C++ doesn't give
guaranteed sizes.  However C++ does give you guaranteed minimum sizes, and that
will usually be all you need to know.

C++ guarantees a char is exactly one byte[25.1], short is at least 2 bytes, int
is at least 2 bytes, and long is at least 4 bytes.  It also guarantees the
unsigned version of each of these is the same size as the original, for
example, sizeof(unsigned short) == sizeof(short).

When writing portable code, you shouldn't make additional assumptions about
these sizes.  For example, don't assume int has 4 bytes.  If you have an
integral variable that needs at least 4 bytes, use a long or unsigned long even
if sizeof(int) == 4 on your particular implementation.  On the other hand, if
you have an integral variable quantity that will always fit within 2 bytes and
if you want to minimize the use of data memory, use a short or unsigned short
even if you know sizeof(int) == 2 on your particular implementation.

Note that there are some subtle tradeoffs here.  In some cases, your computer
might be able to manipulate smaller things faster than bigger things, but in
other cases it is exactly the opposite: int arithmetic might be faster than
short arithmetic on some implementations.  Another tradeoff is data-space
against code-space: int arithmetic might generate less binary code than short
arithmetic on some implementations.  Don't make simplistic assumptions.  Just
because a particular variable can be declared as short doesn't necessarily mean
it should, even if you're trying to save space.

==============================================================================

[28.4] What the heck is a const variable? Isn't that a contradiction in terms?
       [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

If it bothers you, call it a "const identifier" instead.

The main issue is to figure out what it is; we can figure out what to call it
later.  For example, consider the symbol max in the following function:

 void f()
 {
   const int max = 107;
   ...
   float array[max];
   ...
 }

It doesn't matter whether you call max a const variable or a const identifier.
What matters is that you realize it is like a normal variable in some ways
(e.g., you can take its address or pass it by const-reference), but it is
unlike a normal variable in that you can't change its value.

Here is another even more common example:

 class Fred {
 public:
   ...
 private:
   static const int max_ = 107;
   ...
 };

In this example, you would need to add the line int Fred::max_; in exactly one
.cpp file, typically in Fred.cpp.

It is generally considered good programming practice to give each "magic
number" (like 107) a symbolic name and use that name rather than the raw magic
number[28.9].

==============================================================================

[28.5] Why would I use a const variable / const identifier as opposed to
       #define? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

const identifiers are often better than #define because:
 * they obey the language's scoping rules
 * you can see them in the debugger
 * you can take their address if you need to
 * you can pass them by const-reference if you need to
 * they don't create new "keywords" in your program.

In short, const identifiers act like they're part of the language because they
are part of the language.  The preprocessor can be thought of as a language
layered on top of C++.  You can imagine that the preprocessor runs as a
separate pass through your code, which would mean your original source code
would be seen only by the preprocessor, not by the C++ compiler itself.  In
other words, you can imagine the preprocessor sees your original source code
and replaces all #define symbols with their values, then the C++ compiler
proper sees the modified source code after the original symbols got replaced by
the preprocessor.

There are cases where #define is needed, but you should generally avoid it when
you have the choice.  You should evaluate whether to use const vs. #define
based on business value: time, money, risk.  In other words, one size does not
fit all.  Most of the time you'll use const rather than #define for constants,
but sometimes you'll use #define.  But please remember to wash your hands
afterwards.

==============================================================================

[28.6] Are you saying that the preprocessor is evil? [NEW!]

[Recently created (in 6/02).]

Yes, that's exactly what I'm saying: the preprocessor is evil[6.14].

Every #define macro effectively creates a new keyword in every source file and
every scope until that symbol is #undefd.  The preprocessor lets you create a
#define symbol that is always replaced independent of the {...} scope where
that symbol appears.

Sometimes we need the preprocessor, such as the #ifndef/#define wrapper within
each header file, but it should be avoided when you can.  "Evil" doesn't mean
"never use."[6.14] You will use evil things sometimes, particularly when they
are "the lesser of two evils." But they're still evil :-)

==============================================================================

[28.7] What is the "standard library"? What is included / excluded from it?
       [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

Most (not all) implementations have a "standard include" directory, sometimes
directories plural.  If your implementation is like that, the headers in the
standard library are probably a subset of the files in those directories.  For
example, iostream and string are part of the standard library, as is cstring
and cstdio.  There are a bunch of .h files that are also part of the standard
libarary, but not every .h file in those directories is part of the standard
library.  For example, stdio.h is but windows.h is not.

You include headers from the standard library like this:

 #include <iostream>

 int main()
 {
   std::cout << "Hello world!\n";
   return 0;
 }

==============================================================================

[28.8] How should I lay out my code? When should I use spaces, tabs, and/or
       newlines in my code? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

The short answer is: Just like the rest of your team.  In other words, the team
should use a consistent approach to whitespace, but otherwise please don't
waste a lot of time worrying about it.

Here are a few details:

There is no universally accepted coding standard when it comes to whitespace.
There are a few popular whitespace standards, such as the "one true brace"
style, but there is a lot of contention over certain aspects of any given
coding standard.

Most whitespace standards agree on a few points, such as putting a space around
infix operators like x * y or a - b.  Most (not all) whitespace standards do
not put spaces around the [ or ] in a[i], and similar comments for ( and ) in
f(x).  However there is a great deal of contention over vertical whitespace,
particularly when it comes to { and }.  For example, here are a few of the many
ways to lay out if (foo()) { bar(); baz(); }:

 if (foo()) {
   bar();
   baz();
 }

 if (foo())
 {
   bar();
   baz();
 }

 if (foo())
   {
     bar();
     baz();
   }

 if (foo())
   {
   bar();
   baz();
   }

 if (foo()) {
   bar();
   baz();
   }

...and others...

IMPORTANT: Do NOT email me with reasons your whitespace approach is better than
the others.  I don't care.  Plus I won't believe you.  There is no objective
standard of "better" when it comes to whitespace so your opinion is just that:
your opinion.  If you write me an email in spite of this paragraph, I will
consider you to be a hopeless geek who focuses on nits.  Don't waste your time
worrying about whitespace: as long as your team uses a consistent whitespace
style, get on with your life and worry about more important things.

For example, things you should be worried about include design issues like when
ABCs[22.3] should be used, whether inheritance should be an implementation or
specification technique, what testing and inspection strategies should be used,
whether interfaces should uniformly have a get() and/or set() member function
for each data member, whether interfaces should be designed from the outside-in
or the inside-out, whether errors be handled by try/catch/throw or by return
codes, etc.  Read the FAQ for some opinions on those important questions, but
please don't waste your time arguing over whitespace.  As long as the team is
using a consistent whitespace strategy, drop it.

==============================================================================

[28.9] Is it okay if a lot of numbers appear in my code? [NEW!]

[Recently created (in 6/02).]

Probably not.

In many (not all) cases, it's best to name your numbers so each number appears
only once in your code.  That way, when the number changes there will only be
one place in the code that has to change.

For example, suppose your program is working with shipping crates.  The weight
of an empty crate is 5.7.  The expression 5.7 + contentsWeight probably means
the weight of the crate including its contents, meaning the number 5.7 probably
appear many times in the software.  All these occurances of the number 5.7 will
be difficult to find and change when (not if) somebody changes the style of
crates used in this application.  The solution is to make sure the value 5.7
appears exactly once, usually as the initializer for a const identifier.
Typically this will be something like const double crateWeight = 5.7;.  After
that, 5.7 + contentsWeight would be replaced by crateWeight + contentsWeight.

Now that's the general rule of thumb.  But unfortunately there is some fine
print.

Some people believe one should never have numeric literals scattered in the
code.  They believe all numeric values should be named in a manner similar to
that described above.  That rule, however noble in intent, just doesn't work
very well in practice.  It is too tedious for people to follow, and ultimately
it costs companies more than it saves them.  Remember: the goal of all
programming rules is to reduce time, cost and risk.  If a rule actually makes
things worse, it is a bad rule, period.

A more practical rule is to focus on those values that are likely to change.
For example, if a numeric literal is likely to change, it should appear only
once in the software, usually as the initializer of a const identifier.  This
rule lets unchanging values, such as some occurances of 0, 1, -1, etc., get
coded directly in the software so programmers don't have to search for the one
true definition of one or zero.  In other words, if a programmer wants to loop
over the indices of a vector, he can simply write
for (int i = 0; i < v.size(); ++i).  The "extremist" rule described earlier
would require the programmer to poke around asking if anybody else has defined
a const identifier initialized to 0, and if not, to define his own
const int zero = 0; then replace the loop with
for (int i = zero; i < v.size(); ++i).  This is all a waste of time since the
loop will always start with 0.  It adds cost without adding any value to
compensate for that cost.

Obviously people might argue over exactly which values are "likely to change,"
but that kind of judgment is why you get paid the big bucks: do your job and
make a decision.  Some people are so afraid of making a wrong decision that
they'll adopt a one-size-fits-all rule such as "give a name to every number."
But if you adopt rules like that, you're guaranteed to have made the wrong
decision: those rules cost your company more than they save.  They are bad
rules.

The choice is simple: use a flexible rule even though you might make a wrong
decision, or use a one-size-fits-all rule and be guaranteed to make a wrong
decision.

There is one more piece of fine print: where the const identifier should be
defined.  There are three typical cases:
 * If the const identifier is used only within a single function, it can be
   local to that function.
 * If the const identifier is used throughout a class and no where else, it can
   be static within the private part of that class.
 * If the const identifier is used in numerous classes, it can be static within
   the public part of the most appropriate class, or perhaps private in that
   class with a public static access method.

As a last resort, make it static within a namespace or perhaps put it in the
unnamed namespace.  Try very hard to avoid using #define since the preprocessor
is evil[28.6].  If you absolutely must use #define, wash your hands when you're
done.  And please ask some friends if they know of a better alternative.

(As used throughout the FAQ, "evil" doesn't mean "never use it."[6.14] There
are times when you will use something that is "evil" since it will be, in that
particular case, "the lesser of two evils.")

==============================================================================

[28.10] What's the point of the L, U and f suffixes on numeric literals? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

You should use these suffixes when you need to force the compiler to treat the
numeric literal as if it was the specified type.  For example, if x is of type
float, the expression x + 5.7 is of type double: it first promotes the value of
x to a double, then performs the arithmetic using double-precision
instructions.  If that is what you want, fine; but if you really wanted it to
do the arithmetic using single-precision instructions, you can change that code
to x + 5.7f.  Note: it is even better to "name" your numeric literals,
particularly those that are likely to change[28.9].  That would require you to
say x + crateWeight where crateWeight is a const float that is initialized to
5.7f.

The U suffix is similar.  It's probably a good idea to use unsigned integers
for variables that are always >= 0.  For example, if a variable represents an
index into an array, that variable would typically be declared as an unsigned.
The main reason for this is it requires less code, at least if you are careful
to check your ranges.  For example, to check if a variable is both >= 0 and <
max requires two tests if everything is signed: if (n >= 0 && n < max), but can
be done with a single comparison if everything is unsigned: if (n < max).

If you end up using unsigned variables, it is generally a good idea to force
your numeric literals to also be unsigned.  That makes it easier to see that
the compiler will generate "unsigned arithmetic" instructions.  For example:
if (n < 256U) or if ((n & 255u) < 32u).  Mixing signed and unsigned values in a
single arithmetic expression is often confusing for programmers -- the compiler
doesn' always do what you expect it should do.

The L suffix is not as common, but it is occasionally used for similar reasons
as above: to make it obvious that the compiler is using long arithmetic.

The bottom line is this: it is a good discipline for programmers to force all
numeric operands to be of the right type, as opposed to relying on the C++
rules for promoting/demoting numeric expressions.  For example, if x is of type
int and y is of type unsigned, it is a good idea to change x + y so the next
programmer knows whether you intended to use unsigned arithmetic, e.g.,
unsigned(x) + y, or signed arithmetic: x + int(y).  The other possibility is
long arithmetic: long(x) + long(y).  By using those casts, the code is more
explicit and that's good in this case, since a lot of programmers don't know
all the rules for implicit promotions.

==============================================================================

[28.11] I can understand the and (&&) and or (||) operators, but what's the
        purpose of the not (!) operator? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

Some people are confused about the ! operator.  For example, they think that
!true is the same as false, or that !(a < b) is the same as a >= b[28.12], so
in both cases the ! operator doesn't seem to add anything.

Answer: The ! operator is useful in boolean expressions, such occur in an if or
while statement.  For example, let's assume A and B are boolean expressions,
perhaps simple method-calls that return a bool.  There are all sorts of ways to
combine these two expressions:

 if ( A &&  B) ...
 if (!A &&  B) ...
 if ( A && !B) ...
 if (!A && !B) ...
 if (!( A &&  B)) ...
 if (!(!A &&  B)) ...
 if (!( A && !B)) ...
 if (!(!A && !B)) ...

Along with a similar group formed using the || operator.

Note: boolean algebra can be used to transform each of the &&-versions into an
equivalent ||-version, so from a truth-table standpoint there are only 8
logically distinct if statements.  However, since readability is so important
in software, programmers should consider both the &&-version and the logically
equivalent ||-version.  For example, programmers should choose between !A && !B
and !(A || B) based on which one is more obvious to whoever will be maintaining
the code.  In that sense there really are 16 different choices.

The point of all this is simple: the ! operator is quite useful in boolean
expressions.  Sometimes it is used for readability, and sometimes it is used
because expressions like !(a < b) actually are not[28.12] equivalent to a >= b
in spite of what your grade school math teacher told you.

==============================================================================

[28.12] Is !(a < b) logically the same as a >= b? [NEW!]

[Recently created (in 6/02).]

No!

Despite what your grade school math teacher taught you, these equivalences
don't always work in software, especially with floating point expressions or
user-defined types.

Example: if a is a floating point NaN[28.13], then both a < b and a >= b will
be false.  That means !(a < b) will be true and a >= b will be false.

Example: if a is an object of class Foo that has overloaded operator< and
operator>=, then it is up to the creator of class Foo if these operators will
have opposite semantics.  They probably should have opposite semantics, but
that's up to whoever wrote class Foo.

==============================================================================

[28.13] What is this NaN thing? [NEW!]

[Recently created (in 6/02).]

NaN means "not a number," and is used for floating point operations.

There are lots of floating point operations that don't make sense, such as
dividing by zero, taking the log of zero or a negative number, taking the
square root of a negative number, etc.  Depending on your compiler, some of
these operations may produce special floating point values such as infinity
(with distinct values for positive vs. negative infinity) and the not a number
value, NaN.

If your compiler produces a NaN, it has the unusual property that it is not
equal to any value, including itself.  For example, if a is NaN, then a == a is
false.  That is the usual way to check if you are dealing with a NaN:

 void funct(double x)
 {
   if (x == x) {
     // x is a normal value
     ...
   } else {
     // x is NaN
     ...
   }
 }

Similarly, if a is NaN and b is some arbitrary floating point value, a will be
neither less than, equal to, nor greater than b.  In other words, a < b,
a <= b, a > b, a >= b, and a == b will all return false.

==============================================================================

[28.14] What is the type of an enumeration such as enum Color? Is it of type
        int? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

An enumeration such as enum Color { red, white, blue }; is its own type.  It is
not of type int.

When you create an object of an enumeration type, e.g., Color x;, we say that
the object x is of type Color.  Object x isn't of type "enumeration," and it's
not of type int.

An expression of an enumeration type can be converted to a temporary int.  An
analogy may help here.  An expression of type float can be converted to a
temporary double, but that doesn't mean float is a subtype of double.  For
example, after the declaration float y;, we say that y is of type float, and
the expression y can be converted to a temporary double.  When that happens, a
brand new, temporary double is created by copying something out of y.  In the
say way, a Color object such as x can be converted to a temporary int, in which
case a brand new, temporary int is created by copying something out of x.
(Note: the only purpose of the float / double analogy in this paragraph is to
help explain how expressions of an enumeration type can be converted to
temporary ints; do not try to use that analogy to imply any other behavior!)

The above conversion is very different from a subtype relationship, such as the
relationship between derived class Car and its base class Vehicle.  For
example, an object of class Car, such as Car z;, actually is an object of class
Vehicle, therefore you can bind a Vehicle& to that object, e.g.,
Vehicle& v = z;.  Unlike the previous paragraph, the object z is not copied to
a temporary; reference v binds to z itself.  So we say an object of class Car
is a Vehicle, but an object of class "Color" simply can be copied/converted
into a temporary int.  Big difference.

Final note, especially for C programmers: the C++ compiler will not
automatically convert an int expression to a temporary Color[28.15].  Since
that sort of conversion is unsafe, it requires a cast, e.g.,
Color x = Color(2);.

==============================================================================

[28.15] If an enumeration type is distinct from any other type, what good is
        it? What can you do with it? [NEW!]

[Recently created thanks to a question from John Lester (in 6/02).]

Let's consider this enumeration type: enum Color { red, white, blue };.

The best way to look at this (C programmers: hang on to your seats!!) is that
the values of this type are red, white, and blue, as opposed to merely thinking
of those names as constant int values.  The C++ compiler provides an automatic
conversion from Color to int, and the converted values will be, in this case,
0, 1, and 2 respectively.  But you shouldn't think of blue as a fancy name for
2.  blue is of type Color and there is an automatic conversion from blue to 2,
but the inverse conversion, from int to Color, is not provided automatically by
the C++ compiler.

Here is an example that illustrates the conversion from Color to int:

 enum Color { red, white, blue };

 void f()
 {
   int n;
   n = red;    // n will now have value 0
   n = white;  // n will now have value 1
   n = blue;   // n will now have value 2
 }

The following example also demonstrates the conversion from Color to int:

 void f()
 {
   Color x = red;
   Color y = white;
   Color z = blue;

   int n;
   n = x;   // n will now have value 0
   n = y;   // n will now have value 1
   n = z;   // n will now have value 2
 }

However the inverse conversion, from int to Color, is not automatically
provided by the C++ compiler:

 void f()
 {
   Color x;
   x = blue;  // okay: change x to blue
   x = 2;     // compile-time error: can't convert int to Color
 }

The last line above shows that enumeration types are not ints in disguise.  You
can think of them as int types if you want to, but if you do, you must remember
that the C++ compiler will not implicitly convert an int to a Color.  If you
really want that, you can use a cast:

 void f()
 {
   Color x;
   x = red;      // okay: x will now have the value red
   x = Color(1); // okay: x will now have the value white
   x = Color(2); // okay: x will now have the value blue
   x = 2;        // compile-time error: can't convert int to Color
 }

There are other ways that enumeration types are unlike int.  For example,
enumeration types don't have a ++ operator:

 void f()
 {
   int n = red;    // n will now have value 0
   Color x = red;  // x will now have value red

   n++;   // okay: n will now have value 1
   x++;   // compile-time error: can't ++ an enumeration
 }

==============================================================================

SECTION [29]: Learning C++ if you already know Smalltalk


[29.1] What's the difference between C++ and Smalltalk?

Both fully support the OO paradigm.  Neither is categorically and universally
"better" than the other[6.4].  But there are differences.  The most important
differences are:
 * Static typing vs. dynamic typing[29.2]
 * Whether inheritance must be used only for subtyping[29.5]
 * Value vs. reference semantics[30]

Note: Many new C++ programmers come from a Smalltalk background.  If that's
you, this section will tell you the most important things you need know to make
your transition.  Please don't get the notion that either language is somehow
"inferior" or "bad"[6.4], or that this section is promoting one language over
the other (I am not a language bigot; I serve on both the ANSI C++ and ANSI
Smalltalk standardization committees[6.11]).  Instead, this section is designed
to help you understand (and embrace!) the differences.

==============================================================================

[29.2] What is "static typing," and how is it similar/dissimilar to Smalltalk?

Static typing says the compiler checks the type safety of every operation
statically (at compile-time), rather than to generate code which will check
things at run-time.  For example, with static typing, the signature matching
for function arguments is checked at compile time, not at run-time.  An
improper match is flagged as an error by the compiler, not by the run-time
system.

In OO code, the most common "typing mismatch" is invoking a member function
against an object which isn't prepared to handle the operation.  E.g., if class
Fred has member function f() but not g(), and fred is an instance of class
Fred, then fred.f() is legal and fred.g() is illegal.  C++ (statically typed)
catches the error at compile time, and Smalltalk (dynamically typed) catches
the error at run-time.  (Technically speaking, C++ is like Pascal --pseudo
statically typed-- since pointer casts and unions can be used to violate the
typing system; which reminds me: use pointer casts[26.10] and unions only as
often as you use gotos).

==============================================================================

[29.3] Which is a better fit for C++: "static typing" or "dynamic typing"?

[For context, please read the previous FAQ[29.2]].

If you want to use C++ most effectively, use it as a statically typed language.

C++ is flexible enough that you can (via pointer casts, unions, and #define
macros) make it "look" like Smalltalk.  But don't.  Which reminds me: try to
avoid #define: it is evil[6.14] in 4 different ways: evil#1[9.3], evil#2[36.2],
evil#3[36.3], and evil#4[36.4].

There are places where pointer casts and unions are necessary and even
wholesome, but they should be used carefully and sparingly.  A pointer cast
tells the compiler to believe you.  An incorrect pointer cast might corrupt
your heap, scribble into memory owned by other objects, call nonexistent member
functions, and cause general failures.  It's not a pretty sight[26.10].  If you
avoid these and related constructs, you can make your C++ code both safer and
faster, since anything that can be checked at compile time is something that
doesn't have to be done at run-time.

If you're interested in using a pointer cast, use the new style pointer casts.
The most common example of these is to change old-style pointer casts such as
(X*)p into new-style dynamic casts such as dynamic_cast<X*>(p), where p is a
pointer and X is a type.  In addition to dynamic_cast, there is static_cast and
const_cast, but dynamic_cast is the one that simulates most of the advantages
of dynamic typing (the other is the typeid() construct; for example,
typeid(*p).name() will return the name of the type of *p).

==============================================================================

[29.4] How do you use inheritance in C++, and is that different from Smalltalk?

Some people believe that the purpose of inheritance is code reuse.  In C++,
this is wrong.  Stated plainly, "inheritance is not for code reuse."

The purpose of inheritance in C++ is to express interface compliance
(subtyping), not to get code reuse.  In C++, code reuse usually comes via
composition rather than via inheritance.  In other words, inheritance is mainly
a specification technique rather than an implementation technique.

This is a major difference with Smalltalk, where there is only one form of
inheritance (C++ provides private inheritance to mean "share the code but don't
conform to the interface", and public inheritance to mean "kind-of").  The
Smalltalk language proper (as opposed to coding practice) allows you to have
the effect of "hiding" an inherited method by providing an override that calls
the "does not understand" method.  Furthermore Smalltalk allows a conceptual
"is-a" relationship to exist apart from the inheritance hierarchy (subtypes
don't have to be derived classes; e.g., you can make something that is-a Stack
yet doesn't inherit from class Stack).

In contrast, C++ is more restrictive about inheritance: there's no way to make
a "conceptual is-a" relationship without using inheritance (the C++ work-around
is to separate interface from implementation via ABCs[22.3]).  The C++ compiler
exploits the added semantic information associated with public inheritance to
provide static typing.

==============================================================================

[29.5] What are the practical consequences of differences in Smalltalk/C++
       inheritance?

[For context, please read the previous FAQ[29.4]].

Smalltalk lets you make a subtype that isn't a derived class, and allows you to
make a derived class that isn't a subtype.  This allows Smalltalk programmers
to be very carefree in putting data (bits, representation, data structure) into
a class (e.g., you might put a linked list into class Stack).  After all, if
someone wants an array-based-Stack, they don't have to inherit from Stack; they
could inherit such a class from Array if desired, even though an
ArrayBasedStack is not a kind-of Array!

In C++, you can't be nearly as carefree.  Only mechanism (member function
code), but not representation (data bits) can be overridden in derived classes.
Therefore you're usually better off not putting the data structure in a class.
This leads to a stronger reliance on abstract base classes[22.3].

I like to think of the difference between an ATV and a Maseratti.  An ATV (all
terrain vehicle) is more fun, since you can "play around" by driving through
fields, streams, sidewalks, and the like.  A Maseratti, on the other hand, gets
you there faster, but it forces you to stay on the road.  My advice to C++
programmers is simple: stay on the road.  Even if you're one of those people
who like the "expressive freedom" to drive through the bushes, don't do it in
C++; it's not a good fit.

==============================================================================

SECTION [30]: Reference and value semantics


[30.1] What is value and/or reference semantics, and which is best in C++?

With reference semantics, assignment is a pointer-copy (i.e., a reference).
Value (or "copy") semantics mean assignment copies the value, not just the
pointer.  C++ gives you the choice: use the assignment operator to copy the
value (copy/value semantics), or use a pointer-copy to copy a pointer
(reference semantics).  C++ allows you to override the assignment operator to
do anything your heart desires, however the default (and most common) choice is
to copy the value.

Pros of reference semantics: flexibility and dynamic binding (you get dynamic
binding in C++ only when you pass by pointer or pass by reference, not when you
pass by value).

Pros of value semantics: speed.  "Speed" seems like an odd benefit for a
feature that requires an object (vs. a pointer) to be copied, but the fact of
the matter is that one usually accesses an object more than one copies the
object, so the cost of the occasional copies is (usually) more than offset by
the benefit of having an actual object rather than a pointer to an object.

There are three cases when you have an actual object as opposed to a pointer to
an object: local objects, global/static objects, and fully contained member
objects in a class.  The most important of these is the last ("composition").

More info about copy-vs-reference semantics is given in the next FAQs.  Please
read them all to get a balanced perspective.  The first few have intentionally
been slanted toward value semantics, so if you only read the first few of the
following FAQs, you'll get a warped perspective.

Assignment has other issues (e.g., shallow vs. deep copy) which are not covered
here.

==============================================================================

[30.2] What is "virtual data," and how-can / why-would I use it in C++?

virtual data allows a derived class to change the exact class of a base class's
member object.  virtual data isn't strictly "supported" by C++, however it can
be simulated in C++.  It ain't pretty, but it works.

To simulate virtual data in C++, the base class must have a pointer to the
member object, and the derived class must provide a new object to be pointed to
by the base class's pointer.  The base class would also have one or more normal
constructors that provide their own referent (again via new), and the base
class's destructor would delete the referent.

For example, class Stack might have an Array member object (using a pointer),
and derived class StretchableStack might override the base class member data
from Array to StretchableArray.  For this to work, StretchableArray would have
to inherit from Array, so Stack would have an Array*.  Stack's normal
constructors would initialize this Array* with a new Array, but Stack would
also have a (possibly protected) constructor that would accept an Array* from a
derived class.  StretchableStack's constructor would provide a
new StretchableArray to this special constructor.

Pros:
 * Easier implementation of StretchableStack (most of the code is inherited)
 * Users can pass a StretchableStack as a kind-of Stack

Cons:
 * Adds an extra layer of indirection to access the Array
 * Adds some extra freestore allocation overhead (both new and delete)
 * Adds some extra dynamic binding overhead (reason given in next FAQ)

In other words, we succeeded at making our job easier as the implementer of
StretchableStack, but all our users pay for it[30.5].  Unfortunately the extra
overhead was imposed on both users of StretchableStack and on users of Stack.

Please read the rest of this section.  (You will not get a balanced perspective
without the others.)

==============================================================================

[30.3] What's the difference between virtual data and dynamic data?

The easiest way to see the distinction is by an analogy with virtual
functions[20]: A virtual member function means the declaration (signature) must
stay the same in derived classes, but the definition (body) can be overridden.
The overriddenness of an inherited member function is a static property of the
derived class; it doesn't change dynamically throughout the life of any
particular object, nor is it possible for distinct objects of the derived class
to have distinct definitions of the member function.

Now go back and re-read the previous paragraph, but make these substitutions:
 * "member function" --> "member object"
 * "signature" --> "type"
 * "body" --> "exact class"

After this, you'll have a working definition of virtual data.

Another way to look at this is to distinguish "per-object" member functions
from "dynamic" member functions.  A "per-object" member function is a member
function that is potentially different in any given instance of an object, and
could be implemented by burying a function pointer in the object; this pointer
could be const, since the pointer will never be changed throughout the object's
life.  A "dynamic" member function is a member function that will change
dynamically over time; this could also be implemented by a function pointer,
but the function pointer would not be const.

Extending the analogy, this gives us three distinct concepts for data members:
 * virtual data: the definition (class) of the member object is overridable in
   derived classes provided its declaration ("type") remains the same, and this
   overriddenness is a static property of the derived class
 * per-object-data: any given object of a class can instantiate a different
   conformal (same type) member object upon initialization (usually a "wrapper"
   object), and the exact class of the member object is a static property of
   the object that wraps it
 * dynamic-data: the member object's exact class can change dynamically over
   time

The reason they all look so much the same is that none of this is "supported"
in C++.  It's all merely "allowed," and in this case, the mechanism for faking
each of these is the same: a pointer to a (probably abstract) base class.  In a
language that made these "first class" abstraction mechanisms, the difference
would be more striking, since they'd each have a different syntactic variant.

==============================================================================

[30.4] Should I normally use pointers to freestore allocated objects for my
       data members, or should I use "composition"?

Composition.

Your member objects should normally be "contained" in the composite object (but
not always; "wrapper" objects are a good example of where you want a
pointer/reference; also the N-to-1-uses-a relationship needs something like a
pointer/reference).

There are three reasons why fully contained member objects ("composition") has
better performance than pointers to freestore-allocated member objects:
 * Extra layer of indirection every time you need to access the member object
 * Extra freestore allocations (new in constructor, delete in destructor)
 * Extra dynamic binding (reason given below)

==============================================================================

[30.5] What are relative costs of the 3 performance hits associated with
       allocating member objects from the freestore?

The three performance hits are enumerated in the previous FAQ:
 * By itself, an extra layer of indirection is small potatoes
 * Freestore allocations can be a performance issue (the performance of the
   typical implementation of malloc() degrades when there are many allocations;
   OO software can easily become "freestore bound" unless you're careful)
 * The extra dynamic binding comes from having a pointer rather than an object.
   Whenever the C++ compiler can know an object's exact class, virtual[20]
   function calls can be statically bound, which allows inlining.  Inlining
   allows zillions (would you believe half a dozen :-) optimization
   opportunities such as procedural integration, register lifetime issues, etc.
   The C++ compiler can know an object's exact class in three circumstances:
   local variables, global/static variables, and fully-contained member objects

Thus fully-contained member objects allow significant optimizations that
wouldn't be possible under the "member objects-by-pointer" approach.  This is
the main reason that languages which enforce reference-semantics have
"inherent" performance challenges.

Note: Please read the next three FAQs to get a balanced perspective!

==============================================================================

[30.6] Are "inline virtual" member functions ever actually "inlined"?

Occasionally...

When the object is referenced via a pointer or a reference, a call to a
virtual[20] function cannot be inlined, since the call must be resolved
dynamically.  Reason: the compiler can't know which actual code to call until
run-time (i.e., dynamically), since the code may be from a derived class that
was created after the caller was compiled.

Therefore the only time an inline virtual call can be inlined is when the
compiler knows the "exact class" of the object which is the target of the
virtual function call.  This can happen only when the compiler has an actual
object rather than a pointer or reference to an object.  I.e., either with a
local object, a global/static object, or a fully contained object inside a
composite.

Note that the difference between inlining and non-inlining is normally much
more significant than the difference between a regular function call and a
virtual function call.  For example, the difference between a regular function
call and a virtual function call is often just two extra memory references, but
the difference between an inline function and a non-inline function can be as
much as an order of magnitude (for zillions of calls to insignificant member
functions, loss of inlining virtual functions can result in 25X speed
degradation! [Doug Lea, "Customization in C++," proc Usenix C++ 1990]).

A practical consequence of this insight: don't get bogged down in the endless
debates (or sales tactics!) of compiler/language vendors who compare the cost
of a virtual function call on their language/compiler with the same on another
language/compiler.  Such comparisons are largely meaningless when compared with
the ability of the language/compiler to "inline expand" member function calls.
I.e., many language implementation vendors make a big stink about how good
their dispatch strategy is, but if these implementations don't inline member
function calls, the overall system performance would be poor, since it is
inlining --not dispatching-- that has the greatest performance impact.

Note: Please read the next two FAQs to see the other side of this coin!

==============================================================================

[30.7] Sounds like I should never use reference semantics, right?

Wrong.

Reference semantics are A Good Thing.  We can't live without pointers.  We just
don't want our s/w to be One Gigantic Rats Nest Of Pointers.  In C++, you can
pick and choose where you want reference semantics (pointers/references) and
where you'd like value semantics (where objects physically contain other
objects etc).  In a large system, there should be a balance.  However if you
implement absolutely everything as a pointer, you'll get enormous speed hits.

Objects near the problem skin are larger than higher level objects.  The
identity of these "problem space" abstractions is usually more important than
their "value." Thus reference semantics should be used for problem-space
objects.

Note that these problem space objects are normally at a higher level of
abstraction than the solution space objects, so the problem space objects
normally have a relatively lower frequency of interaction.  Therefore C++ gives
us an ideal situation: we choose reference semantics for objects that need
unique identity or that are too large to copy, and we can choose value
semantics for the others.  Thus the highest frequency objects will end up with
value semantics, since we install flexibility where it doesn't hurt us (only),
and we install performance where we need it most!

These are some of the many issues the come into play with real OO design.
OO/C++ mastery takes time and high quality training.  If you want a powerful
tool, you've got to invest.

Don't stop now! Read the next FAQ too!!

==============================================================================

[30.8] Does the poor performance of reference semantics mean I should
       pass-by-value?

Nope.

The previous FAQ were talking about member objects, not parameters.  Generally,
objects that are part of an inheritance hierarchy should be passed by reference
or by pointer, not by value, since only then do you get the (desired) dynamic
binding (pass-by-value doesn't mix with inheritance, since larger derived class
objects get sliced[20.6] when passed by value as a base class object).

Unless compelling reasons are given to the contrary, member objects should be
by value and parameters should be by reference.  The discussion in the previous
few FAQs indicates some of the "compelling reasons" for when member objects
should be by reference.

==============================================================================

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

CAPTCHA




Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Part8 - Part9 - Part10 - Part11 - Part12 - Part13 - Part14

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
cline@parashift.com (Marshall Cline)





Last Update March 27 2014 @ 02:11 PM