🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

C++ Workshop - C++ Keywords, Variables, & Constants (Ch. 3)

Started by
65 comments, last by Dbproguy 16 years, 1 month ago
Emmanuel Deloget

First I have to say I appreciate what everyone is doing here. I have one question about your post before Emmanuel. You have three files, main.ccp, file.h, and file.cpp.

I was wondering what the recommended way to include files is especially if you have a lot of files.
//Lets say I have a main.h
#ifndef _MAIN_H#define _MAIN_H#include <iostream>#include "file.h"class main{void run();};#endif


Then my main.cpp would look like this

#include "main.h"main::run(){//Code here};


Now lets take and say file.cpp
I usually do it this way so I have access to all other included files.
#include "main.h"int somefunction(){//Code here}


I've found that this prevents a lot of my preprocessor errors. The main thing I have found is that the order they are included matters, but I have never had any other problems doing it this way. I typed this in here, so there may be some errors, but I hope you understand my question. In file.h I wouldn't have any includes.

What I think is happening is that the declaration for file.h is being posted in main, then all of that is being posted in file because of the include in file.
Adamhttp://www.allgamedevelopment.com
Advertisement
The topics of headers and when to include them is an often misunderstood concept. I find that the best way to understand them is to have a deeper knowledge of declarations vs. definitions, and what happens during the compile and link stages of building an application. So let me go over those things...

What is the compiled unit of a C++ program?

First, people are often confused about what the compiled unit is in a C++ program. Is it the header file, the source file, or both? The answer is: The Source File.

When your compiler is instructed to build your application it looks through its project settings for any source (*.cpp) files and attempts to compile each of them as a separate entity. The symbols identified within that source file exist only within the scope of that file.

Additionally, your compiler has no problem building applications which may not have a single header file declared in the project. Finally, many IDE's such as Visual Studio, etc...allow you to include header files in the project. Doing this causes the header file to show up in the solution explorer for easy access, but it's important to note that unless the header file is actually "#include"'d in a *.cpp file, then its contents aren't actually contained within any object.

What is necessary for a successful compilation, and what is created?

As we learned in week 1, the first stage of the two-stage process of building an executable is the "Compile" stage. In this stage, your compiler turns each source file into an object file. In order to do this it parses your file, checks for correct syntax, looks for well formed, matching identifiers, etc...and then converts it into a sort of semi-final binary object.

As part of the process the compiler tries to match each identifier it encounters with its declaration (Not Definition). A declaration CAN BE a definition if one exists. For example, when you include a function in its entirety above function main() that definition acts as both a declaration AND a definition. But for the sake of a successful compile stage, all that is required is that the compiler is able to match each identifier used with its corresponding declaration. As I mentioned before, declarations (and definition) are only valid for the scope of a single source file - since that is the compiled unit. Which leads us to the next point....

What is the point of header files?

Because symbols are only valid within the scope of a source file, any symbols which we need to use in multiple source files must be found in EACH source file. Rather than duplicating the function, variable, or data type declaration at the top of each source file, we put them in include files. This does two things for us....first, it means less typing, which is good. Second, it means that our declaration will be IDENTICAL in every source file. This is essential during the link stage as there can be problems if the declaration doesn’t match the definition or if we've got multiply defined identifiers with different declarations.

The two key things to note about the above paragraph is that headers are for shared declarations, not definitions. And that the contents of these header files should be any symbols which we need, not any symbols which happen to be available. Its important to keep the size of header files as small as possible. The primary reason for this is that it keeps the compiled object files small. As well, the fewer things included in header files, the less likely we are to encounter common problems which occur with header files. Lets explore those now...

Common problems resulting from header file miss-use

There are two ways to incorrectly use header files. The first is what you put into the header files, and the second is how you include your header files. Lets look at both:

What you put in header files: People's first response when they learn about header files is to put everything in a header file so they have access to its contents everywhere. As we discussed above, header files are for declarations of identifiers you want available in multiple source files. Often times people forget this and will attempt to include "Definitions" in their header files. This can often lead to "Multiply defined" symbols or "Function re-definition." During the compile stage your compiler simply checks for the existence of a declaration, but in the link stage your linker attempts to match each instance of an identifier with its matching definition. Whereas the compile stage deals with each source file as a separate entity, the link stage combines the contents of all object files into a single symbol pool. If you've provided a definition within your header file, then multiple source files will have a matching definition - and the symbol pool will have the same symbol defined more than once. This will often confuse the linker as it doesn't know which definition to use.

How you include your header files: There are two common questions people have when including header files, or rather, two observations they seem to make.

Observation 1

First, people often have the misunderstanding that the order in which they include header files matters...it doesn’t, if they're used correctly. The reason order *appears* to matter, is because people often forget that the compiler never actually sees the header files. Header files are "Copied" into a source file by the preprocessor just before the source file is compiled. "So?" you ask...Well, this means that header files must obey all the same rules as source files. In particular, all symbols used within the header file must be declared before they're used. Often what occurs is people have declared a necessary symbol within a header file to be used within the source file. They forget, however that the symbol was declared in a header file, and then they attempt to use the symbol within another header file. The problem is, now the second header file MUST be included after the first header file, or the compiler will complain that the symbol is undefined...and it is, at least when THAT header is included. There are three solutions to this problem which many people use.

Solution A: The first solution is to just declare the symbol again at the top of the second header file. Although multiple definitions are not allowed, multiple declarations are just fine, so long as all the declarations match. So including a second declaration within the second header file successfully detaches it from the first header, meaning order no longer matters.

Solution B: The second solution people often try is to pull the symbol declaration out of BOTH header files, and instead put it in the source file ABOVE the #include of either header file. This does remove the dependency between header files, but you must be careful with this approach as now anywhere either of those header files are included, a declaration must be made in the source file, just above the #include.

Solution C: The third solution people often try is to pull the declaration out of either header file, and instead put it in a 3rd header file which is then included in both header files. This is what is sometimes referred to as a header chain, which is the subject of the next observation people make.

Observation 2

People often feel like they need to include multiple header files within each header file, just so they can get their code to compile. This is a bad sign and is an indication they've got header chains, or that they're not doing a very good job making sure their symbols are defined locally within the header. We'll get into this more in later chapters when we begin covering classes.

Ultimately, this observation is a sign that you've got too much 'going on' in your header files. Remember that your source files are where the action is supposed to go. Remember to put your definitions in your source files, and try and include as few header files as necessary within your header files. Your source files should be including header files, not your header files. Its better for your source files to include a large number of header files, than for your header files to include multiple header files. There's two primary reasons for this:

1. Performance: By including things into header files, you're causing all source files which include those header files to ALSO include anything that the header file included...phew, did you get that? This causes your source files to become unnecessarily large which can have a negative impact on performance.

2. Dependencies: A feature of most compilers is dependency checking. Whenever you build your application it uses timestamps to determine which source files need to be rebuilt. However, if your source file includes header file, it also checks THOSE timestamps as well. If a header file is more recent than the source file which includes it, the source file must be recompiled. And of course, this timestamp philosophy travels all the way up the dependency chain. So if you've got a header file 4 levels up in a header chain, then all headers beneath it become invalidated, which means any header or source file which includes one of THOSE header files also become invalidated...blah blah blah. As you can see, even with a small dependency chain it becomes possible to force a rebuild of your entire code base, just by making a relatively small change.

By keeping your header files small, and including them directly in source files, you guarantee the minimal rebuild for changes you make in your headers.

Well, I hope this has been informative and helpful. Please ask any specific questions you might have. If you need examples, don’t hesitate to ask.

Cheers!

[Edited by - jwalsh on June 14, 2006 5:45:54 PM]
Jeromy Walsh
Sr. Tools & Engine Programmer | Software Engineer
Microsoft Windows Phone Team
Chronicles of Elyria (An In-development MMORPG)
GameDevelopedia.com - Blog & Tutorials
GDNet Mentoring: XNA Workshop | C# Workshop | C++ Workshop
"The question is not how far, the question is do you possess the constitution, the depth of faith, to go as far as is needed?" - Il Duche, Boondock Saints
@Oluseyi: you are right. As a professor told me some years ago, the devil lies (always) in the details, and the one who forget to consider the details is a fool. In this case, I used an imprecise terminology in order to give a somewhat precise definition of two important words, and I forgot to explain the details. My bad. Apologies to everyone.

Quote: Original post by adam23
I have one question about your post before Emmanuel. You have three files, main.ccp, file.h, and file.cpp.

I was wondering what the recommended way to include files is especially if you have a lot of files.
//Lets say I have a main.h
*** Source Snippet Removed ***

Then my main.cpp would look like this

*** Source Snippet Removed ***

Now lets take and say file.cpp
I usually do it this way so I have access to all other included files.
*** Source Snippet Removed ***

I've found that this prevents a lot of my preprocessor errors. The main thing I have found is that the order they are included matters, but I have never had any other problems doing it this way. I typed this in here, so there may be some errors, but I hope you understand my question. In file.h I wouldn't have any includes.

What I think is happening is that the declaration for file.h is being posted in main, then all of that is being posted in file because of the include in file.


You can verify your assumption using the "/P /EP" technique that I described earlier. But if by main you intended to write main.h then your are technically right: everything in main.h will be copied file.cpp, and since everything in file.h is 'copied' in main.h, then you'll have everything you need in the preprocessed file.cpp.

However, the question you ask is not an easy one. I have to quit the field of pure C++ programming to enter the large field of software engineering to give your the beginning of an answer.

In order to be efficient (as in "I type less thus I have fewer possible bugs"), you'd better try to limit your #include directives to what's really needed. Moreover, it is a good practive to always include file.h in file.cpp (if file.cpp implements what's declared in file.h) even if another file included in file.cpp already includes file.h. The main reason behind this is that the object oriented paradigm tells you that while the implementation of a class might change, tghe interface of this class is less subject to changes than its implementation. In C++, header files do more than describing an interface - they also describe the private part of a class, and that's part of the implementation. It means that whenever you change the implementation of a particlar class, you might also change the header file for this class.

On day D, you class A is using class B internally (for example, a private member of A is a B instance). Thus, A.h includes B.h. B.cpp instantiate a A object, so B.cpp is including A.h. You feel it's enough - since A.h includes B.h, you d'ont have to include B.h in B.cpp. Later, (day D+1) you change the implementation of A and you figure out that you don't need to have a member of type B. As a consequence, you remove the #include "B.h" directive from "A.h". Suddenly, B.cpp refuse to compile, despite the fact that you don't change anything in B.cpp.

Remember that you are using header guards in your headers. Those guards avoid multiple inclusions of a header file in a compilation unit (such multiple inclusion might result in multiple declaration of the same symbol, something that a compiler don't like very much). As a consequence, there is no problem to #include "B.h" twice in B.cpp. You can take advantage of this to avoid the error I just explained.

The correct usage of header file is hard, but fortunately, some rules might help:
  • The implementation file of a class should explicitely include the file that declare this class (A.cpp always includes A.h).

  • The less file you include, the better it is

  • The order of the #include directives should not depend on the #included files

  • Whenever you use a symbol that is declared in header1.h in another file (either another header file or a cpp file), #include "header1.h" in this file

The benefits of the firt rule is easy to understand: you don't have to search which file really includes the declaration of your class, and you avoid the error I described earlier.

The benefits of the second rule is less visible and has to do with the reduction of dependencies. A file should be included only if there is a direct dependency between the includer and the included file. Since more dependencies means more coupling means a decrease in code reuseability, avoiding dependencies will probably (not always) increase reusability - when it comes to software design, this is considered as a Good Thing.
Be aware that this rule doesn't mean "don't include any header file in your cpp". One big mistake in software design is to create what is called a god class - ie a class that do everything. Of course, it means that this class won't depend on any other class - but it also means that there is no abstraction in your project, and no abstraction leads to no possible reuse. God classes are not OO programming - they are procedural programming in disguise.

The third rule is a bit more complex to understand. The goal is also to ease both core reading and code writing. There is nothing more insane that having to include foo.h before bar.h whenever you want to declare the class "bar". If bar needs to know foo, then, please, include foo.h into bar.h.

The last rule is a temporary one - you'll see that you can bypass it when you'll learn about forward declarations. For the moment, consider it as an axiom because it is the only way to correctly follow the 3rd rule. The other benefit is that you can know what will be used in a file by reading the list of included files.

Let's implement these rules into a simple example: main.cpp defines main(); main() instantiate class A and class A contains a member of type B. We have 5 files: main.cpp, A.h, A.cpp, B.h, B.cpp.

Rule 1 means that A.cpp always includes A.h and B.cpp includes B.h.
Rule 2 tells you that since B.cpp don't need A.h, you don't have to #include A.h in B.cpp (rather obvious)
Rule 3 means that since a B object is declared in the interface of A, A.h needs to include B.h. If we fail to do this, we have to #include "A.h" whenever we want to use a B object - thus, the #include order matters.
Rule 4 means that since A.h is referencing class B, we need to include B.h in A.h (cool: rule 3 already told us to do so. This is not very important because the reason is very different).
Conclusion:
  • main.cpp includes A.h

  • B.cpp includes B.h

  • A.cpp includes A.h

  • A.h includes B.h


In your example, main.cpp includes main.h, main.h includes file.h and file.cpp includes file.h. If file.cpp needs to instantiate (or just need to know) a symbol that is declared in main.h then it should also include main.h.

There is something even less obvious in your example because it introduce something which is called circular dependencies (A depends on B which depends on A). Generally speaking, this is something to avoid. There are some cases where it is difficult to get rid of this circular dependencies but in most situations this is software design mistake.

I won't discuss more about circular dependencies now - experience told me that software design is something that you must learn later [smile]

I hope I'm clear (and precise enough [smile])

And I added those </li> tags! [smile]
Thank you guys for taking the time to respond in so much detail to my question. I basically have two more questions, (as if I didn't ask enough huh :) ).

1. jwalsh you mentioned forward declarations for classes, but if you do it this way are you restricted to only using references of objects in that class?

2. I am working on a game engine that encapsulates everything, DirectInput, DirectSound, Direct3D, and so on. Right now I have seven header files and five cpp files. What I was trying to accomplish was one point of contact with the engine. Here is a basic example of what I have set up.
//==========================================================//Engine.h//Created by Adam Larson//==========================================================//-----------------------------------------------------------------------------// DirectInput Version Define//-----------------------------------------------------------------------------#define DIRECTINPUT_VERSION 0x0800//------------------------------------------//System Includes//------------------------------------------#include <stdio.h>#include <tchar.h>#include <windowsx.h>//-----------------------------------------------------------------------------// DirectX Includes//-----------------------------------------------------------------------------#include <d3dx9.h>#include <dinput.h>//------------------------------------------//Engine Includes//------------------------------------------#include "LinkedList.h"#include "ResourceManager.h"#include "Input.h"#include "Geometry.h"#include "Font.h"#include "State.h"


See in this situation ResourceManager needs LinedList, Font needs Geometry.h. Am I doing this correctly by having all my includes in the header for the engine and then linking everything here. How do I get around worrying about the order? I could include LinkedList in ResourceManager and Geometry in Font, but I also need these classes for variables in Engine.h. For example I am creating a linked list of states as a private member of the Engine class.

Thanks again for all the help, we all really appreciate it. Includes have always been one thing that has confused me because I have been told so many different things. I do remember in college my professor telling me to always include the header in the cpp file that implements the header. Then this year in college my professor is telling me the opposite, by saying that everything is placed in the order you put them in, so you have it included in Engine.h, and have Engine.h included in the source.
Adamhttp://www.allgamedevelopment.com
Quote: Original post by Emmanuel Deloget
Remember that you are using header guards in your headers. Those guards avoid multiple inclusions of a header file in a compilation unit (such multiple inclusion might result in multiple declaration of the same symbol, something that a compiler don't like very much).

Not quite accurate. Compilers don't necessarily care about multiple declarations:
extern int a;extern int a;

The above code compiles perfectly.

What compilers choke on is multiple competing declarations:
extern int a;extern int a;int a = 5; // ERROR: Redefinition!

The problem here lies in the fact that the compiler has been told that a variable named a with type int exists, but it is defined in another compilation unit with namespace-level visibility (we can assume this code to be in the global namespace, for simplicity). However, we then go ahead and introduce another variable with the same identifier and signature, which is a collision - a symbol redefinition and a competing declaration.

In the case of a function declaration, we don't have that problem:

#include <iostream>int f();int f();int main(){	extern int a;	extern int a;	std::cout << f() << std::endl;	return 0;}int f(){	return 1;}


The above compiles just fine, no errors.

The difference lies in the implication of the extern keyword and the fact that functions are not first-class objects in C++, but really a sort of meta-object.

(These, incidentally, are the reasons C++ constitutes a terrible beginner language. Explaining seemingly simple things quickly leads into byzantine explorations of the unintuitive. [smile])
Quote: Original post by adam23
1. jwalsh you mentioned forward declarations for classes, but if you do it this way are you restricted to only using references of objects in that class?

Yes. Until a complete declaration for the class is presented, you are restricted to using only pointers and references to the class in your code. Since forward class declarations typically appear in header files, which should only describe interfaces and not behaviors, this is generally not a problem:

class X;class Y{  //...  X * x;  X & returnAnX() const;  X anotherX; // ERROR};


Quote: How do I get around worrying about the order? I could include LinkedList in ResourceManager and Geometry in Font, but I also need these classes for variables in Engine.h. For example I am creating a linked list of states as a private member of the Engine class.

If you need it directly, include it.

Quote: Then this year in college my professor is telling me the opposite, by saying that everything is placed in the order you put them in, so you have it included in Engine.h, and have Engine.h included in the source.

Having Geometry.cpp include Engine.h is just stupid. The engine is at a higher level of abstraction than the Geometry routines - the Geometry routines don't need to know about the Engine, but the Engine needs to know about the Geometry routines.

Of course, one must question the wisdom of having a class named Geometry. Sounds like someone needed a namespace...
Quote: Original post by Emmanuel Deloget
C++ beginners often have problems to see when they should use the "declare" word or the "define" word, so you are not alone. As I already stated in the very first thread of this workshop, I don't own the book so I don't know if it contains a simple definition of these words.

Let's remember what a C++ compiler does:
  1. first, it preprocess the C++ file
  2. then it compiles the preprocessed file
  3. then it links all the compiled C++ files into one big executable file


The goal of the declaration of a C++ symbol is to tell the compiler that the symbol exists somewhere. Essentially, it says "this symbol exists somewhere, you don't have to know eaxactly where it is, and it has that name and that signature". Once a compiler knows every symbol that is used in a particular C++ file, it can compile the file and produce the object (.o or .obj) file.

The goal of the definition is tu put something behind the symbol itself.


        // ----------------------- file1.h#ifndef FILE_1_H#define FILE_1_H// this is the function ** declaration **int function_plus(int a, int b);#endif // FILE_1_H


Are the terminologies prototype and declaration mean the same thing? Because what I see in the above code sniplet is a function prototype?

Thank you all.
Have we sent the "Don't shoot, we're pathetic" transmission yet?
Quote: Original post by _EpcH_
Are the terminologies prototype and declaration mean the same thing? Because what I see in the above code sniplet is a function prototype?

Prototype is only relevant to functions; declaration is relevant to functions, classes and objects. A prototype is a declaration, but not all declarations are prototypes.
Thank you Oluseyi, I understand now :)

One more qustion:

Quote: Original post by Oluseyi
Compilers don't necessarily care about multiple declarations:
extern int a;extern int a;

The above code compiles perfectly.

What compilers choke on is multiple competing declarations:
extern int a;extern int a;int a = 5; // ERROR: Redefinition!


The problem here lies in the fact that the compiler has been told that a variable named a with type int exists, but it is defined in another compilation unit with namespace-level visibility (we can assume this code to be in the global namespace, for simplicity). However, we then go ahead and introduce another variable with the same identifier and signature, which is a collision - a symbol redefinition and a competing declaration.

In the case of a function declaration, we don't have that problem:



#include <iostream>int f();int f();int main(){	extern int a;	extern int a;	std::cout << f() << std::endl;	return 0;}int f(){	return 1;}


Quote: Original post by Oluseyi
The above compiles just fine, no errors.

The difference lies in the implication of the extern keyword and the fact that functions are not first-class objects in C++, but really a sort of meta-object.

(These, incidentally, are the reasons C++ constitutes a terrible beginner language. Explaining seemingly simple things quickly leads into byzantine explorations of the unintuitive. [smile])



You underlined that "Introducing another variable with the same identifier and signature causes a collision - a symbol redefinition and a competing declaration."

This may be a very noob question, I hope it isnt answered before and I am not stealing your time but what is an identifier and signature for a variable?

I guess for the declaration :

extern int a;

"extern int" is the signature? and "a" is the identifier?

And I guess we can use variables with the same identifier and sinature within different scopes in a source file without symbol redefination like:

#include <iostream>int f();int f();int main(){	extern int a;	extern int a;		{	int a = 5; //Different scope	}	std::cout << f() << std::endl;	return 0;}int f(){	return 1;}


Thanks all of you again,

Cheers!
Have we sent the "Don't shoot, we're pathetic" transmission yet?
Quote: Original post by _EpcH_:
This may be a very noob question, I hope it isnt answered before and I am not stealing your time but what is an identifier and signature for a variable?

I guess for the declaration :

extern int a;

"extern int" is the signature? and "a" is the identifier?


Not quite. For variables the signature consists of two parts - the data type, and the identifier. The data type is necessary to know the amount of memory to allocate, and the identifier is necessary to know what symbol to associate with that location in memory.

Signature = Type Identifier;
ie. "int a"

The "extern" at the beginning merely tells the compiler that the definition of the variable will be established at link time. In essence this "reserves" a definition for a variable with that identifier. More technically, when the compiler sees an "extern" it creates a symbol without allocating any memory. The understanding is that the memory for that variable is allocated elsewhere, and at link time a reference to that memory is created.

The compiler complains in Oluseyi's example because following an extern with a variable with the same identifier and NOT using the extern tells the compiler that the definition is found locally - hence, to create a variable with that identifier AND allocate memory for it. Now you've got a single identifier with conflicting definitions...one for which the address will be determined later at link time, and one for which the address is found within the current object. Although this would technically be a "link error," the problem can be determined within the scope of a single source file, so the compiler's smart enough to catch it early, before making it to the link phase.

Quote: Original post by _EpcH_:
And I guess we can use variables with the same identifier and signature within different scopes in a source file without symbol redefinition...


You are correct. Within a different scope you are at a different location on the stack, which we will talk about more later. Suffice it to say, the problem I mentioned earlier is not present, because the compiler can now successfully allocate memory for the new variable with the same identifier without confusing it with the one defined externally.

Cheers!
Jeromy Walsh
Sr. Tools & Engine Programmer | Software Engineer
Microsoft Windows Phone Team
Chronicles of Elyria (An In-development MMORPG)
GameDevelopedia.com - Blog & Tutorials
GDNet Mentoring: XNA Workshop | C# Workshop | C++ Workshop
"The question is not how far, the question is do you possess the constitution, the depth of faith, to go as far as is needed?" - Il Duche, Boondock Saints

This topic is closed to new replies.

Advertisement