Understanding the `extern` keyword in C language

Introduction

Almost every programmer knowns C language since it's usually the first programming language they learned. However, C is very hard to gain a deeper understanding because of its low-level property. In this article, we will focus on the extern keyword and find out its essence. Before we explain extern keyword, we have to understand some basic concepts. OK, let's begin.

Translation Unit

The text of the program is kept in source files which we also call them units. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit.

Example-1

snippet.c:

some contents;

example.h:

some contents;

example.c:

#include <example.h>
#include <snippet.c> /* Though including a C source file is not a good practice, it's valid. */
other contents;

example.c use #include to include example.h and snippet.c, so those three files constitute a preprocessing translation unit.

When the compiler compiles example.c, the first step is preprocessing which consists of expanding a source file to recursively replace all #include directives with the literal file contents (usually header files, but possibly other source files), macro expansion of #define directives and conditional compilation of #ifdef directives. After preprocessing, a preprocessing translation unit is translated to a translation unit.

Example-2

# Use gcc or cpp command to get the output of preprocessor.
gcc -E example.h example.h snippet.c -o example.i
or
cpp example.h example.h snippet.c > example.i

example.i is the translation unit. A translation unit contains all necessary information about later compilation. If we feed the translation unit to gcc, it will compile it to an object file.

Example-3

gcc -c example.i -o example.o

Since example.i has contained all the necessary information, we don't need other files. The output file example.o is an object file corresponding to the translation unit example.c.

The next step is linking. The linker (such as ld) will link all object files to a single executable file (or library file, another object file). The linking process performed by the linker involves symbol resolution, symbol relocation, etc.

Example-4

ld example.o -o example
or
gcc example.o -o example # gcc will invoke ld automatically.

References for this section:
C standard links
C99 Standard

Linkage

So you want to understand what the linkage really is? I don't want to cite the definitions from C99 standard because it's too abstract to understand. If you have known the concept of translation unit, the meaning of linkage is very clear.

There are three kinds of linkage: external, internal, and none.

For Variables

A variable declared outside any function block and defined without static keyword have external linkage. This variable can be accessed by all the functions in all the translation unit of a program.

A variable declared outside any function block and defined with static keyword have internal linkage. This variable can only be accessed by the functions in the same translation unit.

A variable declared within a block of code have internal linkage. For example, all variables defined in functions, curly braces and for loop parentheses have internal linkage.

For Functions

The function declared without static keyword have external linkage. This function can be called from all the functions in all the translation unit of a program.

The function declared with static keyword have internal linkage. This function can only be called from the functions in the same translation unit.

extern keyword

Though a variable or a function that has external linkage can be accessed in other translation unit, you can't access it directly because the compiler doesn't know where this variable or function is defined. You have to use some way to tell the compiler that a variable or a function is defined in another translation unit. Then the compiler can compile it normally. This is what extern does. When you want to use a variable or a function
that has external linkage outside its own translation unit, you have to declare it with extern keyword before using it.

Example-5

utils.c:

int a;
void sort()
{
    some code;
}

main.c:

extern int a; /* declaration before using */
extern void sort(); /* declaration before using */
void main()
{
    a = 1;
    sort();
}

You may think extern void sort(); is weird, because you always write void sort(); without an explicit extern keyword. These two forms are identical. C99 standard says:

The storage-class specifier, if any, in the declaration specifiers shall be either extern or static.

So though you don't explicitly use an extern keyword, the compiler still treat it as extern void sort();.

Other References:
Understanding extern keyword in C
Translation unit
linkage-in-c
external variable

Show Comments