Hungarian Naming Convention
Naming Conventions (Hungarian)
January 18, 1988
1. Introduction
This document describes a set of naming conventions used by the Applications
Development group. These conventions commonly go by the name "Hungarian",
referring both to the nationality of their original developer, Charles Simonyi, and also to
the fact that to an uninitiated programmer they are somewhat confusing. Once you have
gained familiarity with Hungarian, however, we believe that you will find that the clarity
of code is enhanced. For convenience, this memo first describes how to use Hungarian,
and then describes why it is useful; the general approach is from a programming
viewpoint, rather than a mathematical one. For a more theoretical approach, you are
invited to read Chapter 2 of Simonyi's "Meta-Programming" thesis.
2. The Rules
Hungarian is largely language independent; it is equally applicable to a
microprocessor assembly language and to a fourth-generation database application
language (and has been used in both). However, there is a little flavor of C, in that arrays
and pointers to arrays are not clearly distinguished. While this may sound confusing, in
practice there is little ambiguity.
2.1. Variables
The most common type of identifier is a variable name. All variable names are
composed of three elements: prefixes, base type, and qualifier. (These are also referred
to as constructors, tag, and qualifier). Not all elements are present in all variable names;
the only part that is always present is the base type. This type should not be confused
with the types supported directly by the programming language; most types are
application specific. For example, an lbl type could refer to a structure containing symbol
information; a co could be a value specifying a color.
2.1.1. Base types (tags)
As the above examples indicate, tags should be short (typically two or three
letters) and somewhat mnemonic. Because of the brevity, the mnemonic value will be
useful only as a reminder to someone who knows the application, and has been told what
the basic types are; the name will not be sufficient to inform (by itself) a casual viewer
what is being referred to. For example, a co could just as easily refer to a geometric
coordinate, or to a commanding officer. Within the context of a given application,
however, a co would always have a specific meaning; all co's would refer to the same
type of object, and all references to such an object would use the term co.
One should resist the natural first impulse to use a short descriptive generic
English term as a type name. This is almost always a mistake. One should not preempt
the most useful English phrases for the provincial purposes of any given version of a
given program. Chances are that the same generic term could be equally applicable to
many more types in the same program. How will we know which is the one with the
pretty "logical" name, and which have the more arbitrary variants typically obtained by
omitting various vowels or by other disfigurement? Also, in communicating with other
programmers, how do we distinguish the generic use of the common term from the
reserved technical usage? In practice, it seems best to use some abbreviated or form of
the generic term, or perhaps an acronym. In speech, the tag may be spelled out, or a
pronounceable nickname may be used. In time, the exact derivation of the tag may be
forgotten, but its meaning will still be clear.
As is probably obvious from the above, it is essential that all tags used in a given
application be clearly documented. This is extremely useful in helping a new
programmer learn the code; it not only enables him (or her) to decode the otherwise
cryptic names, but it also serves to describe the underlying concepts of the program,
since the data types tend to determine how the program works. It is also worth pointing
out that this is not nearly as onerous as it sounds; while there may be tens of thousands of
variables in a program, the number of types is likely to be quite small.
Although most types are particular to a given application, there are a few standard
ones that appear in many different ones; synonyms for these types should never be used:
f a flag (boolean, logical). The qualifier (see below) should describe the
condition that will cause this flag to be set (e.g. fError would be clear
if there were no error, set if one exists). This tag may refer to a single
bit, a byte, or a word; often it will be an object of type BOOL (defined
by the application, usually as int). Usually the object referred to will
contain either 1 (fTrue, TRUE) or 0 (fFalse, FALSE). In some
instances, other values may be used, either for efficiency or historical
reasons; such a use usually indicates that another type may be more
appropriate.
ch a one-byte character. Note that this is not adequate for Kanji.
st a Pascal-type string (first byte is count, remainder is the actual
characters). Typically refers to a pointer to the actual memory. This
should be the most common type of string used in the Applications
group; it is more efficient than an sz (below).
sz a zero-terminated string, or a pointer to it. These are most often used
to interface to an operating system (or equivalent) that requires them;
for most other uses, an st is preferable. Unfortunately, C string
constants are normally zero-terminated, so it takes a little more effort
to use st's; the effort is worth it. The Applications Development
compiler provides ways to make strings constants st's.
fn a function. Since about the only thing you can do with a function is
take its address, this almost always has a "p" prefix (see below). For
this reason, in some applications fn is itself used to mean pointer to a
function.
There are some more types that appear in many applications; they should only be
used for the most generic purposes:
w a word (typically 16 bits). For most purposes, this is an incorrect
usage, since the usage of the word is specific to a particular type of
word, and should be so distinguished. Correct usages are generally
limited to generic subroutines (e.g. sort an array of words) that can
deal with a number of different types; another common use is in
conjunction with the prefix c (see below), to produce a count of words
(the size) for some object. The exact meaning of w is also somewhat
loose; it sometimes means a signed quantity and sometimes unsigned.
b a byte (typically 8 bits). The same warnings apply to this as to w.
l a long (typically 32 bits). The same warnings apply to this as to w.
u an unsigned word (typically 16 bits). The same as w, except this is
always unsigned.
bit a single bit. Typically used to specify bits used within other types.
This concept is usually better handled with the "f" and "sh" prefixes
(see below).
v a void. This corresponds to the C definition of void, meaning that the
type is not specified. This type will never be used without a "p" prefix
since it is not possible to have an unspecified type for a variable;
conceivably ther are additional prefixes (e.g. ppv), but such a usage is
unlikely. It is perfectly valid to assign a pv to a pointer of any other
type, or vice versa. The major use of this type is for generic
subroutines (such as allocate and free) which return or take as
arguments pointers of various types.
There are a few types that are used widely within the applications group, but may
not be applicable to others:
env an environment. Used to implement non-local goto's (SetJmp and
DoJmp). The exact format of an env (including size), varies from
system to system.
sb a segment base. The part of a segmented pointer that determines the
segment. The exact implementation varies from system to system.
These are used directly in some applications for efficiency; the same
results can be obtained (less efficiently) through the use of far or huge
pointers.
ib an offset. The part of a pointer that determines the offset within a
segment. These are used directly in some applications for efficiency;
the same results can be obtained (less efficiently) through the use of
far or huge pointers. For the literal-minded, ib is not really a new type
at all; it is simply the prefix i (index) applied to the type b (byte), with
the viewpoint that a segment is just an array of bytes. Many people
prefer to consider it a true indivisible base type.
There are a number of other basic types used widely for applications that run
under Windows or the Macintosh:
[This list has not been compiled yet]
2.1.2. Prefixes (constructors)
Base types are not by themselves sufficient to fully describe the type of a
variable, since variables often refer to more complex items. The more complex items are
always derived from some combination of simple items, with a few operations. For
example, there may be a pointer to an lbl, or an array of them, or a count of co's. These
operations are represented in Hungarian by prefixes; the combination of the prefixes and
the base type represent the complete type of an entity. Note that a type may consist of
multiple prefixes in addition to the base type (e.g. a pointer to a count of co's); the
prefixes are read right to left, with each prefix applying to the remainder of the type (see
examples below). The term constructor is used because a new type is constructed from
the combination of the operation and the base type.
In theory, new prefixes can be created, just as new types are routinely created for
each application. In practice, very few new prefixes have been created over the years, as
the set that already exists is rather comprehensive for operations likely to be applied to
types. Prefixes that have been added tend to deal with the specifics of machine
architecture, and are variations on existing prefixes (i.e. different flavors of pointers).
One can go overboard in refusing to create a new prefix, however; some new concepts
really are logically expressed as prefixes, not types. A couple of examples of incorrect
usage in the list below derived from the reluctance to create a new prefix.
The standard prefixes are:
p a pointer. Within the Applications group, this is usually used to refer
to a near pointer (16 bit). Note that a pointer is not itself a type, it is an
operation applied to a type. For example, a pch is a pointer to a
character.
lp a far pointer. This is a 32 bit direct pointer; the actual implementation
is machine-dependent, and represents the native pointer format for that
machine. These usages are becoming rarer as we deal more and more
with moveable heaps and true segmentation. Some applications use q
instead of lp; a few decided (erroneously) that this was a type rather
than an operation, and used pa as the type tag. l can itself be viewed as
a prefix (applied in this case to the "p" prefix); it can only be applied
to pointer prefixes or types. For example, lrg is a far array and lst is a
far pointer to a Pascal-type string.
hp a huge pointer. This is a 32 bit pointer, composed of an sb and an ib.
Some level of machine- and environment-dependent indirection is
used to handle segmentation. These are now used extensively for non-
local data structures. Some older applications use k instead of hp; Mac
Excel decided (erroneously) that this was a type rather than an
operation, and used ptr as the type tag. As with l, h can be viewed as a
prefix, though ambiguity occurs when h is also being used a prefix
meaning handle (see below).
np a near pointer. This is a 16 bit pointer. This prefix is not used within
the Applications Development Group; p refers to near pointers. This
prefix would be used to explicitly refer to a near pointer when
compiling a large model program with a conventional compiler. As
with l, n can be viewed as a prefix.
rg an array, or a pointer to it. The name comes from a mathematical
viewpoint of an array as the range of a function (see mp and dn
below). For example, an rgch is an array of characters; a pch could
point to one of the characters in the array. Note that it is perfectly
reasonable to assign an rgch to a pch; pch points to the first character
in the array.
i an index into an array. For example, an ich is used to index an rgch
c a count. For example, the first byte of an st is a count of characters, or
a cch.
d a difference between two instances of a type. This is often confused
with a count, but is in reality quite separate. For example, a cch could
refer to the number of characters in a string, whereas a dch could refer
to the difference between the values 'a' and 'A'. The confusion arises
when dealing with indices; a dich (difference between indices into a
character array) is equivalent to a cch (count of characters); which one
to use depends on the viewpoint. This gets most confusing when
dealing with base types that are in effect indices, though not
specifically labelled as such. For example, a spreadsheet could have a
rw type that indicates a row in the spreadsheet; it does not contain the
actual data for the row, but is simply a one-word integer specifying the
row number. A type specifying a count of rows (not rw's) would
correctly be a drw (difference between row numbers), not a crw (count
of row numbers).
h a handle. This is often a pointer to a pointer (used to allow moveable
heap objects). The types of the pointers may vary among applications;
the two most common cases are a near pointer to a near pointer (h is
equivalent to pp) and a far pointer to a far pointer (h is equivalent to
lplp). Most commonly used for interface to an operating system;
within applications, moveable objects can be handled through huge
pointers. In some systems (e.g. Windows) a handle is not a pointer to a
pointer. To avoid confusion it may be best to use pp (or lplp) as
prefixes when the application is actually going to do the indirection,
and reserve h for instances in which the handle is just passed on to the
system. Doing this prevents the most common misuse of h in defining
a handle to an array (or other implicit pointer type); uses of hsz to
imply two indirections to obtain a character are incorrect. This should
properly be done as a psz or, if h must be used, as an hasz (see 'a'
prefix below).
hh a huge handle. This is a huge pointer to a 16-bit pointer within the
same segment pointed to by the huge pointer; it is useful for managing
heaps. This can be viewed as an hpib, where the sb to be used with the
ib is obtained from the hp.
gr a group, or a pointer to it. This is similar to an rg, but is used for
variable size objects. In this case an index (i) is not particularly useful,
since it can not be used directly to obtain an object (one can, of course,
write a routine that will take the gr and i, walk through the data in a
type-specific manner, and derive a pointer to the object desired). This
is a rarely used prefix, and in some code, grp has been used instead of
gr.
b an offset. This is typically used in conjunction with a gr, in place of an
i, in order to get around the problem mentioned above. This offset is
in terms of bytes, so pfoo=(BYTE *)grfoo+bfoo. As with gr, this is a
somewhat rare usage in current code. b originally stood for base-
relative pointer, but should really be considered to be an offset within
a data structure; true base-relative pointers are just near pointers (p);
the base is the segment they are within.
mp an array. This prefix is followed by two types, rather than the standard
one, and represents the most general case of an array. From a
mathematical viewpoint, an array is simply a function mapping the
index to the value stored in the array (hence mp as an abbreviation of
map). In the construct mpxy, x is the type of the index and y is the
type of the value stored in the array. In most cases, the only type that
is important is the type of the value; the index is always an integer
with no other meaning. In this case, an rg is used; this means that an
rgx is equivalent to an mpixx. (This also explains the weird prefix rg;
it is an abbreviation for range).
dn an array. This is used in the rare case that the important part of the
array mapping is the index, not the value. dn is an abbreviation for
domain. Only a few of these are used in the entire Applications group;
an example of a plausible use is given in the discussion of e, below.
e an element of an array. This is used in conjunction with a dn (and is
thus just as rare); it is the type of the value stored in a dn. Just as rgx is
equivalent to mpixx, dnx is equivalent to mpxex. An example of use is
in the native code generation part of the CS compiler; there is a type
vr (an acronym for virtual register). A vr is just a simple integer,
specifying which register to use for various pieces of code output.
However, there is quite a bit more information than just a number that
is associated with each register. This additional data is stored in a
structure called an evr; there is an array of them called dnvr. Thus, the
information for a given register can be found with the expression
dnvr[vr].
f a bit within a type. This is a new prefix that is currently used only by a
few projects, but is now the approved method for dealing with bits. It
is typically used for overloading an integer type with one or more bit
flags, in otherwise unused portions of the integer. This should not be
confused with the f type, in which the entire value is used to contain
the flag. An example is a scan mode (type sm), with possible values
smForward and smBackwards. Since the basic mode only requires a
few bits (in this case only one bit), the remainder of a word can be
used to encode other information. One bit is used for fsmWrap,
another for fsmCaseInsens. Here the f is a prefix to the sm type,
specifying only a single bit is used.
sh a shift amount. This is another new prefix used to deal with bits within
other types (complementing the "f" prefix); it specifies the location
within the type by a bit number (rather than the bit mask which the "f"
prefix specifies). It actually is followed by two types; the first type is
the type being shifted (almost always an f), and the second type is the
type the bits are stored within. Continuing the above example of scan
modes, if fsmWrap has a value of 4000 hex, shfsmWrap would have
the value of 14.
u a union. This is a rarely used prefix; it is used for variables that can
hold one of several types. In practice this becomes unwieldy. An
example is a urwcol, which can hold either a rw type or a col type.
a an allocation. This is a rarely used prefix; it is used to distinguish
between an array and a pointer to it. Thus, sz is a pointer to a null-
terminated string and asz is the actual allocated space. a is almost
invariably used in conjunction with a pointer-type prefix, in order to
allow the pointer to be explicit (rather than implicit, as with an sz). It
is essentially the inverse of a p prefix, so pasz is equivalent to sz. Its
best use is with the h prefix; hasz is a handle to a null-terminated
string. Most of the current Applications code (incorrectly) omits the a.
v a global. This is really not a correct usage of Hungarian, but you may
see it used in some applications anyway. It really should be a qualifier
(see below), if it is present at all. Except in extremely bizarre cases, it
must be the first prefix.
2.1.2.1. Some examples
Since the prefixes and base types both appear in lower case, with no separating
punctuation, ambiguity can arise. Is pfc a tag of its own (e.g. for a private first class), or
is it a pointer to an fc? Such questions can be answered only if one is familiar with the
specific types used in a program. To avoid problems like this it is often wise to avoid
creating base type names that begin with any of the common prefixes. In practice,
ambiguity does not seem to be a problem. The idea of additional punctuation to remove
the ambiguity has been shown to be impractical.
The following list contains both common and rarer usages:
pch a pointer to a character.
ich an index into an array of character.
rgst an array of Pascal-type strings. Hungarian is not sufficient in itself to
indicate whether this is an array of characters or an array of pointers;
since strings are usually variable length, it is probably a safe bet that
this is an array of pointers to the actual characters.
grst a group of Pascal-type strings. As with the above example, this could
be either an array of characters or of pointers; since it is a gr, not an
rg, it is probably safe to assume that it is an array of characters.
bst an offset to a particular Pascal-type string in a grst.
phpx a near pointer to a huge pointer to an object of type x.
pich a near pointer to an index into a character array. A common use for
something like this is passing a pointer as a parameter to a function so
that a return value can be stored through the pointer; pich would be
extremely unlikely to be used in an expression without indirection
(pich+=2 is probably gibberish; (*pich)+=2 may well be meaningful).
en probably a base type (such as an entry). Conceivably it is an element
of an array indexed by an n; only knowledge of the application can tell
for certain.
hrgn handle to a region. Again there is ambiguity; this could be interpreted
as a handle to an array of n's or a huge pointer to an array of n's.
dx length of a horizontal line (difference between x coordinates).
rgrgx a two-dimensional array of x's (an array of arrays of x's).
mpmipfn an array of pointers to functions, indexed by mi's. For example, an mi
could be a menu item, and this array could be used for a command
dispatch. Again, context makes the parsing clear; this could equally
well be interpreted as an array of fn's (perhaps friendly nukes),
indexed by mip's (perhaps missile placements).
pv pointer to a void. Could be used as an argument to Free.
hrgch huge pointer to an array of characters. Could instead be interpreted as
a handle to an array of characters, depending on the application.
2.1.3. Qualifiers
While the prefixes and base type are sufficient to fully specify the type of a
variable, this may not be sufficient to distinguish the variable. If there are two variables
of the same type within the same context, further specification is required to
disambiguate. This is done with qualifiers. A qualifier is a short descriptive word (or
facsimile; good English is not required) that indicates what the variable is used for. In
some cases, multiple words may be used. Some distinctive punctuation should be used to
separate the qualifier from the type; in C and other languages that support it, this is done
by making the first letter of the qualifier upper-case. (If multiple words are used, the first
letter of each should be upper-case; the remainder of the name, both type and qualifier, is
always lower-case. There is one special case to watch out for; defined constants
specifying the size of a type are often of the form cbFOO or cwFOO, where foo is the
type. Strictly speaking only the F in FOO should be capitalized, but the incorrect usage is
fairly common.)
Exactly what constitutes a naming context is language specific; within C the
contexts are individual blocks (compound statements), procedures, data structures (for
naming fields), or the entire program (globals). As a matter of good programming style,
it is not recommended that hiding of names be used; this means that any context should
be considered to include all of its subcontexts. (In other words, don't give a local the
same name as a global.) If there is no conflict within a given context (only one variable
of a given type), it is not necessary to use a qualifier; the type alone serves to identify the
variable. In small contexts (data structures or small procedures), a qualifier should not be
used except in case of conflict; in larger contexts it is often a good idea to use a qualifier
even when not necessary, since later modification of the code may make it necessary. In
cases of ambiguity, one of the variables may be left with no qualifier; this should only be
done if it is clearly more important than the other variables of the same type (no qualifier
implies primary usage).
Since many uses of variables fall into the same basic categories, there are several
standard qualifiers. If applicable, one of these should be used, since they specify meaning
with no chance of confusion. In the case of multiple word qualifiers, the order of the
words is not crucial, and should be chosen for clarity; if one of the words is a standard
qualifier, it should probably come last (unfortunately, this suggestion is by no means
uniformly followed). The standard qualifiers are:
First the first element in a set. This is usually used with an index or a
pointer (e.g. pchFirst), referring to the first element of an array to be
dealt with. The index may be an implied index (as with a rw type in a
spreadsheet).
Last the last element in a set. This is usually used with an index or a pointer
(e.g. pchLast), referring to the last element of an array to be dealt
with). Both First and Last represent valid values (compare with Lim
below); they are often paired, as in this common loop:
for (ich=ichFirst; ich<=ichLast; ich++)
Lim the upper limit of elements in a set. This is not a valid value; for all
valid values of x, x