compiler dependencies

[33] Compiler dependencies, C++ FAQ Lite

[33] Compiler dependencies
(Part of C++ FAQ Lite, Copyright © 1991-98, Marshall Cline, cline@parashift.com)

FAQs in section [33]:

[33.1] Where can I get more information on using MFC and Visual C++?
[33.2] How do I display text in the status bar using MFC?
[33.3] How can I decompile an executable program back into C++ source
code?
[33.4] Where can I get information about the C++ compiler
from {Borland, IBM, Microsoft, Semantic, Sun, etc.}?
[33.5] How do compilers use "over-allocation"
to remember the number of elements in an allocated array?
[33.6] How do compilers use an "associative
array" to remember the number of elements in an allocated array?
[33.7] If name mangling was standardized, could I link code
compiled with compilers from different compiler vendors?
[33.8] GNU C++ (g++) produces big executables for tiny programs;
Why?
[33.9] Is there a yacc-able C++ grammar?
[33.10] What is C++ 1.2? 2.0? 2.1? 3.0?

[33.1] Where can I get more information on using MFC and Visual C++?
[Recently created (on 5/98). Click here to go to the next FAQ in the "chain" of recent changes.]
The MFC/Visual C++ FAQ (http://www.stingray.com/mfc_faq) is
maintained by Scot Wingo.
[ Top | Bottom | Previous section | Next section ]

[33.2] How do I display text in the status bar using MFC?
Use the following code snipped:

    CString s = "Text";
    CStatusBar* p =
     (CStatusBar*)AfxGetApp()->m_pMainWnd->GetDescendantWindow(AFX_IDW_STATUS_BAR);
    p->SetPaneText(1, s);

This works with MFC v.1.00 which hopefully means it will work with other
versions as well.
[ Top | Bottom | Previous section | Next section ]

[33.3] How can I decompile an executable program back into C++ source
code?
You gotta be kidding, right?
Here are a few of the many reasons this is not even remotely feasible:

What makes you think the program was written in C++ to begin
with?
Even if you are sure it was originally written (at least partially)
in C++, which one of the gazillion C++ compilers produced it?
Even if you know the compiler, which particular version of the
compiler was used?
Even if you know the compiler's manufacturer and version number,
what compile-time options were used?
Even if you know the compiler's manufacturer and version number and
compile-time options, what third party libraries were linked-in, and what was
their version?
Even if you know all that stuff, most executables have had their
debugging information stripped out, so the resulting decompiled code will be
totally unreadable.
Even if you know everything about the compiler, manufacturer,
version number, compile-time options, third party libraries, and debugging
information, the cost of writing a decompiler that works with even one
particular compiler and has even a modest success rate at generating code would
be a monumental effort — on the par with writing the compiler itself from
scratch.

But the biggest question is not how you can decompile someone's code,
but why do you want to do this? If you're trying to reverse-engineer
someone elses code, shame on you; go find honest work. If you're trying to
recover from losing your own source, the best suggestion I have is to make
better backups next time.
[ Top | Bottom | Previous section | Next section ]

[33.4] Where can I get information about the C++ compiler
from {Borland, IBM, Microsoft, Semantic, Sun, etc.}?
[Recently corrected the URL for Borland C++ & HP C++; added GNU C++, Intel Reference C++, KAI C++, and Portland Group C++ (on 5/98). Click here to go to the next FAQ in the "chain" of recent changes.]
In alphabetical order by vendor name:

Borland C++ 5.0 FAQs:
http://www.turbopower.com/bcpp/
DJ C++ ("DJGPP"):
http://www.delorie.com/
GNU C++ ("g++" or "GCC"):
http://www.cygnus.com/
HP C++:
http://www.hp.com/lang/cpp/
IBM VisualAge C++:
http://www.software.ibm.com/ad/cset/
Intel Reference C++:
http://developer.intel.com/design/perftool/icl24/
KAI C++:
http://www.allys.cie.fr/ALLYS/KCC-english/v3.2/
Metrowerks C++: http://metrowerks.com or
http://www.metrowerks.com
Microsoft Visual C++:
http://www.microsoft.com/visualc/
Portland Group C++:
http://www.pgroup.com
Silicon Graphics C++:
http://www.sgi.com/Products/DevMagic/products/cplusplus.html
Sun Visual WorkShopTM for C++:
http://www.sun.com/workshop/visual
Symantec C++:
http://www.symantec.com/scpp/index_product.html
Watcom C++:
http://www.powersoft.com/products/languages/watccpl.html

[If anyone has other suggestions that should go into this list, please let me
know; thanks; (cline@parashift.com)].
[ Top | Bottom | Previous section | Next section ]

[33.5] How do compilers use "over-allocation"
to remember the number of elements in an allocated array?
Recall that when you delete[] an array, the runtime system magically
knows how many destructors to run. This FAQ
describes a technique used by some C++ compilers to do this (the other common
technique is to use an associative
array).
If the compiler uses the "over-allocation" technique, the code for p = new Fred[n] looks something like the following. Note that WORDSIZE is an
imaginary machine-dependent constant that is at least sizeof(size_t),
possibly rounded up for any alignment constraints. On many machines, this
constant will have a value of 4 or 8. It is not a real C++ identifier that
will be defined for your compiler.

    // Original code: Fred* p = new Fred[n];
    char* tmp = (char*) operator new[] (WORDSIZE + n * sizeof(Fred));
    Fred* p = (Fred*) (tmp + WORDSIZE);
    *(size_t*)tmp = n;
    size_t i;
    try {
      for (i = 0; i < n; ++i)
        new(p + i) Fred();           // Placement new
    } catch (...) {
      while (i-- != 0)
        (p + i)->~Fred();            // Explicit call to the destructor
      operator delete[] ((char*)p - WORDSIZE);
      throw;
    }

Then the delete[] p statement becomes:

    // Original code: delete[] p;
    size_t n = * (size_t*) ((char*)p - WORDSIZE);
    while (n-- != 0)
      (p + n)->~Fred();
    operator delete[] ((char*)p - WORDSIZE);

Note that the address passed to operator delete[] is not the
same as p.
Compared to the associative array
technique, this technique is faster,
but more sensitive to the problem of programmers saying delete p rather than
delete[] p. For example, if you make a programming error by saying delete p where you should have said delete[] p, the address that is passed to
operator delete(void*) is not the address of any valid heap
allocation. This will probably corrupt the heap. Bang! You're dead!
[ Top | Bottom | Previous section | Next section ]

[33.6] How do compilers use an "associative
array" to remember the number of elements in an allocated array?
Recall that when you delete[] an array, the runtime system magically
knows how many destructors to run. This FAQ
describes a technique used by some C++ compilers to do this (the other common
technique is to over-allocate).
If the compiler uses the associative array technique, the code for p = new Fred[n] looks something like this (where arrayLengthAssociation is
the imaginary name of a hidden, global associative array that maps from void*
to "size_t"):

    // Original code: Fred* p = new Fred[n];
    Fred* p = (Fred*) operator new[] (n * sizeof(Fred));
    size_t i;
    try {
      for (i = 0; i < n; ++i)
        new(p + i) Fred();           // Placement new
    } catch (...) {
      while (i-- != 0)
        (p + i)->~Fred();            // Explicit call to the destructor
      operator delete[] (p);
      throw;
    }
    arrayLengthAssociation.insert(p, n);

Then the delete[] p statement becomes:

    // Original code: delete[] p;
    size_t n = arrayLengthAssociation.lookup(p);
    while (n-- != 0)
      (p + n)->~Fred();
    operator delete[] (p);

Cfront uses this technique (it uses an AVL tree to implement the associative
array).
Compared to the over-allocation
technique, the associative array
technique is slower, but less sensitive to the problem of programmers saying
delete p rather than delete[] p. For example, if you make a programming
error by saying delete p where you should have said delete[] p, only the
first Fred in the array gets destructed, but the heap may survive
(unless you've replaced operator delete[] with something that doesn't
simply call operator delete, or unless the destructors for the other
Fred objects were necessary).
[ Top | Bottom | Previous section | Next section ]

[33.7] If name mangling was standardized, could I link code
compiled with compilers from different compiler vendors?
Short answer: Probably not.
In other words, some people would like to see name mangling standards
incorporated into the proposed C++ ANSI standards in an attempt to avoiding
having to purchase different versions of class libraries for different
compiler vendors. However name mangling differences are one of the smallest
differences between implementations, even on the same platform.
Here is a partial list of other differences:

Number and type of hidden arguments to member functions.

is this handled specially?
where is the return-by-value pointer passed?

Assuming a v-table is used:

what is its contents and layout?
where/how is the adjustment to this made for multiple and/or
virtual inheritance?

How are classes laid out, including:

location of base classes?
handling of virtual base classes?
location of v-pointers, if they are used at
all?

Calling convention for functions, including:

where are the actual parameters placed?
in what order are the actual parameters passed?
how are registers saved?
where does the return value go?
does caller or callee pop the stack after the call?
special rules for passing or returning structs or doubles?
special rules for saving registers when calling leaf functions?

How is the run-time-type-identification laid out?
How does the runtime exception handling system know which local
objects need to be destructed during an exception throw?

[ Top | Bottom | Previous section | Next section ]

[33.8] GNU C++ (g++) produces big executables for tiny programs;
Why?
libg++ (the library used by g++) was probably compiled with debug info
(-g). On some machines, recompiling libg++ without debugging can save
lots of disk space (approximately 1 MB; the down-side: you'll be unable to
trace into libg++ calls). Merely strip-ping the executable doesn't
reclaim as much as recompiling without -g followed by subsequent
strip-ping the resultant a.out's.
Use size a.out to see how big the program code and data segments really
are, rather than ls -s a.out which includes the symbol table.
[ Top | Bottom | Previous section | Next section ]

[33.9] Is there a yacc-able C++ grammar?
There used to be a yacc grammar that was pretty close to C++. As far
as I am aware, it has not kept up with the evolving C++ standard. For example,
the grammar doesn't handle templates, "exceptions", nor
run-time-type-identification, and it deviates from the rest of the language in
some subtle ways.
It is available at
http://srawgw.sra.co.jp/.a/pub/cmd/c++grammar2.0.tar.gz
[ Top | Bottom | Previous section | Next section ]

[33.10] What is C++ 1.2? 2.0? 2.1? 3.0?
These are not versions of the language, but rather versions of cfront, which
was the original C++ translator implemented by AT&T. It has become generally
accepted to use these version numbers as if they were versions of the language
itself.
Very roughly speaking, these are the major features:

2.0 includes multiple/virtual inheritance and
pure virtual functions
2.1 includes semi-nested classes and
delete[] pointerToArray
3.0 includes fully-nested classes, templates and i++ vs.
++i
4.0 will include exceptions

[ Top | Bottom | Previous section | Next section ]

E-mail the author
[ C++ FAQ Lite
| Table of contents
| Subject index
| About the author
| ©
| Download your own copy ]
Revised Jun 29, 1998

Wyszukiwarka