Document: Compiler Commentary
Current version: 1
Date of last update: 2003/10/09
=======================================================================
History of interface specification changes
Version 1 2003/10/09
Added a requirement to align the .compcom section on a 8-byte boundary.
Version 0 2000/12/21
Original specification.
=======================================================================
This source directory contains the files used to generate and process
compiler commentary. The file named INDEX contains a list of
the files here, and some information about which are generated, and how.
Messages can -never- be deleted from this list. The same file is used
to read compiler commentary from older versions of the compilers,
so they can not disappear.
To add additional messages to the compiler commentary list, care must
be taken to not change the meaning of any message name. To add a message:
1. Do an "sccs edit" of the file named comp_com.prototype
2. Edit that file to add the new messages. respecting the
format for all messages, and being careful about
where to put the new message (See below).
3. Check in the file, and do a putback.
The next build of the component will have the new messages defined.
Each message has a tag, CCM_XXXXXXX, and one or more visibility
indicators CCMV_XXX, with all of the indicators or'd together.
The tags must be unique, and are elements in an enum, used
in processing the messages.
Following the line with the tag and visibility indicators are one
or more comment lines, bracketed with /* */ comment indicators. The
text inside the comment lines is used as the text of the commentary.
Special fields <p1>, <s2>, etc., within that text refer to
substitutable parameters; the <p*> ones are numbers, while the
<s*> ones are strings.
The messages are organized into groups, and the first message tag
in each group has an explicit setting of the numerical value for
the message tag enum. All message tags are assigned numerical
values when the comp_com component is built, and the assignment
is made in numerical order within the group. To add a new
message, you -must- add it at the end of a group, and not in the
middle of a group, or the new tag will change the value of
the tags following it, and will break the commentary for code
compiled with older compilers.
The remainder of this file contains the original proposal
for the commentary. It should be more-or-less correct, except
that two distinct compcom sections may be independently
generated. The first is named ".compcom1" and the second
is named ".compcom" For any one line, the .compcom1 messages
are shown before the .compcom messages. In addition, the tools
will read older commentary sections, ".info", ".loops",
and ".loopview", although none should be generated any longer.
Original description follows: [Feel free to stop reading here]
-----------------------------
This file describes a proposed format for the new compiler-commentary
section to be added to .o's and propagated to the a.out. It reflects
information the compiler can expose to the user about his or her
program. The section should be generated for all compiles where
the user has specified -g (or -g0 for C++) on the compile line,
along with optimization. If no optimization is specified,
or no -g/-g0 is specified, the section will not be generated.
[RFEs 4385862 and 4385863 have been filed, asking for a new flag to have
the compiler dump out the commentary to stdout or to a listing file.]
In the analyzer, display of the messages will be governed by a user UI
that sets a vis_bits bitmap, and matches it against a show_bits
bitmap table, which is maintained separately from the producer
code. For any message, if (vis_bits&show_bits) is non-zero, the
message is shown. If zero, the message is not shown. A similar
mechanism would be used for a stand-alone source or disassembly browser.
The .compcom Section
--------------------
The section will be named ".compcom"; it is generated for each
.o, and aggregated into a single section in the a.out. In that
section, each .o's data is separate, and the tools will loop
over the data for each .o in order to find the subsection for
the particular .o being annotated.
The section is the same for 32- and 64-bit .o files, and is:
struct compcom {
// Header describing the section
struct compcomhdr {
int32_t srcname; // offset into strings of source file path
int32_t version; // specification version number for the .compcom format
// used by producers as defined in this README file
int32_t msgcount; // count of messages in the section
int32_t paramcount; // count of parameters in the section
int32_t stringcount; // count of strings in the section
int32_t stringlen; // count of total bytes in strings
}
// the array of messages
struct compmsg msgs[msgcount];
// the parameters used in the messages
// parameters are either integers or string-indices
int32_t param[paramcount];
// the strings used in the messages
char msgstrings[stringlen];
}
Since the header is fixed-length, and the total size of the section
can be easily determined as:
sizeof(stuct compcomhdr)
+ paramcount * sizeof(int32_t)
+ stringlen
+ msgcount * sizeof(struct compmsg)
there is no need to have the size in the header.
The section is always aligned on a 8-byte boundary.
It has been proposed that the section be compressed by the
producer, and then uncompressed by the consumer. If so,
there would need to be a size for both the compressed section
and the uncompressed section (so that the consumer can know
how much space to allocate), as a preface to the compressed data.
But no decision has been reached on whether or not to do the
compression. We will decide after we have a better feel
for the sizes of typical sections, and the size reduction
from compression.
The strings in the msgstrings part will normally not need I18n;
if they do, they will have already been I18n'd before putting them
into the section.
The Message Structure
---------------------
Each message is a fixed-length structure, the same for 32- and 64-bit apps,
as follows:
struct compmsg {
int64_t instaddr; // the PC offset, relative to the .o
// .text section
int32_t lineno; // the source line to which it refers
enum COMPMSG_ID msg_type; // the specific message index
int32_t nparam; // number of parameters to this message
int32_t param_index; // the index of the first parameter
// other parameters follow
}
instaddr is an instruction address, relative to the .text section
in the object file. To initialize a field with the address of an instruction,
that instruction should have a label, and a .word (32-bit) or .xword (64-bit)
datum with that label should be generated. Since the .compcom section is
not allocatable, ld.so will not attempt to relocate the address, and ld
will not complain about impure text. instaddr is used only for annotating
the disassembly output, not the source. If instaddr is 0xFF...FF,
the lineno will be used to place the message before the source line as it
is interpolated into the disassembly. If instaddr is not 0xFF...FF, the message
will be inserted into the disassembly immediately prior to the instruction
at that offset, rather than before the source line given.
lineno is the linenumber of the source file to which the message pertains.
If it is 0, the message will appear at the top of the source file.
If the lineno is negative, the message will not appear in the source,
but pcoffset should be positive so that the message is in the disassembly.
COMPMSG_ID is a global enum, using the names listed below, identifying
each specific message.
nparam is a count of the number of parameters to the message.
param_index is the index of first parameter to the message.
Each parameter may be a number, in which case it is referred to
below as <p1>, <p2>, etc.; it may also be an offset into the
string array, in which case it is referred to below as <s1>,
<s2>, etc.. All parameters are 32-bit values. If a particular
message needs a 64-bit value, or a quad-float, the value
would be broken up into two or 4 32-bit parameters, and then
concatenated and cast back to the required type in externalizing
the message. A 64-bit parameter would be expressed below as
<p1><p2>, although there are none needed yet.
The Actual Messages
-------------------
The actual messages are described in file .../src/comp_com.prototype
That file is used to automatically generate a message catalog,
a table of visibility bits, and a .h file for the CCR
component.
Message Classes and Visualization Bits
--------------------------------------
Each of the messages above may belong to zero or more visualization
classes, governed by a table using zero or more of the following symbolic
names for the classes:
CCMV_UNIMPL Unimplemented messages -- this bitmask is zero, but the
setting can be used for human-readability in the file
CCMV_WANT A message requested by a user -- also a zero bitmask
CCMV_OBS A message that should no longer be generated -- another zero
The real classes are:
CCMV_VER Versioning messages
CCMV_WARN Warning messages
CCMV_PAR Parallelization messages
CCMV_QUERY Compiler queries
CCMV_LOOP Loop detail messages
CCMV_PIPE Pipelining messages
CCMV_INLINE Inlining information
CCMV_MEMOPS Messages concerning memory operations
CCMV_FE Front-end messages (all compilers)
CCMV_CG Code-generator messages (all compilers)
CCMV_BASIC Messages that are on by default
CCMV_ALL All messages
The numerical values for the implied bitmask for visualization
will be defined in the generated include file, comp_com.h
The Consumer API
----------------
This consumer API assumes that some other component has
opened the object file containing a particular comp-com section,
and has mapped or read the section into memory, at symbolic
locations compcom.
Note that in the future this API, and the ultimate usage for
displaying the annotated source and dusassembly should be enhanced
to support hyperlinking to some sort of help, much in the
same way the current compilers support error-browsing.
/*
* preprocesses the header structure, builds a table of messages with
* the line numbers, PCoffsets, original index, and compmsg pointer
* for each message.
* If the show_bits field is not in the message, this routine would
* fill it in from the mapping from COMPMSG_ID
*
* E.g.,
* Elf_Scn *CommSec = Elf ".compcom" section;
* ino_t src_inode = Inode number of source file;
* nl_catd catd = Message catalog descriptor;
* int32_t vis_bits = visible bits;
* compcomhdr *hdr;
* compmsg *msg;
* int index;
* char *str;
*
* hdr = compcom_c_open(CommSec, srcInode, catd, vis_bits);
* if (hdr != NULL) {
* for (index = 0; index < hdr->msgcount; index++) {
* str = compcom_c_format(index, &msg);
* if (str == NULL)
* continue;
*
* /* read/use str and msg here */
* }
* }
*/
compcomhdr *
compcom_c_open(Elf_Scn *compcom, ino_t src_inode, nl_catd catd,
int32_t vis_bits);
/*
* takes the message, and returns the I18N string for the message.
*/
char *
compcom_c_format(int index, compmsg **m);
Internal Compiler APIs
----------------------
The internal compiler APIs will not be discussed here. They are needed
for the various front-ends to generate IR for messages, and for iropt
and cg to agree on how the messages will be passed through to the point
at which the producer API is invoked.
==> AI: Sid -- design the IR for front-end messages to be passed through
==> AI: Sergei and Raja -- design the cg-iropt message handling protocol
The Producer API
----------------
The producer API is used to walk the internal structures in the compiler,
and generate the section.
compcom_p_open(char *srcname, int32_t version) {
/*
Initializes the data structures, converts
the source name to a string, and fills in
srcname and version in the header
*/
int32_t
compcom_p_string(char *s) {
/*
Finds or enters the string s into the string table,
and returns the index of the string
*/
}
compcom_p_putmsg(enum compmsg_phase p,
int32_t lineno, int32_t, pcoffset, enum COMPMSG_ID m,
int32_t nparams, ... ) {
/*
Enter the single message. Any string parameters
should have been converted to int32_t's by calling
compcom_p_string()
*/
}
compcom_p_finalize() {
/*
Whatever is needed to close the section and write
it out to the .o
*/
}
The CCR Component
-----------------
[This is VERY preliminary -- just a sketch of what's needed.]
Contents:
comp_com.h -- defining the structures to be used
comp_com.msg -- the I18N catalog component for the
messages
comp_com_p.c -- source for the producer API
comp_com_c.c -- source for the consumer API
comp_com_showbits.c -- source for the table mapping
each message to its show_bits