Samuel Jacob's Weblog

Just another technical blog

Archive for the ‘gcc’ Category

Internals of GNU Code Coverage – gcov

with one comment

Few years ago I worked on a small project to extract code coverage information created by gcc from FreeBSD based kernel. During that time I didn’t find any good internal documentation about gcov. So here I post what I learned. Before jumping to the internals of GCOV here is an example from the man page.

$ gcov -b tmp.c
87.50% of 8 source lines executed in file tmp.c
80.00% of 5 branches executed in file tmp.c
80.00% of 5 branches taken  at  least  once  in  file tmp.c
50.00% of 2 calls executed in file tmp.c
Creating tmp.c.gcov.
Here is a sample of a resulting tmp.c.gcov file:

      main()
      {
    1        int i, total;
    1        total = 0;
   11        for (i = 0; i < 10; i++)
branch 0 taken = 91%
branch 1 taken = 100%
branch 2 taken = 100%
   10        total += i;
   1         if (total != 45)
branch 0 taken = 100%
  ######          printf ("Failure0);
call 0 never executed
branch 1 never executed
             else
    1             printf ("Success0);
call 0 returns = 100%
    1    }

Note – gcov has a cool graphical front-end in Linux – lcov.
As shown above gcov can show what all code path executed and how many time executed.
Want to try? Here is the quick example.

$ gcc -fprofile-arcs -ftest-coverage your_program.c
$ ./a.out
$ gcov your_program.c

During compilation with -ftest-coverage option gcc generates a “.gcno” file. It contains information about each branches in your code. While finishing execution, ./a.out creates .gcda file(s) which actually contains which all branches taken(basic block entry/exit). Using these there files .c(source), .gcno(block info) and .gcda(block execution count) gcov command prints the code coverage information in a human readable format.

You might wonder how your ./a.out would create .gcda while exiting the program. It is because of “-fprofile-arcs” automatically includes libgcov. Libgcov registers itself to be invoked during program exit by using atexit(). (Yes – it wont generate .gcda files if you exit abnormally). And during program exit it just dumps all the branch information to one or more gcda file.

The coverage information is “just” dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at run-time on the fly. It is accomplished by having a counter for each branch. For example consider the following program.

int if_counter = 0, else_counter = 0;

void dump_counters()
{
	int fd;
	
	fd = open(strcat(filename, ".gcda"), "w");
	write(fd, if_counter, sizeof(if_counter));
	write(fd, else_counter, sizeof(else_counter));
}

int main(int argc, char *argv[])
{
	atexit(dump_counters);
	
	if(argc > 1) {
		if_counter++;
		printf("Arguments provided\n");
	} else {
		else_counter++;
		printf("No arguments\n");
	}
}

If you replace the above example with gcov then green colored code is provided by libgcov(during link/load) and the blue colored coded inserted into your executable by gcc(during compilation).

It is easy to speculate how the increment operation would be be implanted inside your code by gcc. gcc just inserts “inc x-counter machine instruction before and after every branch. It should be noted that “inc” is instruction might have side effect on some programs which uses asm inside C. For example in x86 the “inc” instruction affects carry flag. Some assembly code might depends on this and if gcc inserts “inc counter” instruction then it will result in error. I had hard time figuring this out when compiled with -fprofile-arcs a kernel was booting but not able to receive any network packets(it was discarding all packets because the network stack found the checksum was wrong).

Here is a simple C program’s disassembly:

int main()
{
  4004b4:       55                      push   %rbp
  4004b5:       48 89 e5                mov    %rsp,%rbp
    int a = 1;
  4004b8:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)

    if (a) {
  4004bf:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  4004c3:       74 06                   je     4004cb <main+0x17>
        a++;
  4004c5:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004c9:       eb 04                   jmp    4004cf <main+0x1b>
    } else {
        a--;
  4004cb:       83 6d fc 01             subl   $0x1,-0x4(%rbp)
    }

    return a;
  4004cf:       8b 45 fc                mov    -0x4(%rbp),%eax
}
int main()
{
    int a = 1;

    if (a) {
        a++;
    } else {
        a--;
    }

    return a;
}
gcc -g3 test.c
objdump -S -d ./a.out

When the same program compiled with profile-arcs, the disassembly looks like

int main()
{
  400c34:       55                      push   %rbp
  400c35:       48 89 e5                mov    %rsp,%rbp
  400c38:       48 83 ec 10             sub    $0x10,%rsp
    int a = 1;
  400c3c:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)

    if (a) {
  400c43:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  400c47:       74 18                   je     400c61 
        a++;
  400c49:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  400c4d:       48 8b 05 3c 25 20 00    mov    0x20253c(%rip),%rax        # 603190 
  400c54:       48 83 c0 01             add    $0x1,%rax
  400c58:       48 89 05 31 25 20 00    mov    %rax,0x202531(%rip)        # 603190 
  400c5f:       eb 16                   jmp    400c77 
    } else {
        a--;
  400c61:       83 6d fc 01             subl   $0x1,-0x4(%rbp)
  400c65:       48 8b 05 2c 25 20 00    mov    0x20252c(%rip),%rax        # 603198 
  400c6c:       48 83 c0 01             add    $0x1,%rax
  400c70:       48 89 05 21 25 20 00    mov    %rax,0x202521(%rip)        # 603198 
    }

    return a;
  400c77:       8b 45 fc                mov    -0x4(%rbp),%eax
}
  400c7a:       c9                      leaveq
  400c7b:       c3                      retq

From the above disassembly it might seem putting inc instruction while compiling is easy. But how/where storage for the counters(dtor_idx.6460 and dtor_idx.6460 in above example) are created. GCC uses statically allocated memory. Dynamically allocating space is one way but it would complicate the code(memory allocation operations during init) and might slow down execution of program(defer pointer). To avoid that gcc allocates storage as a loadable section.

The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.
gcov
There is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.

Generic C constructor:
gcc generates constructors for all program. C constructors are accomplished by using “.ctors” section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed “.init” section so it will be called during program initialization. A function can be declared as constructor by using function attribute.

“-ftest-coverage” creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.

samuel@ubuntu:~$objdump  -t ./a.out  | grep -i _GLOBAL__
0000000000400c7c l     F .text  0000000000000010              _GLOBAL__sub_I_65535_0_main

And disassembly of _GLOBAL__sub_I_65535_0_main

 954 0000000000400c7c <_global__sub_i_65535_0_main>:
 955   400c7c:       55                      push   %rbp
 956   400c7d:       48 89 e5                mov    %rsp,%rbp
 957   400c80:       bf 00 31 60 00          mov    $0x603100,%edi
 958   400c85:       e8 a6 12 00 00          callq  401f30 <__gcov_init>
 959   400c8a:       5d                      pop    %rbp
 960   400c8b:       c3                      retq
 961   400c8c:       90                      nop
 962   400c8d:       90                      nop
 963   400c8e:       90                      nop
 964   400c8f:       90                      nop

gcov_init() implemented in libgcov stores all the gcov_info() passed in a linked list. This linked list is used to walk through all the gcov_info during program termination.

Written by samueldotj

March 31st, 2012 at 4:21 pm

Posted in C,gcc,Tools

USB image creation using dd command

without comments

Wrote this post while waiting for a “dd” command to finish USB image creation. So this post is mainly for dd tips.

How to find where my usb drive is attached?

Attach USB and then issue the following command and then look for ‘/dev/sd??’
dmesg | less

How to create a USB drive clone?

sudo dd if=/dev/xxx of=clone.img
The output image will be in clone.img.
If you want to restore to the original disk or another USB drive then use
sudo dd of=/dev/xxx if=clone.img

How long will it take for a X GB drive?

Depends on your usb drive transfer rate. But also depends on the buffer size used by dd, I use
sudo dd of=/dev/xxx if=clone.img bs=10M

iostat will give you the transfer rate of your devices

Dont know whether dd is dead or not?

If you send USR1 signal to dd it will print the status.
sudo kill -USR1 `pidof dd`
If you want auto update every 10 sec, try
watch -n 10 sudo kill -USR1 `pidof dd`

Written by samueldotj

February 9th, 2011 at 3:22 am

Posted in C,gcc,Tools

Size of all data structures in a C program

without comments

There are some cases when you want to find size of all data structures in your program/project. For a given program it is easy because we can manually calculate or put a printf sizeof() to calculate a few structures. But for a project with few hundred files it is difficult. There is noway AFAIK in gcc to dump all the structure sizes while it compiles.

Here is a way todo it using DWARF debugging info if you are using gnu compiler tools. Compile your project output file with -g option, this will generate debugging information in your program. Then run objdump to dump the debug info. This debugging information contains all the information about your program, filter the output for “DW_TAG_structure_type” and you will get only structure information.

Here is how I did it:

objdump -W kernel.debug.reloc > debug.info
grep -e "DW_TAG_structure_type" debug.info --after-context=5 --mmap > structs.txt

Written by samueldotj

November 19th, 2009 at 3:58 am

Posted in C,Debugger,gcc

Call Trace without modifying the source

without comments

While investigating about gcc flag “-fprofile-arcs”, I came to know about a new(to me) gcc flag and this blog entry is about it. For any large C project it is hard to learn/find call graph through code walk. From C prospective unless otherwise you put a printf in each function entry/exit it is hard to find the call trace.

GCC and ICC has a wonderful option “-finstrument-functions” to solve this. This option instructs the compiler to emit instructions to call a external function on each function’s entry/exit. Defining these two functions like the following and adding the above option -finstrument-function to your makefile will do the magic.

void noinstrument __cyg_profile_func_enter(void *this_fn, void *call_site)
{
  printf("%p called from %p\n", this_fn, call_site);
}

void noinstrument __cyg_profile_func_exit(void *this_fn, void *call_site)
{
  printf("%p returns\n", this_fn);
}

Of course, you can do anything in these functions, for simplicity sake I just defined them as printfs.

Written by samueldotj

October 3rd, 2009 at 1:04 pm

GCC debugging switches

without comments

While compiling a FreeBSD kernel I encountered the following error message from GNU assembler

Error: suffix or operands invalid for 'mov'

I was not sure whether the error is because of my changes are not. But still I have to debug the problem and solve it, so I decided to find what is wrong with the mov instruction. But since gcc created a temporary file and invoked as to assemble it, I couldn’t find the temp file anymore.

Googled and find the following gcc options to find what gcc does.

gcc –v is verbose mode, which prints all the action/scripts gcc is doing.
gcc -### similar to –v but wont execute any commands.
gcc –save-temps saves all the temporary files generated by the gcc. Useful if you want to see the assembly output passed to gnu assembler.
gcc –E Just preprocess the c file.
gcc –S generate assembly file.

With gcc –v –save-temps, I got the temporary assembler file used by assembler. Since reading machine generated assembly file is difficult, I tried to find the C source code corresponding to the error. From nearest .file assembler directive I found the C file name and from the .loc directive I found the line number. And the following was the inline assembly in the C file.
__asm __volatile("<b>movl </b>%%ss,%0" : "=rm" (sel));

Was surprised what is wrong with this and why my assembler giving error. Googled again and found this link – http://kerneltrap.org/node/5785. It seems newer gnu assembler(>=2.16) wont support long read/write operations(mov) on segment registers(ds,ss…). So changed the above source to movw %%ss, %o and it compiled perfectly. 🙂

Written by samueldotj

August 28th, 2009 at 2:23 am

Unreferenced symbols – EXPORT_SYMBOL

without comments

I wrote ACPI driver for Ace and try to load it into kernel. It was failing with the error message symbol not found.

The log file content says AcpiGetName was not found.

$ cat log.txt
Driver for id acpi : acpi.sys
../src/kernel/pm/elf.c:343:FindElfSymbolByName(): Symbol not found AcpiGetName(c01a2148:1869) string table c01aa619
[shell]

After exploring the Acpi-CA source code and Ace kernel code, I couldn’t figure out why the symbol is missing. Then I tried to dump the symbol table for Acpi library and kernel.sys.
[shell]
$ nm build/default/src/kernel/libacpi.a | grep GetName
U AcpiExGetNameString
000001e3 T AcpiExGetNameString
000001b7 T AcpiPsGetName
000000f4 T AcpiGetName

$ nm build/default/src/kernel/kernel.sys | grep GetName
c0135163 T AcpiExGetNameString
c0137bfb T AcpiPsGetName

It is obvious during kernel link process some how the linker is losing AcpiGetName. Within few seconds I realized that the linker is doing optimization on the output to remove unreferenced sections. So I googled for how to avoid this and found --no-gc-sections is the ld option to do it. However even after giving this option to linker to was removing the unreferenced symbols; I kept on trying to find a solution for making the linker to include unreferenced symbols in the output… I didn’t find the solution.
Then I started thinking how linux kernel is avoiding this problem and I it flashed seeing EXPORT_SYMBOL kind of macro in some source. So I googled for it and got the answer – use the symbol in storage, so that it will be referenced, so that linker will include it in the output.

struct kernel_symbol
{
   unsigned long symbol;
   char * name;
};

#ifndef MODULE_SYMBOL_PREFIX
#define MODULE_SYMBOL_PREFIX “”
#endif

#define __EXPORT_SYMBOL(sym, sec) \
static const char __kstrtab_##sym[] \
__attribute__((section("__ksymtab_strings"))) \
= MODULE_SYMBOL_PREFIX#sym; \
static const struct kernel_symbol __ksymtab_##sym \
__attribute__((section("__ksymtab" sec), unused)) \
= { (unsigned long)&sym, __kstrtab_##sym }

#define EXPORT_SYMBOL(sym) \
__EXPORT_SYMBOL(sym, "")

Written by samueldotj

April 9th, 2009 at 4:22 am

Posted in Ace,C,gcc,Programming,Tools

Porting to “Intel C Compiler”

without comments

In this post, I will explain how to easy it is to compile/port C programs and makefiles designed for GCC(GNU compiler collection) can be used with ICC(Intel C Compiler)

The necessity for using ICC happened to me because gcc was generating too big object files. Ace is using ACPI-CA(http://acpica.org/) as it is ACPI driver. After compilation the ACPI library is around 7M in debug version and around 500K in non-debug version. But from http://acpica.org/download/changes.txt it seems Microsoft C compiler produce smaller code.

  Non-Debug Version:  81.2K Code, 17.0K Data,  98.2K Total
  Debug Version:     155.8K Code, 49.1K Data, 204.9K Total

So I decided to try “Intel Compiler” as replacement for GCC. Things were easy, since icc has options which is similar to gcc.
You can get the details of portability options from icc documentation – http://www.intel.com/software/products/compilers/docs/clin/main_cls/index.htm.

I just changed my make.conf file to check the compiler in use and use appropriate options. I removed -fno-leading-underscore and –Wall. I removed –Wall because it was giving lot of warnings, we have to fix our code sometime soon. I removed -fno-leading-underscore because there is no equivalent option in icc, however icc does not adds underscores to symbols anyway.

I added –O1 option because the kernel was panicked during boot with invalid opcode exception, I believe it was because of some MMX register usage by icc. So I settled with –O1 now.

ifeq ($(CC),gcc)
    DEBUG_FLAGS= -gdwarf-2 -g3
    CFLAGS+= -Wall -Wno-multichar $(DEFINES) -nostartfiles -ffreestanding -funsigned-char -fno-leading-underscore -c -fno-stack-protector $(DEBUG_FLAGS) $(CUSTOM_FLAG)
else
    DEBUG_FLAGS= -gdwarf-2 -g
    CFLAGS+= -O1 -Wno-multichar $(DEFINES) -nostartfiles -ffreestanding -funsigned-char -c -fno-stack-protector $(DEBUG_FLAGS) $(CUSTOM_FLAG)
endif

I also made some static variables in kernel/i386/gdb_stub.c to non-static because icc appends .0 to static variables but it forgets the same name when referenced in a inline assembly code in the same file. After removing the static attribute it is working fine.

Compilation time:
gcc compiler took the following time to compile Ace source code.

real    0m51.125s
user    0m36.975s
sys     0m11.185s

icc compiler took the following time to compile Ace source code.

real    1m44.366s
user    0m59.107s
sys     0m20.946s

Size:
Gcc produced double the size of kernel.sys produced by icc. (I used GNU linker(ld) and ar).
gcc output size:

$ ls -lh /home/samuel/Projects/Ace3/obj/
-rw-rw-r– 1 samuel samuel 372K 2008-11-22 18:17 acpi.a
-rw-rw-r– 1 samuel samuel 545K 2008-11-22 18:17 arch.a
-rwxrwxr-x 1 samuel samuel 1.3M 2008-11-22 18:17 kernel.sys
-rw-rw-r– 1 samuel samuel  62K 2008-11-22 18:17 libds.a
-rw-rw-r– 1 samuel samuel  40K 2008-11-22 18:17 libheap.a
-rw-rw-r– 1 samuel samuel  39K 2008-11-22 18:17 libstring.a
-rw-rw-r– 1 samuel samuel 9.5K 2008-11-22 18:17 libsync.a
-rw-rw-r– 1 samuel samuel  91K 2008-11-22 18:17 pic.a
-rw-rw-r– 1 samuel samuel  23K 2008-11-22 18:17 pit.a
-rw-rw-r– 1 samuel samuel  21K 2008-11-22 18:17 rtc.a

icc output:
$ ls -lh /home/samuel/Projects/Ace3/obj/
-rw-rw-r– 1 samuel samuel 352K 2008-11-22 16:25 acpi.a
-rw-rw-r– 1 samuel samuel 230K 2008-11-22 16:24 arch.a
-rwxrwxr-x 1 samuel samuel 528K 2008-11-22 16:25 kernel.sys
-rw-rw-r– 1 samuel samuel  38K 2008-11-22 16:23 libds.a
-rw-rw-r– 1 samuel samuel  22K 2008-11-22 16:23 libheap.a
-rw-rw-r– 1 samuel samuel  21K 2008-11-22 16:23 libstring.a
-rw-rw-r– 1 samuel samuel  11K 2008-11-22 16:23 libsync.a
-rw-rw-r– 1 samuel samuel  19K 2008-11-22 16:24 pic.a
-rw-rw-r– 1 samuel samuel 8.4K 2008-11-22 16:24 pit.a
-rw-rw-r– 1 samuel samuel  13K 2008-11-22 16:24 rtc.a

Conclusion: So it is easy to switch to icc instead of gcc.

Written by samueldotj

November 22nd, 2008 at 5:36 am

Posted in Ace,C,Compiler,gcc