{"id":6,"date":"2012-03-31T16:21:00","date_gmt":"2012-03-31T23:21:00","guid":{"rendered":"http:\/\/samueldotj.com\/blog\/?p=6"},"modified":"2013-09-08T16:18:45","modified_gmt":"2013-09-08T23:18:45","slug":"gcov-internals-overview","status":"publish","type":"post","link":"http:\/\/samueldotj.com\/blog\/gcov-internals-overview\/","title":{"rendered":"Internals of GNU Code Coverage &#8211; gcov"},"content":{"rendered":"<p>Few years ago I worked on a small project to extract <a href=\"http:\/\/en.wikipedia.org\/wiki\/Code_coverage\">code coverage<\/a> information created by gcc from FreeBSD based <a href=\"http:\/\/www.freebsd.org\/\">kernel<\/a>.  During that time I didn&#8217;t find any good internal documentation about <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Gcov.html\">gcov<\/a>. So here I post what I learned. Before jumping to the internals of GCOV here is an example from the <a href=\"http:\/\/nixdoc.net\/man-pages\/openbsd\/man1\/gcov.1.html\">man <\/a>page.<\/p>\n<pre class=\"code\">\r\n$ gcov -b tmp.c\r\n<span style='color:green'>87.50% of 8 source lines executed in file tmp.c\r\n80.00% of 5 branches executed in file tmp.c\r\n80.00% of 5 branches taken  at  least  once  in  file tmp.c\r\n50.00% of 2 calls executed in file tmp.c<\/span>\r\nCreating tmp.c.gcov.\r\nHere is a sample of a resulting tmp.c.gcov file:\r\n\r\n      main()\r\n      {\r\n<span style='color:blue'>    1<\/span>        int i, total;\r\n<span style='color:blue'>    1<\/span>        total = 0;\r\n<span style='color:blue'>   11<\/span>        for (i = 0; i < 10; i++)\r\n<span style='color:green'>branch 0 taken = 91%\r\nbranch 1 taken = 100%\r\nbranch 2 taken = 100%<\/span>\r\n<span style='color:blue'>   10<\/span>        total += i;\r\n<span style='color:blue'>   1<\/span>         if (total != 45)\r\n<span style='color:green'>branch 0 taken = 100%<\/span>\r\n<span style='color:red'>  ######<\/span>          printf (\"Failure0);\r\n<span style='color:red'>call 0 never executed\r\nbranch 1 never executed<\/span>\r\n             else\r\n<span style='color:blue'>    1<\/span>             printf (\"Success0);\r\n<span style='color:orange'>call 0 returns = 100%<\/span>\r\n<span style='color:blue'>    1<\/span>    }\r\n<\/pre>\n<p>Note &#8211; gcov has a cool graphical front-end in Linux &#8211; <a href=\"http:\/\/ltp.sourceforge.net\/coverage\/lcov\/output\/example\/example.c.gcov.frameset.html\">lcov<\/a>.<br \/>\nAs shown above gcov can show what all code path executed and how many time executed.<br \/>\nWant to try? Here is the quick example.<br \/>\n[shell]<br \/>\n$ gcc -fprofile-arcs -ftest-coverage your_program.c<br \/>\n$ .\/a.out<br \/>\n$ gcov your_program.c<br \/>\n[\/shell]<\/p>\n<p>During compilation with <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Debugging-Options.html\">-ftest-coverage<\/a> option gcc generates a <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc-4.1.2\/gcc\/Gcov-Data-Files.html\">&#8220;.gcno&#8221;<\/a> file. It contains information about each branches in your code. While finishing execution, .\/a.out creates <b>.gcda<\/b> file(s) which actually contains which all branches taken(<a href='http:\/\/en.wikipedia.org\/wiki\/Basic_block'>basic block<\/a> entry\/exit). Using these there files .c(source), .gcno(block info) and .gcda(block execution count) gcov command prints the code coverage information in a human readable format.<\/p>\n<p>You might wonder how your .\/a.out would create .gcda while exiting the program. It is because of &#8220;-fprofile-arcs&#8221; automatically includes <a href=\"http:\/\/gcc.gnu.org\/viewcvs\/trunk\/libgcc\/libgcov.c?view=markup\">libgcov<\/a>. Libgcov registers itself to be invoked during program exit by using <a href=\"http:\/\/pubs.opengroup.org\/onlinepubs\/009604599\/functions\/atexit.html\">atexit()<\/a>. (Yes &#8211; it wont generate .gcda files if you exit abnormally). And during program <a href=\"http:\/\/pubs.opengroup.org\/onlinepubs\/000095399\/functions\/exit.html\">exit<\/a> it just dumps all the branch information to one or more gcda file.<\/p>\n<p>The coverage information is &#8220;just&#8221; dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at <strong>run-time<\/strong> on the fly. It is accomplished by having a counter for each branch. For example consider the following program.<\/p>\n<pre class=\"code\">\r\n<span style='color:green'>int if_counter = 0, else_counter = 0;<\/span>\r\n<span style='color:blue'>\r\nvoid dump_counters()\r\n{\r\n\tint fd;\r\n\t\r\n\tfd = open(strcat(filename, \".gcda\"), \"w\");\r\n\twrite(fd, if_counter, sizeof(if_counter));\r\n\twrite(fd, else_counter, sizeof(else_counter));\r\n}\r\n<\/span>\r\nint main(int argc, char *argv[])\r\n{\r\n\t<span style='color:green'>atexit(dump_counters);<\/span>\r\n\t\r\n\tif(argc > 1) {\r\n\t\t<span style='color:blue'>if_counter++;<\/span>\r\n\t\tprintf(\"Arguments provided\\n\");\r\n\t} else {\r\n\t\t<span style='color:blue'>else_counter++;<\/span>\r\n\t\tprintf(\"No arguments\\n\");\r\n\t}\r\n}\r\n<\/pre>\n<p>If you replace the above example with gcov then green colored code is provided by libgcov(during link\/load) and the blue colored coded inserted into your executable by gcc(during compilation).<\/p>\n<p>It is easy to speculate how the increment operation would be be implanted inside your code by gcc. gcc just inserts <b>&#8220;inc <i>x-counter<\/i>&#8220;<\/b> machine instruction before and after every branch. It should be noted that &#8220;inc&#8221; is instruction might have side effect on some programs which uses asm inside C. For example in x86 the &#8220;<a href=\"http:\/\/maven.smith.edu\/~thiebaut\/ArtOfAssembly\/CH06\/CH06-2.html#HEADING2-117\">inc<\/a>&#8221; instruction affects <a href=\"http:\/\/en.wikibooks.org\/wiki\/X86_Assembly\/X86_Architecture#EFLAGS_Register\">carry flag<\/a>. Some assembly code might depends on this and if gcc inserts &#8220;inc counter&#8221; instruction then it will result in error. I had hard time figuring this out when compiled with -fprofile-arcs a kernel was booting but not able to receive any network packets(it was discarding all packets because the network stack found the <a href=\"http:\/\/www.netfor2.com\/checksum.html\">checksum<\/a> was wrong).<\/p>\n<p>Here is a simple C program&#8217;s disassembly:<br \/>\n[codegroup]<br \/>\n[c tab=&#8217;disassembly&#8217;]<br \/>\nint main()<br \/>\n{<br \/>\n  4004b4:       55                      push   %rbp<br \/>\n  4004b5:       48 89 e5                mov    %rsp,%rbp<br \/>\n    int a = 1;<br \/>\n  4004b8:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)<\/p>\n<p>    if (a) {<br \/>\n  4004bf:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)<br \/>\n  4004c3:       74 06                   je     4004cb <main+0x17><br \/>\n        a++;<br \/>\n  4004c5:       83 45 fc 01             addl   $0x1,-0x4(%rbp)<br \/>\n  4004c9:       eb 04                   jmp    4004cf <main+0x1b><br \/>\n    } else {<br \/>\n        a&#8211;;<br \/>\n  4004cb:       83 6d fc 01             subl   $0x1,-0x4(%rbp)<br \/>\n    }<\/p>\n<p>    return a;<br \/>\n  4004cf:       8b 45 fc                mov    -0x4(%rbp),%eax<br \/>\n}<br \/>\n[\/c]<\/p>\n<p>[c tab=&#8217;source&#8217;]<br \/>\nint main()<br \/>\n{<br \/>\n    int a = 1;<\/p>\n<p>    if (a) {<br \/>\n        a++;<br \/>\n    } else {<br \/>\n        a&#8211;;<br \/>\n    }<\/p>\n<p>    return a;<br \/>\n}<br \/>\n[\/c]<\/p>\n<p>[shell tab=&#8217;command&#8217;]<br \/>\ngcc -g3 test.c<br \/>\nobjdump -S -d .\/a.out<br \/>\n[\/shell]<\/p>\n<p>[\/codegroup]<\/p>\n<p>When the <strong>same program compiled with profile-arcs<\/strong>, the disassembly looks like<\/p>\n<pre class=\"code\">\r\nint main()\r\n{\r\n  400c34:       55                      push   %rbp\r\n  400c35:       48 89 e5                mov    %rsp,%rbp\r\n  400c38:       48 83 ec 10             sub    $0x10,%rsp\r\n    int a = 1;\r\n  400c3c:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)\r\n\r\n    if (a) {\r\n  400c43:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)\r\n  400c47:       74 18                   je     400c61 <main+0x2d>\r\n        a++;\r\n  400c49:       83 45 fc 01             addl   $0x1,-0x4(%rbp)\r\n<span style='color:red'>  400c4d:       48 8b 05 3c 25 20 00    mov    0x20253c(%rip),%rax        # 603190 <dtor_idx.6460+0x8>\r\n  400c54:       48 83 c0 01             add    $0x1,%rax\r\n  400c58:       48 89 05 31 25 20 00    mov    %rax,0x202531(%rip)        # 603190 <dtor_idx.6460+0x8><\/span>\r\n  400c5f:       eb 16                   jmp    400c77 <main+0x43>\r\n    } else {\r\n        a--;\r\n  400c61:       83 6d fc 01             subl   $0x1,-0x4(%rbp)\r\n<span style='color:red'>  400c65:       48 8b 05 2c 25 20 00    mov    0x20252c(%rip),%rax        # 603198 <dtor_idx.6460+0x10>\r\n  400c6c:       48 83 c0 01             add    $0x1,%rax\r\n  400c70:       48 89 05 21 25 20 00    mov    %rax,0x202521(%rip)        # 603198 <dtor_idx.6460+0x10><\/span>\r\n    }\r\n\r\n    return a;\r\n  400c77:       8b 45 fc                mov    -0x4(%rbp),%eax\r\n}\r\n  400c7a:       c9                      leaveq\r\n  400c7b:       c3                      retq\r\n<\/pre>\n<p>From the above disassembly it might seem putting inc instruction while compiling is easy. But how\/where storage for the counters(dtor_idx.6460 and dtor_idx.6460 in above example) are created. GCC uses statically allocated memory. Dynamically allocating space is one way but it would complicate the code(memory allocation operations during init) and might slow down execution of program(defer pointer). To avoid that gcc allocates storage as a loadable section.<\/p>\n<p>The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.<br \/>\n<a href=\"http:\/\/samueldotj.com\/blog\/wp-content\/uploads\/2012\/03\/gcov-1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/samueldotj.com\/blog\/wp-content\/uploads\/2012\/03\/gcov-1.png\" alt=\"gcov\" width=\"577\" height=\"423\" class=\"aligncenter size-full wp-image-53\" srcset=\"http:\/\/samueldotj.com\/blog\/wp-content\/uploads\/2012\/03\/gcov-1.png 577w, http:\/\/samueldotj.com\/blog\/wp-content\/uploads\/2012\/03\/gcov-1-300x219.png 300w\" sizes=\"auto, (max-width: 577px) 100vw, 577px\" \/><\/a><br \/>\nThere is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.<\/p>\n<p><strong>Generic C constructor<\/strong>:<br \/>\ngcc generates constructors for all program. C constructors are accomplished by using &#8220;.ctors&#8221; section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed &#8220;.init&#8221; section so it will be called during program initialization. A function can be declared as constructor by using function <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc-4.1.1\/gcc\/Function-Attributes.html\">attribute<\/a>.<\/p>\n<p>&#8220;-ftest-coverage&#8221; creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.<\/p>\n<pre class=\"code\">\r\nsamuel@ubuntu:~$objdump  -t .\/a.out  | grep -i _GLOBAL__\r\n0000000000400c7c l     F .text  0000000000000010              _GLOBAL__sub_I_65535_0_main\r\n<\/pre>\n<p>And disassembly of _GLOBAL__sub_I_65535_0_main<\/p>\n<pre class=\"code\">\r\n 954 0000000000400c7c <_global__sub_i_65535_0_main>:\r\n 955   400c7c:       55                      push   %rbp\r\n 956   400c7d:       48 89 e5                mov    %rsp,%rbp\r\n 957   400c80:       bf 00 31 60 00          mov    $0x603100,%edi\r\n 958   400c85:       e8 a6 12 00 00          callq  401f30 <__gcov_init>\r\n 959   400c8a:       5d                      pop    %rbp\r\n 960   400c8b:       c3                      retq\r\n 961   400c8c:       90                      nop\r\n 962   400c8d:       90                      nop\r\n 963   400c8e:       90                      nop\r\n 964   400c8f:       90                      nop\r\n<\/__gcov_init><\/_global__sub_i_65535_0_main><\/pre>\n<p>gcov_init() implemented in libgcov stores all the gcov_info() passed in a linked list. This linked list is used to walk through all the gcov_info during program termination.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Few years ago I worked on a small project to extract code coverage information created by gcc from FreeBSD based kernel. During that time I didn&#8217;t find any good internal documentation about gcov. So here I post what I learned. Before jumping to the internals of GCOV here is an example from the man page. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,9,4],"tags":[],"class_list":["post-6","post","type-post","status-publish","format-standard","hentry","category-c","category-gcc","category-tools"],"_links":{"self":[{"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/posts\/6","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/comments?post=6"}],"version-history":[{"count":5,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/posts\/6\/revisions"}],"predecessor-version":[{"id":225,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/posts\/6\/revisions\/225"}],"wp:attachment":[{"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/media?parent=6"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/categories?post=6"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/samueldotj.com\/blog\/wp-json\/wp\/v2\/tags?post=6"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}