Archive for the ‘Tools’ Category
Internals of GNU Code Coverage – gcov
Few years ago I worked on a small project to extract code coverage information created by gcc from FreeBSD based kernel. During that time I didn’t find any good internal documentation about gcov. So here I post what I learned. Before jumping to the internals of GCOV here is an example from the man page.
$ gcov -b tmp.c 87.50% of 8 source lines executed in file tmp.c 80.00% of 5 branches executed in file tmp.c 80.00% of 5 branches taken at least once in file tmp.c 50.00% of 2 calls executed in file tmp.c Creating tmp.c.gcov. Here is a sample of a resulting tmp.c.gcov file: main() { 1 int i, total; 1 total = 0; 11 for (i = 0; i < 10; i++) branch 0 taken = 91% branch 1 taken = 100% branch 2 taken = 100% 10 total += i; 1 if (total != 45) branch 0 taken = 100% ###### printf ("Failure0); call 0 never executed branch 1 never executed else 1 printf ("Success0); call 0 returns = 100% 1 }
Note – gcov has a cool graphical front-end in Linux – lcov.
As shown above gcov can show what all code path executed and how many time executed.
Want to try? Here is the quick example.
[shell]
$ gcc -fprofile-arcs -ftest-coverage your_program.c
$ ./a.out
$ gcov your_program.c
[/shell]
During compilation with -ftest-coverage option gcc generates a “.gcno” file. It contains information about each branches in your code. While finishing execution, ./a.out creates .gcda file(s) which actually contains which all branches taken(basic block entry/exit). Using these there files .c(source), .gcno(block info) and .gcda(block execution count) gcov command prints the code coverage information in a human readable format.
You might wonder how your ./a.out would create .gcda while exiting the program. It is because of “-fprofile-arcs” automatically includes libgcov. Libgcov registers itself to be invoked during program exit by using atexit(). (Yes – it wont generate .gcda files if you exit abnormally). And during program exit it just dumps all the branch information to one or more gcda file.
The coverage information is “just” dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at run-time on the fly. It is accomplished by having a counter for each branch. For example consider the following program.
int if_counter = 0, else_counter = 0; void dump_counters() { int fd; fd = open(strcat(filename, ".gcda"), "w"); write(fd, if_counter, sizeof(if_counter)); write(fd, else_counter, sizeof(else_counter)); } int main(int argc, char *argv[]) { atexit(dump_counters); if(argc > 1) { if_counter++; printf("Arguments provided\n"); } else { else_counter++; printf("No arguments\n"); } }
If you replace the above example with gcov then green colored code is provided by libgcov(during link/load) and the blue colored coded inserted into your executable by gcc(during compilation).
It is easy to speculate how the increment operation would be be implanted inside your code by gcc. gcc just inserts “inc x-counter“ machine instruction before and after every branch. It should be noted that “inc” is instruction might have side effect on some programs which uses asm inside C. For example in x86 the “inc” instruction affects carry flag. Some assembly code might depends on this and if gcc inserts “inc counter” instruction then it will result in error. I had hard time figuring this out when compiled with -fprofile-arcs a kernel was booting but not able to receive any network packets(it was discarding all packets because the network stack found the checksum was wrong).
Here is a simple C program’s disassembly:
[codegroup]
[c tab=’disassembly’]
int main()
{
4004b4: 55 push %rbp
4004b5: 48 89 e5 mov %rsp,%rbp
int a = 1;
4004b8: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
if (a) {
4004bf: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
4004c3: 74 06 je 4004cb
a++;
4004c5: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4004c9: eb 04 jmp 4004cf
} else {
a–;
4004cb: 83 6d fc 01 subl $0x1,-0x4(%rbp)
}
return a;
4004cf: 8b 45 fc mov -0x4(%rbp),%eax
}
[/c]
[c tab=’source’]
int main()
{
int a = 1;
if (a) {
a++;
} else {
a–;
}
return a;
}
[/c]
[shell tab=’command’]
gcc -g3 test.c
objdump -S -d ./a.out
[/shell]
[/codegroup]
When the same program compiled with profile-arcs, the disassembly looks like
int main() { 400c34: 55 push %rbp 400c35: 48 89 e5 mov %rsp,%rbp 400c38: 48 83 ec 10 sub $0x10,%rsp int a = 1; 400c3c: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp) if (a) { 400c43: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 400c47: 74 18 je 400c61a++; 400c49: 83 45 fc 01 addl $0x1,-0x4(%rbp) 400c4d: 48 8b 05 3c 25 20 00 mov 0x20253c(%rip),%rax # 603190 400c54: 48 83 c0 01 add $0x1,%rax 400c58: 48 89 05 31 25 20 00 mov %rax,0x202531(%rip) # 603190 400c5f: eb 16 jmp 400c77} else { a--; 400c61: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400c65: 48 8b 05 2c 25 20 00 mov 0x20252c(%rip),%rax # 603198 400c6c: 48 83 c0 01 add $0x1,%rax 400c70: 48 89 05 21 25 20 00 mov %rax,0x202521(%rip) # 603198 } return a; 400c77: 8b 45 fc mov -0x4(%rbp),%eax } 400c7a: c9 leaveq 400c7b: c3 retq
From the above disassembly it might seem putting inc instruction while compiling is easy. But how/where storage for the counters(dtor_idx.6460 and dtor_idx.6460 in above example) are created. GCC uses statically allocated memory. Dynamically allocating space is one way but it would complicate the code(memory allocation operations during init) and might slow down execution of program(defer pointer). To avoid that gcc allocates storage as a loadable section.
The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.
There is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.
Generic C constructor:
gcc generates constructors for all program. C constructors are accomplished by using “.ctors” section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed “.init” section so it will be called during program initialization. A function can be declared as constructor by using function attribute.
“-ftest-coverage” creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.
samuel@ubuntu:~$objdump -t ./a.out | grep -i _GLOBAL__ 0000000000400c7c l F .text 0000000000000010 _GLOBAL__sub_I_65535_0_main
And disassembly of _GLOBAL__sub_I_65535_0_main
954 0000000000400c7c <_global__sub_i_65535_0_main>: 955 400c7c: 55 push %rbp 956 400c7d: 48 89 e5 mov %rsp,%rbp 957 400c80: bf 00 31 60 00 mov $0x603100,%edi 958 400c85: e8 a6 12 00 00 callq 401f30 <__gcov_init> 959 400c8a: 5d pop %rbp 960 400c8b: c3 retq 961 400c8c: 90 nop 962 400c8d: 90 nop 963 400c8e: 90 nop 964 400c8f: 90 nop
gcov_init() implemented in libgcov stores all the gcov_info() passed in a linked list. This linked list is used to walk through all the gcov_info during program termination.
Plot your data using gnuplot
System statistics are hard interpolate since usually they are collected in large quantities and sometimes represents large numbers. Recently I was doing a prototype and wanted to measure how much damage it would to the main project (in terms of performance); so used performance counter feature in the processor to measure some events(cache miss, memory read etc) with and without my code change. But after looking at the numbers I realized it is difficult to analyze such a data. Because each number is 8 digit and I had 16 columns(16 cpu) and 100 rows of data(100 seconds of run). So I decided to use some graph so that it would be easy to understand the data.
Googled for a GNU graph tool and found gnu plot – this blog is to show how good it is and how easy it is to use. Consider using it if you have lot of numbers. For this post I took some sample data from my ubuntu machine while running stress command.
[codegroup]
[shell tab=’output’]
cat stat.txt
procs ———–memory———- —swap– —–io—- -system– —-cpu—-
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 251124 1827388 4720 32704 6 34 19 38 42 74 1 0 98 1
0 0 251124 1827388 4720 32708 0 0 0 0 104 71 0 0 100 0
13 3 251108 1349912 4728 322540 4 0 4 20 683 1789 42 12 47 0
11 3 251008 1382620 4728 322520 180 0 180 0 1604 1233 89 12 0 0
11 3 251008 1432052 4728 322520 0 0 0 0 1361 1237 90 10 0 0
11 3 251008 1298352 4728 322668 0 0 0 0 1392 1275 90 10 0 0
2 3 251008 1512576 4728 323524 0 0 0 0 20077 14827 59 16 13 12
0 0 251008 1826388 4728 32756 0 0 0 0 45069 25566 0 4 25 71
0 0 251008 1826444 4728 32708 0 0 0 0 59 46 0 0 100 0
…
[/shell]
[shell tab=’stress’]
stress –cpu 8 –io 4 –vm 2 –vm-bytes 128M –hdd 2 –timeout 200s
[/shell]
[shell tab=’vmstat’]
sudo vmstat -n 1 200 > stat.txt
[/shell]
[/codegroup]
The following example shows how to create line graph for us, sy columns in the above against time(seconds).
This graph might not be impressive because it deals with only numbers ranging from 0-100 and the numbers are very steady. Consider a range 0-99999999 and the numbers are fluctuating too much then it will be to graph. The above graph was created by running “gnuplot” with following commands
[shell]
set title ‘CPU usage’
#set terminal svg butt enhanced dynamic
set terminal jpeg
set output ‘output.jpg’
set xlabel ‘seconds’
#set logscale y
set ylabel ‘cpu’
set key below
plot \
“stat.txt” using :13 title ‘Application’ with lines lc rgb ‘blue’, \
“stat.txt” using :14 title ‘Kernel’ with lines lc rgb ‘green’
[/shell]
You can also intermix two or more data files. The following example shows how to graph two different samples collected during different time period.
[shell]
set title ‘CPU usage’
#set terminal svg butt enhanced dynamic
set terminal jpeg
set output ‘output.jpg’
set xlabel ‘seconds’
#set logscale y
set ylabel ‘cpu’
set key below
plot \
“stat.txt” using :13 title ‘Application’ with lines lc rgb ‘light-green’, \
“stat.txt” using :14 title ‘Kernel’ with lines lc rgb ‘light-red’, \
“stat1.txt” using :13 title ‘Application1’ with lines lc rgb ‘dark-green’, \
“stat1.txt” using :14 title ‘Kernel1’ with lines lc rgb ‘dark-red’
[/shell]
The stat1.txt file is generated by running vmstat while the system was stressed by the following command
[shell]stress –cpu 4 –io 2 –vm 4 –vm-bytes 1M –hdd 2 –hdd-bytes 4096 –timeout 200s[/shell]
The nice thing about gnuplot is it will skip the row(line) in the data file if it cant recognize the columns. And also it supports svg and pdf outputs. See what all gnuplot can do at the official demo page.
USB image creation using dd command
Wrote this post while waiting for a “dd” command to finish USB image creation. So this post is mainly for dd tips.
How to find where my usb drive is attached?
Attach USB and then issue the following command and then look for ‘/dev/sd??’
[shell]dmesg | less[/shell]
How to create a USB drive clone?
[shell]sudo dd if=/dev/xxx of=clone.img[/shell]
The output image will be in clone.img.
If you want to restore to the original disk or another USB drive then use
[shell]sudo dd of=/dev/xxx if=clone.img[/shell]
How long will it take for a X GB drive?
Depends on your usb drive transfer rate. But also depends on the buffer size used by dd, I use
[shell]sudo dd of=/dev/xxx if=clone.img bs=10M[/shell]
iostat will give you the transfer rate of your devices
Dont know whether dd is dead or not?
If you send USR1 signal to dd it will print the status.
[shell]sudo kill -USR1 `pidof dd`[/shell]
If you want auto update every 10 sec, try
[shell]watch -n 10 sudo kill -USR1 `pidof dd`[/shell]
PS3 File Conversion(mkv to vob)
Eva
Eva is an expression evaluator developed as a cross platform utility. It is written using ANTLR(targetting C), gcc. Eva supports all C expressions including bitwise operations. It is created to aid developers to convert between hexa to binary operations easily. The main goal was very simple code base ~1000 lines.
The UI part was initially coded using wxWidgets and now contains – simple command line interface, wxWidgets and Windows specific GUI interface. Here is a video demonstrating different operations possible with Eva.
http://www.youtube.com/watch?v=m7ieVDEkdZ4
Here is the source code – https://bitbucket.org/samueldotj/eva
GCC debugging switches
While compiling a FreeBSD kernel I encountered the following error message from GNU assembler
[shell]
Error: suffix or operands invalid for ‘mov’
[/shell]
I was not sure whether the error is because of my changes are not. But still I have to debug the problem and solve it, so I decided to find what is wrong with the mov instruction. But since gcc created a temporary file and invoked as to assemble it, I couldn’t find the temp file anymore.
Googled and find the following gcc options to find what gcc does.
gcc –v is verbose mode, which prints all the action/scripts gcc is doing.
gcc -### similar to –v but wont execute any commands.
gcc –save-temps saves all the temporary files generated by the gcc. Useful if you want to see the assembly output passed to gnu assembler.
gcc –E Just preprocess the c file.
gcc –S generate assembly file.
With gcc –v –save-temps, I got the temporary assembler file used by assembler. Since reading machine generated assembly file is difficult, I tried to find the C source code corresponding to the error. From nearest .file assembler directive I found the C file name and from the .loc directive I found the line number. And the following was the inline assembly in the C file.
[c]__asm __volatile(“movl %%ss,%0″ : “=rm” (sel));[/c]
Was surprised what is wrong with this and why my assembler giving error. Googled again and found this link – http://kerneltrap.org/node/5785. It seems newer gnu assembler(>=2.16) wont support long read/write operations(mov) on segment registers(ds,ss…). So changed the above source to movw %%ss, %o
and it compiled perfectly. 🙂
Unreferenced symbols – EXPORT_SYMBOL
I wrote ACPI driver for Ace and try to load it into kernel. It was failing with the error message symbol not found.
The log file content says AcpiGetName was not found.
[shell]
$ cat log.txt
Driver for id acpi : acpi.sys
../src/kernel/pm/elf.c:343:FindElfSymbolByName(): Symbol not found AcpiGetName(c01a2148:1869) string table c01aa619
[shell]
After exploring the Acpi-CA source code and Ace kernel code, I couldn’t figure out why the symbol is missing. Then I tried to dump the symbol table for Acpi library and kernel.sys.
[shell]
$ nm build/default/src/kernel/libacpi.a | grep GetName
U AcpiExGetNameString
000001e3 T AcpiExGetNameString
000001b7 T AcpiPsGetName
000000f4 T AcpiGetName
$ nm build/default/src/kernel/kernel.sys | grep GetName
c0135163 T AcpiExGetNameString
c0137bfb T AcpiPsGetName
[/shell]
It is obvious during kernel link process some how the linker is losing AcpiGetName. Within few seconds I realized that the linker is doing optimization on the output to remove unreferenced sections. So I googled for how to avoid this and found –no-gc-sections is the ld option to do it. However even after giving this option to linker to was removing the unreferenced symbols; I kept on trying to find a solution for making the linker to include unreferenced symbols in the output… I didn’t find the solution.
Then I started thinking how linux kernel is avoiding this problem and I it flashed seeing EXPORT_SYMBOL kind of macro in some source. So I googled for it and got the answer – use the symbol in storage, so that it will be referenced, so that linker will include it in the output.
[c]
struct kernel_symbol
{
unsigned long symbol;
char * name;
};
#ifndef MODULE_SYMBOL_PREFIX
#define MODULE_SYMBOL_PREFIX “”
#endif
#define __EXPORT_SYMBOL(sym, sec) \
static const char __kstrtab_##sym[] \
__attribute__((section(“__ksymtab_strings”))) \
= MODULE_SYMBOL_PREFIX#sym; \
static const struct kernel_symbol __ksymtab_##sym \
__attribute__((section(“__ksymtab” sec), unused)) \
= { (unsigned long)&sym, __kstrtab_##sym }
#define EXPORT_SYMBOL(sym) \
__EXPORT_SYMBOL(sym, “”)
[/c]
Source code browsing tool
In UNIX environment there is the famous “cscope”, to browse the source code and make changes. Using cscope it is easy to browse/edit the source code, cross reference all the functions calling a function, including a file etc.
In windows, I was using notepad++’s find files options browse C source code. Today I decided to find an open source full fledged source navigator for windows.
I started with notepad++ plugin – “Function List”. But it is just not enough for a project source browsing. Google searches gave me cscope port of windows(http://cscope.sourceforge.net/). However I wanted more – the GUI. With few more searches I found Source Navigator – http://sourcenav.sourceforge.net/
Soucrce Navigator is very useful, I just added my project and it give me full list of symbols. Editing a function is just clicking it. Since I am using notepad++ for editing anything, I changed the editor preference to notepad++. To do go to Source Navigator->File menu->Project Preference->Edit-> External editor and give the following line
“C:\Program Files\Notepad++\notepad++.exe” -n%l
Now I can easily browse the source code in Source Navigator and edit it in Notepad++ with few keystrokes.
Network Booting
After creating Ace bootable CD, I decided to try to make Ace bootable from network also to completely avoid creation of disk images. Network booting also helps to boot test machine (real machine) on the same network with ease.
Earlier I had grub installed (on a disk) on a test machine (real machine) with network support. Grub can get the kernel using tftp and boots the system.
Now I wanted to avoid installing grub on a disk. So I had only one option diskless boot. So I decided to use PXE(Preboot eXecution Environment).
To make PXE work, I needed a DHCP server and TFTP server. There are lot of tools available for Linux but for Windows, very very less tools available. After some internet searches, I found and downloaded perfect tools: Dual Server http://sourceforge.net/projects/dhcp-dns-server/ and TFTP Server http://sourceforge.net/projects/tftp-server/ both projects are developed and maintained by Achal Dhir. The other tftp/dhcp servers available windows doesn’t have large set of options provided by Dual Server and TFTP server.
Since I wanted to try PXE boot on emulators, I needed to create software network tap adapter. Without software tap adapter, emulated network card cant talk to the physical network. So I downloaded OpenVPN, which has tool to create tap devices from http://openvpn.net/index.php/downloads.html
If you are using real machine, you may need to download and install PXE roms from etherboot.org http://rom-o-matic.net/
To boot from network, grub should be compiled with appropriate network driver and other flags. Since grub has some problem with compiling under Windows, I used a Linux machine to compile.
cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/grub co grub
cd grub
./configure –-enable-diskless –-enable-pxe –-enable-rtl8139
You might want to compile in the preset menu also using the –preset option.
Now configure Dual DHCP server to provide IP address, boot file information to qemu
#qemu-ne2k-pci
[52:54:00:12:34:56]
IP_Addr=192.168.2.200
HostName=QemuBox
DNS_Server=192.168.2.10
Router=192.168.2.10
Boot_File=pxegrub-ne2k-pci
Next_Server=192.168.2.10
Then configure TFTP server for the correct tftp home folder.
Now it is time to run, Qemu. Note – Windows port of Qemu 0.9.0 has some problem with obtaining DHCP, so install Qemu 0.9.1 and run the following command, it will boot from network.
qemu -L “$QEMU_BIOS_DIR” -M pc -m 32 -no-kqemu -hdc c.img -boot n -net nic,model=ne2k_pci,macaddr=52:54:00:12:34:56,vlan=0 -net tap,vlan=0,ifname=tap0
I am able to PXE boot qemu only with NE2000 cards, for other cards it failing. PXE boot on vmware only works if I configure grub using ifconfig command. Otherwise vmware machines get assigned with some wrong gateway address. No luck to PXE boot Ace on bochs.
For booting my test machine(real machine), I setup my router (DD-WRT box) with boot-dhcp option in DNSMasq.
http://www.dd-wrt.com/phpBB2/viewtopic.php?t=4662&highlight=pxe
Bootable CD with Grub
Last week, I decided to boot Ace using CD image rather than floppy image because booting was slow when floppy image is used in bochs.
To create bootable cd with grub, stage2_eltorito should compiled from the grub source. The mkisofs is also needed to create the image. http://smithii.com/cdrtools contains windows cygwin port of the cdrtools which contains mkisofs.
[shell]
mkdir -p $ACE_ROOT/img/iso/boot/grub
cp $ACE_ROOT/img/boot/grub/stage2_eltorito $ACE_ROOT/img/iso/boot/grub
cp $ACE_ROOT/img/boot/grub/menu.lst $ACE_ROOT/img/iso/boot/grub
cp $ACE_ROOT/obj/kernel.sys $ACE_ROOT/img/iso/
mkisofs -R -b boot/grub/stage2_eltorito -no-emul-boot -boot-load-size 4 -boot-info-table -o $ACE_ROOT/img/bootcd.iso $ACE_ROOT/img/iso
[/shell]
This image can be written to a CD using any CD burning software.