Samuel Jacob's Weblog

Just another technical blog

SWIG and Complex C structures

without comments

I had to use SWIG to access a kernel module’s chardev interface through python and found SWIG examples are not enough, so adding my own.

Lets take the following example header file.
I will explain how to access all the members in complex_struct_t from python.
Also extend these structures so that python code would look little better.

/* test.h */

#ifndef _TEST_STRUCT_H
#define _TEST_STRUCT_H

/* a simple structure - no problem with SWIG */
typedef struct simple_struct {
   int int_var;
   long long_var;
   float float_var;
} simple_struct_t;

typedef struct tlv_base {
   int type;
   int length;
   unsigned char value[];
} tlv_base_t;

typedef struct tlv_type1 {
   tlv_base_t base;
   int stat;
   int info;
   long something;
} tlv_type1_t;

/* relatively complex C structure. */
typedef struct complex_struct {
   char string[10];           //SWIG considers this as null terminated string
   unsigned char bytes[10];   //SWIG wont considers this as string

   simple_struct_t embedded;

   int pointer_array_count;
   simple_struct_t *pointer_array; //SWIG supports only accessing first element.

   tlv_base_t tlv;   //How do cast this to derived_struct_1 ?
} complex_struct_t;

complex_struct_t * alloc_complex_struct(int array_count);
void free_complex_struct(complex_struct_t *cs);
void print_complex_struct(complex_struct_t *cs);


/* test.c */

#include <stdlib.h>
#include <stdio.h>

#include "test.h"

complex_struct_t *
alloc_complex_struct(int array_count)
   complex_struct_t *result;
   size_t size;

   result = (complex_struct_t *)malloc(sizeof(complex_struct_t) + sizeof(tlv_type1_t));
   if (result == NULL){
      return NULL;

   result->tlv.type = 1;
   result->tlv.length = sizeof(tlv_type1_t);

   size = sizeof(simple_struct_t) * array_count;
   result->pointer_array = (simple_struct_t *)malloc(size);
   if (result->pointer_array == NULL) {
      return NULL;
   memset(result->pointer_array, 0, size);
   result->pointer_array_count = array_count;

   return result;

free_complex_struct(complex_struct_t *cs)

static inline void
print_simple_struct(simple_struct_t *ss)
   printf("int %d long %ld float %f\n", ss->int_var, ss->long_var, ss->float_var);

print_complex_struct(complex_struct_t *cs)
   int i;

   printf("String = %s\n", cs->string);
   printf("Embedded : ");
   printf("External : \n");
   for (i=0; i<cs->pointer_array_count; i++) {
      printf("%d) ", i + 1);

// test_swig.i
%module test_struct
   #include "../test.h"

%include "test.h"
# Commands to make the shared library
$ mkdir -p _build
$ gcc -I /usr/include/python2.7/ -fPIC -c -o _build/test.o test.c
$ swig -python -outdir _build -o _build/test_swig_wrap.c test_swig.i
$ gcc -I /usr/include/python2.7/ -fPIC -c -o _build/test_swig_wrap.o _build/test_swig_wrap.c
$ ld -shared  _build/test.o  _build/test_swig_wrap.o -o _build/
$ rm _build/test_swig_wrap.c

SWIG_SRCS = test_swig.i
C_SRCS = test.c
CFLAGS = -I /usr/include/python2.7/ -fPIC -c -Wall

OBJS = $(patsubst %.c, $(BLD_DIRECTORY)/%.o, $(C_SRCS)) \
		 $(patsubst %.i, $(BLD_DIRECTORY)/%_wrap.o, $(SWIG_SRCS))

$(BLD_DIRECTORY)/%.o: %.c %.h
	gcc $(CFLAGS)  -o $@ $<

	gcc $(CFLAGS) -o $@ $<

$(BLD_DIRECTORY)/%_wrap.c: %.i
	swig -python -outdir $(BLD_DIRECTORY) -o $@ $<
	cp $@ $@.bak

	mkdir -p $(BLD_DIRECTORY)

	rm -rf $(BLD_DIRECTORY)

	ld -shared $(OBJS) -o $(BLD_DIRECTORY)/

.PHONY: all clean


With this simple interface file, SWIG would be able to create a and which is perfect for most cases.

%module test_struct
   #include "../test.h"

%include "test.h"

import test_struct as ts
cs = ts.alloc_complex_struct(1)
cs.string = "Hello"
cs.embedded.int_var = 9
cs.embedded.long_var = 10
cs.embedded.float_var = 11.23

This shows SWIG’s ability to convert C string to python string and vice versa.
Similarly accessing primitive structure members is very easy.
Here is the output of the program when ran from _build directory.

String = Hello
Embedded : int 9 long 10 float 11.230000
Pointer Array :
1) int 0 long 0 float 0.000000

It also shows how to call function call into C library. If you have noticed this program looks more like a C program rather than a python program – mainly because it manages the memory allocation/free. Python can informed that alloc_complex_struct() returns a new object and it is the responsibility of the caller to free it by using the SWIG typemap newobject . Now python garbage collector will free the object when there is no reference. But python does not know how to free the object(complex_struct_t) – this can be done by using newfree typemap.

By adding the following to the test_swig.i, we can avoid calling free_complex_struct() in python program.

%typemap(newobject) alloc_complex_struct;
%typemap(newfree) complex_struct_t * {

Lets modify the program a little bit and access the pointer_array elements.

import test_struct as ts
cs = ts.alloc_complex_struct(5)
print 'Pointer array count ', cs.pointer_array_count
print cs.pointer_array[0]

This will fail with the following error:

Pointer array count  5
Traceback (most recent call last):
  File "./", line 4, in <module>
    print cs.pointer_array[0]
TypeError: 'simple_struct_t' object does not support indexing

The reason is SWIG does not really know simple_struct_t *pointer_array; actually points to an array of simple_struct_t. In other words SWIG safely assumes it points to a single entry. If pointer_array was “array of simple_struct_t pointers” then carrays.i macro would have been helped. But pointer_array is actually “pointer to array of simple_struct_t” so carrays.i won’t help.

The easiest way is extending complex_struct_t and add a new member(kind of) function to it.

%extend complex_struct_t{
    simple_struct_t *get_array_element(int i) {
        return &$self->pointer_array[i];

This way cs.get_array_element(4) will return 4th element in the array.
Similarly tlv elements can be accessed also – but this time I decided to override indexing operator([]).

%extend complex_struct_t{
    unsigned char __getitem__(int i) {
        return $self->tlv[i];

However this is not useful since python cant cast from (struct tlv_base *) to struct tlv_type1 *. To cast, a C function can be coded or SWIG’s cpointer.i can be used.

Here is the full test_swig.i

%module test_struct
   #include "../test.h"

%include "test.h"                                                                                                                                                        

%extend complex_struct{
    simple_struct_t *get_array_element(int i) {
        return &$self->pointer_array[i];

%typemap(newobject) alloc_complex_struct;
%typemap(newfree) complex_struct_t * {

%include <cpointer.i>
%pointer_cast(tlv_base_t *,  tlv_type1_t *, cast_to_tlv_type1);

And test code:

import test_struct as ts
cs = ts.alloc_complex_struct(5)
cs.string = 'Hello'
print 'Pointer array count ', cs.pointer_array_count
for i in range(cs.pointer_array_count):
   simple_struct = cs.get_array_element(i)
   simple_struct.int_var = i * 10
   simple_struct.long_var = i * 20
   simple_struct.float_var = i * 3.3

tlv = ts.cast_to_tlv_type1(cs.tlv)
print tlv.stat,, tlv.something

Written by samueldotj

September 11th, 2015 at 9:51 pm

Posted in C,Programming

I2C 20×4 LCD module and Arduino

without comments

Recently I was working on a ZigBee project using XBee. Since XBee occupied the Arduino UART port, I decided use a character LCD for debug logs. I got the SainSmart 20×4 LCD module with i2c interface. The connections are simple:

Arduino LCD
5v VCC

But I couldnt make it work with sample code provided, after few googling I found that the sample code has a wrong i2c address. Even after using the correct I2C address(0x3f), it didnt work, but I was getting only to horizontal black bars on the screen.

I confirmed the I2C is working by using I2C Explorer. Finally after updating the i2c library and with the following code, it started working. I found this code on a amazon review and modified for my purpose.


Written by samueldotj

September 8th, 2013 at 4:12 pm

Posted in Uncategorized

Tagged with

Fix loop in a linked list

without comments

Cycle or loop in a linked list with n nodes is shown in pictorial representation.

Loop in a singly linked list

Loop in a singly linked list

In this picture the number of nodes in the list is n and the last node points to node at a position x(which created the cycle). From the picture it is obvious that to fix the loop:

  1. Find the last element
  2. Correct it to point end(NULL)

In the above steps, step 1(find the last element) is little difficult if total number of nodes in the list is unknown. In this post I will describe a method I discovered to find the last node. I drew couple of pictures to explain the complexity of the problem. The following pictures explains the same loop condition but in different form. It explains why it is hard to find the last node.

The illustration also tells us the following
n = x+y
  n is total number of nodes in the list
  x is number of nodes before the cycle starts.
  y is number of nodes forming the cycle.

If we have value of x and y then the equation can be solved, which will give the last node’s location.

Finding y

  1. Detect the cycle using Floyd’s Cycle detection algorithm. (Two pointer with varying speed)
  2. When cycle is detected
    1. Store the current node as H
    2. Move to next node and increment y(y++)
    3. If current node is H then exit
    4. Goto step 2

Finding x
Let’s first see the easiest way to solve this:

  1. Have two pointers (p1, p2)
  2. p1 = node 0
  3. p2 = p1 + y nodes
  4. if p1 == p2 then we found the result is x = p1
  5. p1 = p1 + y nodes
  6. goto step 3.

The complexity of this algorithm linear depends on x. For example if x is 10K and y is 2 then number of node visits are > 20K. In other words the complexity is O(2x + y).

I believe it can be done in O(n) or O(x + y) with following method This following diagram captures state when the “tortoise and the hare” algorithm detects the cycle.

Lets define few more variables at the time of cycle detection:
T is the total number of nodes tortoise traveled.
H is the total number of nodes hare traveled.
r can be defined as number of nodes before the ‘last node’

T = 2H                   \newline x + cy - r = T           \newline x + y - r = H            \newline
where c is the number of complete cycles hare covered.
T > H                                \newline c > 1                                \newline r >= x \ and \ r <= n                \newline
From this we can derive
c = H / y + 1

Using value of c, r can be derived.
x + y - r = H            \newline r = x + y - H            \newline \newline x + cy - r = T           \newline x =  T - cy + r           \newline \newline r =  T - cy + r  +  y - H  \newline r =  H - cy + r  +  y      \newline

Using this formula r can be found. With this r value, the last node can be found by travelling r-1 nodes where T and H met.

Written by samueldotj

August 29th, 2013 at 3:35 pm

Posted in Uncategorized

LLDB Backtrace formatting

without comments

lldb can be configured to print backtraces with syntax highlighting. Here is how to setup lldb to do that

Consider the following source level debugging session,

$ cat test.c
static void crash_me()
    char *c = 0;
    *c = 0;

static void recursive_call(int value)
    if (value == 0) {
    recursive_call(value - 1);

int main(int argc, char argv[])
$ gcc -g3 -O3 test.c
$ lldb a.out
(lldb) run 0 1 2 3 4 5

Without color syntax the backtrace would look like the following.

Normal backtrace with out any makeup

Normal backtrace with out any makeup

Since lldb supports ANSI escape sequence, the escape sequences can be used to color the backtrace output which makes output more readable. Here is the link to official lldb page describing this feature –

Here is my backtrace setting and example

(lldb) settings set frame-format "frame #${frame.index}: ${frame.pc}{ \x1b\x5b36m${module.file.basename}\x1b\x5b39m{` \x1b\x5b33m${} \x1b\x5b39m${function.pc-offset}}}{ at ${line.file.basename}:${line.number}}\n"

Backtrace with color

Backtrace with color

Similarly thread format cant be colorized so that ‘thread list‘ would look neat.

(lldb) settings set thread-format "\x1b\x5b42;1mthread #${thread.index}: tid = ${}{, ${frame.pc}}{ \x1b\x5b31m${module.file.basename}\x1b\x5b39m{`${}${function.pc-offset}}}{ at ${line.file.basename}:${line.number}}{, name = '\x1b\x5b34m${}}\x1b\x5b39m{, queue = '${thread.queue}}{, stop reason = ${thread.stop-reason}}{\nReturn value: ${thread.return-value}}\x1b\x5b0m\n"

Reference to ANSI escape sequence –

Written by samueldotj

July 22nd, 2013 at 9:20 pm

Posted in C,Debugger,lldb

Self Balancing Tree as Heap

without comments

Here is my thoughts about how to combine a Heap and AVL tree and get benefit of both them from a single data structure.

A self balancing binary search tree such as AVL tree can do faster lookup for a item in the tree in O(log n). Heaps are mainly used to implement priority queues which needs to find min/max elements quickly. Heap achieves this in O(1) time complexity.

Balanced BST1 Lets revisit a basic property of binary search tree – In a binary search tree min element is at the left most leaf position. Similarly in a BST max element is at the right most leaf position. So finding min/max element in a BST is O(h) where h is the depth of the tree. If the BST is a balanced tree then O(h) is O(log n) in worst case(otherwise the worst case O(n), since we considering only balanced trees here lets ignore the unbalanced cases). The following diagram illustrates a basic property of the BST – min element is always on the left most node.

Cached pointer to Min element at the root

Cached pointer to Min element at the root

Using this basic property, min remove operation can be optimized. The first and simplest optimization comes to mind is store a pointer to min/max elements. This caching will result in O(1) time complexity for finding min/max elements. However this would increase the cost of node insert/delete because the min/max pointer has to be updated during insert and deletion. The cost of finding and deleting a min node is O(log(n)) which is same as if we havent had the cache pointers. The picture in the right shows advantage of having cached pointer to find a min element. Obviously this method cant be used for priority queues where find min/delete operation is used together.

In the above method the problem was the complexity finding next smallest element in the tree from min element is O(log n). So if we have pointer to next smallest element from all nodes then find and delete opearation would be of complexity O(1).

Lets look at the BST from slightly different angle. Usual declaration of BST in C as follows:

struct binary_tree
   struct binary_tree *left;
   struct binary_tree *right;

When we(me and +Dilip Simha) were implementing ADT for AceOS we decided to experiment BST in a different way. We saw tree as recursive lists rather than a recursive pointers.

In the following picture you could see 6 lists(not counting sub-lists):
A list is highlighted

  1. (300, 200, 100, 50)
  2. (300, 400, 500, 600)
  3. (200, 250, 275)
  4. (100, 150)
  5. (400, 350)
  6. (500, 450)

Now consider this list is a doubly linked circular list. This is illustrated in the following figure. You may argue that this will make the BST to become cyclic directed graph. But for the sake of simplicity lets continue to call this as balanced BST. In the picture I left out few arrows to keep it cleaner.

Binary Search Tree with nodes having pointer to parent also

Binary Search Tree with nodes having pointer to parent also

typedef struct list LIST, * LIST_PTR;
struct list {
   LIST_PTR next;
   LIST_PTR prev;

typedef struct binary_tree
   LIST left;
   LIST right;

typedef struct avl_tree
   int height;    /*! height of the node*/
   BINARY_TREE bintree; /*! low level binary tree*/

AVL tree in Ace OS  is implemented in this way. You can see the data structure declarations below. Initially we did it for reusing the code. But after implementing this we figured out some interesting properties. This balanced tree/graph can find any node in O(log(n)) and also offers findmin operation in O(1) complexity. This also reduces the complexity of delete operation(since we can find right node’s left most child in O(1) operation). But delete operation might result in balancing the tree.

Written by samueldotj

May 11th, 2013 at 11:59 am

Posted in Ace

Internals of GNU Code Coverage – gcov

with one comment

Few years ago I worked on a small project to extract code coverage information created by gcc from FreeBSD based kernel. During that time I didn’t find any good internal documentation about gcov. So here I post what I learned. Before jumping to the internals of GCOV here is an example from the man page.

$ gcov -b tmp.c
87.50% of 8 source lines executed in file tmp.c
80.00% of 5 branches executed in file tmp.c
80.00% of 5 branches taken  at  least  once  in  file tmp.c
50.00% of 2 calls executed in file tmp.c
Creating tmp.c.gcov.
Here is a sample of a resulting tmp.c.gcov file:

    1        int i, total;
    1        total = 0;
   11        for (i = 0; i < 10; i++)
branch 0 taken = 91%
branch 1 taken = 100%
branch 2 taken = 100%
   10        total += i;
   1         if (total != 45)
branch 0 taken = 100%
  ######          printf ("Failure0);
call 0 never executed
branch 1 never executed
    1             printf ("Success0);
call 0 returns = 100%
    1    }

Note – gcov has a cool graphical front-end in Linux – lcov.
As shown above gcov can show what all code path executed and how many time executed.
Want to try? Here is the quick example.

$ gcc -fprofile-arcs -ftest-coverage your_program.c
$ ./a.out
$ gcov your_program.c

During compilation with -ftest-coverage option gcc generates a “.gcno” file. It contains information about each branches in your code. While finishing execution, ./a.out creates .gcda file(s) which actually contains which all branches taken(basic block entry/exit). Using these there files .c(source), .gcno(block info) and .gcda(block execution count) gcov command prints the code coverage information in a human readable format.

You might wonder how your ./a.out would create .gcda while exiting the program. It is because of “-fprofile-arcs” automatically includes libgcov. Libgcov registers itself to be invoked during program exit by using atexit(). (Yes – it wont generate .gcda files if you exit abnormally). And during program exit it just dumps all the branch information to one or more gcda file.

The coverage information is “just” dumped into files by libgcov. So who collects the the coverage information at run time? Actually the program itself collects the coverage information. In-fact only it can collect because only it knew which all code path it takes. The code coverage information is collected at run-time on the fly. It is accomplished by having a counter for each branch. For example consider the following program.

int if_counter = 0, else_counter = 0;

void dump_counters()
	int fd;
	fd = open(strcat(filename, ".gcda"), "w");
	write(fd, if_counter, sizeof(if_counter));
	write(fd, else_counter, sizeof(else_counter));

int main(int argc, char *argv[])
	if(argc > 1) {
		printf("Arguments provided\n");
	} else {
		printf("No arguments\n");

If you replace the above example with gcov then green colored code is provided by libgcov(during link/load) and the blue colored coded inserted into your executable by gcc(during compilation).

It is easy to speculate how the increment operation would be be implanted inside your code by gcc. gcc just inserts “inc x-counter machine instruction before and after every branch. It should be noted that “inc” is instruction might have side effect on some programs which uses asm inside C. For example in x86 the “inc” instruction affects carry flag. Some assembly code might depends on this and if gcc inserts “inc counter” instruction then it will result in error. I had hard time figuring this out when compiled with -fprofile-arcs a kernel was booting but not able to receive any network packets(it was discarding all packets because the network stack found the checksum was wrong).

Here is a simple C program’s disassembly:

int main()
  4004b4:       55                      push   %rbp
  4004b5:       48 89 e5                mov    %rsp,%rbp
    int a = 1;
  4004b8:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)

    if (a) {
  4004bf:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  4004c3:       74 06                   je     4004cb <main+0x17>
  4004c5:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004c9:       eb 04                   jmp    4004cf <main+0x1b>
    } else {
  4004cb:       83 6d fc 01             subl   $0x1,-0x4(%rbp)

    return a;
  4004cf:       8b 45 fc                mov    -0x4(%rbp),%eax
int main()
    int a = 1;

    if (a) {
    } else {

    return a;
gcc -g3 test.c
objdump -S -d ./a.out

When the same program compiled with profile-arcs, the disassembly looks like

int main()
  400c34:       55                      push   %rbp
  400c35:       48 89 e5                mov    %rsp,%rbp
  400c38:       48 83 ec 10             sub    $0x10,%rsp
    int a = 1;
  400c3c:       c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)

    if (a) {
  400c43:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  400c47:       74 18                   je     400c61 
  400c49:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  400c4d:       48 8b 05 3c 25 20 00    mov    0x20253c(%rip),%rax        # 603190 
  400c54:       48 83 c0 01             add    $0x1,%rax
  400c58:       48 89 05 31 25 20 00    mov    %rax,0x202531(%rip)        # 603190 
  400c5f:       eb 16                   jmp    400c77 
    } else {
  400c61:       83 6d fc 01             subl   $0x1,-0x4(%rbp)
  400c65:       48 8b 05 2c 25 20 00    mov    0x20252c(%rip),%rax        # 603198 
  400c6c:       48 83 c0 01             add    $0x1,%rax
  400c70:       48 89 05 21 25 20 00    mov    %rax,0x202521(%rip)        # 603198 

    return a;
  400c77:       8b 45 fc                mov    -0x4(%rbp),%eax
  400c7a:       c9                      leaveq
  400c7b:       c3                      retq

From the above disassembly it might seem putting inc instruction while compiling is easy. But how/where storage for the counters(dtor_idx.6460 and dtor_idx.6460 in above example) are created. GCC uses statically allocated memory. Dynamically allocating space is one way but it would complicate the code(memory allocation operations during init) and might slow down execution of program(defer pointer). To avoid that gcc allocates storage as a loadable section.

The compiler keep tracks of all the counters in a single file. The data structure outlined in the below picture.
There is a single gcov_info structure for a C file. And multiple gcov_fn_info and gcov_ctr_info. During program exit() these structures are dumped into the .gcda file. For a project(with multiple C files) each C file will have a gcov_info structure. These gcov_info structures should be linked together so that during exit() the program can generate .gcda file for all the C files. This is done by using constructors and destructors.

Generic C constructor:
gcc generates constructors for all program. C constructors are accomplished by using “.ctors” section of ELF file. This section contains array of function pointers. This array is iterated and each function is invoked by _init()->__do_global_ctors_aux() during program start. _init() is placed “.init” section so it will be called during program initialization. A function can be declared as constructor by using function attribute.

“-ftest-coverage” creates a constructor per file. This constructor calls __gcov_init() and passes the gcov_info as argument.

samuel@ubuntu:~$objdump  -t ./a.out  | grep -i _GLOBAL__
0000000000400c7c l     F .text  0000000000000010              _GLOBAL__sub_I_65535_0_main

And disassembly of _GLOBAL__sub_I_65535_0_main

 954 0000000000400c7c <_global__sub_i_65535_0_main>:
 955   400c7c:       55                      push   %rbp
 956   400c7d:       48 89 e5                mov    %rsp,%rbp
 957   400c80:       bf 00 31 60 00          mov    $0x603100,%edi
 958   400c85:       e8 a6 12 00 00          callq  401f30 <__gcov_init>
 959   400c8a:       5d                      pop    %rbp
 960   400c8b:       c3                      retq
 961   400c8c:       90                      nop
 962   400c8d:       90                      nop
 963   400c8e:       90                      nop
 964   400c8f:       90                      nop

gcov_init() implemented in libgcov stores all the gcov_info() passed in a linked list. This linked list is used to walk through all the gcov_info during program termination.

Written by samueldotj

March 31st, 2012 at 4:21 pm

Posted in C,gcc,Tools

Plot your data using gnuplot

without comments

System statistics are hard interpolate since usually they are collected in large quantities and sometimes represents large numbers. Recently I was doing a prototype and wanted to measure how much damage it would to the main project (in terms of performance); so used performance counter feature in the processor to measure some events(cache miss, memory read etc) with and without my code change. But after looking at the numbers I realized it is difficult to analyze such a data. Because each number is 8 digit and I had 16 columns(16 cpu) and 100 rows of data(100 seconds of run). So I decided to use some graph so that it would be easy to understand the data.

Googled for a GNU graph tool and found gnu plot – this blog is to show how good it is and how easy it is to use. Consider using it if you have lot of numbers. For this post I took some sample data from my ubuntu machine while running stress command.

cat stat.txt
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 251124 1827388   4720  32704    6   34    19    38   42   74  1  0 98  1
 0  0 251124 1827388   4720  32708    0    0     0     0  104   71  0  0 100  0
13  3 251108 1349912   4728 322540    4    0     4    20  683 1789 42 12 47  0
11  3 251008 1382620   4728 322520  180    0   180     0 1604 1233 89 12  0  0
11  3 251008 1432052   4728 322520    0    0     0     0 1361 1237 90 10  0  0
11  3 251008 1298352   4728 322668    0    0     0     0 1392 1275 90 10  0  0
 2  3 251008 1512576   4728 323524    0    0     0     0 20077 14827 59 16 13 12
 0  0 251008 1826388   4728  32756    0    0     0     0 45069 25566  0  4 25 71
 0  0 251008 1826444   4728  32708    0    0     0     0   59   46  0  0 100  0
stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 2  --timeout 200s
sudo vmstat -n 1 200 > stat.txt

The following example shows how to create line graph for us, sy columns in the above against time(seconds).

This graph might not be impressive because it deals with only numbers ranging from 0-100 and the numbers are very steady. Consider a range 0-99999999 and the numbers are fluctuating too much then it will be to graph. The above graph was created by running “gnuplot” with following commands

set title 'CPU usage'
#set terminal svg butt enhanced dynamic
set terminal jpeg
set output 'output.jpg'
set xlabel 'seconds'
#set logscale y
set ylabel 'cpu'
set key below
plot \
    "stat.txt" using :13 title 'Application' with lines lc rgb 'blue', \
    "stat.txt" using :14 title 'Kernel' with lines lc rgb 'green'

You can also intermix two or more data files. The following example shows how to graph two different samples collected during different time period.

set title 'CPU usage'
#set terminal svg butt enhanced dynamic
set terminal jpeg
set output 'output.jpg'
set xlabel 'seconds'
#set logscale y
set ylabel 'cpu'
set key below
plot \
    "stat.txt" using :13 title 'Application' with lines lc rgb 'light-green', \
    "stat.txt" using :14 title 'Kernel' with lines lc rgb 'light-red', \
    "stat1.txt" using :13 title 'Application1' with lines lc rgb 'dark-green', \
    "stat1.txt" using :14 title 'Kernel1' with lines lc rgb 'dark-red'

The stat1.txt file is generated by running vmstat while the system was stressed by the following command
stress --cpu 4 --io 2 --vm 4 --vm-bytes 1M --hdd 2 --hdd-bytes 4096 --timeout 200s


The nice thing about gnuplot is it will skip the row(line) in the data file if it cant recognize the columns. And also it supports svg and pdf outputs. See what all gnuplot can do at the official demo page.

Written by samueldotj

February 27th, 2012 at 8:27 pm

Posted in C,Programming,Tools


with one comment



This post describes the steps needed to make NGX’s USB ARM JTAG to work with OpenOCD in windows 7. This JTAG is compatible with colink JTAG and works with IAR Workbench and Keil uVision. To use with these IDEs there is a well defined methods/plug-ins available in the product page and in internet. However to use this JTAG with OpenOCD there is scarce resource in the internet.

OpenOCD can be used to low level debugging, source level debugging (through GDB) and can be used for flashing. OpenOCD exposes a command line interface which can be accessed through telnet. It also provides remote GDB server which also can be reached through TCP connection.

Steps needed for Windows:

  1. Plug-In the JTAG to a available USB connector
  2. Download libusb-win32
  3. Extract libusb-win32 to a folder and run “inf-wizard.exe”
  4. Select “USB Serial Converter A” and install driver
  5. Download and install OpenOCD
  6. Attach the JTAG probe to your target ARM board and poweron the target board
  7. Create a openocd configurations file (see at the end)
  8. Run openocd.exe –f
  9. Run putty or telnet and connect to port localhost:4444

After this the target board will respond to JTAG commands which can be issued through the telnet session.

For GDB debugging, you need a cross compiled GDB(arm-none-eabi-gdb).
After launching arm-none-eabi-gdb.exe run target remote localhost:3333 to start remote debugging.
You can execute low level JTAG commands from GDB by using monitor command.

Flashing can be done using the following commands:

sleep 200
flash probe 0
flash info 0
flash write_image erase unlock
sleep 200
reset run

OpenOCD configuration file:

# openocd configurations
telnet_port 4444

# gdb configuration
gdb_port 3333

# cpu configuration
source [find target/lpc1768.cfg]

# interface configuration
interface ft2232
ft2232_vid_pid 0x0403 0x6010
ft2232_device_desc "NGX JTAG"
ft2232_layout "oocdlink"
ft2232_latency 2

Written by samueldotj

May 22nd, 2011 at 6:04 am

Posted in C,Compiler,Debugger

DIY – Wirless Router and NAS: Software Pieces

without comments

This is the followup post of DIY – RCN. Here I document about the different software used to make my RCN.

Operating System

linux-ubuntu There are two open source choices BSD(FreeBSD) or Linux(ubuntu). After few days of analysis I decided to go with Linux – because in my work I use FreeBSD. In either case I did not want to use FreeNAS or OpenFiler or any other ready made distro. Since I am familiar with Ubuntu, I decided to use the Ubuntu server version.

File System

Wanted to use ZFS on my main storage disk but it is not available on Linux yet, so decided to go with XFS. EXT3/4 on the boot disk because it is natively supported and no extra package needed. The boot media is 8GB flash disk.


Since there is no optical disk drive, installation should be through network or USB. Since most of the Linux distributions supports that I decided to use USB.

  1. Download Ubuntu 10.10 server
  2. Download Universal USB installer
  3. Create bootable install media using the installer
  4. Boot the system with boot media


Although no data is going to be stored in the boot media, it would be good to have separate partitions to store the config files and home directory. Otherwise re-installation would wipe out all the data.

I chose to create 5 partitions

/     - EXT4 - 2GB
/usr  - EXT4 - 2GB
/var  - EXT4 - 2GB
/home - EXT4 - 1GB
swap  -      - 1GB


Since this device will run headless only way to communicate with the system is through network interface. Having SSH access is good but still having a web interface for common administration access is better. Few Linux applications are available for that my choice is Webmin.

sudo vi /etc/apt/sources.list
sudo apt-key add jcameron-key.asc
sudo apt-get update
sudo apt-get install webmin

After this the machine can be controlled from local network – https://hostname:10000/


Shutting down the system should be easy. Since the storage is connected to the system it cant be power off directly. The file system data should be syncd first and using command line or web interface is not realistic. So programming the ATX power switch is the only way – acpid does that.

sudo apt-get install acpid


The goal was to create file based storage which is accessible from my home network. The NAS server should be big enough for at least next 2 years(1TB). It should be fast enough to view videos from it without flickering(64MB ondisk buffer). It should have hardware fault tolerance(RAID).

Although few of my desktop boards had RAID option in the BIOS menu, I never used it and never explored it. I thought RAID chipsets in a motherboard is equivalent to RAID controllers/adapters. It was one of the decideding factor I favoured for Gigabyte(GA-D425TUD) motherboard with JMicron RAID chipset over Intel(D525MO) motherboard.

After configuring RAID in the BIOS and starting Linux I realized it is not true raid. Because Linux recognized as fakeraid. In simple terms fakeraid is a firmware based RAID. That is all the work is still to be done in software yielding no performance benefit. Advantage of fakeraid is multiple OS which runs on same box can utiltize the same RAID. Since my setup wont have multiboot option, I dont want the fakeraid so decided to go with pure software RAID 0. Here is the steps to create software raid 0.

  1. Create software raid using multiple devices(md) interface.
  2. mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb
  3. The above command will take some time (around 6 hours) because it needs to sync the contents of both disks.
    While it is doing that the status can be checked by using the following command.
  4. cat /proc/mdstat
  5. Then create a XFS file system on the md device
  6. mkfs.xfs /dev/md0
  7. Store the configuration
  8. <b>mdadm --detail --scan > /etc/mdadm/mdadm.conf
  9. Create mount point and add the mount information in the /etc/fstab
  10.      mkdir /mnt/raid
         echo "/dev/md0        /mnt/raid       xfs     defaults            1       2" >> /etc/fstab

Windows File Sharing

After this /mnt/raid can be made accessible to remote machines through either NFS or through Windows File Sharing. For Windows File Sharing samba service needed to installed. The following command installs samba server.
sudo apt-get install samba

After installing samba server it can be configured using webmin. Use webmin to configure samba “Servers”->”Samba File sharing”. Add the storage mount point here.


The routing functionality is very simple – handle all 3 interfaces with some limitations.

  • First interface eth0 is a Gigabit ethernet interface which is directly connected to the a desktop computer.
  • Second interface eth1 is a Fast ethernet interface which is directly connected to internet(connected to a ADSL modem).
  • Third interface is 802.11n wireless network.

Network and IP

All interfaces are in different networks. All interface should get static IPv4 address while booting up. This router should provide dynamic IP to the other machines.

Modify network interface and dhcp configurations

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
post-up iptables-restore < /etc/iptables.up.rules
up /etc/init.d/dhcp3-server start

#wireless network
auto wlan0
iface wlan0 inet static
up /etc/init.d/dhcp3-server start

#wan interface
auto eth1
iface eth1 inet static

[shell tab='/etc/dhcp3/dhcpd.conf']
subnet netmask {
option domain-name-servers,;
option routers;
option broadcast-address;
default-lease-time 600;
max-lease-time 7200;
subnet netmask {
option domain-name-servers,;
option routers;
option broadcast-address;
default-lease-time 600;
max-lease-time 7200;

Finally enable forwarding in Linux kernel by setting a system tunable.
[shell]echo 1 > /proc/sys/net/ipv4/ip_forward

To set it during boot modify /etc/sysctl.conf

NAT - Network Address Translation

NAT is required on eth1 to translate addresses on any outgoing packets and incoming packets. For the iptable rules should be set correctly, the following script does that.


#set default polices and flush
iptables -P INPUT ACCEPT
iptables -F INPUT
iptables -F OUTPUT
iptables -P FORWARD DROP
iptables -F FORWARD
#setup NAT
iptables -t nat -F
iptables -t nat -A POSTROUTING -o $EXTIF -j MASQUERADE

iptables -A FORWARD -i $EXTIF -o $INTIF -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -i $INTIF -o $EXTIF -j ACCEPT

iptables -A FORWARD -i $EXTIF -o $INTIF1 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -i $INTIF1 -o $EXTIF -j ACCEPT

iptables -A FORWARD -i $INTIF -o $INTIF1 -j ACCEPT
iptables -A FORWARD -i $INTIF1 -o $INTIF -j ACCEPT

#unblock certain services
iptables -A INPUT -p tcp -m tcp --dport 10000 -j ACCEPT


wifiNow it is time to setup the wireless interface. Assuming the wireless are drivers are present in the kernel.
The other tool required is hostapd. hostapd implements IEEE 802.11 access point management.
hostapd configuration





wpa_pairwise=TKIP CCMP

Written by samueldotj

March 12th, 2011 at 8:43 am

DIY – Wirless Router and NAS: Hardware bits

with one comment

I have replaced my Buffalo WRT-G54 wireless router with my own custom built router. This blog explains the details involved that process. I call this device as RCN (Router cum NAS).


I began my hardware hunt 45 days back – it was difficult because of my requirements and availability in Indian Market.

Here is the initial requirement list I had(strike-through indicates I gave up that requirement because the items are too costly or not available in India):

  • Total board TDP should be less than 40 watts
  • Passively cooled
  • Mini-ITX form factor
  • Onboard RAID
  • DDR3 support
  • Dual core
  • Hardware should be available in India

Here is the final products I purchased:

Here is some pictures of my RCN

RCN 012

RCN 006

RCN 013

RCN 010

Interesting fact about WD10EARS is it use 4KB sectors instead of the conventional 512byte sectors. Since this is a breakthrough software support is not much there. Looks like these drives perform poor if not partitioned properly – Since my requirement demands creating single partition, I purchased this drive and it performs well.

Written by samueldotj

February 14th, 2011 at 8:20 am

Posted in Router NAS Linux