wiki:cypress/Programming/CodeProfiling

Code Profiling

To use gprof, compile your code with -pg option. For example, when compiling ex32.c with Intel MKL,

user@host> ifort -pg ex32.f90 stokeslet2d.f90 -L$MKLROOT//lib/intel64/ \
-I$MKLROOT/mkl/include \
-Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a \
$MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a \
-Wl,--end-group -lpthread

then, run it to generate the profiling information,

user@host>./a.out 5
Total # of Particles=        2400
setting matrix   0.518922030925751      (sec)
Sovle linear ststem   15.6426219940186      (sec)
|b - Ax|=  4.422628714702772E-013
Compute internal velocity   3.97039604187012      (sec)

When it finishes, gmon.out is created.

user@host>ls
a.out   ex32.f90   ex32.o  gmon.out  gmres.f90~  particle.dat  res.dat        stokeslet2d_dist.c  stokeslet2d.f90   stokeslet2d.h    stokeslet2d.o
ex32.c  ex32.f90~  ex43.c  gmres.c   gmres.h     README.txt    stokeslet2d.c  stokeslet2d_dist.h  stokeslet2d.f90~  stokeslet2d.mod

To see the profiling results, use the command gprof as

user@host>gprof a.out 
Flat profile:

Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
65.36     13.17    13.17                             LN12_M2_LOOPgas_1
 7.84     14.75     1.58 36240000     0.00     0.00  stokeslet2d_mp_term2_
 4.71     15.70     0.95        1     0.95     2.77  stokeslet2d_mp_slet2d_velocity_
 4.22     16.55     0.85                             mkl_blas_def_dgemm_copyan
 3.82     17.32     0.77                             log.A
 2.90     17.91     0.59 36240000     0.00     0.00  stokeslet2d_mp_term1_
 1.96     18.30     0.40        2     0.20     0.37  stokeslet2d_mp_slet2d_mkmatrix_
You can see most of time spent in "LN12_M2_LOOPgas_1", which is a routine in the Intel math library.

See here for details.

Last modified 4 years ago Last modified on May 15, 2015 11:30:24 AM