Changes between Initial Version and Version 1 of cypress/Programming/CodeProfiling


Ignore:
Timestamp:
05/15/15 11:30:24 (10 years ago)
Author:
cmaggio
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • cypress/Programming/CodeProfiling

    v1 v1  
     1= Code Profiling =
     2To use [[http://en.wikipedia.org/wiki/Gprof|gprof]], compile your code with ''-pg'' option. For example, when compiling '''ex32.c''' with Intel MKL,
     3{{{
     4user@host> ifort -pg ex32.f90 stokeslet2d.f90 -L$MKLROOT//lib/intel64/ \
     5-I$MKLROOT/mkl/include \
     6-Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a \
     7$MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a \
     8-Wl,--end-group -lpthread
     9}}}
     10then, run it to generate the profiling information,
     11{{{
     12user@host>./a.out 5
     13Total # of Particles=        2400
     14setting matrix   0.518922030925751      (sec)
     15Sovle linear ststem   15.6426219940186      (sec)
     16|b - Ax|=  4.422628714702772E-013
     17Compute internal velocity   3.97039604187012      (sec)
     18}}}
     19When it finishes, '''gmon.out''' is created.
     20{{{
     21user@host>ls
     22a.out   ex32.f90   ex32.o  gmon.out  gmres.f90~  particle.dat  res.dat        stokeslet2d_dist.c  stokeslet2d.f90   stokeslet2d.h    stokeslet2d.o
     23ex32.c  ex32.f90~  ex43.c  gmres.c   gmres.h     README.txt    stokeslet2d.c  stokeslet2d_dist.h  stokeslet2d.f90~  stokeslet2d.mod
     24}}}
     25To see the profiling results, use the command '''gprof''' as
     26{{{
     27user@host>gprof a.out
     28Flat profile:
     29
     30Each sample counts as 0.01 seconds.
     31 %   cumulative   self              self     total           
     32time   seconds   seconds    calls   s/call   s/call  name   
     3365.36     13.17    13.17                             LN12_M2_LOOPgas_1
     34 7.84     14.75     1.58 36240000     0.00     0.00  stokeslet2d_mp_term2_
     35 4.71     15.70     0.95        1     0.95     2.77  stokeslet2d_mp_slet2d_velocity_
     36 4.22     16.55     0.85                             mkl_blas_def_dgemm_copyan
     37 3.82     17.32     0.77                             log.A
     38 2.90     17.91     0.59 36240000     0.00     0.00  stokeslet2d_mp_term1_
     39 1.96     18.30     0.40        2     0.20     0.37  stokeslet2d_mp_slet2d_mkmatrix_
     40You can see most of time spent in "LN12_M2_LOOPgas_1", which is a routine in the Intel math library.
     41}}}
     42See [[http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html|here]] for details.