= Code Profiling = To use [[http://en.wikipedia.org/wiki/Gprof|gprof]], compile your code with ''-pg'' option. For example, when compiling '''ex32.c''' with Intel MKL, {{{ user@host> ifort -pg ex32.f90 stokeslet2d.f90 -L$MKLROOT//lib/intel64/ \ -I$MKLROOT/mkl/include \ -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a \ $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a \ -Wl,--end-group -lpthread }}} then, run it to generate the profiling information, {{{ user@host>./a.out 5 Total # of Particles= 2400 setting matrix 0.518922030925751 (sec) Sovle linear ststem 15.6426219940186 (sec) |b - Ax|= 4.422628714702772E-013 Compute internal velocity 3.97039604187012 (sec) }}} When it finishes, '''gmon.out''' is created. {{{ user@host>ls a.out ex32.f90 ex32.o gmon.out gmres.f90~ particle.dat res.dat stokeslet2d_dist.c stokeslet2d.f90 stokeslet2d.h stokeslet2d.o ex32.c ex32.f90~ ex43.c gmres.c gmres.h README.txt stokeslet2d.c stokeslet2d_dist.h stokeslet2d.f90~ stokeslet2d.mod }}} To see the profiling results, use the command '''gprof''' as {{{ user@host>gprof a.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 65.36 13.17 13.17 LN12_M2_LOOPgas_1 7.84 14.75 1.58 36240000 0.00 0.00 stokeslet2d_mp_term2_ 4.71 15.70 0.95 1 0.95 2.77 stokeslet2d_mp_slet2d_velocity_ 4.22 16.55 0.85 mkl_blas_def_dgemm_copyan 3.82 17.32 0.77 log.A 2.90 17.91 0.59 36240000 0.00 0.00 stokeslet2d_mp_term1_ 1.96 18.30 0.40 2 0.20 0.37 stokeslet2d_mp_slet2d_mkmatrix_ You can see most of time spent in "LN12_M2_LOOPgas_1", which is a routine in the Intel math library. }}} See [[http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html|here]] for details.