Version 1 (modified by 9 years ago) ( diff ) | ,
---|
Code Profiling
To use gprof, compile your code with -pg option. For example, when compiling ex32.c with Intel MKL,
user@host> ifort -pg ex32.f90 stokeslet2d.f90 -L$MKLROOT//lib/intel64/ \ -I$MKLROOT/mkl/include \ -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a \ $MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_core.a \ -Wl,--end-group -lpthread
then, run it to generate the profiling information,
user@host>./a.out 5 Total # of Particles= 2400 setting matrix 0.518922030925751 (sec) Sovle linear ststem 15.6426219940186 (sec) |b - Ax|= 4.422628714702772E-013 Compute internal velocity 3.97039604187012 (sec)
When it finishes, gmon.out is created.
user@host>ls a.out ex32.f90 ex32.o gmon.out gmres.f90~ particle.dat res.dat stokeslet2d_dist.c stokeslet2d.f90 stokeslet2d.h stokeslet2d.o ex32.c ex32.f90~ ex43.c gmres.c gmres.h README.txt stokeslet2d.c stokeslet2d_dist.h stokeslet2d.f90~ stokeslet2d.mod
To see the profiling results, use the command gprof as
user@host>gprof a.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 65.36 13.17 13.17 LN12_M2_LOOPgas_1 7.84 14.75 1.58 36240000 0.00 0.00 stokeslet2d_mp_term2_ 4.71 15.70 0.95 1 0.95 2.77 stokeslet2d_mp_slet2d_velocity_ 4.22 16.55 0.85 mkl_blas_def_dgemm_copyan 3.82 17.32 0.77 log.A 2.90 17.91 0.59 36240000 0.00 0.00 stokeslet2d_mp_term1_ 1.96 18.30 0.40 2 0.20 0.37 stokeslet2d_mp_slet2d_mkmatrix_ You can see most of time spent in "LN12_M2_LOOPgas_1", which is a routine in the Intel math library.
See here for details.
Note:
See TracWiki
for help on using the wiki.