Code Debugging
Ideally, code should be debuged on your desktop computer before being moved to a cluster environment. There are a number of debugging techniques, which you can learn from the internet.
Insert 'print' into the source code.
in C/C++
/* check */ #ifdef DEBUG if (info == 0) printf("successfully done\n"); #endif
in Fortran
#ifdef debug if (info == 0) then print *,"successfully done" endif #endif
Compile with -DDEBUG option
icc -g -pg -DDEBUG -c stokeslet2d.c
Makefile
# # CCS WORKSHOP # Stokes Flow in a Cavity # # Makefile # # TARGET = ex32s ex32m # ALL: $(TARGET) # CC = icc #CFLAGS = -O3 CFLAGS = -g -pg -DDEBUG # # # SRC_EX32c = ex32.c stokeslet2d.c gmres.c # # MKL_SQ_LIBS = -L$(MKLROOT)/lib/intel64/ \ -I$(MKLROOT)/mkl/include \ -Wl,--start-group \ $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \ $(MKLROOT)/lib/intel64/libmkl_sequential.a \ $(MKLROOT)/lib/intel64/libmkl_core.a \ -Wl,--end-group \ -lpthread # MKL_MT_LIBS = -L$(MKLROOT)/lib/intel64/ \ -I$(MKLROOT)/mkl/include \ -Wl,--start-group \ $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \ $(MKLROOT)/lib/intel64/libmkl_intel_thread.a \ $(MKLROOT)/lib/intel64/libmkl_core.a \ -Wl,--end-group \ -liomp5 \ -lpthread # # # OBJ_EX32c = $(SRC_EX32c:.c=.o) # # ex32s : $(OBJ_EX32c) $(CC) $(CFLAGS) -o $@ $(OBJ_EX32c) $(MKL_SQ_LIBS) ex32m : $(OBJ_EX32c) $(CC) $(CFLAGS) -o $@ $(OBJ_EX32c) $(MKL_MT_LIBS) # # %.o : %.c $(CC) $(CFLAGS) -c $< # clean: rm -f *.o $(TARGET)
GDB
GDB is the standard debugger. http://www.gnu.org/software/gdb/documentation/
To debug with GDB, submit an interactive job. See here
Compiling with -g option
icc -g -pg -DDEBUG -c stokeslet2d.c
run gdb
user@host>gdb ./ex32s GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /ccs-autofs/u01/fuji/LabWork/FlowInCavity/ex32s...done. (gdb)
show source by command, list line#
(gdb) list 44 39 printf("Usage:%s [Depth of Cavity]\n",argv[0]); 40 exit(-1); 41 } 42 43 /* get inputed depth */ 44 dp = atof(argv[1]); 45 46 /* # of particles in depth */ 47 numpdepth = (int)(dp / EPSILON + 0.5); 48 (gdb)
set breakpoint by command, b line#
(gdb) b 47 Breakpoint 1 at 0x4044c2: file ex32.c, line 47. (gdb)
run [command line option]
(gdb) run 5 Starting program: /ccs-autofs/u01/fuji/LabWork/FlowInCavity/ex32s 1 [Thread debugging using libthread_db enabled] Breakpoint 1, main (argc=2, argv=0x7fffffffd5c8) at ex32.c:47 47 numpdepth = (int)(dp / EPSILON + 0.5); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.3.x86_64 (gdb)
print values
(gdb) p dp $1 = 5 (gdb) p numpdepth $2 = 0 (gdb)
continue one step
(gdb) next 50 numpwidth = (int)(1.0 / EPSILON + 0.5); (gdb) p numpdepth $3 = 1000 (gdb)
exit
(gdb) quit A debugging session is active. Inferior 1 [process 6222] will be killed. Quit anyway? (y or n) y
Valgrind
Valgrind tools can detect many memory management and threading bugs, and profile your programs in detail.
Detect Invalid Access
Example code: (this code has a bug)
#include <stdio.h> #include <stdlib.h> #include <string.h> char * foo() { char a[200]; strcpy(a, "hello world cup\n"); return a; } int main() { char * a = foo(); char c = a[0]; printf("a[0] = %c\n", c); printf("a = %s\n", a); return 0; }
Start an interactive session,
[fuji@cypress2 ~]$ idev -c 1 --gres=mic:0 Requesting 1 node(s) task(s) to workshop queue of workshop partition 1 task(s)/node, 1 cpu(s)/task, mic:0 MIC device(s)/node Time: 0 (hr) 60 (min). Submitted batch job 52605 JOBID=52605 begin on cypress01-089 --> Creating interactive terminal session (login) on node cypress01-089. --> You have 0 (hr) 60 (min). Last login: Wed Aug 19 21:05:45 2015 from cypress2.cm.cluster [fuji@cypress01-089 ~]$
compile and run,
[fuji@cypress01-089 ~]$ module load intel-psxe/2015-update1 [fuji@cypress01-089 ~]$ icc off_stack.c off_stack.c(8): warning #1251: returning pointer to local variable return a; ^ [fuji@cypress01-089 ~]$ ./a.out a[0] = h a = hello world cup
[fuji@cypress01-089 ~]$ icc -O0 -g off_stack.c off_stack.c(8): warning #1251: returning pointer to local variable return a; ^ [fuji@cypress01-089 ~]$ valgrind ./a.out ==33367== Memcheck, a memory error detector ==33367== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==33367== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==33367== Command: ./a.out ==33367== ==33367== Invalid read of size 1 ==33367== at 0x4005C5: main (off_stack.c:13) ==33367== Address 0x7feffdd50 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes ==33367== a[0] = h a = ==33367== ==33367== HEAP SUMMARY: ==33367== in use at exit: 0 bytes in 0 blocks ==33367== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==33367== ==33367== All heap blocks were freed -- no leaks are possible ==33367== ==33367== For counts of detected and suppressed errors, rerun with: -v ==33367== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) [fuji@cypress01-089 ~]$
Detect Uninitialized Data Access
Example code: (this code has a bug)
#include <stdio.h> #include <stdlib.h> int main() { double * p = malloc(sizeof(double) * 10); if (p[0] < 1) { printf("p[0] < 1\n"); } else { printf("p[1] >= 1\n"); } return 0; }
[fuji@cypress01-089 Valgrind]$ icc uninit.c [fuji@cypress01-089 Valgrind]$ ./a.out p[0] < 1 [fuji@cypress01-089 Valgrind]$ icc -O0 -g uninit.c [fuji@cypress01-089 Valgrind]$ valgrind ./a.out ==34643== Memcheck, a memory error detector ==34643== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==34643== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==34643== Command: ./a.out ==34643== ==34643== Conditional jump or move depends on uninitialised value(s) ==34643== at 0x4005A9: main (uninit.c:6) ==34643== ==34643== Conditional jump or move depends on uninitialised value(s) ==34643== at 0x4005AB: main (uninit.c:6) ==34643== p[0] < 1 ==34643== ==34643== HEAP SUMMARY: ==34643== in use at exit: 80 bytes in 1 blocks ==34643== total heap usage: 1 allocs, 0 frees, 80 bytes allocated ==34643== ==34643== LEAK SUMMARY: ==34643== definitely lost: 80 bytes in 1 blocks ==34643== indirectly lost: 0 bytes in 0 blocks ==34643== possibly lost: 0 bytes in 0 blocks ==34643== still reachable: 0 bytes in 0 blocks ==34643== suppressed: 0 bytes in 0 blocks ==34643== Rerun with --leak-check=full to see details of leaked memory ==34643== ==34643== For counts of detected and suppressed errors, rerun with: -v ==34643== Use --track-origins=yes to see where uninitialised values come from ==34643== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)
Detect Memory Leaks
Example code: (this code has a bug)
#include <iostream> #include <cstring> char * foo() { char *a = new char[200]; std::strcpy(a, "hello workshop"); return a; } int main() { char * a = foo(); char * b = foo(); std::cout << "a = " << a << std::endl; std::cout << "b = " << b << std::endl; return 0; }
[fuji@cypress1 TestCodes]$ icpc -g mleak.cpp [fuji@cypress1 TestCodes]$ valgrind --leak-check=full ./a.out ==10272== Memcheck, a memory error detector ==10272== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==10272== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==10272== Command: ./a.out ==10272== a = hello workshop b = hello workshop ==10272== ==10272== HEAP SUMMARY: ==10272== in use at exit: 400 bytes in 2 blocks ==10272== total heap usage: 2 allocs, 0 frees, 400 bytes allocated ==10272== ==10272== 200 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==10272== at 0x4C28192: operator new[](unsigned long) (vg_replace_malloc.c:363) ==10272== by 0x4009D8: foo() (mleak.cpp:5) ==10272== by 0x400A0F: main (mleak.cpp:11) ==10272== ==10272== 200 bytes in 1 blocks are definitely lost in loss record 2 of 2 ==10272== at 0x4C28192: operator new[](unsigned long) (vg_replace_malloc.c:363) ==10272== by 0x4009D8: foo() (mleak.cpp:5) ==10272== by 0x400A20: main (mleak.cpp:12) ==10272== ==10272== LEAK SUMMARY: ==10272== definitely lost: 400 bytes in 2 blocks ==10272== indirectly lost: 0 bytes in 0 blocks ==10272== possibly lost: 0 bytes in 0 blocks ==10272== still reachable: 0 bytes in 0 blocks ==10272== suppressed: 0 bytes in 0 blocks ==10272== ==10272== For counts of detected and suppressed errors, rerun with: -v ==10272== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)
Intel® Inspector XE
Memory and Thread Debugger:
- Debug memory errors like leaks and allocation errors and threading errors like data races and deadlocks.
Setting Environment and Compiling your code
Load module to setup Intel compilers and tools.
[fuji@cypress1 ~]$ module load intel-psxe/2015-update1
Compiling codes with '-g' option to tells the compiler to generate full debugging information in the object file.
[fuji@cypress1 ~]$ icc -g -o mytest mytest.c
Run and Collect Information
Start an interactive job,
[fuji@cypress1 ~]$ idev
To collect information, run the code, for example,
[fuji@cypress1 ~]$ inspxe-cl -collect=mi2 -app-working-dir=$PWD -result-dir=$PWD/results $PWD/mytest
-collect= options
Memory error analysis types
mi1 | Detect memory leaks |
---|---|
mi2 | Detect memory leaks and memory access problems |
mi3 | Find locations of memory leaks and memory access problems |
Threading error analysis_types
ti1 | Detect deadlocks |
---|---|
ti2 | Detect deadlocks and data races |
ti3 | Find locations of deadlocks and data races |
To show results, for example,
[fuji@cypress1 ~]$ inspxe-cl -R problems -r $PWD/results
See here for details.
Intel® Advisor XE
Threading design and prototyping tool for software architects:
- Analyze, design, tune and check your threading design before implementation
- Explore and test threading options without disrupting normal development
- Predict threading errors & performance scaling on systems with more cores
Survey
Survey the application to determine hotspots. Typically an optimized
(non-debug) version of the application is used when surveying an application.
Run and Collect info.
$ icc -g -O3 mycode.c $ advixe-cl --collect survey --project-dir ./advi ./a.out
Show report
$ advixe-cl --report survey --project-dir ./advi ./a.out
Add Annotations
Add annotations to the application source code, and rebuild the application.
Please see the Getting Started Tutorial for more information.
For C/C++
#include "advisor-annotate.h" ..... ANNOTATE_SITE_BEGIN(sitename1); for ( .... { ANNOTATE_TASK_BEGIN(taskname1); ... ANNOTATE_TASK_END(); } ANNOTATE_SITE_END();
Fortran
use advisor_annotate ..... call annotate_site_begin(sitename1) do ..... call annotate_task_begin(taskname1) .... call annotate_task_end() enddo call annotate_site_end()
Suitability
Collect suitability data. Note that annotations must be present in the source
code for this collection to be successful. Typically an optimized (non-debug) version of the application is used when collecting suitability data.
$ icc -g -O3 mycode.c -I $ADVISOR_XE_2015_DIR/include
$ advixe-cl --collect suitability --project-dir ./advi ./a.out
$ advixe-cl --report suitability --project-dir ./advi ./a.out
Correctness
Collect correctness data. Note that annotations must be present in the source
code for this collection to be successful. Typically an application with debug symbols is used when collecting correctness data.
$ icc -g -O0 mycode.c $ advixe-cl --collect correctness --project-dir ./advi ./a.out
$ advixe-cl --report correctness --project-dir ./advi ./a.out
Display a list of annotations present.
advixe-cl --report annotations --project-dir ./advi ./a.out
Update the application using the chosen parallel coding constructs. Rebuild the application and test.
Intel® VTune™ Amplifier 2015
- Intuitive CPU & GPU performance tuning, multi-core scalability, bandwidth and more
- Quick performance insight with advanced data visualization
- Automate regression tests and collect data remotely
Compiling codes with '-g' option to tells the compiler to generate full debugging information in the object file.
[fuji@cypress1 ~]$ icc -g -o mytest mytest.c
Run and Collect Information
Start an interactive job,
[fuji@cypress1 ~]$ idev
To collect information, run the code, for example,
[fuji@cypress1 ~]$ amplxe-cl -collect hotspot ./mytest
This will create a directory like r000hs.
-collect options
concurrency | Concurrency analysis |
---|---|
hotspots | Hotspots analysis |
lightweight-hotspots | Lightweight Hotspots analysis |
locksandwaits | Locks and Waits analysis |
To show results, for example,
[fuji@cypress1 ~]$ amplxe-cl -report hotspot -r r000hs
-report options
summary | Display data for the overall performance of the target. |
---|---|
hotspots | Display functions with the highest CPU time. |
wait-time | Display Wait time. |
perf | Display performance data for each module of the target. |
perf-detail | Display performance data for each function of the target. |
callstacks | Display CPU or Wait time for call stacks. |
top-down | Display a call tree for your target application and provide CPU and Wait time for each function. |
gprof-cc | Display CPU or wait time in the gprof-like format. |