
Code Debugging

Ideally, code should be debuged on your desktop computer before being moved to a cluster environment. There are a number of debugging techniques, which you can learn from the internet.


Insert 'print' into the source code.

in C/C++

 /* check */
#ifdef DEBUG
  if (info == 0) printf("successfully done\n"); 

in Fortran

#ifdef debug
    if (info == 0) then
       print *,"successfully done"

Compile with -DDEBUG option

icc -g -pg -DDEBUG -c stokeslet2d.c


# Stokes Flow in a Cavity
# Makefile
TARGET = ex32s ex32m
CC = icc
SRC_EX32c = ex32.c stokeslet2d.c gmres.c
MKL_SQ_LIBS = -L$(MKLROOT)/lib/intel64/ \
        -I$(MKLROOT)/mkl/include \
        -Wl,--start-group \
        $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
        $(MKLROOT)/lib/intel64/libmkl_sequential.a \
        $(MKLROOT)/lib/intel64/libmkl_core.a \
        -Wl,--end-group \
MKL_MT_LIBS = -L$(MKLROOT)/lib/intel64/ \
        -I$(MKLROOT)/mkl/include \
        -Wl,--start-group \
        $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
        $(MKLROOT)/lib/intel64/libmkl_intel_thread.a \
        $(MKLROOT)/lib/intel64/libmkl_core.a \
        -Wl,--end-group \
        -liomp5 \
OBJ_EX32c = $(SRC_EX32c:.c=.o)
ex32s : $(OBJ_EX32c)
        $(CC) $(CFLAGS) -o $@ $(OBJ_EX32c) $(MKL_SQ_LIBS)
ex32m : $(OBJ_EX32c)
        $(CC) $(CFLAGS) -o $@ $(OBJ_EX32c) $(MKL_MT_LIBS)
%.o : %.c
        $(CC) $(CFLAGS) -c $<
        rm -f *.o $(TARGET)


GDB is the standard debugger.

To debug with GDB, submit an interactive job. See here

Compiling with -g option

 icc -g -pg -DDEBUG -c stokeslet2d.c

run gdb

user@host>gdb ./ex32s 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /ccs-autofs/u01/fuji/LabWork/FlowInCavity/ex32s...done.

show source by command, list line#

(gdb) list 44
39	    printf("Usage:%s [Depth of Cavity]\n",argv[0]);
40	    exit(-1);
41	  }
43	  /* get inputed depth */
44	  dp = atof(argv[1]);
46	  /* # of particles in depth */
47	  numpdepth = (int)(dp / EPSILON + 0.5);

set breakpoint by command, b line#

(gdb) b 47
Breakpoint 1 at 0x4044c2: file ex32.c, line 47.

run [command line option]

(gdb) run 5
Starting program: /ccs-autofs/u01/fuji/LabWork/FlowInCavity/ex32s 1
[Thread debugging using libthread_db enabled]

Breakpoint 1, main (argc=2, argv=0x7fffffffd5c8) at ex32.c:47
47	  numpdepth = (int)(dp / EPSILON + 0.5);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.3.x86_64

print values

(gdb) p dp
$1 = 5
(gdb) p numpdepth
$2 = 0

continue one step

(gdb) next
50	  numpwidth = (int)(1.0 / EPSILON + 0.5);
(gdb) p numpdepth
$3 = 1000


(gdb) quit
A debugging session is active.

	Inferior 1 [process 6222] will be killed.

Quit anyway? (y or n) y


Valgrind tools can detect many memory management and threading bugs, and profile your programs in detail.

Detect Invalid Access

Example code: (this code has a bug)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char * foo() {
  char a[200];
  strcpy(a, "hello world cup\n");
  return a;

int main() {
  char * a = foo();
  char c = a[0];
  printf("a[0] = %c\n", c);
  printf("a = %s\n", a);
  return 0;

Start an interactive session,

[fuji@cypress2 ~]$ idev -c 1 --gres=mic:0
Requesting 1 node(s)  task(s) to workshop queue of workshop partition
1 task(s)/node, 1 cpu(s)/task, mic:0 MIC device(s)/node
Time: 0 (hr) 60 (min).
Submitted batch job 52605
JOBID=52605 begin on cypress01-089
--> Creating interactive terminal session (login) on node cypress01-089.
--> You have 0 (hr) 60 (min).
Last login: Wed Aug 19 21:05:45 2015 from
[fuji@cypress01-089 ~]$

compile and run,

[fuji@cypress01-089 ~]$ module load intel-psxe/2015-update1
[fuji@cypress01-089 ~]$ icc off_stack.c
off_stack.c(8): warning #1251: returning pointer to local variable
    return a;

[fuji@cypress01-089 ~]$ ./a.out
a[0] = h
a = hello world cup
[fuji@cypress01-089 ~]$ icc -O0 -g off_stack.c
off_stack.c(8): warning #1251: returning pointer to local variable
    return a;

[fuji@cypress01-089 ~]$ valgrind ./a.out
==33367== Memcheck, a memory error detector
==33367== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==33367== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==33367== Command: ./a.out
==33367== Invalid read of size 1
==33367==    at 0x4005C5: main (off_stack.c:13)
==33367==  Address 0x7feffdd50 is just below the stack ptr.  To suppress, use: --workaround-gcc296-bugs=yes
a[0] = h
a =
==33367== HEAP SUMMARY:
==33367==     in use at exit: 0 bytes in 0 blocks
==33367==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==33367== All heap blocks were freed -- no leaks are possible
==33367== For counts of detected and suppressed errors, rerun with: -v
==33367== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
[fuji@cypress01-089 ~]$

Detect Uninitialized Data Access

Example code: (this code has a bug)

#include <stdio.h>
#include <stdlib.h>

int main() {
  double * p = malloc(sizeof(double) * 10);
  if (p[0] < 1) {
    printf("p[0] < 1\n");
  } else {
    printf("p[1] >= 1\n");
  return 0;
[fuji@cypress01-089 Valgrind]$ icc uninit.c
[fuji@cypress01-089 Valgrind]$ ./a.out
p[0] < 1
[fuji@cypress01-089 Valgrind]$ icc -O0 -g uninit.c
[fuji@cypress01-089 Valgrind]$ valgrind ./a.out
==34643== Memcheck, a memory error detector
==34643== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==34643== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==34643== Command: ./a.out
==34643== Conditional jump or move depends on uninitialised value(s)
==34643==    at 0x4005A9: main (uninit.c:6)
==34643== Conditional jump or move depends on uninitialised value(s)
==34643==    at 0x4005AB: main (uninit.c:6)
p[0] < 1
==34643== HEAP SUMMARY:
==34643==     in use at exit: 80 bytes in 1 blocks
==34643==   total heap usage: 1 allocs, 0 frees, 80 bytes allocated
==34643== LEAK SUMMARY:
==34643==    definitely lost: 80 bytes in 1 blocks
==34643==    indirectly lost: 0 bytes in 0 blocks
==34643==      possibly lost: 0 bytes in 0 blocks
==34643==    still reachable: 0 bytes in 0 blocks
==34643==         suppressed: 0 bytes in 0 blocks
==34643== Rerun with --leak-check=full to see details of leaked memory
==34643== For counts of detected and suppressed errors, rerun with: -v
==34643== Use --track-origins=yes to see where uninitialised values come from
==34643== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)

Detect Memory Leaks

Example code: (this code has a bug)

#include <iostream>
#include <cstring>

char * foo() {
  char *a = new char[200];
  std::strcpy(a, "hello workshop");
  return a;

int main() {
  char * a = foo();
  char * b = foo();
  std::cout << "a = " <<  a << std::endl;
  std::cout << "b = " <<  b << std::endl;
  return 0;
[fuji@cypress1 TestCodes]$ icpc -g mleak.cpp
[fuji@cypress1 TestCodes]$ valgrind --leak-check=full ./a.out
==10272== Memcheck, a memory error detector
==10272== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==10272== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==10272== Command: ./a.out
a = hello workshop
b = hello workshop
==10272== HEAP SUMMARY:
==10272==     in use at exit: 400 bytes in 2 blocks
==10272==   total heap usage: 2 allocs, 0 frees, 400 bytes allocated
==10272== 200 bytes in 1 blocks are definitely lost in loss record 1 of 2
==10272==    at 0x4C28192: operator new[](unsigned long) (vg_replace_malloc.c:363)
==10272==    by 0x4009D8: foo() (mleak.cpp:5)
==10272==    by 0x400A0F: main (mleak.cpp:11)
==10272== 200 bytes in 1 blocks are definitely lost in loss record 2 of 2
==10272==    at 0x4C28192: operator new[](unsigned long) (vg_replace_malloc.c:363)
==10272==    by 0x4009D8: foo() (mleak.cpp:5)
==10272==    by 0x400A20: main (mleak.cpp:12)
==10272== LEAK SUMMARY:
==10272==    definitely lost: 400 bytes in 2 blocks
==10272==    indirectly lost: 0 bytes in 0 blocks
==10272==      possibly lost: 0 bytes in 0 blocks
==10272==    still reachable: 0 bytes in 0 blocks
==10272==         suppressed: 0 bytes in 0 blocks
==10272== For counts of detected and suppressed errors, rerun with: -v
==10272== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)

Intel® Inspector XE

Memory and Thread Debugger:

  • Debug memory errors like leaks and allocation errors and threading errors like data races and deadlocks.

Setting Environment and Compiling your code

Load module to setup Intel compilers and tools.

[fuji@cypress1 ~]$ module load intel-psxe/2015-update1

Compiling codes with '-g' option to tells the compiler to generate full debugging information in the object file.

[fuji@cypress1 ~]$ icc -g -o mytest mytest.c

Run and Collect Information

Start an interactive job,

[fuji@cypress1 ~]$ idev

To collect information, run the code, for example,

[fuji@cypress1 ~]$ inspxe-cl -collect=mi2 -app-working-dir=$PWD -result-dir=$PWD/results $PWD/mytest

-collect= options

Memory error analysis types

mi1 Detect memory leaks
mi2 Detect memory leaks and memory access problems
mi3 Find locations of memory leaks and memory access problems

Threading error analysis_types

ti1 Detect deadlocks
ti2 Detect deadlocks and data races
ti3 Find locations of deadlocks and data races

To show results, for example,

[fuji@cypress1 ~]$ inspxe-cl -R problems -r $PWD/results

See here for details.

Inspector Brief Tutorial

Intel® Advisor XE

Threading design and prototyping tool for software architects:

  • Analyze, design, tune and check your threading design before implementation
  • Explore and test threading options without disrupting normal development
  • Predict threading errors & performance scaling on systems with more cores


Survey the application to determine hotspots. Typically an optimized

(non-debug) version of the application is used when surveying an application.

Run and Collect info.

$ icc -g -O3 mycode.c
$ advixe-cl --collect survey --project-dir ./advi ./a.out

Show report

$ advixe-cl --report survey --project-dir ./advi ./a.out

Add Annotations

Add annotations to the application source code, and rebuild the application.

Please see the Getting Started Tutorial for more information.

For C/C++

#include "advisor-annotate.h"
    for ( .... 


use advisor_annotate
call annotate_site_begin(sitename1)
do .....
    call annotate_task_begin(taskname1)
    call annotate_task_end()
call annotate_site_end()


Collect suitability data. Note that annotations must be present in the source

code for this collection to be successful. Typically an optimized (non-debug) version of the application is used when collecting suitability data.

$ icc -g -O3 mycode.c -I $ADVISOR_XE_2015_DIR/include
$ advixe-cl --collect suitability --project-dir ./advi ./a.out
$ advixe-cl --report suitability --project-dir ./advi ./a.out


Collect correctness data. Note that annotations must be present in the source

code for this collection to be successful. Typically an application with debug symbols is used when collecting correctness data.

$  icc -g -O0 mycode.c
$ advixe-cl --collect correctness --project-dir ./advi ./a.out
$ advixe-cl --report correctness --project-dir ./advi ./a.out

Display a list of annotations present.

advixe-cl --report annotations --project-dir ./advi ./a.out

Update the application using the chosen parallel coding constructs. Rebuild the application and test.

Advisor Brief Tutorial

Intel® VTune™ Amplifier 2015

  • Intuitive CPU & GPU performance tuning, multi-core scalability, bandwidth and more
  • Quick performance insight with advanced data visualization
  • Automate regression tests and collect data remotely

Compiling codes with '-g' option to tells the compiler to generate full debugging information in the object file.

[fuji@cypress1 ~]$ icc -g -o mytest mytest.c

Run and Collect Information

Start an interactive job,

[fuji@cypress1 ~]$ idev

To collect information, run the code, for example,

[fuji@cypress1 ~]$ amplxe-cl -collect hotspot ./mytest

This will create a directory like r000hs.

-collect options

concurrency Concurrency analysis
hotspots Hotspots analysis
lightweight-hotspots Lightweight Hotspots analysis
locksandwaits Locks and Waits analysis

To show results, for example,

[fuji@cypress1 ~]$ amplxe-cl -report hotspot -r r000hs

-report options

summary Display data for the overall performance of the target.
hotspots Display functions with the highest CPU time.
wait-time Display Wait time.
perf Display performance data for each module of the target.
perf-detail Display performance data for each function of the target.
callstacks Display CPU or Wait time for call stacks.
top-down Display a call tree for your target application and provide CPU and Wait time for each function.
gprof-cc Display CPU or wait time in the gprof-like format.

VTune Brief Tutorial

Last modified 10 years ago Last modified on 08/20/15 21:45:29
Note: See TracWiki for help on using the wiki.