wiki:cypress/Python

Version 12 (modified by fuji, 8 years ago) ( diff )

Running Python on Cypress

Python Modules

As of August 18th, 2015 there are three (3) versions of Python available on Cypress

  • Python 2.6.6 is loaded by default for al users
  • Python 2.7.8 is available as part of the Anaconda module
  • Python 2.7.10 is available as a stand alone module

We currently do not have Python 3 installed on Cypress as we have had no requests for Python 3.

Anaconda

As the name implies, Anaconda is a larger version of Python. In addition to Python 2.7.8, Anaconda includes over 300 of the most requested Python packages. This includes

  • NumPy
  • Pandas
  • SciPy
  • Matplotlib
  • IPython.

A complete list of packages available though Anaconda can be found here.

Install Packages to Anaconda Python

See here.

'conda' is the command to add packages to Anaconda python distribution. However, since you don't have a permission in /share/apps/anaconda, it won't work. There is a workaround. Here, for example, we want to add 'vtk' python wrapper library.

conda install vtk

But you will see an error message with an instruction for the workaround. Following the instruction, you have to clone the python environment into your home directory by

conda create -n my_root --clone=/share/apps/anaconda/2/2.5.0

'my_root' is the name of environment that you decide. Then,

source activate my_root

Finally, you can install the package,

conda install vtk

In case you encounter "Error: 'conda' can only be installed into the root environment", try

conda remove conda-build
conda remoce conda-env
conda update -all
conda install vtk

From next time, to use Anaconda python with your environment,

module load anaconda
source activate my_root

You have to state above commands in your script when you run python on batch jobs.

Running Python Interactively

Start an interactive session using idev

[tulaneID@cypress1 pp-1.6.4]$ idev 
Requesting 1 node(s)  task(s) to normal queue of defq partition
1 task(s)/node, 20 cpu(s)/task, 2 MIC device(s)/node
Time: 0 (hr) 60 (min).
Submitted batch job 52311
Seems your requst is pending.
JOBID=52311 begin on cypress01-035
--> Creating interactive terminal session (login) on node cypress01-035.
--> You have 0 (hr) 60 (min).
[tulaneID@cypress01-035 pp-1.6.4]$ 

Load the desired Python module

[tulaneID@cypress01-035 pp-1.6.4]$ module load anaconda
[tulaneID@cypress01-035 pp-1.6.4]$ module list
Currently Loaded Modulefiles:
  1) git/2.4.1           3) idev                5) anaconda/2.1.0
  2) slurm/14.03.0       4) bbcp/amd64_rhel60

Run Python in the command line window

[tulaneID@cypress01-035 pp-1.6.4]$ python
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Aug 21 2014, 18:22:21) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>>

Running a Python script in Batch mode

You can also submit your Python job to the batch nodes (compute nodes) on Cypress. Inside your SLURM script, include a command to load the desired Python module. Then invoke python on your python script.

#!/bin/bash
#SBATCH --qos=normal            # Quality of Service
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python mypythonscript.py

Running a Parallel Python Job

The exact configuration of you parallel job script will depend on the flavor of parallelism that you choose for you Python script.

As an example we will use the Parallel Python (pp) package that we installed above. Parallel Python used the shared memory model of parallelism (analogous to to OpenMP). Let's run the sum of primes example from the Parallel Python website.

We need to communicate the number of cores we wish to use to our script. The syntax here is

python sum_primes.py [ncpus]

We can communicate the SLURM parameters to the script using the appropriate SLURM environment variable.

#!/bin/bash
#SBATCH --qos=normal            # Quality of Service
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=15      # Number of threads per task (OMP threads)

module load anaconda
python sum_primes.py $SLURM_CPUS_PER_TASK

Installing Packages

See here.

Note: See TracWiki for help on using the wiki.