A script for running processes in parallel in Bash
--------------------------------------------------------------------------------------------------------------
In Bash you can start new processes (theads) on the background simply by running a command with ampersand &. The
wait
command can be used to wait until all background processes have finished (to wait for a certain process do wait PID
where PID
is a process ID). So here’s a simple pseudocode for parallel processing:
1
2
3
4
5
6
7
8
| for ARG in $*; do command $ARG & NPROC =$(( $NPROC + 1 )) if [ "$NPROC" -ge 4 ]; then wait NPROC = 0 fi done |
I.e. you run 4 processes at a time and wait until all of them have finished before executing the next four. This is a sufficient solution if all of the processes take equally long to finish. However this is suboptimal if running time of the processes vary a lot.
A better solution is to track the process IDs and poll if all of them are still running. In Bash
$!
returns the ID of last initiated background process. If a process is running, the corresponding PID is found in directory /proc/
.
Based on the ideas given in a Ubuntu forum thread and a template on command line parsing, I wrote a simple script “
parallel
” that allows you to run virtually any simple command concurrently.
Assume that you have a program
The script takes care of dividing the task. Obviously
If you need command line options, use quotes to separate the command from the variable arguments, e.g.
Furthermore,
I.e. this executes
proc
and you want to run something like proc *.jpg
using three concurrent processes. Then simply do
parallel -j 3 proc *.jpg
The script takes care of dividing the task. Obviously
-j 3
stands for three simultaneous jobs.If you need command line options, use quotes to separate the command from the variable arguments, e.g.
parallel -j 3 "proc -r -A=40" *.jpg
Furthermore,
-r
allows even more sophisticated commands by replacing asterisks in the command string by the argument:
parallel -j 6 -r "convert -scale 50% * small/small_*" *.jpg
I.e. this executes
convert -scale 50% file1.jpg small/small_file1.jpg
for all the jpg files. This is a real-life example for scaling down images by 50% (requires imagemagick).
Finally, here’s the script. It can be easily manipulated to handle different jobs, too. Just write your command between
#DEFINE COMMAND
and #DEFINE COMMAND END
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
| #!/bin/bash NUM = 0 QUEUE = "" MAX_NPROC = 2 # default REPLACE_CMD = 0 # no replacement by default USAGE =" A simple wrapper for running processes in parallel. Usage: `basename $0 ` [-h] [-r] [-j nb_jobs] command arg_list -h Shows this help -r Replace asterix * in the command string with argument -j nb_jobs Set number of simultanious jobs [ 2 ] Examples: `basename $0 ` somecommand arg1 arg2 arg3 `basename $0 ` -j 3 \"somecommand -r -p\" arg1 arg2 arg3 `basename $0 ` -j 6 -r \ "convert -scale 50% * small/small_*\" *.jpg" function queue { QUEUE = "$QUEUE $1" NUM =$(( $NUM + 1 )) } function regeneratequeue { OLDREQUEUE = $QUEUE QUEUE = "" NUM = 0 for PID in $OLDREQUEUE do if [ -d /proc/ $PID ] ; then QUEUE = "$QUEUE $PID" NUM =$(( $NUM + 1 )) fi done } function checkqueue { OLDCHQUEUE = $QUEUE for PID in $OLDCHQUEUE do if [ ! -d /proc/ $PID ] ; then regeneratequeue # at least one PID has finished break fi done } # parse command line if [ $ # -eq 0 ]; then # must be at least one arg echo "$USAGE" >& 2 exit 1 fi while getopts j :rh OPT ; do # "j:" waits for an argument "h" doesnt case $OPT in h) echo "$USAGE" exit 0 ;; j) MAX_NPROC = $OPTARG ;; r) REPLACE_CMD = 1 ;; \?) # getopts issues an error message echo "$USAGE" >& 2 exit 1 ;; esac done # Main program echo Using $MAX_NPROC parallel threads shift `expr $OPTIND - 1 ` # shift input args, ignore processed args COMMAND = $1 shift for INS in $* # for the rest of the arguments do # DEFINE COMMAND if [ $REPLACE_CMD -eq 1 ]; then CMD =${ COMMAND // "*" / $INS } else CMD = "$COMMAND $INS" #append args fi echo "Running $CMD" $CMD & # DEFINE COMMAND END PID =$! queue $PID while [ $NUM -ge $MAX_NPROC ]; do checkqueue sleep 0 . 4 done done wait # wait for all processes to finish before exit |
content from : http://pebblesinthesand.wordpress.com/
No comments:
Post a Comment