Task -- a task scheduler for distributed computations ----------------------------------------------------- OVERVIEW -------- A Task network needs two types of instances. One server and a set of clients. The server dispatches commands, typically shell scripts, from a queue maintained locally, and saves the resulting output locally. The clients receive these commands, run them, and send the output back to the server. Clients connect to the server and communicate over TCP/IP. Clients can connect and disconnect dynamically. As a result, commands may be interrupted, in which case the server requeues them, until they are successfully run and completed on a client. Commands too can be added to the queue or removed from it at any moment, yet the output of completed commands will not be removed from the server. There is no buit-in security or authentication of any form. However, by default the server is bound to the loopback interface and gets only connections from localhost. Trust is achieved by running the server on a controlled machine and establishing connections via ssh forwarding. The server is controlled by special clients that send instructions and return immediately upon acknowledgment or completion. RUNNING THE SERVER ------------------ task server [-h host] [-p port] [-A] [-L] -h host Specify hostname to bind to (default: localhost) -p port Specify port to bind to (default: 8000) -A Bind to all the network addresses on this host -L Bind to localhost (default) RUNNING A CLIENT ---------------- task client [-h host] [-p port] [-n processes] [-ws file|-wc command] -h host Specify server hostname (default: localhost) -p port Specify server port (default: 8000) -n processes Specify the maximum number of tasks to be run simultaneously by this client instance (default: number of online CPUs) -ws file -wc command Run 'command' (or the script 'file') and continuously read from its output. Every time the character '0' is read, the client is suspended and the tasks running on it are cancelled. Every time '1' is read, the client signals the server that it is back up and resumes normal operation. CONTROL ------- task add [-h host] [-p port] [-o dir] {-s file|-c command} {-a file|name...} task rm [-h host] [-p port] [-o dir] [-f d|r|p...] [-a file|name...] task exec [-h host] [-p port] {-s file|-c command} task list [-h host] [-p port] [-l] [done] [running] [pending] [d|r|p...] task nodes [-h host] [-p port] -h host Specify server hostname (default: localhost) -p port Specify server port (default: 8000) -o dir Specify the output directory on the server side -s file -c command Specify a command, or a script file containing the command -a file name... Specify a set of tasks by their name, or a file containing the task names. -t Display task durations -l Long form (task start and stop times) -f d|r|p... Select only tasks that have a given status: d|r|p done - successfully completed running - currently running on a client pending - still in the queue CONTROLLING TASKS ----------------- A single command may be used to run several tasks, by specifying several task names. Different tasks sharing a same command are differentiated by the fact that commands are prepended with the string task= before being executed, where is the name of the task. 'task add' queues a series of tasks, all using a single specified command. 'task rm' removes and stops tasks matching the provided server-side output directory, task status, and task name. If one parameter is not provided, any value for that parameter will be matched. Note that if no parameter is provided, for safety reasons, the help is displayed and nothing is performed. Use 'task rm -f drp' to remove all tasks. 'task exec' queues one synchronizing command for every connected client. Each command is synchronizing in that (a) it is started only after all previous normal tasks are done or running (b) it is started on a client that is not executing any other task (c) no later task can be assigned to the same client until it has completed. It is useful, for example to schedule a code update or a recompilation, then immediately start queuing tasks for the updated code. Note that the control 'task exec' is blocking and will wait until the completion of all the tasks it schedules, displaying their output. You can detach it from the terminal using & to avoid blocking. 'task list' shows the content of the server queues. 'task nodes' shows a summary of the connected clients. SSH EXAMPLE ----------- On the server host, we start by starting the dispatcher: (task server -h localhost 2>task.log ) & If we can connect to the clients from the server, we can make use of the following script: hlist="client1.domain client2.domain client3.domain" for h in $hlist; do ssh -R 8000:localhost:8000 user@$h \ "mkdir /tmp/task; cd /tmp/task; task client" & done Instead, if we can connect to the server from the client, then we can run on each client: ssh -L 8000:localhost:8000 user@server & then task client & Note that here we started the server specifying "-h localhost", as the server may otherwise listen only to IPv6 connections (localhost is usually bound to the IPv4 address 127.0.0.1). WATCH EXAMPLE ------------- We can use -ws to interrupt the client whenever a given user (say, root) needs the computer. while true; do sleep 1 if [ -n "`who | grep root`" ]; then echo 0 else echo 1 fi done This way, every time root logs in, the client on the same machine sees it, cancels the running tasks, and stops being available for further computations until root logs out. COMPILATION ----------- 'make' is enough. REQUIREMENTS ------------ libc with _BSD_SOURCE (for gethostname()) and _POSIX_SOURCE (for kill() and getaddrinfo()) available.