You are here: CCLS Collaboration Website >  Cluster Web  >  DellPowerEdgeCluster > LavaSchedulerInformation

Platform Lava Man Pages

Platform Computing external provides a commercially-supported distribution of the popular NPACI Rocks external cluster management software, in a similar way that RedHat Linux external provides a commercially-supported distribution of the popular Linux operating system. Job scheduling systems are software systems which efficiently distribute computing tasks to the computing nodes of a cluster. Platform Lava external is Platform Computing's job scheduling system.

Most-Used Commands At-A-Glance

I want to... command
See what jobs are running bjobs
See what jobs have completed bhist
Submit a job to the cluster bsub
Suspend a job bstop
Resume a job bresume
Delete a job bkill
See the compute nodes status bhosts

Examples

Here are a few of the most common use cases.

Submitting a Few Jobs With E-mail Notification

Use this option when you are submitting a few (less than 100) jobs which should be processed relatively quickly.

bsub [-B -N -u <my_email@sfsu.edu>] [-o <my_output_file>] [-e <my_error_file>] [-q short] <my_command> [<my_command_options>]

  • B tells Lava to send you e-mail when your job Begins
  • N tells Lava to send you e-mail when your job eNds, and also sends you the output
  • o tells Lava to save the output (from standard output) to the given file; be sure to use an absolute path (for example: /home/mikewong/my_output_file.txt )
  • e tells Lava to save the error messages (from standard error) to the given file; again, be sure to use an absolute path.
  • q tells Lava to put the job into the given queue. The short queue has high priority, normal queue (the default) has moderate priority, and the idle queue has low priority.

Submitting Many Jobs with a Single E-main Notification on Completion

Use this option when you are submitting many (100 or more) jobs which should be processed when the resources become available. Be sure to contact Mike Wong before submitting many jobs. As mentioned in the CCLS Computing Resource Use Policy, Processes that lock computing resources or otherwise prevent equitable distribution of computing cycles to all users will be identified and owners notified immediately for action. Such usage is strongly discouraged, and aforementioned processes may be terminated without notice, at the discretion of the CCLS staff.

bsub [-o <my_output_file>] [-e <my_error_file>] <my_command> [<my_command_options>]

  • o tells Lava to save the output (from standard output) to the given file; be sure to use an absolute path (for example: /home/mikewong/my_output_file.txt )
  • e tells Lava to save the error messages (from standard error) to the given file; again, be sure to use an absolute path.

Sample Perl Code to Submit Many Jobs

A job is any process you'd normally type in at the command line. For example, a blast search (tblastn amel feature01.fst) would be a job. Submitting many jobs is best handled with a script. Because of the limits of the cluster, you will want to ideally submit 2500 jobs or less; any more will pose problems for others to use the cluster simultaneously.

Because each research has different requirements, most solutions will need to be customized to your research. Contact Mike Wong to request assistance in creating your script. A generic solution is outlined below.

pl Perl Script to submit many jobs using the application mycommand over several different input files (all ending with .job)
use Cwd;
use Proc::Daemon;

our $cwd= get_cwd();

Proc::Daemon::Init(); # Runs this script as a stand-alone background process, so you can log out

# Find all the *.job files in the directory
my @jobs = map { chomp; $_ } `ls $cwd/*.job`;
mkdir "log";
mkdir "results";
open LOG, ">$cwd/log/submission.log";
foreach my $jobfile (@jobs) {
  my $command = "bsub -u myemail@sfsu.edu -N -o $cwd/results/$jobfile.out -e $cwd/log/$jobfile.log mycommand $jobfile";
  print LOG scalar( localtime ) . "\t$command\n";
  `$command`;
  sleep( 1 );
}
close LOG;

Dell Whitepapers

Platform Lava Command-Line Interface

  • pdf Platform Lava User's Guide
  • Man pages
    • bbot - Reduce the priority of a job
    • bhist - See the historical progress of a job
    • bjobs - See what jobs are running on the cluster
    • bkill - Delete a job
    • bqueues - See a list of queues and status for each queue
    • brequeue - Put a job back into a queue
    • bresume - Resume a suspended job
    • brun - Force a pending job to run
    • bstop - Suspend a job
    • bsub - Submit a job to the cluster
    • bswitch - Change the priority of a job
    • btop - Increase the priority of a job


Back to top arrowbup

bbot(1)								      bbot(1)



NAME
       bbot  - moves a pending job relative to the last job in the queue


SYNOPSIS




       bbot job_ID | job_ID [position]


       bbot [-h | -V]


DESCRIPTION




       Changes	the  queue  position of a pending job, or a pending job array
       element, to affect the order in which jobs  are	considered  for	 dis-
       patch.


       By  default,  Lava  dispatches jobs in a queue in the order of arrival
       (that is, first-come-first-served), subject to availability  of	suit-
       able server hosts.


       The  bbot  command allows users and the Lava administrator to manually
       change the order in which jobs are considered for dispatch. Users  can
       only  operate  on  their	 own jobs, whereas the Lava administrator can
       operate on any user's jobs. Users can only change the  relative	posi-
       tion of their own jobs.


       If  invoked  by	the  Lava  administrator, bbot moves the selected job
       after the last job with the same priority submitted to the queue.  The
       positions  of  all users' jobs in the queue can be changed by the Lava
       administrator.


       If invoked by a regular user, bbot moves the selected  job  after  the
       last job with the same priority submitted by the user to the queue.


       Pending jobs are displayed by bjobs in the order in which they will be
       considered for dispatch.


OPTIONS




       job_ID


	      Required. Job ID of the job or job array on which to operate.




       position



	      Optional. The position argument can be  specified	 to  indicate
	      where in the queue the job is to be placed. position is a posi-
	      tive number that indicates the target position of the job	 from
	      the  end	of  the queue. The positions are relative to only the
	      applicable jobs in the queue, depending on whether the  invoker
	      is  a regular user or the Lava administrator. The default value
	      of 1 means the position is after all other jobs with  the	 same
	      priority.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




SEE ALSO




       bjobs(1), bswitch(1), btop(1)




		      November 2004   Platform Lava 6.1		      bbot(1)
Back to top arrowbup
bhist(1)							     bhist(1)



NAME
       bhist  - displays historical information about jobs


SYNOPSIS




       bhist   [-a  |  -d  |  -p  |  -r	 |  -s]	 [-b  |	 -w]  [-l]  [-t]  [-C
       start_time,end_time] [-D start_time,end_time] [-S start_time,end_time]
       [-T start_time,end_time] [-f logfile_name | -n number_logfiles | -n 0]
       [-J job_name] [-m host_name]  [-N  host_name  |	-N  host_model	|  -N
       CPU_factor] [-q queue_name] [-u user_name | -u all]


       bhist  [-J  job_name]  [-N  host_name | -N host_model | -N CPU_factor]
       [job_ID ...]


       bhist [-h | -V]


DESCRIPTION




       By default:



	      o Displays information about your own pending, running and sus-
		 pended jobs.  Groups information by job


	      o CPU time is not normalized


	      o	 Searches  the event log file currently used by the Lava sys-
		 tem:	 $LSB_SHAREDIR/cluster_name/logdir/lsb.events	 (see
		 lsb.events(5))


	      o	 Displays  events occurring in the past week, but this can be
		 changed by setting the environment variable  LSB_BHIST_HOURS
		 to an alternate number of hours





       If  neither -l nor -b is present, the default is to display the fields
       in OUTPUT only (see below).


OPTIONS




       -a


	      Displays information about both finished and unfinished jobs.



	      This option overrides -d, -p, -s, and -r.




       -b


	      Brief format. Displays the information in a  brief  format.  If
	      used with the -s option, shows the reason why each job was sus-
	      pended.




       -d


	      Only displays information about finished jobs.




       -l


	      Long format. Displays additional information. If used with  -s,
	      shows the reason why each job was suspended.



	      If  you submitted a job using the OR (||) expression to specify
	      alternate resources, this option displays the successful rusage
	      string that caused the job to run.



	      bhist  -l	 can display job exit codes. A job with exit code 131
	      means that the job exceeded a configured resource	 usage	limit
	      and Lava killed the job.




       -p


	      Only displays information about pending jobs.




       -r


	      Only displays information about running jobs.




       -s


	      Only displays information about suspended jobs.




       -t


	      Displays job events chronologically.




       -w


	      Wide format. Displays the information in a wide format.




       -C start_time,end_time



	      Only  displays  jobs that completed or exited during the speci-
	      fied time interval. Specify the span of time for which you want
	      to display the history. If you do not specify a start time, the
	      start time is assumed to be the time of the  first  occurrence.
	      If  you  do not specify an end time, the end time is assumed to
	      be now. If you do not specify an end  time,  the	end  time  is
	      assumed to be now.



	      Specify  the  times  in  the  format "yyyy/mm/dd/HH:MM". Do not
	      specify spaces in the time interval string.



	      The time interval can be specified in many ways. For more	 spe-
	      cific  syntax  and  examples of time formats, see TIME INTERVAL
	      FORMAT.




       -D start_time,end_time



	      Only displays jobs dispatched during the specified time  inter-
	      val. Specify the span of time for which you want to display the
	      history. If you do not specify a start time, the start time  is
	      assumed  to  be the time of the first occurrence. If you do not
	      specify an end time, the end time is assumed to be now. If  you
	      do  not specify an end time, the end time is assumed to be now.



	      Specify the times in  the	 format	 "yyyy/mm/dd/HH:MM".  Do  not
	      specify spaces in the time interval string.



	      The  time interval can be specified in many ways. For more spe-
	      cific syntax and examples of time formats,  see  TIME  INTERVAL
	      FORMAT.




       -S start_time,end_time



	      Only displays information about jobs submitted during the spec-
	      ified time interval.  Specify the span of time  for  which  you
	      want  to	display	 the  history.	If you do not specify a start
	      time, the start time is assumed to be the	 time  of  the	first
	      occurrence.  If you do not specify an end time, the end time is
	      assumed to be now. If you do not specify an end time,  the  end
	      time is assumed to be now.



	      Specify  the  times  in  the  format "yyyy/mm/dd/HH:MM". Do not
	      specify spaces in the time interval string.



	      The time interval can be specified in many ways. For more	 spe-
	      cific  syntax  and  examples of time formats, see TIME INTERVAL
	      FORMAT.




       -T start_time,end_time



	      Used together with -t.



	      Only displays information about job events within the specified
	      time  interval.  Specify the span of time for which you want to
	      display the history. If you do not specify a  start  time,  the
	      start  time  is assumed to be the time of the first occurrence.
	      If you do not specify an end time, the end time is  assumed  to
	      be  now.	If  you	 do  not specify an end time, the end time is
	      assumed to be now.



	      Specify the times in  the	 format	 "yyyy/mm/dd/HH:MM".  Do  not
	      specify spaces in the time interval string.



	      The  time interval can be specified in many ways. For more spe-
	      cific syntax and examples of time formats,  see  TIME  INTERVAL
	      FORMAT.




       -f logfile_name



	      Searches the specified event log. Specify either an absolute or
	      a relative path.



	      Useful for analysis directly on the file.




       -J job_name



	      Only displays the jobs that have the specified job_name.




       -m host_name



	      Only displays jobs dispatched to the specified host.




       -n number_logfiles | -n 0



	      Searches the specified number of event logs, starting with  the
	      current  event log and working through the most recent consecu-
	      tively numbered logs. The maximum number of logs you can search
	      is  100.	Specify	 0  to	specify	 all  the  event log files in
	      $(LSB_SHAREDIR)/cluster_name/logdir (up to  a  maximum  of  100
	      files).



	      If  you delete a file, you break the consecutive numbering, and
	      older files will be inaccessible to bhist.



	      For example,  if	you  specify  3,  Lava	searches  lsb.events,
	      lsb.events.1, and lsb.events.2. If you specify 4, Lava searches
	      lsb.events, lsb.events.1, lsb.events.2, and lsb.events.3.	 How-
	      ever,  if	 lsb.events.2  is missing, both searches will include
	      only lsb.events and lsb.events.1.




       -N host_name | -N host_model | -N CPU_factor



	      Normalizes CPU time by the specified CPU factor, or by the  CPU
	      factor of the specified host or host model.



	      If  you  use bhist directly on an event log, you must specify a
	      CPU factor.



	      Use lsinfo to get host model and CPU factor information.




       -q queue_name



	      Only displays information about jobs submitted to the specified
	      queue.




       -u user_name | -u all



	      Displays	information  about  jobs  submitted  by the specified
	      user, or by all users if the keyword all is specified.




       job_ID


	      Searches all event log  files  and  only	displays  information
	      about the specified jobs.



	      This  option overrides all other options except -J, -N, -h, and
	      -V. When it is used with -J, only those jobs listed  here	 that
	      have the specified job name are displayed.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




OUTPUT




   Default Format




       Statistics  of  the  amount  of	time  that a job has spent in various
       states:


	      PEND



		     The total waiting time  excluding	user  suspended	 time
		     before the job is dispatched.




	      PSUSP



		     The total user suspended time of a pending job.




	      RUN



		     The total run time of the job.




	      USUSP



		     The  total	 user  suspended  time	after the job is dis-
		     patched.




	      SSUSP



		     The total system suspended time after the	job  is	 dis-
		     patched.




	      UNKWN



		     The  total	 unknown  time of the job (job status becomes
		     unknown if sbatchd on the execution host is  temporarily
		     unreachable).




	      TOTAL



		     The total time that the job has spent in all states; for
		     a finished job, it is the turnaround time (that is,  the
		     time interval from job submission to job completion).




   Long Format (-l)





	      The -l option displays a long format listing with the following
	      additional fields:



	      Command



		     The job command.





	      Detailed history includes the date and time the  job  was	 for-
	      warded  and  the	name of the cluster to which the job was for-
	      warded.


FILES




       Reads lsb.events.


SEE ALSO




       lsb.events(5), bsub(1), bjobs(1), lsinfo(1)


TIME INTERVAL FORMAT




       You use the time interval to define a start and end time for  collect-
       ing the data to be retrieved and displayed. While you can specify both
       a start and an end time, you can also let one of the  values  default.
       You can specify either of the times as an absolute time, by specifying
       the date or time, or you can specify  them  relative  to	 the  current
       time.


       Specify the time interval is follows:


       start_time,end_time|start_time,|,end_time|start_time

       Specify start_time or end_time in the following format:


       [year/][month/][day][/hour:minute|/hour:]|.|.-relative_int

       Where:



	      o year is a four-digit number representing the calendar year.


	      o	 month is a number from 1 to 12, where 1 is January and 12 is
		 December.


	      o day is a number from 1 to 31, representing  the	 day  of  the
		 month.


	      o hour is an integer from 0 to 23, representing the hour of the
		 day on a 24-hour clock.


	      o minute is an integer from 0 to 59, representing the minute of
		 the hour.


	      o . (period) represents the current month/day/hour:minute.


	      o	 .-relative_int is a number, from 1 to 31, specifying a rela-
		 tive start or end time prior to now.



	      start_time,end_time



		     Specifies both the start and end times of the  interval.




	      start_time,



		     Specifies a start time, and lets the end time default to
		     now.




	      ,end_time



		     Specifies to start with the first logged occurrence, and
		     end at the time specified.




	      start_time



		     Starts at the beginning of the most specific time period
		     specified, and ends at the maximum	 value	of  the	 time
		     period specified. For example, 2/ specifies the month of
		     February--start February 1 at 00:00 a.m. and end at  the
		     last  possible  minute  in	 February:  February  28th at
		     midnight.






   ABSOLUTE TIME EXAMPLES





	      Assume the current time is May 9 17:06 2004:



	      1,8 = May 1 00:00 2004 to May 8 23:59 2004



	      ,4 = the time of the first occurrence to May 4 23:59 2004



	      6 = May 6 00:00 2004 to May 6 23:59 2004



	      2/ = Feb 1 00:00 2004 to Feb 28 23:59 2004



	      /12: = May 9 12:00 2004 to May 9 12:59 2004



	      2/1 = Feb 1 00:00 2004 to Feb 1 23:59 2004



	      2/1, = Feb 1 00:00 to the current time



	      ,. = the time of the first occurrence to the current time



	      ,2/10: = the time of the first occurrence to May 2 10:59 2004



	      2001/12/31,2004/5/1 = from Dec 31, 2001  00:00:00	 to  May  1st
	      2004 23:59:59




   RELATIVE TIME EXAMPLES








	      ,.-2/ = the time of the first occurrence to Mar 7 17:06 2004



	      17:06)







		      November 2004   Platform Lava 6.1		     bhist(1)
Back to top arrowbup
bjobs(1)							     bjobs(1)



NAME
       bjobs  - displays information about Lava jobs


SYNOPSIS




       bjobs  [-a] [-w | -l] [-J job_name] [-m host_name ] [-N host_name | -N
       host_model | -N CPU_factor] [-q queue_name] [-u user_name  |  -u	 all]
       job_ID ...


       bjobs [-d] [-p] [-r] [-s] [-A] [-w | -l] [-J job_name] [-m host_name ]
       [-N host_name | -N host_model | -N  CPU_factor]	[-q  queue_name]  [-u
       user_name | -u all] job_ID ...


       bjobs [-h | -V]


DESCRIPTION




       By  default,  displays information about your own pending, running and
       suspended jobs.


       To display older historical information, use bhist.


OPTIONS




       -a


	      Displays information about jobs in all states,  including	 fin-
	      ished jobs that finished recently, within an interval specified
	      by CLEAN_PERIOD in lsb.params (the default period is 1 hour).




       -d


	      Displays information about jobs that finished recently,  within
	      an  interval  specified  by  CLEAN_PERIOD	 in  lsb.params	 (the
	      default period is 1 hour).




       -l


	      Long format. Displays detailed information for each  job	in  a
	      multiline format.



	      The  -l  option  displays the following additional information:
	      job command, current working directory on the submission	host,
	      pending  and  suspending	reasons,  job status, resource usage,
	      resource usage limits information.




       -p


	      Displays pending jobs, together with the pending	reasons	 that
	      caused  each  job not to be dispatched during the last dispatch
	      turn. The pending reason shows the number	 of  hosts  for	 that
	      reason, or names the hosts if -l is also specified.



	      Each pending reason is associated with one or more hosts and it
	      states the cause why these hosts are not allocated to  run  the
	      job. In situations where the job requests specific hosts (using
	      bsub -m), users may see reasons for unrelated hosts also	being
	      displayed,  together  with  the  reasons	associated  with  the
	      requested hosts.



	      The life cycle of a pending reason ends after  the  time	indi-
	      cated by PEND_UPDATE_INTERVAL in lsb.params.




       -r


	      Displays running jobs.




       -s


	      Displays	suspended  jobs,  together with the suspending reason
	      that caused each job to become suspended.



	      The suspending reason may not remain the	same  while  the  job
	      stays suspended. For example, a job may have been suspended due
	      to the paging rate, but after the paging rate  dropped  another
	      load  index  could prevent the job from being resumed. The sus-
	      pending reason will be updated according to the load index. The
	      reasons  could  be  as  old  as  the time interval specified by
	      SBD_SLEEP_TIME in lsb.params. So	the  reasons  shown  may  not
	      reflect the current load situation.




       -w


	      Wide   format.  Displays	job  information  without  truncating
	      fields.




       -J job_name



	      Displays information about the specified jobs.




       -m host_name ...



	      Only displays jobs dispatched to the specified  hosts.  To  see
	      the available hosts, use bhosts.




       -N host_name | -N host_model | -N CPU_factor



	      Displays	the  normalized CPU time consumed by the job. Normal-
	      izes using the CPU factor specified, or the CPU factor  of  the
	      host or host model specified.




       -q queue_name



	      Only displays jobs in the specified queue.



	      The  command bqueues returns a list of queues configured in the
	      system, and  information	about  the  configurations  of	these
	      queues.




       -u user_name...| -u all



	      Only  displays  jobs  that have been submitted by the specified
	      users. The keyword all specifies all users.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




OUTPUT




       Pending jobs are displayed in the order in which they will be  consid-
       ered  for  dispatch.   Jobs  in	higher	priority queues are displayed
       before those in lower priority queues.  Pending jobs in the same	 pri-
       ority  queues  are displayed in the order in which they were submitted
       but this order can be changed by using the commands btop or  bbot.  If
       more  than  one job is dispatched to a host, the jobs on that host are
       listed in the order in which they will be considered for scheduling on
       this  host by their queue priorities and dispatch times. Finished jobs
       are displayed in the order in which they were completed.


   Default Display




       A listing of jobs is displayed with the following fields:


JOBID



       The job ID that Lava assigned to the job.




USER



       The user who submitted the job.




STAT



       The current status of the job (see JOB STATUS below).




QUEUE



       The name of the job queue to which the job belongs. If  the  queue  to
       which  the  job	belongs	 has been removed from the configuration, the
       queue name will be displayed as lost_and_found. Use bhist to  get  the
       original	 queue	name. Jobs in the lost_and_found queue remain pending
       until they are switched with the bswitch command into another queue.




FROM_HOST



       The name of the host from which the job was submitted.




EXEC_HOST



       The name of one or more hosts on which  the  job	 is  executing	(this
       field  is  empty	 if  the job has not been dispatched). If the host on
       which the job is running has been removed from the configuration,  the
       host  name  will	 be displayed as lost_and_found. Use bhist to get the
       original host name.




JOB_NAME



       The job name assigned by the user, or the command string	 assigned  by
       default	(see  bsub  (1)).  If the job name is too long to fit in this
       field, then only the latter part of the job name is displayed.




SUBMIT_TIME



       The submission time of the job.




   -l output




       The -l option displays a long format listing with the following	addi-
       tional fields:


Command



       The job command.




CWD



       The current working directory on the submission host.




PENDING REASONS



       The  reason  the	 job  is in the PEND or PSUSP state. The names of the
       hosts associated with each reason will be displayed when both  -p  and
       -l options are specified.




SUSPENDING REASONS



       The reason the job is in the USUSP or SSUSP state.



       loadSched


	      The load scheduling thresholds for the job.



       loadStop


	      The load suspending thresholds for the job.




JOB STATUS



       Possible values for the status of a job include:



       PEND


	      The job is pending, that is, it has not yet been started.



       PSUSP


	      The  job	has  been  suspended, either by its owner or the Lava
	      administrator, while pending.



       RUN


	      the job is currently running.



       USUSP


	      The job has been suspended, either by its	 owner	or  the	 Lava
	      administrator, while running.



       SSUSP


	      The  job has been suspended by Lava. The job has been suspended
	      by Lava due to either of the following two causes:




       o The load conditions on the execution host or hosts have  exceeded  a
	  threshold  according to the loadStop vector defined for the host or
	  queue.


       o The run window	 of  the  job's	 queue	is  closed.  See  bqueues(1),
	  bhosts(1), and lsb.queues(5).



	  DONE


		 The job has terminated with status of 0.



	  EXIT


		 The  job has terminated with a non-zero status - it may have
		 been aborted due to an error in its execution, or killed  by
		 its owner or the Lava administrator.



		 For  example,	exit  code  131 means that the job exceeded a
		 configured resource usage limit and Lava killed the job.



	  UNKWN


		 mbatchd has lost contact with the sbatchd  on	the  host  on
		 which the job runs.



	  ZOMBI


		 A job will become ZOMBI if:



		 -  A non-rerunnable job is killed by bkill while the sbatchd
		 on the execution host is unreachable and the job is shown as
		 UNKWN.



		 -  The host on which a rerunnable job is running is unavail-
		 able and the job has been requeued by Lava with  a  new  job
		 ID, as if the job were submitted as a new job.



		 After the execution host becomes available, Lava will try to
		 kill the ZOMBI job.   Upon  successful	 termination  of  the
		 ZOMBI job, the job's status will be changed to EXIT.



	  RESOURCE LIMITS



		 The  hard resource usage limits that are imposed on the jobs
		 in the queue (see  getrlimit(2)  and  lsb.queues(5)).	These
		 limits are imposed on a per-job and a per-process basis.



		 The possible per-job resource usage limits are:



		 CPULIMIT



		 MEMLIMIT



		 SWAPLIMIT



		 PROCESSLIMIT



		 The possible UNIX per-process resource usage limits are:



		 RUNLIMIT



		 FILELIMIT



		 DATALIMIT



		 STACKLIMIT



		 CORELIMIT



		 If  a	job  submitted	to  the queue has any of these limits
		 specified (see bsub(1)), then the lower of the corresponding
		 job limits and queue limits are used for the job.



		 If  no	 resource limit is specified, the resource is assumed
		 to be unlimited.







EXAMPLES




       % bjobs -pl



	      Displays detailed information about all  pending	jobs  of  the
	      invoker.




       % bjobs -ps



	      Display only pending and suspended jobs.




       % bjobs -u all -a



	      Displays all jobs of all users.




       % bjobs -d -q short -m hostA -u user1



	      Displays	all  the recently finished jobs submitted by user1 to
	      the queue short, and executed on the host hostA.




       % bjobs 101 102 203 509



	      Display jobs with job_ID 101, 102, 203, and 509.




SEE ALSO




       bsub(1),	 bkill(1),  bhosts(1),	bqueues(1),   bhist(1),	  bresume(1),
       bstop(1), lsb.params(5)




		      November 2004   Platform Lava 6.1		     bjobs(1)
Back to top arrowbup
bkill(1)							     bkill(1)



NAME
       bkill  - sends signals to kill, suspend, or resume unfinished jobs


SYNOPSIS




       bkill  [-l]  [-J	 job_name]  [-m	 host_name ] [-q queue_name] [-r | -s
       (signal_value | signal_name)] [-u user_name| -u all] [job_ID ...	 |  0
       ...]


       bkill [-h | -V]


DESCRIPTION




       By  default,  sends  a  set  of signals to kill the specified jobs. On
       UNIX, SIGINT and SIGTERM are sent to give the job a chance to clean up
       before  termination,  then  SIGKILL  is sent to kill the job. The time
       interval between sending each signal  is	 defined  by  the  JOB_TERMI-
       NATE_INTERVAL parameter in lsb.params(5).


       You  must  specify a job ID or -g, -J, -m, -u, or -q. Specify job ID 0
       (zero) to kill multiple jobs.


       Exit code 130 is returned when a dispatched job is killed with  bkill.


       Users  can only operate on their own jobs. Only root and Lava adminis-
       trators can operate on jobs submitted by other users.


       If a signal request fails to reach the job execution host, Lava	tries
       the  operation later when the host becomes reachable. Lava retries the
       most recent signal request.


       If the job cannot be killed, use bkill -r to remove the job  from  the
       Lava  system  without  waiting  for the job to terminate, and free the
       resources of the job.


OPTIONS




       0


	      Kill all the jobs that satisfy other options (-g,	 -m,  -q,  -u
	      and -J).




       -l


	      Displays	the signal names supported by bkill. This is a subset
	      of signals supported by /bin/kill and is platform-dependent.




       -J job_name



	      Operates only on jobs  with  the	specified  job_name.  The  -J
	      option  is ignored if a job ID other than 0 is specified in the
	      job_ID option.




       -m host_name



	      Operates only on jobs dispatched to the specified host.



	      If job_ID is not specified, only the  most  recently  submitted
	      qualifying  job  is  operated on. The -m option is ignored if a
	      job ID other than 0 is specified in  the	job_ID	option.	  See
	      bhosts(1) for more information about hosts.




       -q queue_name



	      Operates only on jobs in the specified queue.



	      If  job_ID  is  not specified, only the most recently submitted
	      qualifying job is operated on.



	      The -q option is ignored if a job ID other than 0 is  specified
	      in the job_ID option.



	      See bqueues(1) for more information about queues.




       -r


	      Removes  a job from the Lava system without waiting for the job
	      to terminate in the operating system.



	      Sends the same series of signals as bkill	 without  -r,  except
	      that the job is removed from the system immediately, the job is
	      marked as EXIT, and the job resources that  Lava	monitors  are
	      released as soon as Lava receives the first signal.



	      Also operates on jobs for which a bkill command has been issued
	      but which cannot be reached to be acted on by sbatchd (jobs  in
	      ZOMBI  state).  If  sbatchd  recovers  before the jobs are com-
	      pletely removed, Lava ignores the zombi jobs killed with	bkill
	      -r.



	      Use  bkill -r only on jobs that cannot be killed in the operat-
	      ing system, or on jobs that cannot be otherwise  removed	using
	      bkill.



	      The -r option cannot be used with the -s option.




       -s (signal_value | signal_name)



	      Sends  the  specified signal to specified jobs. You can specify
	      either a name, stripped of the SIG prefix (such as KILL), or  a
	      number (such as 9).



	      Eligible signal names are listed by bkill -l.



	      The -s option cannot be used with the -r option.



	      Use  bkill -s to suspend and resume jobs by using the appropri-
	      ate signal instead of using bstop or bresume. Sending the	 SIG-
	      CONT signal is the same as using bresume.



	      Sending the SIGSTOP signal to sequential jobs or the SIGTSTP to
	      parallel jobs is the same as using bstop.



	      You cannot suspend a job that is already suspended, or resume a
	      job  that	 is  not suspended. Using SIGSTOP or SIGTSTP on a job
	      that is in the USUSP state has no effect and using SIGCONT on a
	      job  that	 is not in either the PSUSP or the USUSP state has no
	      effect. See bjobs(1) for more information about job states.




       -u user_name | -u all



	      Operates only on jobs submitted by the specified	user,  or  by
	      all users if the reserved user name all is specified.



	      If  job_ID  is  not specified, only the most recently submitted
	      qualifying job is operated on.  The -u option is ignored	if  a
	      job ID other than 0 is specified in the job_ID option.




       job_ID ... | 0



	      Operates only on jobs that are specified by job_ID.



	      Jobs  submitted by any user can be specified here without using
	      the -u option. If you use the reserved job ID 0, all  the	 jobs
	      that  satisfy  other  options  (that is, -m, -q, -u and -J) are
	      operated on; all other job IDs are ignored.



	      The options -u, -q, -m and -J have no effect if a job ID	other
	      than  0  is  specified.  Job IDs are returned at job submission
	      time (see bsub(1)) and may be obtained with the  bjobs  command
	      (see bjobs(1)).




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




EXAMPLES





       % bkill -s 17 -q night




	      Sends  signal  17	 to  the  last	job that was submitted by the
	      invoker to queue night.




       % bkill -q short -u all 0




	      Kills all the jobs that are in the queue short.




       % bkill -r 1045




	      Forces the removal of unkillable job 1045.





SEE ALSO




       bsub(1),	 bjobs(1),  bqueues(1),	 bhosts(1),   bresume(1),   bstop(1),
       kill(1),






		      November 2004   Platform Lava 6.1		     bkill(1)
Back to top arrowbup
bqueues(1)							   bqueues(1)



NAME
       bqueues	- displays information about queues


SYNOPSIS




       bqueues [-w | -l | -r] [-m host_name | -m all] [-u user_name | -u all]
       [queue_name ...]


       bqueues [-h | -V]


DESCRIPTION




       Displays information about queues.


       By default, returns the following information about all queues:	queue
       name, queue priority, queue status, job slot statistics, and job state
       statistics.


       Batch queue names and characteristics are set up by the Lava  adminis-
       trator (see lsb.queues(5) and mbatchd(8)).


       CPU time is normalized.


OPTIONS




       -w


	      Displays	queue  information  in a wide format. Fields are dis-
	      played without truncation.




       -l


	      Displays queue information in a long multiline format.  The  -l
	      option  displays	the  following	additional information: queue
	      description, queue characteristics and  statistics,  scheduling
	      parameters,  resource usage limits, scheduling policies, users,
	      hosts, associated commands, dispatch and run windows,  and  job
	      controls.




       -m host_name | -m all



	      Displays the queues that can run jobs on the specified host. If
	      the keyword all is specified, displays the queues that can  run
	      jobs on all hosts.




       -u user_name | -u all



	      Displays	the  queues  that  can accept jobs from the specified
	      user. If the keyword all is specified, displays the queues that
	      can accept jobs from all users.




       queue_name ...



	      Displays information about the specified queues.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




OUTPUT




   Default Output




       Displays the following fields:


QUEUE_NAME



       The  name  of the queue. Queues are named to correspond to the type of
       jobs usually submitted to them,	or  to	the  type  of  services	 they
       provide.



       lost_and_found


	      If  the Lava administrator removes queues from the system, Lava
	      creates a queue called lost_and_found and places the jobs	 from
	      the  removed  queues into the lost_and_found queue. Jobs in the
	      lost_and_found queue  will  not  be  started  unless  they  are
	      switched to other queues (see bswitch).




PRIO



       The priority of the queue. The larger the value, the higher the prior-
       ity. If job priority is not configured, determines  the	queue  search
       order  at  job  dispatch,  suspension  and  resumption time. Jobs from
       higher priority queues are dispatched first (this is contrary to	 UNIX
       process	priority  ordering),  and jobs from lower priority queues are
       suspended first when hosts are overloaded.




STATUS



       The current status of the queue. The possible values are:



       Open


	      The queue is able to accept jobs.



       Closed


	      The queue is not able to accept jobs.



       Active


	      Jobs in the queue may be started.



       Inactive


	      Jobs in the queue cannot be started for the time being.



       At any moment, each queue is either Open	 or  Closed,  and  is  either
       Active  or  Inactive. The queue can be opened, closed, inactivated and
       re-activated by the Lava administrator using badmin (see badmin(8)).



       Jobs submitted to a queue that is later closed are still dispatched as
       long  as	 the queue is active. The queue can also become inactive when
       either its dispatch window is closed or its run window is closed	 (see
       DISPATCH_WINDOWS	 in  the "Output for the -l Option" section). In this
       case, the queue cannot be activated using badmin.  The  queue  is  re-
       activated  by Lava when one of its dispatch windows and one of its run
       windows are open again. The initial state of a queue at Lava boot time
       is  set	to  open, and either active or inactive depending on its win-
       dows.




MAX



       The maximum number of job slots that can be used by the jobs from  the
       queue.  These job slots are used by dispatched jobs which have not yet
       finished, and by pending jobs which have slots reserved for them.



       A sequential job will use one job slot when  it	is  dispatched	to  a
       host,  while  a parallel job will use as many job slots as is required
       by bsub -n when it is dispatched. See bsub(1) for details. If  '-'  is
       displayed, there is no limit.




NJOBS



       The  total  number  of  job slots held currently by jobs in the queue.
       This includes pending, running, suspended and reserved  job  slots.  A
       parallel	 job  that  is	running	 on  n processors is counted as n job
       slots, since it takes n job slots in the queue. See  bjobs(1)  for  an
       explanation of batch job states.




PEND



       The number of job slots used by pending jobs in the queue.




RUN



       The number of job slots used by running jobs in the queue.




SUSP



       The number of job slots used by suspended jobs in the queue.




   Output for -l Option




       In addition to the above fields, the -l option displays the following:


Description



       A description of the typical use of the queue.




Default queue indication



       Indicates that this is the default queue.




PARAMETERS/STATISTICS


       NICE


	      The nice value at which jobs in the queue will be run. This  is
	      the  UNIX	 nice  value  for  reducing the process priority (see
	      nice(1)).



       STATUS



	      Inactive


		     The long format for the -l	 option	 gives	the  possible
		     reasons for a queue to be inactive:




	      Inact_Win


		     The  queue is out of its dispatch window or its run win-
		     dow.




	      Inact_Adm


		     The queue has been inactivated by the  Lava  administra-
		     tor.



	      SSUSP


		     The  number  of job slots in the queue allocated to jobs
		     that are suspended by Lava because of load levels or run
		     windows.



	      USUSP


		     The  number  of job slots in the queue allocated to jobs
		     that are suspended by the job submitter or by  the	 Lava
		     administrator.



	      RSV


		     The  number  of job slots in the queue that are reserved
		     by Lava for pending jobs.



       Migration threshold



	      The length of time in seconds that a job	dispatched  from  the
	      queue  will remain suspended by the system before Lava attempts
	      to migrate the job to another host.  See the MIG	parameter  in
	      lsb.queues and lsb.hosts.




       Interval for a host to accept two jobs



	      The  length  of time in seconds to wait after dispatching a job
	      to a host before dispatching a second job to the same host.  If
	      the  job	accept	interval is zero, a host may accept more than
	      one job in each dispatching interval. See the JOB_ACCEPT_INTER-
	      VAL parameter in lsb.queues and lsb.params.




       RESOURCE LIMITS



	      The  hard resource usage limits that are imposed on the jobs in
	      the queue (see getrlimit(2) and  lsb.queues(5)).	These  limits
	      are imposed on a per-job and a per-process basis.



	      The possible per-job limits are:



	      CPULIMIT


		     The maximum CPU time a job can use, in minutes, relative
		     to the CPU factor of the named host. CPULIMIT is  scaled
		     by the CPU factor of the execution host so that jobs are
		     allowed more time on slower hosts.



		     When the job-level CPULIMIT is reached, a SIGXCPU signal
		     is	 sent  to  all processes belonging to the job. If the
		     job has no signal handler for SIGXCPU, the job is killed
		     immediately.  If the SIGXCPU signal is handled, blocked,
		     or ignored by the	application,  then  after  the	grace
		     period  expires, Lava sends SIGINT, SIGTERM, and SIGKILL
		     to the job to kill it.



	      MEMLIMIT


		     The maximum running set size (RSS) of a process, in  KB.
		     If	 a  process uses more than MEMLIMIT kilobytes of mem-
		     ory, its priority is reduced so that other processes are
		     more  likely  to  be  paged in to available memory. This
		     limit is enforced by the setrlimit	 system	 call  if  it
		     supports the RLIMIT_RSS option.



	      SWAPLIMIT


		     The swap space limit that a job may use. If SWAPLIMIT is
		     reached, the  system  sends  the  following  signals  in
		     sequence  to  all processes in the job: SIGINT, SIGTERM,
		     and SIGKILL.



		     The possible UNIX per-process resource limits are:



	      RUNLIMIT


		     The maximum wall clock time a process can use,  in	 min-
		     utes. RUNLIMIT is scaled by the CPU factor of the execu-
		     tion host. When a job has been in the RUN	state  for  a
		     total  of	RUNLIMIT minutes, Lava sends a SIGUSR2 signal
		     to the job. If the job does not exit within 10  minutes,
		     Lava sends a SIGKILL signal to kill the job.



	      FILELIMIT


		     The  maximum  file	 size  a process can create, in kilo-
		     bytes. This limit is enforced by the UNIX setrlimit sys-
		     tem  call if it supports the RLIMIT_FSIZE option, or the
		     ulimit  system  call  if  it  supports  the  UL_SETFSIZE
		     option.



	      DATALIMIT


		     The  maximum  size	 of the data segment of a process, in
		     kilobytes. This restricts the amount of memory a process
		     can  allocate.  DATALIMIT	is  enforced by the setrlimit
		     system call if it supports the RLIMIT_DATA	 option,  and
		     unsupported otherwise.



	      STACKLIMIT


		     The  maximum  size of the stack segment of a process, in
		     kilobytes. This restricts the amount of memory a process
		     can use for local variables or recursive function calls.
		     STACKLIMIT is enforced by the setrlimit system  call  if
		     it supports the RLIMIT_STACK option.



	      CORELIMIT


		     The  maximum  size	 of a core file, in KB. This limit is
		     enforced by the setrlimit system call if it supports the
		     RLIMIT_CORE option.



	      If  a job submitted to the queue has any of these limits speci-
	      fied (see bsub(1)), then the lower  of  the  corresponding  job
	      limits and queue limits are used for the job.



	      If  no  resource limit is specified, the resource is assumed to
	      be unlimited.




       SCHEDULING PARAMETERS



	      The scheduling and suspending thresholds for the queue.



	      The scheduling threshold loadSched and the suspending threshold
	      loadStop	are  used  to control batch job dispatch, suspension,
	      and resumption. The queue thresholds are	used  in  combination
	      with  the	 thresholds  defined  for  hosts  (see	bhosts(1) and
	      lsb.hosts(5)). If both queue level and  host  level  thresholds
	      are configured, the most restrictive thresholds are applied.



	      The  loadSched  and  loadStop  thresholds	 have  the  following
	      fields:



	      r15s


		     The 15-second exponentially averaged effective  CPU  run
		     queue length.



	      r1m


		     The  1-minute  exponentially  averaged effective CPU run
		     queue length.



	      r15m


		     The 15-minute exponentially averaged effective  CPU  run
		     queue length.



	      ut


		     The CPU utilization exponentially averaged over the last
		     minute, expressed as a percentage between 0 and 1.



	      pg


		     The memory paging rate exponentially averaged  over  the
		     last minute, in pages per second.



	      io


		     The  disk	I/O rate exponentially averaged over the last
		     minute, in kilobytes per second.



	      ls


		     The number of current login users.



	      it


		     On UNIX, the idle time of the host (keyboard not touched
		     on all logged in sessions), in minutes.



	      tmp


		     The amount of free space in /tmp, in megabytes.



	      swp


		     The   amount  of  currently  available  swap  space,  in
		     megabytes.



	      mem


		     The amount of currently available memory, in  megabytes.



	      In  addition  to	these  internal indices, external indices are
	      also  displayed  if  they	 are  defined  in   lsb.queues	 (see
	      lsb.queues(5)).



	      The  loadSched  threshold	 values	 specify  the job dispatching
	      thresholds for the corresponding load indices. If '-'  is	 dis-
	      played  as the value, it means the threshold is not applicable.
	      Jobs in the queue may be dispatched to a host if the values  of
	      all  the	load  indices of the host are within (below or above,
	      depending on the meaning of the load index)  the	corresponding
	      thresholds  of  the queue and the host. The same conditions are
	      used to resume jobs dispatched from the queue  that  have	 been
	      suspended on this host.



	      Similarly, the loadStop threshold values specify the thresholds
	      for job suspension.  If any of the load index values on a	 host
	      go beyond the corresponding threshold of the queue, jobs in the
	      queue will be suspended.




       SCHEDULING POLICIES



	      Scheduling policies of the queue. Optionally, one	 or  more  of
	      the following policies may be configured:



	      NO_INTERACTIVE


		     This  queue does not accept batch interactive jobs. (see
		     the -I option of bsub(1)). The default is to accept both
		     interactive and non-interactive jobs.



	      ONLY_INTERACTIVE


		     This  queue  only	accepts	 batch interactive jobs. Jobs
		     must be submitted using the -I, -Is, and -Ip options  of
		     bsub(1).  The  default is to accept both interactive and
		     non-interactive jobs.




       DEFAULT HOST SPECIFICATION



	      The default host or host model that will be used	to  normalize
	      the CPU time limit of all jobs.



	      If  you  want to view a list of the CPU factors defined for the
	      hosts in your cluster, see lsinfo(1). The CPU factors are	 con-
	      figured in lsf.shared(5).



	      The appropriate CPU scaling factor of the host or host model is
	      used to adjust the actual CPU time limit at the execution	 host
	      (see CPULIMIT in lsb.queues(5)).	The DEFAULT_HOST_SPEC parame-
	      ter in lsb.queues overrides the system DEFAULT_HOST_SPEC param-
	      eter  in	lsb.params  (see lsb.params(5)). If a user explicitly
	      gives a host specification when submitting a job using bsub  -c
	      cpu_limit[/host_name  |  /host_model],  the  user specification
	      overrides the values defined in both lsb.params and lsb.queues.




       RUN_WINDOWS



	      The  time	 windows in a week during which jobs in the queue may
	      run.



	      When a queue is out of its window or windows, no	job  in	 this
	      queue  will  be  dispatched. In addition, when the end of a run
	      window is reached, any running jobs from this  queue  are	 sus-
	      pended  until  the  beginning of the next run window, when they
	      are resumed. The default is no restriction, or always open.




       DISPATCH_WINDOWS



	      Dispatch windows are the time windows in a  week	during	which
	      jobs in the queue may be dispatched.



	      When  a  queue is out of its dispatch window or windows, no job
	      in this queue will be dispatched. Jobs already  dispatched  are
	      not  affected  by	 the  dispatch	windows.   The	default is no
	      restriction, or always open (that is, twenty-four hours a	 day,
	      seven  days a week). Note that such windows are only applicable
	      to batch jobs. Interactive jobs scheduled by LIM are controlled
	      by  another  set	of dispatch windows (see lshosts(1)). Similar
	      dispatch windows may be configured for  individual  hosts	 (see
	      bhosts(1)).



	      A	 window	 is displayed in the format begin_time-end_time. Time
	      is specified  in	the  format  [day:]hour[:minute],  where  all
	      fields  are  numbers  in	their respective legal ranges: 0(Sun-
	      day)-6 for day, 0-23 for hour, and 0-59 for minute. The default
	      value  for minute is 0 (on the hour). The default value for day
	      is every day of the week. The  begin_time	 and  end_time	of  a
	      window  are  separated  by '-', with no blank characters (SPACE
	      and TAB) in between.  Both  begin_time  and  end_time  must  be
	      present  for  a  window. Windows are separated by blank charac-
	      ters.




       USERS



	      A list of users allowed to submit	 jobs  to  this	 queue.	 Lava
	      administrators  can  submit  jobs to the queue even if they are
	      not listed here.




       HOSTS



	      A list of hosts where jobs in the queue can be dispatched.




       PRE_EXEC



	      The queue's pre-execution command. The pre-execution command is
	      executed	before	each job in the queue is run on the execution
	      host (or on the first host selected for a parallel batch	job).
	      See lsb.queues(5) for more information.




       POST_EXEC



	      The  queue's post-execution command. The post-execution command
	      is run on	 the  execution	 host  when  a	job  terminates.  See
	      lsb.queues(5) for more information.




       REQUEUE_EXIT_VALUES



	      Jobs  that  exit	with these values are automatically requeued.
	      See lsb.queues(5) for more information.




       RES_REQ



	      Resource requirements of the queue. Only the hosts that satisfy
	      these resource requirements can be used by the queue.




       RESUME_COND



	      The conditions that must be satisfied to resume a suspended job
	      on a host. See lsb.queues(5) for more information.




       STOP_COND



	      The conditions which determine whether a job running on a	 host
	      should be suspended. See lsb.queues(5) for more information.




       JOB_STARTER



	      An  executable  file  that  runs immediately prior to the batch
	      job, taking the batch job file as an input argument.  All	 jobs
	      submitted	 to  the  queue are run via the job starter, which is
	      generally used  to  create  a  specific  execution  environment
	      before  processing  the  jobs themselves. See lsb.queues(5) for
	      more information.




       RERUNNABLE



	      If the RERUNNABLE field displays yes, jobs  in  the  queue  are
	      rerunnable.  That	 is,  jobs  in	the  queue  are automatically
	      restarted or rerun if the execution host	becomes	 unavailable.
	      However,	a  job	in the queue will not be restarted if the you
	      have  removed  the  rerunnable  option  from   the   job.	  See
	      lsb.queues(5) for more information.




       CHECKPOINT



	      If  the  CHKPNTDIR  field	 is  displayed, jobs in the queue are
	      checkpointable. Jobs will use the default checkpoint  directory
	      and period unless you specify other values.  Note that a job in
	      the queue will not be checkpointed  if  you  have	 removed  the
	      checkpoint  option  from	the  job.  See lsb.queues(5) for more
	      information.



	      CHKPNTDIR


		     Specifies the checkpoint directory using an absolute  or
		     relative path name.



	      CHKPNTPERIOD


		     Specifies the checkpoint period in seconds.



		     Although  the  output  of bqueues reports the checkpoint
		     period in seconds, the checkpoint period is  defined  in
		     minutes  (the  checkpoint	period is defined through the
		     bsub -k "checkpoint_dir [checkpoint_period]" option,  or
		     in lsb.queues).




       JOB CONTROLS



	      The configured actions for job control. See JOB_CONTROLS param-
	      eter in lsb.queues.



	      The  configured	actions	  are	displayed   in	 the   format
	      [action_type,  command]  where  action_type  is either SUSPEND,
	      RESUME, or TERMINATE.




SEE ALSO




       lsb.queues(5), bsub(1), bjobs(1), bhosts(1), badmin(8)






		      November 2004   Platform Lava 6.1		   bqueues(1)

Back to top arrowbup
brequeue(1)							  brequeue(1)



NAME
       brequeue	 - Kills and requeues a job


SYNOPSIS




       brequeue	 [-J  job_name]	 [ -u user_name | -u all] [ job_ID] [-d] [-e]
       [-r] [-a] [-H]


       brequeue [-h | -V]


DESCRIPTION




       You can only use brequeue on a job you own, unless you are root or the
       Lava administrator.


       Kills  a	 running  (RUN),  user-suspended (USUSP), or system-suspended
       (SSUSP) job and returns it to the queue. A  job	that  is  killed  and
       requeued	 retains  its  submit time but is dispatched according to its
       requeue time. When the job is requeued, it is assigned the PEND status
       or  PSUSP  if  the  -H option is used. Once dispatched, the job starts
       over from the beginning. The requeued job keeps the same job ID.


       Use brequeue to requeue job arrays or elements of them.


       By default, kills and requeues your most recently submitted  job	 when
       no job_ID is specified.


OPTIONS




       -J job_name



	      Operates on the specified job.




       -u user_name | -u all



	      Operates on the specified user's jobs or all jobs.



	      Only root and Lava administrators can requeue jobs submitted by
	      other users.




       job_ID


	      Operates on the specified job.



	      The value of 0 for job_ID is ignored.




       -d


	      Requeues jobs that have finished running with DONE job  status.




       -e


	      Requeues	jobs  that  have  terminated abnormally with EXIT job
	      status.




       -r


	      Requeues jobs that are running.




       -a


	      Requeues all jobs including running jobs, suspending jobs,  and
	      jobs with EXIT or DONE status.




       -H


	      Requeues jobs to PSUSP job status.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




LIMITATIONS




       brequeue cannot be used on interactive batch jobs; brequeue only kills
       interactive batch jobs, it does not restart them.




		      November 2004   Platform Lava 6.1		  brequeue(1)
Back to top arrowbup
bresume(1)							   bresume(1)



NAME
       bresume	- resumes one or more suspended jobs


SYNOPSIS




       bresume	[-J  job_name] [-m host_name ] [-q queue_name] [-u user_name|
       -u all ] [0]


       bresume [job_ID] ...


       bresume [-h | -V]


DESCRIPTION




       Sends the SIGCONT signal to resume one or more of your suspended jobs.


       Only  root  and	Lava  administrators can operate on jobs submitted by
       other users. You cannot resume a job that is not suspended. Using bre-
       sume  on	 a job that is not in either the PSUSP or the USUSP state has
       no effect.


       You must specify a job ID or -J, -m, -u, or -q. You  cannot  resume  a
       job  that is not suspended. Specify -0 (zero) to resume multiple jobs.


       You can also use bkill -s CONT to send the resume signal to a job.


       If a signal request fails to reach the job execution host,  Lava	 will
       retry  the  operation  later  when  the	host  becomes reachable. Lava
       retries the most recent signal request.


       Jobs that are suspended by the administrator can only  be  resumed  by
       the  administrator  or  root; users do not have permission to resume a
       job suspended by another user or the administrator. Administrators  or
       root can resume jobs suspended by users or administrators.


OPTIONS




       0


	      Resumes  all  the	 jobs that satisfy other options (-m, -q, -u,
	      and -J).




       -J job_name



	      Resumes only jobs with the specified name.




       -m host_name



	      Resumes only jobs dispatched to the specified host.




       -q queue_name



	      Resumes only jobs in the specified queue.




       -u user_name | -u all



	      Resumes only jobs owned by the specified user, or all users  if
	      the reserved user name all is specified.




       job_ID ...



	      Resumes only the specified jobs. Jobs submitted by any user can
	      be specified here without using the -u option.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




EXAMPLES





       % bresume -q night 0




       Resumes all of the user's suspended jobs that are in the night  queue.
       If  the	user is the Lava administrator, resumes all suspended jobs in
       the night queue.


SEE ALSO




       bsub(1),	 bjobs(1),   bqueues(1),   bhosts(1),	bstop(1),   bkill(1),
       lsb.params(5)






		      November 2004   Platform Lava 6.1		   bresume(1)
Back to top arrowbup
brun(8)								      brun(8)



NAME
       brun  - forces a job to run immediately


SYNOPSIS




       brun -m "host_name ..." job_ID

       brun [-h | -V]


DESCRIPTION




       This command can only be used by Lava administrators.



       Forces a pending job to run immediately on specified hosts.


       A  job  which has been forced to run is counted as a running job, this
       may violate the user, queue, or host job limits.


       By default, after the job is started, it is still subject to run	 win-
       dows and suspending conditions.


OPTIONS




       -c


	      Distribute  job slots for a mult-host parallel job according to
	      free CPUs.



	      By default, if a parallel job spans for  more  than  one	host,
	      Lava  distributes	 the  slots based on the static CPU counts of
	      each host listed in the -m option. Use  -c  to  distribute  the
	      slots based on the free CPUs of each host instead of the static
	      CPUs.



	      The -c option can be only applied to  hosts  whose  total	 slot
	      counts  equal  to their total CPU counts. MXJ in lsb.hosts must
	      be less than or equal to the number of  CPUs  and	 PJOB_LIMIT=1
	      must be specified in the queue (lsb.queues).



	      For example, a 6-CPU job is submitted to hostA and hostB with 4
	      CPUs each.  Without -c, Lava would let the  job  take  4	slots
	      from hostA first and then take 2 slots from hostB regardless to
	      the status or the slots usage on hostA and hostB. If any	slots
	      on  hostA	 are used, the job will remain pending. With -c, Lava
	      takes into consideration that hostA has  2  slots	 in  use  and
	      hostB  is	 completely free, so Lava is able to dispatch the job
	      using the 2 free slots on hostA and all 4 slots on hostB.




       -m host_name ...



	      Required. Specify one or more hosts on which to run the job.




       job_ID


	      Required. Specify the job to run.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




LIMITATIONS




       You cannot force a job in SSUSP or USUSP state.


       brun does not guarantee a job will run; it just forces  Lava  to	 dis-
       patch the job.






		      November 2004   Platform Lava 6.1		      brun(8)
Back to top arrowbup
bstop(1)							     bstop(1)



NAME
       bstop  - suspends unfinished jobs


SYNOPSIS







       bstop  [-a]  [-d]  [-J  job_name]  [-m  host_name] [-q queue_name] [-u
       user_name | -u all] [0] [job_ID] ...


       bstop [-h | -V]


DESCRIPTION




       Suspends unfinished jobs.


       Sends the SIGSTOP signal to sequential jobs and the SIGTSTP signal  to
       parallel jobs to suspend them.


       You  must  specify a job ID or -J, -m, -u, or -q. You cannot suspend a
       job that is already suspended. Specify job ID 0 (zero) to stop  multi-
       ple jobs.


       Only  root  and	Lava  administrators can operate on jobs submitted by
       other users.


       Use bresume to resume suspended jobs.


       Using bstop on a job that is in the USUSP state has no effect.


       You can also use bkill -s STOP to send the suspend signal to a job  or
       use  bkill - s TSTP to suspend one or more parallel jobs. Use bkill -s
       CONT to send a resume signal to a job.


       If a signal request fails to reach the job execution host,  Lava	 will
       retry  the  operation  later  when  the	host  becomes reachable. Lava
       retries the most recent signal request.


OPTIONS




       0


	      Suspends all the jobs that satisfy other options (-m,  -q,  -u,
	      and -J).




       -a


	      Suspends all jobs.




       -d


	      Suspends only finished jobs (with a DONE or EXIT status).




       -J job_name



	      Suspends only jobs with the specified name.




       -m host_name



	      Suspends only jobs dispatched to the specified host.




       -q queue_name



	      Suspends only jobs in the specified queue.




       -u user_name | -u all



	      Suspends	only jobs owned by the specified user or all users if
	      the keyword all is specified.




       job_ID ...



	      Suspends only the specified jobs. Jobs submitted	by  any	 user
	      can be specified here without using the -u option.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




EXAMPLES





       % bstop 314




       Suspends job number 314.



       % bstop -m hostA




       Suspends the invoker's last job that was dispatched to host hostA.



       % bstop -u jsmith 0




       Suspends all the jobs submitted by user jsmith.



       % bstop -u all




       Suspends the last submitted job in the Lava system.



       % bstop -u all 0




       Suspends all jobs for all users in the Lava system.


SEE ALSO




       bsub(1),	  bjobs(1),   bqueues(1),  bhosts(1),  bresume(1),  bkill(1),
       kill(1), lsb.params(5)






		      November 2004   Platform Lava 6.1		     bstop(1)
Back to top arrowbup
bsub(1)								      bsub(1)



NAME
       bsub  - submits a batch job to Lava


SYNOPSIS




       bsub [options] command [arguments]


       bsub [-h | -V]


OPTION LIST


       -B
       -H
       -I
       -K
       -N
       -r
       -a esub_parameters
       -b [[month:]day:]hour:minute
       -c [hour:]minute[/host_name | /host_model] 139

       -C core_limit
       -D data_limit
       -e err_file
       -E "pre_exec_command [arguments ...]"
       -f "local_file operator [remote_file]" ...

       -F file_limit
       -i input_file
       -J job_name
       -k "checkpoint_dir [checkpoint_period][method=method_name]"
       -L login_shell
       -m "host_name..."
       -M mem_limit
       -n number_proc
       -o out_file
       -q "queue_name ..."
       -R "res_req"
       -s signal
       -S stack_limit
       -sp priority
       -t [[month:]day:]hour:minute
       -u mail_user
       -v swap_limit
       -w 'dependency_expression'
       -W [hour:]minute[/host_name | /host_model]

       -h
       -V


DESCRIPTION




       Submits	a  job	for batch execution and assigns it a unique numerical
       job ID.


       Runs the job on a host that satisfies all  requirements	of  the	 job,
       when  all  conditions  on the job, host, queue, and cluster are satis-
       fied. If Lava cannot run all jobs immediately, Lava  scheduling	poli-
       cies  determine	the order of dispatch. Jobs are started and suspended
       according to the current system load.


       Sets the user's execution environment for the job, including the	 cur-
       rent  working directory, file creation mask, and all environment vari-
       ables, and sets Lava environment variables before starting the job.


       When a job is run, the command  line  and  stdout/stderr	 buffers  are
       stored in the directory home_directory/.lsbatch on the execution host.
       If this directory is not accessible, /tmp/.lsbtmpuser_ID	 is  used  as
       the  job's  home	 directory. If the current working directory is under
       the home directory on the submission host, then	the  current  working
       directory is also set to be the same relative directory under the home
       directory on the execution host. The job is run in /tmp if the current
       working directory is not accessible on the execution host.


       If no command is supplied, bsub prompts for the command from the stan-
       dard input.  On UNIX, the input is terminated by entering CTRL-D on  a
       new  line. On Windows, the input is terminated by entering CTRL-Z on a
       new line.


       Use -n to submit a parallel job.


       Use -I to submit an  interactive job.


       Use -J to assign a name to your job.


       Use -k to specify a checkpointable job.


       To kill a batch job submitted with bsub, use bkill.


       Use bmod to modify  jobs	 submitted  with  bsub.	 bmod  takes  similar
       options to bsub.


   DEFAULT BEHAVIOR





	      Lava  assumes  that uniform user names and user ID spaces exist
	      among all the hosts in the cluster. That is, a job submitted by
	      a given user will run under the same user's account on the exe-
	      cution host. For situations where	 nonuniform  user  names  and
	      user ID spaces exist, account mapping must be used to determine
	      the account used to run a job.



	      The job is not checkpointable.



	      bsub automatically selects an appropriate queue. If you defined
	      a	 default queue list by setting LSB_DEFAULTQUEUE, the queue is
	      selected from your list. If LSB_DEFAULTQUEUE  is	not  defined,
	      the queue is selected from the system default queue list speci-
	      fied by the Lava administrator (see the parameter DEFAULT_QUEUE
	      in lsb.params(5)).



	      bsub assumes only one processor is requested.



	      bsub  does  not start a login shell but runs the job file under
	      the execution environment from which the job was submitted.



	      The input file for the batch job is /dev/null (no input).



	      bsub sends mail to you when the job is done. The default desti-
	      nation  is  defined by LSB_MAILTO in lsf.conf. The mail message
	      includes the job report, the job output (if any), and the error
	      message (if any).




OPTIONS




       -B


	      Sends  mail to you when the job is dispatched and begins execu-
	      tion.




       -H


	      Holds the job in the PSUSP state when the job is submitted. The
	      job  will	 not be scheduled until you tell the system to resume
	      the job (see bresume(1)).



	      Submits a batch interactive job. A new job cannot be  submitted
	      until the interactive job is completed or terminated.



	      Sends the job's standard output (or standard error) to the ter-
	      minal. Does not send mail to you when the job  is	 done  unless
	      you specify the -N option.



	      If  the  -i input_file option is specified, you cannot interact
	      with the job's standard input via the terminal.



	      If the -o out_file option is specified, sends the	 job's	stan-
	      dard  output  to	the specified output file. If the -e err_file
	      option is specified, sends the  job's  standard  error  to  the
	      specified error file.



	      You cannot use -I with the -K option.



	      Interactive jobs cannot be checkpointed.



	      Interactive jobs cannot be rerunnable (bsub -r).




       -K


	      Submits  a  batch	 job and waits for the job to complete. Sends
	      the message "Waiting for dispatch" to  the  terminal  when  you
	      submit the job. Sends the message "Job is finished" to the ter-
	      minal when the job is done.



	      You will not be able to submit another job  until	 the  job  is
	      completed.  This	is  useful  when  completion  of  the  job is
	      required in order to proceed, such as a job script. If the  job
	      needs to be rerun due to transient failures, bsub returns after
	      the job finishes successfully. bsub will	exit  with  the	 same
	      exit  code  as the job so that job scripts can take appropriate
	      actions based on the exit codes. bsub exits with value  126  if
	      the job was terminated while pending.



	      You cannot use the -K option with the -I options.




       -N


	      Sends the job report to you by mail when the job finishes. When
	      used without  any	 other	options,  behaves  the	same  as  the
	      default.



	      Use  only	 with  -o, and -I options, which do not send mail, to
	      force Lava to send you a mail message when the job is done.




       -r


	      If the execution host becomes unavailable while a job  is	 run-
	      ning,  specifies	that the job will rerun on another host. Lava
	      requeues the job in the same job queue with the  same  job  ID.
	      When an available execution host is found, reruns the job as if
	      it were submitted new, even if the job has  been	checkpointed.
	      You  receive  a  mail message informing you of the host failure
	      and requeuing of the job.



	      If the system goes down while a job is running, specifies	 that
	      the job will be requeued when the system restarts.



	      Reruns a job if the execution host or the system fails; it does
	      not rerun a job if the job itself fails.



	      Interactive jobs (bsub -I) cannot be rerunnable.




       -a esub_parameters



	      String format parameter containing the name of an	 application-
	      specific esub program to be passed to the master esub. The mas-
	      ter esub program (LSF_SERVERDIR/mesub) handles  job  submission
	      requirements  of	the  applications.  Application-specific esub
	      programs can specify their own job submission requirements. The
	      value  of	 -a  is	 set  in the LSB_SUB_ADDITIONAL option in the
	      LSB_SUB_PARM file used by esub.



	      Use the -a option to specify which application-specific esub is
	      invoked by mesub.



	      For example, to submit a job to hostA that invokes two applica-
	      tion-specific esub programs named esub.license:  and  esub.flu-
	      ent:





       % bsub -a license fluent -m hostA my_job



	      mesub  uses  the	method	name license to invoke the esub named
	      LSF_SERVERDIR/esub.license,  and	the  method  name  fluent  to
	      invoke the esub named LSF_SERVERDIR/esub.fluent.



	      The  value  of  -a  is passed to esub, but it does not directly
	      affect the other bsub parameters or behavior. The value  of  -a
	      must  correspond	to  an	actual esub file. For example, to use
	      bsub  -a	fluent,	 the   file   esub.fluent   must   exist   in
	      LSF_SERVERDIR.



	      Mandatory	 esub  methods specified by LSB_ESUB_METHOD (environ-
	      ment variable or set in lsf.conf), are invoked before any	 esub
	      programs specified by -a.



	      The  name of the esub program must be a valid file name. It can
	      contain only alphanumeric characters, underscore (_) and hyphen
	      (-).





   Compatibility note




       After LSF version 5.1, the value of -a and LSB_ESUB_METHOD must corre-
       spond to an actual esub file in LSF_SERVERDIR.  For  example,  to  use
       bsub -a fluent, the file esub.fluent must exist in LSF_SERVERDIR.



       -b [[month:]day:]hour:minute



	      Dispatches the job for execution on or after the specified date
	      and  time.  The  date   and   time   are	 in   the   form   of
	      [[month:]day:]hour:minute	 where	the number ranges are as fol-
	      lows: month 1-12, day 1-31, hour 0-23, minute 0-59.



	      At least two fields must be specified. These fields are assumed
	      to  be hour:minute. If three fields are given, they are assumed
	      to be day:hour:minute,  and  four	 fields	 are  assumed  to  be
	      month:day:hour:minute.




       -c [hour:]minute[/host_name | /host_model]



	      Limits  the total CPU time the job can use. This option is use-
	      ful for preventing runaway jobs or jobs that use	up  too	 many
	      resources.  When	the  total  CPU	 time  for  the whole job has
	      reached the limit, a SIGXCPU signal is first sent to  the	 job,
	      then SIGINT, SIGTERM, and SIGKILL.



	      If  LSB_JOB_CPULIMIT in lsf.conf is set to n, Lava-enforced CPU
	      limit is disabled and Lava passes the limit  to  the  operating
	      system.  When one process in the job exceeds the CPU limit, the
	      limit is enforced by the operating system.



	      The CPU limit is in the form of [hour:]minute. The minutes  can
	      be  specified  as	 a number greater than 59. For example, three
	      and a half hours can either be specified as 3:30, or 210.



	      The CPU time you specify is the normalized CPU  time.  This  is
	      done so that the job does approximately the same amount of pro-
	      cessing for a given CPU limit, even if it is sent to host	 with
	      a	 faster	 or  slower  CPU.  Whenever  a normalized CPU time is
	      given, the actual time on the execution host is  the  specified
	      time  multiplied	by  the	 CPU factor of the normalization host
	      then divided by the CPU factor of the execution host.



	      Optionally, you can supply a host name or	 a  host  model	 name
	      defined  in  Lava.  You must insert a slash (/) between the CPU
	      limit and the host name or model name.  If a host name or model
	      name is not given, Lava uses the default CPU time normalization
	      host  defined  at	 the  queue   level   (DEFAULT_HOST_SPEC   in
	      lsb.queues)  if  it  has	been  configured,  otherwise uses the
	      default CPU time normalization  host  defined  at	 the  cluster
	      level  (DEFAULT_HOST_SPEC in lsb.params) if it has been config-
	      ured, otherwise uses the submission host.




       -C core_limit



	      Sets a per-process (soft) core file size limit for all the pro-
	      cesses  that  belong  to this batch job (see getrlimit(2)). The
	      core limit is specified in KB.



	      The behavior of this option depends on  platform-specific	 UNIX
	      systems.



	      In  some cases, the process is sent a SIGXFSZ signal if the job
	      attempts to create a core file larger than the specified limit.
	      The SIGXFSZ signal normally terminates the process.



	      In  other cases, the writing of the core file terminates at the
	      specified limit.




       -D data_limit



	      Sets a per-process (soft) data segment size limit for  each  of
	      the  processes that belong to the batch job (see getrlimit(2)).
	      The data limit is specified in KB. A sbrk call  to  extend  the
	      data segment beyond the data limit will return an error.




       -e err_file



	      Specify  a  file path. Appends the standard error output of the
	      job to the specified file.



	      If you use the special character %J in the name  of  the	error
	      file, then %J is replaced by the job ID of the job.



	      If  the current working directory is not accessible on the exe-
	      cution host after the job	 starts,  Lava	writes	the  standard
	      error output file to /tmp/.




       -E "pre_exec_command [arguments ...]"



	      Runs  the	 specified pre-exec command on the batch job's execu-
	      tion host before actually running the job. For a parallel	 job,
	      the  pre-exec  command  runs on the first host selected for the
	      parallel job.



	      If the pre-exec command exits with 0 (zero), then the real  job
	      is  started on the selected host. Otherwise, the job (including
	      the pre-exec command) goes back to PEND status and is  resched-
	      uled.



	      If  your job goes back into PEND status, Lava will keep on try-
	      ing to run the pre-exec command and the real  job	 when  condi-
	      tions  permit. For this reason, be sure that your pre-exec com-
	      mand can be run many times without having side effects.



	      The standard input and output  for  the  pre-exec	 command  are
	      directed	to  the	 same files as for the real job. The pre-exec
	      command runs under the same user	ID,  environment,  home,  and
	      working  directory  as the real job. If the pre-exec command is
	      not in the user's normal execution path (the  $PATH  variable),
	      the full path name of the command must be specified.




       -f "local_file operator [remote_file]" ...



	      Copies  a	 file  between	the  local  (submission) host and the
	      remote (execution) host.	Specify absolute or  relative  paths,
	      including the file names. You should specify the remote file as
	      a file name with no path when running in non-shared systems.



	      If the remote file is not specified, it defaults to  the	local
	      file,  which  must be given. Use multiple -f options to specify
	      multiple files.



operator



       An operator that specifies whether the file is copied  to  the  remote
       host,  or whether it is copied back from the remote host. The operator
       must be surrounded by white space.



       The following describes the operators:



       > Copies the local file to the remote  file  before  the	 job  starts.
       Overwrites the remote file if it exists.



       <  Copies  the  remote file to the local file after the job completes.
       Overwrites the local file if it exists.



       << Appends the remote file to the local file after the job  completes.
       The local file must exist.



       ><  Copies  the	local  file to the remote file before the job starts.
       Overwrites the remote file if it exists. Then copies the	 remote	 file
       to  the local file after the job completes. Overwrites the local file.



       <> Copies the local file to the remote file  before  the	 job  starts.
       Overwrites  the	remote file if it exists. Then copies the remote file
       to the local file after the job completes. Overwrites the local	file.




       If  you	use the -i input_file option, then you do not have to use the
       -f option to copy the specified input file to the execution host. Lava
       does  this for you, and removes the input file from the execution host
       after the job completes.



       If you use the -o out_file, or -e err_file option, and  you  want  the
       specified  file	to be copied back to the submission host when the job
       completes, then you must use the -f option.



       If the submission and execution hosts have different directory  struc-
       tures, you must make sure that the directory where the remote file and
       local file will be placed exists.



       If the local and remote hosts have different  file  name	 spaces,  you
       must always specify relative path names. If the local and remote hosts
       do not share the same file system, you must make sure that the  direc-
       tory  containing	 the  remote file exists. It is recommended that only
       the file name be given for the remote file when running	in  heteroge-
       neous  file systems. This places the file in the job's current working
       directory. If the file is shared between the submission and  execution
       hosts, then no file copy is performed.



       Lava  uses  lsrcp to transfer files (see lsrcp(1) command). lsrcp con-
       tacts RES on the remote host to perform the file transfer. If  RES  is
       not  available, rcp is used (see rcp(1)). The user must make sure that
       the rcp binary is in the user's $PATH on the execution host.



       Jobs that are submitted from Lava client hosts should specify  the  -f
       option  only  if	 rcp  is  allowed.  Similarly, rcp must be allowed if
       account mapping is used.




-F file_limit



       Sets a per-process (soft) file size limit for each  of  the  processes
       that  belong  to the batch job (see getrlimit(2)). The file size limit
       is specified in KB. If a job process attempts to write to a file	 that
       exceeds	the file size limit, then that process is sent a SIGXFSZ sig-
       nal. The SIGXFSZ signal normally terminates the process.




-i input_file



       Gets the standard input for the job from specified  file.  Specify  an
       absolute	 or  relative  path.  The input file can be any type of file,
       though it is typically a shell script text file.



       If the file exists on the execution host,  Lava	uses  it.  Otherwise,
       Lava  attempts to copy the file from the submission host to the execu-
       tion host. For the file copy to be successful, you must	allow  remote
       copy (rcp) access, or you must submit the job from a server host where
       RES is running. The file is copied from the submission host to a	 tem-
       porary  file  in	 $HOME/.lsbatch directory on the execution host. Lava
       removes this file when the job completes.




-J job_name



       Assigns the specified name to the job.



       The job name need not be unique.



       After a job is submitted, you use the job name to identify the job.




-k "checkpoint_dir [checkpoint_period][method=method_name]"



       Makes a job checkpointable and specifies the checkpoint directory.  If
       you omit the checkpoint period, the quotes are not required. Specify a
       relative or absolute path name.



       When a job is checkpointed, the checkpoint information  is  stored  in
       checkpoint_dir/job_ID/file_name. Multiple jobs can checkpoint into the
       same directory.	The system can create multiple files.



       The checkpoint directory is used for restarting the job.



       Optionally, specifies a checkpoint period in minutes. Specify a	posi-
       tive  integer.  The  running  job  is checkpointed automatically every
       checkpoint  period.  The	 checkpoint  period  can  be  changed	using
       bchkpnt(1).  Because  checkpointing  is	a  heavyweight operation, you
       should choose a checkpoint period greater than half an hour.



       Optionally, specifies a custom checkpoint and restart  method  to  use
       with  the  job. Use method=default to indicate to use the default Lava
       checkpoint and restart  programs	 for  the  job,	 echkpnt.default  and
       erestart.default.



       The  echkpnt.method_name	 and erestart.method_name programs must be in
       LSF_SERVERDIR or in the directory specified by  LSB_ECHKPNT_METHOD_DIR
       (environment variable or set in lsf.conf).



       If  a  custom  checkpoint and restart method is already specified with
       LSB_ECHKPNT_METHOD (environment variable or in lsf.conf),  the  method
       you specify with bsub -k overrides this.



       Process	checkpointing  is  not	available  on all host types, and may
       require linking programs with a special libraries (see  libckpt.a(3)).
       Lava invokes echkpnt (see echkpnt(8)) found in LSF_SERVERDIR to check-
       point the job. You can override the default echkpnt  for	 the  job  by
       defining	 as  environment  variables or in lsf.conf LSB_ECHKPNT_METHOD
       and LSB_ECHKPNT_METHOD_DIR to point to your own echkpnt.	 This  allows
       you to use other checkpointing facilities, including application-level
       checkpointing.



       The checkpoint method directory should be accessible by all users  who
       need to run the custom echkpnt and erestart programs.




-L login_shell



       Initializes the execution environment using the specified login shell.
       The specified login shell must be an absolute path. This is not neces-
       sarily the shell under which the job will be executed.




-m "host_name..."



       Runs the job on one of the specified hosts.



       By  default,  if	 multiple  hosts  are candidates, runs the job on the
       least-loaded host.



       If you also use -q, the specified queue must be configured to  include
       all the hosts in the your host list. Otherwise, the job is not submit-
       ted. To find out what hosts are configured for the queue, use  bqueues
       -l.




-M mem_limit



       Sets  a	per-process  (soft)  memory  limit for all the processes that
       belong to this batch job (see getrlimit(2)). The memory limit is spec-
       ified in KB.




-n number_proc



       Submits a parallel job and specifies the number of processors required
       to run the job (some of the processors may be on the same multiproces-
       sor host).



       Once  at	 the  required	number of processors is available, the job is
       dispatched to the first host selected. The list of selected host names
       for  the	 job are specified in the environment variables LSB_HOSTS and
       LSB_MCPU_HOSTS. The job itself is expected to  start  parallel  compo-
       nents  on  these hosts and establish communication among them, option-
       ally using RES.




-o out_file



       Specify a file path. Appends the standard output of  the	 job  to  the
       specified  file.	 Sends the output by mail if the file does not exist,
       or the system has trouble writing to it.



       If only a file name is specified, Lava writes the output file  to  the
       current	working	 directory.  If	 the current working directory is not
       accessible on the execution host after the job starts, Lava writes the
       standard output file to /tmp/.



       If  you	use -o without -e, the standard error of the job is stored in
       the output file.



       If you use -o without -N, the job report is stored in the output	 file
       as the file header.



       If you use both -o and -N, the output is stored in the output file and
       the job report is sent by mail. The job report itself does not contain
       the  output, but the report will advise you where to find your output.



       If you use the special character %J in the name of  the	output	file,
       then %J is replaced by the job ID of the job.




-q "queue_name ..."



       Submits	the  job  to one of the specified queues. Quotes are optional
       for a single queue.  The specified queues  must	be  defined  for  the
       local  cluster.	For a list of available queues in your local cluster,
       use bqueues.



       When a list of queue names is specified, Lava selects the most  appro-
       priate queue in the list for your job based on the job's resource lim-
       its, and other restrictions, such as the requested hosts, your  acces-
       sibility	 to a queue, queue status (closed or open), etc. The order in
       which the queues are considered is  the	same  order  in	 which	these
       queues are listed. The queue listed first is considered first.




-R "res_req"



       Runs the job on a host that meets the specified resource requirements.
       A resource requirement string describes the  resources  a  job  needs.
       Lava  uses  resource requirements to select hosts for remote execution
       and job execution.



       The size of the resource requirement string is limited to 512  charac-
       ters.



       Any  run-queue-length-specific  resource,  such	as r15s, r1m or r15m,
       specified in the resource requirements refers to	 the  normalized  run
       queue length.



       A resource requirement string is divided into the following sections:





       o  A  selection	section (select). The selection section specifies the
	  criteria for selecting hosts from the system.


       o An ordering section (order). The ordering section indicates how  the
	  hosts that meet the selection criteria should be sorted.




	  If no section name is given, then the entire string is treated as a
	  selection string. The select keyword may be omitted if  the  selec-
	  tion string is the first string in the resource requirement.



	  The resource requirement string has the following syntax:







select[selection_string] order[order_string]



       The square brackets must be typed as shown.



       The section names are select and order. Sections that do not apply for
       a command are ignored.



       Each section has a different syntax.



       For example, to submit a job which will run on Solaris 7 or Solaris 8:




% bsub -R "sol7 || sol8" myjob



       The  following command runs the job called myjob on an HP-UX host that
       is lightly loaded (CPU utilization) and has at least  15	 MB  of	 swap
       memory available.




% bsub -R "swp > 15 && hpux order[cpu]" myjob



       You  defined  a resource called bigmem in lsf.shared and defined it as
       an exclusive resource for hostE in lsf.cluster.mycluster. Use the fol-
       lowing command to submit a job that will run on hostE:




% bsub -R "bigmem" myjob



or



% bsub -R "defined(bigmem)" myjob



       You  configured	a static shared resource for licenses for the Verilog
       application as a resource called verilog_lic. To	 submit	 a  job	 that
       will run on a host when there is a license available:




% bsub -R "select[defined(verilog_lic)]" myjob



-s signal



       Send the specified signal when a queue-level run window closes.



       By  default, when the window closes, Lava suspends jobs running in the
       queue (job state becomes SSUSP) and stops dispatching  jobs  from  the
       queue.



       Use -s to specify a signal number; when the run window closes, the job
       is signalled by this signal instead of being suspended.




-S stack_limit



       Sets a per-process (soft) stack segment size limit  for	each  of  the
       processes  that	belong to the batch job (see getrlimit(2)). The limit
       is specified in KB.




-sp priority



       Specifies user-assigned job priority which allow users to order	their
       jobs  in a queue. Valid values for priority are any integers between 1
       and MAX_USER_PRIORITY. Job priorities that are not valid are rejected.
       Lava   and   queue   administrators   can  specify  priorities  beyond
       MAX_USER_PRIORITY.



       The job owner can change the priority of	 their	own  jobs.  Lava  and
       queue administrators can change the priority of all jobs in a queue.



       Job  order is the first consideration to determine job eligibility for
       dispatch. Jobs are still subject to all scheduling policies regardless
       of  job	priority.  Jobs with the same priority are ordered first come
       first served.




-t [[month:]day:]hour:minute



       Specifies the job termination deadline.



       If a UNIX job is still running at the termination  time,	 the  job  is
       sent  a	SIGUSR2 signal, and is killed if it does not terminate within
       ten minutes.



       In the queue definition, a TERMINATE action can be configured to over-
       ride  the  bkill	 default  action  (see	the JOB_CONTROLS parameter in
       lsb.queues(5)).



       The format for the termination time is [[month:]day:]hour:minute where
       the  number  ranges  are	 as follows: month 1-12, day 1-31, hour 0-23,
       minute 0-59.



       At least two fields must be specified. These fields are assumed to  be
       hour:minute.  If	 three	fields	are  given,  they  are	assumed to be
       day:hour:minute,	  and	 four	 fields	   are	  assumed    to	   be
       month:day:hour:minute.




-u mail_user



       Sends mail to the specified email destination.




-v swap_limit



       Set the total process virtual memory limit to swap_limit in KB for the
       whole job. The default is no limit. Exceeding the limit causes the job
       to terminate.




-w 'dependency_expression'



       Lava  will  not place your job unless the dependency expression evalu-
       ates to TRUE. If you specify a dependency on a job  that	 Lava  cannot
       find (such as a job that has not yet been submitted), your job submis-
       sion fails.



       The dependency expression is a logical expression composed of  one  or
       more  dependency conditions. To make dependency expression of multiple
       conditions, use the following logical operators:



       && (AND)



       || (OR)



       ! (NOT)



       Use parentheses to indicate the order of operations, if necessary.



       Enclose the dependency expression in single quotes (') to prevent  the
       shell from interpreting special characters (space, any logic operator,
       or parentheses). If you use single quotes for the  dependency  expres-
       sion, use double quotes for quoted items within it, such as job names.



       In dependency conditions, job names specify only your own jobs, unless
       you  are	 the Lava administrator. If you use the job name to specify a
       dependency condition, and more than one of  your	 jobs  has  the	 same
       name,  all of your jobs that have that name must satisfy the test. Use
       double quotes (") around job names that begin with a  number.  In  the
       job  name, specify the wildcard character asterisk (*) at the end of a
       string, to indicate all jobs whose name begins with  the	 string.  For
       example,	 if  you  use  jobA* as the job name, it specifies jobs named
       jobA, jobA1, jobA_test, jobA.log, etc.



       In dependency conditions, the variable op represents one of  the	 fol-
       lowing relational operators:



       >



       >=



       <



       <=



       ==



       !=



       Use the following conditions to form the dependency expression.



done(job_ID |"job_name" ...)



       The job state is DONE.



       Lava refers to the oldest job of job_name in memory.




ended(job_ID | "job_name")



       The job state is EXIT or DONE.




exit(job_ID | "job_name" [,[operator] exit_code])



       The  job state is EXIT, and the job's exit code satisfies the compari-
       son test.



       If you specify an exit code with no operator, the test is for equality
       (== is assumed).



       If you specify only the job, any exit code satisfies the test.




job_ID | "job_name"



       If  you	specify a job without a dependency condition, the test is for
       the DONE state  (Lava  assumes  the  "done"  dependency	condition  by
       default).




post_done(job_ID | "job_name")



       The  job	 state is POST_DONE (the post-processing of specified job has
       completed without errors).




post_err(job_ID | "job_name")



       The job state is POST_ERR (the post-processing of  the  specified  job
       has completed with errors).




started(job_ID | "job_name")



       The job state is:



       - RUN, DONE, or EXIT



       -  PEND	or  PSUSP,  and the job has a pre-execution command (bsub -E)
       that is running.





-W [hour:]minute[/host_name | /host_model]



       Sets the run time limit of the batch job. If a UNIX  job	 runs  longer
       than the specified run limit, the job is sent a SIGUSR2 signal, and is
       killed if it does not terminate within ten minutes.  (For  a  detailed
       description  of	how  these  jobs are killed, see bkill.) In the queue
       definition, a TERMINATE action can be configured to override the bkill
       default action (see the JOB_CONTROLS parameter in lsb.queues(5)).



       The  run	 limit	is  in	the form of [hour:]minute. The minutes can be
       specified as a number greater than 59. For example, three and  a	 half
       hours can either be specified as 3:30, or 210.



       The  run limit you specify is the normalized run time. This is done so
       that the job does approximately the same amount of processing, even if
       it  is sent to host with a faster or slower CPU. Whenever a normalized
       run time is given, the actual time on the execution host is the speci-
       fied  time multiplied by the CPU factor of the normalization host then
       divided by the CPU factor of the execution host.



       Optionally, you can supply a host name or a host model name defined in
       Lava.  You  must insert '/' between the run limit and the host name or
       model name.



       If no host or host model is given, Lava uses the default run time nor-
       malization  host	 defined  at  the  queue  level (DEFAULT_HOST_SPEC in
       lsb.queues) if it  has  been  configured;  otherwise,  Lava  uses  the
       default	CPU  time  normalization  host	defined	 at the cluster level
       (DEFAULT_HOST_SPEC in lsb.params) if it has  been  configured;  other-
       wise, Lava uses the submission host.




-h


       Prints command usage to stderr and exits.




-V


       Prints Lava release version to stderr and exits.




command [argument]



       The  job	 can  be  specified  by	 a  command line argument command, or
       through the standard input if the command is not present on  the	 com-
       mand  line.  The	 command  can  be anything that is provided to a UNIX
       Bourne shell (see sh(1)). command is assumed to begin with  the	first
       word that is not part of a bsub option. All arguments that follow com-
       mand are provided as the arguments to the command.



       If the batch job is not given on the command line, bsub reads the  job
       commands	 from  standard input. If the standard input is a controlling
       terminal, the user is prompted with bsub> for the commands of the job.
       The input is terminated by entering CTRL-D on a new line. You can sub-
       mit multiple commands through standard input.



       The commands are executed in the order in which they are	 given.	 bsub
       options can also be specified in the standard input if the line begins
       with #BSUB; e.g., #BSUB -x. If an option is given  on  both  the	 bsub
       command line, and in the standard input, the command line option over-
       rides the option in the standard input. The user can specify the shell
       to  run	the  commands  by specifying the shell path name in the first
       line of the standard input, such as #!/bin/csh. If the  shell  is  not
       given  in the first line, the Bourne shell is used. The standard input
       facility can be used to spool a user's job  script;  such  as  bsub  <
       script.



       See  EXAMPLES  below for examples of specifying commands through stan-
       dard input.




OUTPUT




       If the job is successfully submitted, displays  the  job	 ID  and  the
       queue to which the job has been submitted.


EXAMPLES




       % bsub sleep 100


	      Submit the UNIX command sleep together with its argument 100 as
	      a batch job.




       % bsub -q short -o my_output_file "pwd; ls"


	      Submit the UNIX command pwd and ls as a batch job to the	queue
	      named short and store the job output in my_output file.




       % bsub -m "host1 host3 host8 host9" my_program


	      Submit  my_program to run on one of the candidate hosts: host1,
	      host3, host8 and host9.




       % bsub -q "queue1 queue2 queue3" -c 5 my_program


	      Submit my_program to  one	 of  the  candidate  queues:  queue1,
	      queue2, and queue3 which are selected according to the CPU time
	      limit specified by -c 5.




       % bsub -I ls


	      Submit a batch interactive job which displays the output of  ls
	      at the user's terminal.




       % bsub -b 20:00 -J my_job_name my_program


	      Submit  my_program  to  run  after 8 p.m. and assign it the job
	      name my_job_name.




       % bsub my_script


	      Submit my_script as a batch job. Since my_script	is  specified
	      as  a command line argument, the my_script file is not spooled.
	      Later changes to the my_script file before  the  job  completes
	      may affect this job.




       % bsub < default_shell_script


	      where default_shell_script contains:




	      sim1.exe

	      sim2.exe



The  file default_shell_script is spooled, and the commands will be run under
the Bourne shell since a shell specification is not given in the  first	 line
of the script.




% bsub < csh_script


       where csh_script contains:




       #!/bin/csh

       sim1.exe

       sim2.exe



csh_script is spooled and the commands will be run under /bin/csh.




% bsub -q night < my_script


       where my_script contains:




       #!/bin/sh

       #BSUB -q test

       #BSUB -o outfile -e errfile # my default stdout, stderr files

       #BSUB -m "host1 host2" # my default candidate hosts

       #BSUB -f "input > tmp" -f "output << tmp"

       #BSUB -D 200 -c 10/host1

       #BSUB -t 13:00

       #BSUB -k "dir 5"

       sim1.exe

       sim2.exe



The  job is submitted to the night queue instead of test, because the command
line overrides the script.




% bsub -b 20:00 -J my_job_name


       bsub> sleep 1800

       bsub> my_program

       bsub> CTRL-D




       The job commands are entered interactively.




LIMITATIONS




       File transfer via the -f option to bsub(1) requires rcp(1) to be work-
       ing  between  the submission and execution hosts. Use the -N option to
       request mail, and/or the -o and -e options to specify an	 output	 file
       and error file, respectively.


SEE ALSO




       bjobs(1),   bkill(1),   bqueues(1),  bhosts(1),	bmod(1),  bchkpnt(1),
       lsb.queues(5), lsb.params(5), lsb.hosts(5)






		       January 2005   Platform Lava 6.1		      bsub(1)
Back to top arrowbup
bswitch(1)							   bswitch(1)



NAME
       bswitch	- switches unfinished jobs from one queue to another


SYNOPSIS




       bswitch	[-J  job_name] [-m host_name] [-q queue_name] [-u user_name |
       -u all] destination_queue [0]


       bswitch destination_queue [job_ID] ...


       bswitch [-h | -V]


DESCRIPTION




       Switches one or more of your unfinished jobs to the  specified  queue.
       Lava administrators and root can switch jobs submitted by other users.


       By default, switches one job, the most recently submitted job, or  the
       most  recently  submitted  job  that  also  satisfies  other specified
       options (-m, -q, -u, or -J). Specify  -0	 (zero)	 to  switch  multiple
       jobs.


       The switch operation can be done only if a specified job is acceptable
       to the new queue as if it were submitted to it, and, in case  the  job
       has  been  dispatched  to  a  host, if the host can be used by the new
       queue. If the switch operation is unsuccessful, the job stays where it
       is.


       If  a  switched job has not been dispatched, then its behavior will be
       as if it were submitted to the new queue in the first place.


       If a switched job has been dispatched, then it will be  controlled  by
       the  loadSched and loadStop vectors and other configuration parameters
       of the new queue, but its nice value and resource limits	 will  remain
       the same.


       Also,  if a switched job has been dispatched, it will be controlled by
       the PRIORITY and RUN_WINDOW configuration parameters of the new queue.


       The  bswitch  command is useful to change a job's attributes inherited
       from the queue.


OPTIONS




       0


	      (Zero). Switches multiple jobs. Switches all the jobs that sat-
	      isfy other specified options (-m, -q, -u and -J).




       -J job_name



	      Only switches jobs that have the specified job name.




       -m host_name



	      Only switches jobs dispatched to the specified host.




       -q queue_name



	      Only switches jobs in the specified queue.




       -u user_name |-u all



	      Only  switches  jobs  submitted  by  the specified user, or all
	      users if you specify the keyword all.




       destination_queue



	      Required. Specify the queue to which the job is to be moved.




       job_ID ...



	      Switches only the specified jobs.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




SEE ALSO




       bqueues(1), bhosts(1), bsub(1), bjobs(1)






		      November 2004   Platform Lava 6.1		   bswitch(1)
Back to top arrowbup
btop(1)								      btop(1)



NAME
       btop  - moves a pending job relative to the first job in the queue


SYNOPSIS




       btop job_ID | job_ID [position]


       btop [-h | -V]


DESCRIPTION




       Changes	the  queue  position of a pending job, to affect the order in
       which jobs are considered for dispatch.


       By default, Lava dispatches jobs in a queue  in	the  order  of	their
       arrival (that is, first-come-first-served), subject to availability of
       suitable server hosts.


       The btop command allows users and the Lava administrator	 to  manually
       change  the order in which jobs are considered for dispatch. Users can
       only operate on their own jobs, whereas	the  Lava  administrator  can
       operate	on  any user's jobs. Users can only change the relative posi-
       tion of their own jobs.


       If invoked by the Lava administrator,  btop  moves  the	selected  job
       before  the  first  job with the same priority submitted to the queue.
       The positions of all users' jobs in the queue can be  changed  by  the
       Lava administrator.


       If  invoked  by a regular user, btop moves the selected job before the
       first job with the same priority submitted by the user to  the  queue.
       Pending jobs are displayed by bjobs in the order in which they will be
       considered for dispatch.


OPTIONS




       job_ID


	      Required. Job ID of the job on which to operate.




       position



	      Optional. The position argument can be  specified	 to  indicate
	      where in the queue the job is to be placed. position is a posi-
	      tive number that indicates the target position of the job	 from
	      the  beginning of the queue. The positions are relative to only
	      the applicable jobs in the  queue,  depending  on	 whether  the
	      invoker  is  a  regular  user  or	 the  Lava administrator. The
	      default value of 1 means the position is before all  the	other
	      jobs in the queue that have the same priority.




       -h


	      Prints command usage to stderr and exits.




       -V


	      Prints Lava release version to stderr and exits.




SEE ALSO




       bbot(1), bjobs(1), bswitch(1)




		      November 2004   Platform Lava 6.1		      btop(1)