Basic Usage

Some high-level tools are provided for basic usage of DynRM. For more customization refer to the ‘advanced usage’ chapter.

Basic DynRM runs

DynRM offers a basic command to run a dedicated DynRM instance for executing a single job or job mix:

dynrm_run
    --topology              Path to the topology used in this run
    --submission            Path to submission file (job batch file or job mix file) executed by this run
    --system                System class name to be used in this run
    --policy                Scheduling policy to be used in this run
    --policy_params         Paramters for scheduling policy
    --output                Path to directory where output files should be written to
    --verbosity             Verbosity level to be used in this run (0 - 11)

Topologies

DynRM requires information about the topology to be managed. The default topology creation module supports yaml files with the following syntax:

topology:
    nodes:
        [NODE_NAME]
            num_cores: [NUM_CORES]
    ...

Example of a topology consiting of two node each with 8 cores:

topology:
    nodes:
        n1:
            num_cores: 8
        n2:
            num_cores: 8

Job Submissions

DynRM requires the submission of a “Task Graph”, consting of jobs and job dependencies.

To specify resource (re-)assignments, Output Space Generators are used.

The yaml submisson module expects a python script with the .batch file extension.

tasks:
    -   name :  [Task Name]                      # e.g. "task1"
        executable : [Name of executable]       # absolute or relative path or binary name"
        arguments :                             # Arguments to be passed to executable
            - arg1                              # First <String> argument
            - arg2                              # Second <String> argument
            - ...
        launch_generator:                       # Information about job launch (see Output Space Generators)
            model: [PSetModel class]            # [OPTIONAL] PsetModel class to use for output PSet. E.g. "AmdahlPsetModel"
            model_params:                       # [OPTIONAL] Parameters of the PsetModel
                key1 : value1                   # [OPTIONAL] default =  t_s : 1
                key2 : value2                   # [OPTIONAL] default =  t_p : 200
            mapping : "[NUM_PROCS]:node"        # [OPTIONAL] Mapping of output PSets (Number of processes per node)
            num_procs : [NUM_PROCS]             # [OPTIONAL] Fixed Number of processes tp start


        generators:                                 # [OPTIONAL] Define Output Space Generators to be referenced by the applications
            -   key : [generator_key]               # Key to be used to reference this Output Space Generator
                function : [generator_function]     # See Output Space Generators "output_space_generator_replace"
                model: [PSetModel class]            # [OPTIONAL] PsetModel class to use for output PSet. E.g. "AmdahlPsetModel"
                model_params:                       # [OPTIONAL] Parameters of the PsetModel
                    key1 : [value1]                 # [OPTIONAL] default =  t_s : 1
                    key2 : [value2]                 # [OPTIONAL] default =  t_p : 200
                mapping : "[NUM_PROCS]:node"        # [OPTIONAL] Mapping of output PSets (Number of processes per node)
                num_procs_add : [NUM_PROCS]         # [OPTIONAL] Fixed number of processes to add
                num_procs_sub : [NUM_PROCS]         # [OPTIONAL] Fixed number of processes to remove
                max_procs : [NUM_PROCS]             # [OPTIONAL] Maximum number of processes after reconfiguration
                min_procs : [NUM_PROCS]             # [OPTIONAL] Minimum number of processes after reconfiguration
                power2 : [true/false]           # [OPTIONAL] Allow only power of 2 numbers of proesses
                factor : [FACTOR]                   # [OPTIONAL] Fixed factor between number of processes of input and output

runtime : [SECONDS]                                 # Estimated runtime of task graph (not considered by all policies)
num_nodes : [NUM_NODES]                             # Fixed number of nodes to allocate (Only for static scheduling)

Example of a job starting with 8 processes on 1 node. During runtime, processes can be added/removed, with a miximum of 64 processes, allowing only powers of 2 numbers of processes. I.e. valid configurations are 8, 16, 32, 64 processes runnning on 1, 2, 4, 8 nodes:

example_submission.batch
tasks:
    -   name :  "Example Task"
        executable : /path/to/my_executable
        arguments :
            "a_positional_arg"
            "--key"
            "value"
        launch_generator:
            model: "AmdahlPsetModel"
            model_params:
                key1 : t_s : 1
                key2 : t_p : 200
            mapping : "8:node"
            num_procs : 8


        generators:
            -   key : "power2_reconf"
                function : "output_space_generator_replace"
                model: "AmdahlPsetModel"
                model_params:
                    t_s : 1
                    t_p : 200
                mapping : "8:node"
                max_procs : 64
                power2 : true

runtime : 3600
num_nodes : 1

Job Reconfiguration

Following the DPP design principles, job reconfiguartions are expressed as Process Set Operarations. The Dynamic Open-MPI and Dynamic OpenPMIx libraries provide a according functions to specify PSet Operations.

These interfaces support the following arguments:

  • input_psets: The list of input PSets of the operation

  • psetop_type: The type of the PSet operation, e.g. ADD, SUB, GROW, SHRINK, REPLACE, SPLIT, UNION, DIFFERENCE, …

In addition to this, addaitional information for DynRM can be provided, e.g. via MPI_Info object, or :

  • model: The PSetOp model. The Default Models are of the form: Default[Operaration]Model, e.g. Default[Replace]Model

  • generator_key: <String> Key to reference an output space generator specified in job submission script, e.g. “power2_reconf”

  • output_space_generator: <String> specifying paramters binding for Output Space Generator Function (using partial from functools module). See Advanced Usage.

  • output_space_generator_json: A <json string> containing paramaters for Output Space Generators

{
  "function": "[generator_function]",       // Output Space Generator function (e.g., "output_space_generator_replace")
  "model": "[PSetModel class]",             // OPTIONAL: PSetModel class used for output (e.g., "AmdahlPsetModel")

  "model_params": {                         // OPTIONAL: Parameters for the PSetModel
    "key1": "[value1]",                     // OPTIONAL: default = t_s : 1
    "key2": "[value2]"                      // OPTIONAL: default = t_p : 200
  },

  "mapping": "[NUM_PROCS]:node",            // OPTIONAL: Mapping of output PSets (processes per node)
  "num_procs_add": "[NUM_PROCS]",           // OPTIONAL: Fixed number of processes to add
  "num_procs_sub": "[NUM_PROCS]",           // OPTIONAL: Fixed number of processes to remove

  "max_procs": "[NUM_PROCS]",               // OPTIONAL: Maximum number of processes after reconfiguration
  "min_procs": "[NUM_PROCS]",               // OPTIONAL: Minimum number of processes after reconfiguration

  "power2": "[true/false]",                 // OPTIONAL: Restrict to powers of 2
  "factor": "[FACTOR]"                      // OPTIONAL: Fixed factor between input and output process counts
}

Example using MPI4py:

input_psets = ['mpi://WORLD']
op = MPI.PSETOP_REPLACE

generator_json = json.dumps(
    {
      "function": "ouputspace_generator_replace",
      "model": "AmdahlPsetModel",
      "model_params": {
        "t_s": 1,
        "t_p": 2
      },
      "mapping": "8:node",
      "max_procs": 64,
      "power2": True",
    }
)

info = MPI.Info.Create()
info.Set('model', 'DefaultReplaceModel')
info.Set('ouput_space_generator_json', generator_json)

req = session.Dyn_v2a_psetop_nb(op, input_psets, info)

# PSetOp is forwarded to DynRM. Use req.Test() to check for reconfiguration

Job Mix Submissions

Job mix files have a “.mix” file extension and are structured as csv files.

CSV structure

Job mix CSV columns

arrival_time_s

submission_path

parameters

arrival time (in seconds after job mix submission)

path to submission file (.batch file)

Optional dict of parameters

Supported parameters

  • terminate_soon: True/False - When True, after submission of this job the resource manager will terminate when all jobs have finished.

Example:

example_job_mix.mix
1,/path/to/executable1,
10,/path/to/executable10,
11,/path/to/executable11,
12,/path/to/executable12,
13,/path/to/executable13, {"terminate_soon": True}