Configuration Schema

MariusConfig

Key

Type

Description

Required

model

ModelConfig

Defines model architecture, learning task, optimizers and loss function.

Yes

storage

StorageConfig

Defines the input graph and how to store the graph (edges, features) and learned model (embeddings).

Yes

training

TrainingConfig

Hyperparameters for training.

Training

evaluation

EvaluationConfig

Hyperparameters for evaluation.

Evaluation

Below is a sample end-to-end configuration file for link prediction on fb15_237 dataset. The model consists of an embedding layer in the encoder phase which is directly fed to the DISTMULT decoder. Both embeddings and edges are stored in cpu memory.

model:
  learning_task: LINK_PREDICTION
  encoder:
    layers:
      - - type: EMBEDDING
          output_dim: 50
          bias: true
          init:
            type: GLOROT_NORMAL
  decoder:
    type: DISTMULT
  loss:
    type: SOFTMAX_CE
    options:
      reduction: SUM
  dense_optimizer:
    type: ADAM
    options:
      learning_rate: 0.01
  sparse_optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1
storage:
  full_graph_evaluation: true
  device_type: cpu
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/
  edges:
    type: DEVICE_MEMORY
    options:
      dtype: int
  embeddings:
    type: DEVICE_MEMORY
    options:
      dtype: float
training:
  batch_size: 1000
  negative_sampling:
    num_chunks: 10
    negatives_per_positive: 10
    degree_fraction: 0
    filtered: false
  num_epochs: 10
  pipeline:
    sync: true
  epochs_per_shuffle: 1
  logs_per_epoch: 10
  resume_training: false
evaluation:
  batch_size: 1000
  negative_sampling:
    filtered: true
  epochs_per_eval: 1
  pipeline:
    sync: true

Model Configuration

ModelConfig

Key

Type

Description

Required

random_seed

Int

Random seed used to initialize, train, and evaluate the model. If not given, a seed will be generated.

No

learning_task

String

Learning task for which the model is used. Valid values are [“LINK_PREDICTION”, “NODE_CLASSIFICATION”] (case insensitive). “LP” and “NC” can be used for shorthand.

Yes

ref

encoder

EncoderConfig

Defines the architecture of the encoder and configuration of neighbor samplers.

Yes

ref

decoder

DecoderConfig

Denotes the decoder to apply to the output of the encoder. The decoder is learning task specific.

Yes

ref

loss

LossConfig

Loss function to apply over the output of the decoder.

Required for training

dense_optimizer

OptimizerConfig

Optimizer to use for dense model parameters. Where dense model parameters refer to all parameters besides the node embeddings. Where node embeddings are handled by the sparse_optimizer.

Required for training

sparse_optimizer

OptimizerConfig

Optimizer to use for the node embedding parameters. Currently only ADAGRAD is supported.

No

Below is a full view of the model attribute and the corresponding parameters that can be set in the model configuration. It consists of an embedding layer in the encoder phase and a DISTMULT decoder.

model:
  random_seed: 456356765463
  learning_task: LINK_PREDICTION
  encoder:
    layers:
      - - type: EMBEDDING
          output_dim: 50
          bias: true
          init:
            type: GLOROT_NORMAL
          optimizer:
            type: DEFAULT
            options:
              learning_rate: 0.1
  decoder:
    type: DISTMULT
    options:
      inverse_edges: true
      use_relation_features: false
      edge_decoder_method: CORRUPT_NODE
    optimizer:
      type: ADAGRAD
      options:
        learning_rate: 0.1
  loss:
    type: SOFTMAX_CE
    options:
      reduction: SUM
  dense_optimizer:
    type: ADAM
    options:
      learning_rate: 0.01
  sparse_optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1

Encoder Configuration

EncoderConfig

Key

Type

Description

Required

layers

List[List[LayerConfig]]

Defines architecture of the encoder. Layers of the encoder are grouped into stages, where the layers within a stage are executed in parallel and the output of stage is the input to the successive stage.

Yes

train_neighbor_sampling

List[NeighborSamplingConfig]

Sets the neighbor sampling configuration for each GNN layer for training (and evaluation if eval_neighbor_sampling is not set). Defined as a list of neighbor sampling configurations, where the size of the list must match the number of GNN layers in the encoder.

Only for GNNs

eval_neighbor_sampling

List[NeighborSamplingConfig]

Sets the neighbor sampling configuration for each GNN layer for evaluation. Defined as a list of neighbor sampling configurations, where the size of the list must match the number of GNN layers in the encoder. If this field is not set then the sampling configuration used for training will be used for evaluation.

No

The below example depicts a configuration where there is one embedding layer, followed by three GNN layers.

encoder:
  train_neighbor_sampling:
    - type: ALL
    - type: ALL
    - type: ALL
  eval_neighbor_sampling:
    - type: ALL
    - type: ALL
    - type: ALL
  layers:
    - - type: EMBEDDING
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL
NeighborSamplingConfig

Key

Type

Description

Required

type

String

Denotes the type of the neighbor sampling layer. Options: [“ALL”, “UNIFORM”, “DROPOUT”].

Yes

options

NeighborSamplingOptions

Specific options depending on the type of sampling layer.

No

In the following configuration snippet, the GNN layer samples all neighbors for a given node during training. All neighbors with incoming edges to the given node are sampled while the outgoing edges are ignored.

train_neighbor_sampling:
  - type: ALL
    use_incoming_nbrs: true
    use_outgoing_nbrs: false
UniformSamplingOptions[NeighborSamplingOptions]

Key

Type

Description

Required

max_neighbors

Int

Number of neighbors to sample in a given uniform sampling layer.

Yes

The below configuration might work for a graph configuration where there are 2 GNN layers. The configuration specifies that at most 10 neighboring nodes will be samples for any given node embedding during training.

train_neighbor_sampling:
  - type: UNIFORM
    options:
      max_neighbors: 10
  - type: UNIFORM
    options:
      max_neighbors: 10
DropoutSamplingOptions[NeighborSamplingOptions]

Key

Type

Description

Required

rate

Float

The dropout rate for a dropout layer.

Yes

DROPOUT mode neighbor sampling randomly drops rate * 100 percent neighbors during sampling.

train_neighbor_sampling:
  - type: DROPOUT
    options:
      rate: 0.05

Layer Configuration

LayerConfig

Key

Type

Description

Required

type

String

Denotes the type of layer. Options: [“EMBEDDING”, “FEATURE”, “GNN” “REDUCTION”]

Yes

options

LayerOptions

Layer specific options depending on the type.

No

input_dim

Int

The dimension of the input to the layer.

GNN and Reduction layers

output_dim

Int

The output of dimension of the layer.

Yes

init

InitConfig

Initialization method for the layer parameters. (Default GLOROT_UNIFORM).

No

optimizer

OptimizerConfig

Optimizer to use for the parameters of this layer. If not given, the dense_optimizer is used.

No

bias

Bool

Enable a bias to be applied to the output of the layer. (Default False)

No

bias_init

InitConfig

Initialization method for the bias. The default initialization is zeroes.

No

activation

String

Activation function to apply to the output of the layer. Options [“RELU”, “SIGMOID”, “NONE”]. (Default “NONE”)

No

Below is a configuration for creating and embedding layer with output dimension 50. It is initialized with zeros and has no activation set.

layers:
- - type: EMBEDDING
    input_dim: -1
    output_dim: 50
    init:
      type: GLOROT_NORMAL
    optimizer:
      type: DEFAULT
      options:
        learning_rate: 0.1
    bias: true
    bias_init:
      type: ZEROS
    activation: NONE

A GNN layer of type GAT (Graph Attention) with input and output dimension of 50 is as follows.

layers:
- - type: GNN
    options:
      type: GAT
    input_dim: 50
    output_dim: 50
    bias: true
    init:
      type: GLOROT_NORMAL

A Reduction layer of type Linear, with input dimension of 100 and output dimension of 50 is as follows.

layers:
- - type: REDUCTION
    input_dim: 100
    ouptut_dim: 50
    bias: true
    options:
      type: LINEAR

Below is a simple Feature layer with output dimension of 50. The input dimension is set to -1 by default since both Feature and Embedding layers do not have any input.

layers:
- - type: FEATURE
    output_dim: 50
    bias: true

Layer Options

GNN Layer Options

GraphSageLayerOptions[LayerOptions]

Key

Type

Description

Required

type

String

The type of the GNN layer, for GraphSage, this must be equal to “GRAPH_SAGE”.

Yes

aggregator

String

Aggregation to use for graph sage, options are [“GCN”, “MEAN”]. (Default “MEAN”)

No

A GNN layer of type GRAPH_SAGE with aggregator set to MEAN. Another possbile option is GCN (Graph Convolution).

- - type: GNN
    options:
      type: GRAPH_SAGE
      aggregator: MEAN
GATLayerOptions[LayerOptions]

Key

Type

Description

Required

type

String

The type of the GNN layer, for GAT, this must be equal to “GAT”.

Yes

num_heads

Int

Number of attention heads to use. (Default 10)

No

average_heads

Bool

If true, the attention heads will be averaged, otherwise they will be concatenated. (Default True)

No

negative_slope

Float

Negative slope to use for LeakyReLU. (Default .2)

No

input_dropout

Float

Dropout rate to apply to the input to the layer. (Default 0.0)

No

attention_dropout

Float

Dropout rate to apply to the attention weights. (Default 0.0)

No

A GNN layer of type GAT (Graph Attention) with 50 attention heads. input_dropout is set to 0.1 implying that 10 percent of the input tensor values will be randomly dropped.

- - type: GNN
    options:
      type: GAT
      num_heads: 50
      average_heads: True
      input_dropout: 0.1

Reduction Layer Options

ReductionLayerOptions[LayerOptions]

Key

Type

Description

Required

type

String

The type of the reduction layer. Options are: [“CONCAT”, “LINEAR”]. (Default “CONCAT”)

Yes

A reduction layer of type LINEAR. Another possible type for the reduction layer is CONCAT.

- - type: REDUCTION
    options:
      type: LINEAR

Initialization Configuration

InitConfig

Key

Type

Description

Required

type

String

The type of the initialization. Options are: [“GLOROT_UNIFORM”, “GLOROT_NORMAL”, “UNIFORM”, “NORMAL”, “ZEROES”, “ONES”, “CONSTANT”]. Default “GLOROT_UNIFORM”

Yes

options

InitOptions

Initialization specific options depending on the type.

No

init:
  type: GLOROT_NORMAL
  options: {}

Uniform Init Options

UniformInitOptions[InitOptions]

Key

Type

Description

Required

scale_factor

Float

The scale factor of the uniform distribution. (Default 1)

No

The below configuration is used to initialize a layer with a uniform distribution of values ranging between [-scale_factor, +scale_factor]

init:
  type: UNIFORM
  options:
    scale_factor: 1

Normal Init Options

NormalInitOptions[InitOptions]

Key

Type

Description

Required

mean

Float

The mean of the distribution. (Default 0.0)

No

std

Float

The standard deviation of the distribution. (Default 1.0)

No

The below configuration is used to initialize a layer with values belonging to a noraml distribution, with mean 0.5 and standard deviation 0.1.

init:
  type: NORMAL
  options:
    mean: 0.5
    std: 0.1

Constant Init Options

ConstantInitOptions[InitOptions]

Key

Type

Description

Required

constant

Float

The value to set all parameters. (Default 0.0)

No

CONSTANT initialization mode initializes all parameters of the layer to the specified constant value.

init:
  type: CONSTANT
  options:
    constant: 0.4

Decoder Configuration

DecoderConfig

Key

Type

Description

Required

type

String

Denotes the type of decoder. Options: [“DISTMULT”, “TRANSE”, “COMPLEX”, “NODE”]. The first three are decoders for link prediction and the “NODE” decoder is used for node classification.

Yes

options

DecoderOptions

Decoder specific options depending on the type.

No

optimizer

OptimizerConfig

Optimizer to use for the parameters of the decoder (if any). If not given, the dense_optimizer is used.

No

Below is a DISTMULT decoder with Adagrad Optimizer, that optimizes the loss function over edges as well as their inverses (dest->rel->src).

decoder:
  type: DISTMULT
  options:
    inverse_edges: true
  optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1

Decoder Options

Edge Decoder Options

EdgeDecoderOptions[DecoderOptions]

Key

Type

Description

Required

inverse_edges

Bool

If true, the decoder will use two embeddings per edge-type (relation). Where one embedding is applied to the source node of an edge, and the other is applied to the destination node of an edge. Furthermore, the scores of the inverse of the edges will be computed (dst->rel->src) and used in the loss. (Default True)

No

edge_decoder_method

String

Specifies how to apply the decoder to a given set of edges, and negatives. Options are [“infer”, “train”]. (Default “train”)

No

decoder:
  type: DISTMULT
  options:
    inverse_edges: true
    edge_decoder_method: CORRUPT_NODE

Loss Configuration

LossConfig

Key

Type

Description

Required

type

String

Denotes the type of the loss function. Options: [“SOFTMAX_CE”, “RANKING”, “CROSS_ENTROPY”, “BCE_AFTER_SIGMOID”, “BCE_WITH_LOGITS”, “MSE”, “SOFTPLUS”].

Yes

options

LossOptions

Loss function specific options depending on the type.

No

Below is the configuration for a SOFTMAX_CE loss function with SUM as the reduction method.

loss:
  type: SOFTMAX_CE
  options:
    reduction: SUM

Loss Options

LossOptions

Key

Type

Description

Required

reduction

String

The reduction to use for the loss. Options are [“SUM”, “MEAN”]. (Default “SUM”)

No

Below is the configuration for a SOFTMAX_CE loss function with MEAN as the reduction method.

loss:
  type: SOFTMAX_CE
  options:
    reduction: MEAN
RankingLossOptions[LossOptions]

Key

Type

Description

Required

reduction

String

The reduction to use for the loss. Options are [“SUM”, “MEAN”]. (Default “SUM”)

No

margin

Float

The margin for the ranking loss function. (Default .1)

No

Below is the configuration for a RANKING loss function with margin set to 1.

loss:
  type: RANKING
  options:
    reduction: SUM
    margin: 1

Optimizer Configuration

OptimizerConfig

Key

Type

Description

Required

type

String

Denotes the type of the optimizer. Options: [“SGD”, “ADAM”, “ADAGRAD”].

Yes

options

OptimizerOptions

Optimizer specific options depending on the type.

No

The configuration for an ADAGRAD optimizer with learning rate of 0.1 is as follows

optimizer:
  type: ADAGRAD
  options:
    learning_rate: 0.1

SGD Options

SGDOptions[OptimizerOptions]

Key

Type

Description

Required

learning_rate

Float

SGD learning rate. (Default .1)

No

optimizer:
  type: SGD
  options:
    learning_rate: 0.1

Adagrad Options

AdagradOptions[OptimizerOptions]

Key

Type

Description

Required

learning_rate

Float

Adagrad learning rate. (Default .1)

No

eps

Float

Term added to the denominator to improve numerical stability. (Default 1e-10)

No

init_value

Float

Initial accumulator value. (Default 0.0)

No

lr_decay

Float

Learning rate decay. (Default 0.0)

No

weight_decay

Float

Weight decay (L2 penalty). (Default 0.0)

No

The below configuration shows the options that can be set for ADAGRAD optimizer.

optimizer:
  type: ADAGRAD
  options:
    learning_rate: 0.1
    eps: 1.0e-10
    init_value: 0.0
    lr_decay: 0.0
    weight_decay: 0.0

Adam Options

AdamOptions[OptimizerOptions]

Key

Type

Description

Required

learning_rate

Float

Adam learning rate. (Default .1)

No

amsgrad

Bool

Whether to use the AMSGrad variant of ADAM.

No

beta_1

Float

Coefficient used for computing running averages of gradient and its square. (Default .9)

No

beta_2

Float

Coefficient used for computing running averages of gradient and its square. (Default .999)

No

eps

Float

Term added to the denominator to improve numerical stability. (Default 1e-8)

No

weight_decay

Float

Weight decay (L2 penalty). (Default 0.0)

No

The below configuration shows the options that can be set for ADAM optimizer.

optimizer:
  type: ADAM
  options:
    learning_rate: 0.01
    amsgrad: false
    beta_1: 0.9
    beta_2: 0.999
    eps: 1.0e-08
    weight_decay: 0.0

Storage Configuration

StorageConfig

Key

Type

Description

Required

device_type

String

Whether to use cpu or gpu training. Options are [“CPU”, “CUDA”]. (Default “CPU”)

No

dataset

DatasetConfig

Contains information about the input dataset.

Yes

edges

StorageBackendConfig

Storage backend of the edges. (Default edges.type = DEVICE_MEMORY, edges.options.dtype = int32)

No

embeddings

StorageBackendConfig

Storage backend of the node embedding. (Default embeddings.type = DEVICE_MEMORY, embeddings.options.dtype = float32)

No

features

StorageBackendConfig

Storage backend of the node features. (Default features.type DEVICE_MEMORY, features.options.dtype = float32)

No

prefetch

Bool

If true and the nodes/features storage configuration uses a partition buffer, then node partitions and edge buckets will be prefetched. Note that this introduces additional memory overheads. (Default True)

No

full_graph_evaluation

Bool

If true and the nodes/features storage configuration uses a partition buffer, evaluation will be performed with the full graph in memory (if there is enough memory). This is useful for fair comparisons across different storage configurations. (Default False)

No

model_dir

String

Saves the model parameters in the given directory. If not specified, stores in model_x directory within the dataset_dir where x changes incrementally from 0 - 10. A maximum of 11 models are stored when model_dir is not specified, post which the contents in model_10/ directory are overwritten with the latest parameters.

No

Below is a storage configuration that contains the path to the pre-processed data and specifies storage backends to be used for edges, features and embeddings.

storage:
  device_type: cpu
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/
  edges:
    type: DEVICE_MEMORY
    options:
      dtype: int
  nodes:
    type: DEVICE_MEMORY
    options:
      dtype: int
  embeddings:
    type: DEVICE_MEMORY
    options:
      dtype: float
  features:
    type: DEVICE_MEMORY
    options:
      dtype: float
  prefetch: true
  shuffle_input: true
  full_graph_evaluation: true
  export_encoded_nodes: true
  log_level: info

Dataset Configuration

DatasetConfig

Key

Type

Description

Required

dataset_dir

String

Directory containing the prepreprocessed dataset. Also used to store model parameters and embedding table.

Yes

num_edges

Int

Number of edges in the input graph. If link prediction, this should be set to the number of training edges.

No

num_nodes

Int

Number of nodes in the input graph.

No

num_relations

Int

Number of relations (edge-types) in the input graph. (Default 1)

No

num_train

Int

Number of training examples. In link prediction the examples are edges, in node classification they are nodes.

No

num_valid

Int

Number of validation examples. If not given, no validation will be performed

No

num_test

Int

Number of test examples. If not given, only training will occur.

No (Evaluation)

node_feature_dim

Int

Dimension of the node features, if any.

No

num_classes

Int

Number of class labels.

No (Node classification)

For Marius in-built datasets, the below numbers are retrieved from output of marius_preprocess. For custom user datasets, a file with the dataset statistics mentioned above should be present in the dataset_dir. Below is the cofiguration for the fb15k_237 dataset.

storage:
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/

Storage Backend Configuration

StorageBackendConfig

Key

Type

Description

Required

type

String

The type of storage backend to use. The valid options depend on the data being stored. For edges, the valid backends are [“FLAT_FILE”, “HOST_MEMORY” and “DEVICE_MEMORY”]. For embeddings and features, the valid chocies are [“PARTITION_BUFFER”, “HOST_MEMORY”, “DEVICE_MEMORY”]

Yes

options

StorageOptions

Storage backend options depending on the type of storage.

No

Below configuration specifies that the edges be stored in DEVICE_MEMORY, i.e CPU/GPU memory based on device_type.

edges:
  type: DEVICE_MEMORY
  options:
    dtype: int

Storage Backend Options

StorageOptions

Key

Type

Description

Required

dtype

String

The datatype of the storage. Valid options [“FLOAT”, “FLOAT32”, “DOUBLE”, “FLOAT64”, “INT”, “INT32”, “LONG, “INT64”]. The default value depends on the data being stored. For edges, the default is “INT32”, otherwise the default is “FLOAT32”

No

A configuration defining the datatype of the input edges as int.

edges:
  options:
    dtype: int
PartitionBufferOptions[StorageOptions]

Key

Type

Description

Required

dtype

String

The datatype of the storage. Valid options [“FLOAT”, “FLOAT32”, “DOUBLE”, “FLOAT64”]. (Default “FLOAT32”)

No

num_partitions

Int

Number of node partitions.

Yes

buffer_capacity

Int

Number of partitions which can fit in the buffer.

Yes

prefetching

Bool

If true, partitions will be prefetched and written to storage asynchronously. This prevents IO wait times at the cost of additional memory overheads. (Default True)

No

Below is a disk-based storage configuration, where at max of buffer_capacity embeddings buckets are stored in memory at any given time. The dataset must be partitioned using marius_preprocess with –num_partitions set accordingly.

embeddings:
  type: PARTITION_BUFFER
  options:
    dtype: float
    num_partitions: 10
    buffer_capacity: 5
    prefetching: true

Training Configuration

TrainingConfig

Key

Type

Description

Required

batch_size

Int

Amount of training examples per batch. (Default 1000)

No

negative_sampling

NegativeSamplingConfig

Negative sampling configuration for link prediction.

Link Prediction

num_epochs

Int

Number of epochs to train.

Yes

pipeline

PipelineConfig

Advanced configuration of the training pipeline. Defaults to synchronous training.

No

epochs_per_shuffle

Int

Sets how often to shuffle the training data. (Default 1)

No

logs_per_epoch

Int

Sets how often to report progress during an epoch. (Default 10)

No

save_model

Bool

If true, the model will be saved at the end of training. (Default True)

No

resume_training

Bool

If true, the training procedure will resume from the previous state and will train num_epochs further epochs. (Default False)

No

resume_from_checkpoint

String

If set, loads the model from the given directory and resumes training procedure. Will train num_epochs further epochs and store the new model parameters in model_dir.

No

A training configuration with batchsize of 1000 and a total of 10 epochs is as follows. pipeline is set to true, which ensures that the training is synchronous and doesn’t allow staleness. Marius groups edges into chunks and reuses negative samples within the chunk. num_chunks`*`negatives_per_positive negative edges are sampled for each positive edge.

training:
  batch_size: 1000
  negative_sampling:
    num_chunks: 10
    negatives_per_positive: 10
    degree_fraction: 0.0
    filtered: false
  num_epochs: 10
  pipeline:
    sync: true
  epochs_per_shuffle: 1
  logs_per_epoch: 10
  save_model: true
  resume_training: false

Evaluation Configuration

EvaluationConfig

Key

Type

Description

Required

batch_size

Int

Amount of evaluation examples per batch. (Default 1000)

No

negative_sampling

NegativeSamplingConfig

Negative sampling configuration for link prediction.

Link Prediction

pipeline

PipelineConfig

Advanced configuration of the evaluation pipeline. Defaults to synchronous evaluation.

No

epochs_per_eval

Int

Sets how often to evaluate the model. (Default 1)

No

An evaluation configuration with batchsize of 1000 is as follows. num_chunks`*`negatives_per_positive negative edges are sampled for each positive edge.

evaluation:
  batch_size: 1000
  negative_sampling:
    num_chunks: 1
    negatives_per_positive: 1000
    degree_fraction: 0.0
    filtered: true
  pipeline:
    sync: true
  epochs_per_eval: 1