Configuration Schema

MariusConfig
Key	Type	Description	Required
model	ModelConfig	Defines model architecture, learning task, optimizers and loss function.	Yes
storage	StorageConfig	Defines the input graph and how to store the graph (edges, features) and learned model (embeddings).	Yes
training	TrainingConfig	Hyperparameters for training.	Training
evaluation	EvaluationConfig	Hyperparameters for evaluation.	Evaluation

Below is a sample end-to-end configuration file for link prediction on fb15_237 dataset. The model consists of an embedding layer in the encoder phase which is directly fed to the DISTMULT decoder. Both embeddings and edges are stored in cpu memory.

model:
  learning_task: LINK_PREDICTION
  encoder:
    layers:
      - - type: EMBEDDING
          output_dim: 50
          bias: true
          init:
            type: GLOROT_NORMAL
  decoder:
    type: DISTMULT
  loss:
    type: SOFTMAX_CE
    options:
      reduction: SUM
  dense_optimizer:
    type: ADAM
    options:
      learning_rate: 0.01
  sparse_optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1
storage:
  full_graph_evaluation: true
  device_type: cpu
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/
  edges:
    type: DEVICE_MEMORY
    options:
      dtype: int
  embeddings:
    type: DEVICE_MEMORY
    options:
      dtype: float
training:
  batch_size: 1000
  negative_sampling:
    num_chunks: 10
    negatives_per_positive: 10
    degree_fraction: 0
    filtered: false
  num_epochs: 10
  pipeline:
    sync: true
  epochs_per_shuffle: 1
  logs_per_epoch: 10
  resume_training: false
evaluation:
  batch_size: 1000
  negative_sampling:
    filtered: true
  epochs_per_eval: 1
  pipeline:
    sync: true

Model Configuration

ModelConfig
Key	Type	Description	Required
random_seed	Int	Random seed used to initialize, train, and evaluate the model. If not given, a seed will be generated.	No
learning_task	String	Learning task for which the model is used. Valid values are [“LINK_PREDICTION”, “NODE_CLASSIFICATION”] (case insensitive). “LP” and “NC” can be used for shorthand.	Yes
ref encoder	EncoderConfig	Defines the architecture of the encoder and configuration of neighbor samplers.	Yes
ref decoder	DecoderConfig	Denotes the decoder to apply to the output of the encoder. The decoder is learning task specific.	Yes
ref loss	LossConfig	Loss function to apply over the output of the decoder.	Required for training
dense_optimizer	OptimizerConfig	Optimizer to use for dense model parameters. Where dense model parameters refer to all parameters besides the node embeddings. Where node embeddings are handled by the sparse_optimizer.	Required for training
sparse_optimizer	OptimizerConfig	Optimizer to use for the node embedding parameters. Currently only ADAGRAD is supported.	No

Below is a full view of the model attribute and the corresponding parameters that can be set in the model configuration. It consists of an embedding layer in the encoder phase and a DISTMULT decoder.

model:
  random_seed: 456356765463
  learning_task: LINK_PREDICTION
  encoder:
    layers:
      - - type: EMBEDDING
          output_dim: 50
          bias: true
          init:
            type: GLOROT_NORMAL
          optimizer:
            type: DEFAULT
            options:
              learning_rate: 0.1
  decoder:
    type: DISTMULT
    options:
      inverse_edges: true
      use_relation_features: false
      edge_decoder_method: CORRUPT_NODE
    optimizer:
      type: ADAGRAD
      options:
        learning_rate: 0.1
  loss:
    type: SOFTMAX_CE
    options:
      reduction: SUM
  dense_optimizer:
    type: ADAM
    options:
      learning_rate: 0.01
  sparse_optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1

Encoder Configuration

EncoderConfig
Key	Type	Description	Required
layers	List[List[LayerConfig]]	Defines architecture of the encoder. Layers of the encoder are grouped into stages, where the layers within a stage are executed in parallel and the output of stage is the input to the successive stage.	Yes
train_neighbor_sampling	List[NeighborSamplingConfig]	Sets the neighbor sampling configuration for each GNN layer for training (and evaluation if eval_neighbor_sampling is not set). Defined as a list of neighbor sampling configurations, where the size of the list must match the number of GNN layers in the encoder.	Only for GNNs
eval_neighbor_sampling	List[NeighborSamplingConfig]	Sets the neighbor sampling configuration for each GNN layer for evaluation. Defined as a list of neighbor sampling configurations, where the size of the list must match the number of GNN layers in the encoder. If this field is not set then the sampling configuration used for training will be used for evaluation.	No

The below example depicts a configuration where there is one embedding layer, followed by three GNN layers.

encoder:
  train_neighbor_sampling:
    - type: ALL
    - type: ALL
    - type: ALL
  eval_neighbor_sampling:
    - type: ALL
    - type: ALL
    - type: ALL
  layers:
    - - type: EMBEDDING
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

    - - type: GNN
        options:
          type: GAT
        input_dim: 10
        output_dim: 10
        bias: true
        init:
          type: GLOROT_NORMAL

NeighborSamplingConfig
Key	Type	Description	Required
type	String	Denotes the type of the neighbor sampling layer. Options: [“ALL”, “UNIFORM”, “DROPOUT”].	Yes
options	NeighborSamplingOptions	Specific options depending on the type of sampling layer.	No

In the following configuration snippet, the GNN layer samples all neighbors for a given node during training. All neighbors with incoming edges to the given node are sampled while the outgoing edges are ignored.

train_neighbor_sampling:
  - type: ALL
    use_incoming_nbrs: true
    use_outgoing_nbrs: false

UniformSamplingOptions[NeighborSamplingOptions]
Key	Type	Description	Required
max_neighbors	Int	Number of neighbors to sample in a given uniform sampling layer.	Yes

The below configuration might work for a graph configuration where there are 2 GNN layers. The configuration specifies that at most 10 neighboring nodes will be samples for any given node embedding during training.

train_neighbor_sampling:
  - type: UNIFORM
    options:
      max_neighbors: 10
  - type: UNIFORM
    options:
      max_neighbors: 10

DropoutSamplingOptions[NeighborSamplingOptions]
Key	Type	Description	Required
rate	Float	The dropout rate for a dropout layer.	Yes

DROPOUT mode neighbor sampling randomly drops rate * 100 percent neighbors during sampling.

train_neighbor_sampling:
  - type: DROPOUT
    options:
      rate: 0.05

Layer Configuration

LayerConfig
Key	Type	Description	Required
type	String	Denotes the type of layer. Options: [“EMBEDDING”, “FEATURE”, “GNN” “REDUCTION”]	Yes
options	LayerOptions	Layer specific options depending on the type.	No
input_dim	Int	The dimension of the input to the layer.	GNN and Reduction layers
output_dim	Int	The output of dimension of the layer.	Yes
init	InitConfig	Initialization method for the layer parameters. (Default GLOROT_UNIFORM).	No
optimizer	OptimizerConfig	Optimizer to use for the parameters of this layer. If not given, the dense_optimizer is used.	No
bias	Bool	Enable a bias to be applied to the output of the layer. (Default False)	No
bias_init	InitConfig	Initialization method for the bias. The default initialization is zeroes.	No
activation	String	Activation function to apply to the output of the layer. Options [“RELU”, “SIGMOID”, “NONE”]. (Default “NONE”)	No

Below is a configuration for creating and embedding layer with output dimension 50. It is initialized with zeros and has no activation set.

layers:
- - type: EMBEDDING
    input_dim: -1
    output_dim: 50
    init:
      type: GLOROT_NORMAL
    optimizer:
      type: DEFAULT
      options:
        learning_rate: 0.1
    bias: true
    bias_init:
      type: ZEROS
    activation: NONE

A GNN layer of type GAT (Graph Attention) with input and output dimension of 50 is as follows.

layers:
- - type: GNN
    options:
      type: GAT
    input_dim: 50
    output_dim: 50
    bias: true
    init:
      type: GLOROT_NORMAL

A Reduction layer of type Linear, with input dimension of 100 and output dimension of 50 is as follows.

layers:
- - type: REDUCTION
    input_dim: 100
    ouptut_dim: 50
    bias: true
    options:
      type: LINEAR

Below is a simple Feature layer with output dimension of 50. The input dimension is set to -1 by default since both Feature and Embedding layers do not have any input.

layers:
- - type: FEATURE
    output_dim: 50
    bias: true

Layer Options

GNN Layer Options

GraphSageLayerOptions[LayerOptions]
Key	Type	Description	Required
type	String	The type of the GNN layer, for GraphSage, this must be equal to “GRAPH_SAGE”.	Yes
aggregator	String	Aggregation to use for graph sage, options are [“GCN”, “MEAN”]. (Default “MEAN”)	No

A GNN layer of type GRAPH_SAGE with aggregator set to MEAN. Another possbile option is GCN (Graph Convolution).

- - type: GNN
    options:
      type: GRAPH_SAGE
      aggregator: MEAN

GATLayerOptions[LayerOptions]
Key	Type	Description	Required
type	String	The type of the GNN layer, for GAT, this must be equal to “GAT”.	Yes
num_heads	Int	Number of attention heads to use. (Default 10)	No
average_heads	Bool	If true, the attention heads will be averaged, otherwise they will be concatenated. (Default True)	No
negative_slope	Float	Negative slope to use for LeakyReLU. (Default .2)	No
input_dropout	Float	Dropout rate to apply to the input to the layer. (Default 0.0)	No
attention_dropout	Float	Dropout rate to apply to the attention weights. (Default 0.0)	No

A GNN layer of type GAT (Graph Attention) with 50 attention heads. input_dropout is set to 0.1 implying that 10 percent of the input tensor values will be randomly dropped.

- - type: GNN
    options:
      type: GAT
      num_heads: 50
      average_heads: True
      input_dropout: 0.1

Reduction Layer Options

ReductionLayerOptions[LayerOptions]
Key	Type	Description	Required
type	String	The type of the reduction layer. Options are: [“CONCAT”, “LINEAR”]. (Default “CONCAT”)	Yes

A reduction layer of type LINEAR. Another possible type for the reduction layer is CONCAT.

- - type: REDUCTION
    options:
      type: LINEAR

Initialization Configuration

InitConfig
Key	Type	Description	Required
type	String	The type of the initialization. Options are: [“GLOROT_UNIFORM”, “GLOROT_NORMAL”, “UNIFORM”, “NORMAL”, “ZEROES”, “ONES”, “CONSTANT”]. Default “GLOROT_UNIFORM”	Yes
options	InitOptions	Initialization specific options depending on the type.	No

init:
  type: GLOROT_NORMAL
  options: {}

Uniform Init Options

UniformInitOptions[InitOptions]
Key	Type	Description	Required
scale_factor	Float	The scale factor of the uniform distribution. (Default 1)	No

The below configuration is used to initialize a layer with a uniform distribution of values ranging between [-scale_factor, +scale_factor]

init:
  type: UNIFORM
  options:
    scale_factor: 1

Normal Init Options

NormalInitOptions[InitOptions]
Key	Type	Description	Required
mean	Float	The mean of the distribution. (Default 0.0)	No
std	Float	The standard deviation of the distribution. (Default 1.0)	No

The below configuration is used to initialize a layer with values belonging to a noraml distribution, with mean 0.5 and standard deviation 0.1.

init:
  type: NORMAL
  options:
    mean: 0.5
    std: 0.1

Constant Init Options

ConstantInitOptions[InitOptions]
Key	Type	Description	Required
constant	Float	The value to set all parameters. (Default 0.0)	No

CONSTANT initialization mode initializes all parameters of the layer to the specified constant value.

init:
  type: CONSTANT
  options:
    constant: 0.4

Decoder Configuration

DecoderConfig
Key	Type	Description	Required
type	String	Denotes the type of decoder. Options: [“DISTMULT”, “TRANSE”, “COMPLEX”, “NODE”]. The first three are decoders for link prediction and the “NODE” decoder is used for node classification.	Yes
options	DecoderOptions	Decoder specific options depending on the type.	No
optimizer	OptimizerConfig	Optimizer to use for the parameters of the decoder (if any). If not given, the dense_optimizer is used.	No

Below is a DISTMULT decoder with Adagrad Optimizer, that optimizes the loss function over edges as well as their inverses (dest->rel->src).

decoder:
  type: DISTMULT
  options:
    inverse_edges: true
  optimizer:
    type: ADAGRAD
    options:
      learning_rate: 0.1

Decoder Options

Edge Decoder Options

EdgeDecoderOptions[DecoderOptions]
Key	Type	Description	Required
inverse_edges	Bool	If true, the decoder will use two embeddings per edge-type (relation). Where one embedding is applied to the source node of an edge, and the other is applied to the destination node of an edge. Furthermore, the scores of the inverse of the edges will be computed (dst->rel->src) and used in the loss. (Default True)	No
edge_decoder_method	String	Specifies how to apply the decoder to a given set of edges, and negatives. Options are [“infer”, “train”]. (Default “train”)	No

decoder:
  type: DISTMULT
  options:
    inverse_edges: true
    edge_decoder_method: CORRUPT_NODE

Loss Configuration

LossConfig
Key	Type	Description	Required
type	String	Denotes the type of the loss function. Options: [“SOFTMAX_CE”, “RANKING”, “CROSS_ENTROPY”, “BCE_AFTER_SIGMOID”, “BCE_WITH_LOGITS”, “MSE”, “SOFTPLUS”].	Yes
options	LossOptions	Loss function specific options depending on the type.	No

Below is the configuration for a SOFTMAX_CE loss function with SUM as the reduction method.

loss:
  type: SOFTMAX_CE
  options:
    reduction: SUM

Loss Options

LossOptions
Key	Type	Description	Required
reduction	String	The reduction to use for the loss. Options are [“SUM”, “MEAN”]. (Default “SUM”)	No

Below is the configuration for a SOFTMAX_CE loss function with MEAN as the reduction method.

loss:
  type: SOFTMAX_CE
  options:
    reduction: MEAN

RankingLossOptions[LossOptions]
Key	Type	Description	Required
reduction	String	The reduction to use for the loss. Options are [“SUM”, “MEAN”]. (Default “SUM”)	No
margin	Float	The margin for the ranking loss function. (Default .1)	No

Below is the configuration for a RANKING loss function with margin set to 1.

loss:
  type: RANKING
  options:
    reduction: SUM
    margin: 1

Optimizer Configuration

OptimizerConfig
Key	Type	Description	Required
type	String	Denotes the type of the optimizer. Options: [“SGD”, “ADAM”, “ADAGRAD”].	Yes
options	OptimizerOptions	Optimizer specific options depending on the type.	No

The configuration for an ADAGRAD optimizer with learning rate of 0.1 is as follows

optimizer:
  type: ADAGRAD
  options:
    learning_rate: 0.1

SGD Options

SGDOptions[OptimizerOptions]
Key	Type	Description	Required
learning_rate	Float	SGD learning rate. (Default .1)	No

optimizer:
  type: SGD
  options:
    learning_rate: 0.1

Adagrad Options

AdagradOptions[OptimizerOptions]
Key	Type	Description	Required
learning_rate	Float	Adagrad learning rate. (Default .1)	No
eps	Float	Term added to the denominator to improve numerical stability. (Default 1e-10)	No
init_value	Float	Initial accumulator value. (Default 0.0)	No
lr_decay	Float	Learning rate decay. (Default 0.0)	No
weight_decay	Float	Weight decay (L2 penalty). (Default 0.0)	No

The below configuration shows the options that can be set for ADAGRAD optimizer.

optimizer:
  type: ADAGRAD
  options:
    learning_rate: 0.1
    eps: 1.0e-10
    init_value: 0.0
    lr_decay: 0.0
    weight_decay: 0.0

Adam Options

AdamOptions[OptimizerOptions]
Key	Type	Description	Required
learning_rate	Float	Adam learning rate. (Default .1)	No
amsgrad	Bool	Whether to use the AMSGrad variant of ADAM.	No
beta_1	Float	Coefficient used for computing running averages of gradient and its square. (Default .9)	No
beta_2	Float	Coefficient used for computing running averages of gradient and its square. (Default .999)	No
eps	Float	Term added to the denominator to improve numerical stability. (Default 1e-8)	No
weight_decay	Float	Weight decay (L2 penalty). (Default 0.0)	No

The below configuration shows the options that can be set for ADAM optimizer.

optimizer:
  type: ADAM
  options:
    learning_rate: 0.01
    amsgrad: false
    beta_1: 0.9
    beta_2: 0.999
    eps: 1.0e-08
    weight_decay: 0.0

Storage Configuration

StorageConfig
Key	Type	Description	Required
device_type	String	Whether to use cpu or gpu training. Options are [“CPU”, “CUDA”]. (Default “CPU”)	No
dataset	DatasetConfig	Contains information about the input dataset.	Yes
edges	StorageBackendConfig	Storage backend of the edges. (Default edges.type = DEVICE_MEMORY, edges.options.dtype = int32)	No
embeddings	StorageBackendConfig	Storage backend of the node embedding. (Default embeddings.type = DEVICE_MEMORY, embeddings.options.dtype = float32)	No
features	StorageBackendConfig	Storage backend of the node features. (Default features.type DEVICE_MEMORY, features.options.dtype = float32)	No
prefetch	Bool	If true and the nodes/features storage configuration uses a partition buffer, then node partitions and edge buckets will be prefetched. Note that this introduces additional memory overheads. (Default True)	No
full_graph_evaluation	Bool	If true and the nodes/features storage configuration uses a partition buffer, evaluation will be performed with the full graph in memory (if there is enough memory). This is useful for fair comparisons across different storage configurations. (Default False)	No
model_dir	String	Saves the model parameters in the given directory. If not specified, stores in model_x directory within the dataset_dir where x changes incrementally from 0 - 10. A maximum of 11 models are stored when model_dir is not specified, post which the contents in model_10/ directory are overwritten with the latest parameters.	No

Below is a storage configuration that contains the path to the pre-processed data and specifies storage backends to be used for edges, features and embeddings.

storage:
  device_type: cpu
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/
  edges:
    type: DEVICE_MEMORY
    options:
      dtype: int
  nodes:
    type: DEVICE_MEMORY
    options:
      dtype: int
  embeddings:
    type: DEVICE_MEMORY
    options:
      dtype: float
  features:
    type: DEVICE_MEMORY
    options:
      dtype: float
  prefetch: true
  shuffle_input: true
  full_graph_evaluation: true
  export_encoded_nodes: true
  log_level: info

Dataset Configuration

DatasetConfig
Key	Type	Description	Required
dataset_dir	String	Directory containing the prepreprocessed dataset. Also used to store model parameters and embedding table.	Yes
num_edges	Int	Number of edges in the input graph. If link prediction, this should be set to the number of training edges.	No
num_nodes	Int	Number of nodes in the input graph.	No
num_relations	Int	Number of relations (edge-types) in the input graph. (Default 1)	No
num_train	Int	Number of training examples. In link prediction the examples are edges, in node classification they are nodes.	No
num_valid	Int	Number of validation examples. If not given, no validation will be performed	No
num_test	Int	Number of test examples. If not given, only training will occur.	No (Evaluation)
node_feature_dim	Int	Dimension of the node features, if any.	No
num_classes	Int	Number of class labels.	No (Node classification)

For Marius in-built datasets, the below numbers are retrieved from output of marius_preprocess. For custom user datasets, a file with the dataset statistics mentioned above should be present in the dataset_dir. Below is the cofiguration for the fb15k_237 dataset.

storage:
  dataset:
    dataset_dir: /home/data/datasets/fb15k_237/

Storage Backend Configuration

StorageBackendConfig
Key	Type	Description	Required
type	String	The type of storage backend to use. The valid options depend on the data being stored. For edges, the valid backends are [“FLAT_FILE”, “HOST_MEMORY” and “DEVICE_MEMORY”]. For embeddings and features, the valid chocies are [“PARTITION_BUFFER”, “HOST_MEMORY”, “DEVICE_MEMORY”]	Yes
options	StorageOptions	Storage backend options depending on the type of storage.	No

Below configuration specifies that the edges be stored in DEVICE_MEMORY, i.e CPU/GPU memory based on device_type.

edges:
  type: DEVICE_MEMORY
  options:
    dtype: int

Storage Backend Options

StorageOptions
Key	Type	Description	Required
dtype	String	The datatype of the storage. Valid options [“FLOAT”, “FLOAT32”, “DOUBLE”, “FLOAT64”, “INT”, “INT32”, “LONG, “INT64”]. The default value depends on the data being stored. For edges, the default is “INT32”, otherwise the default is “FLOAT32”	No

A configuration defining the datatype of the input edges as int.

edges:
  options:
    dtype: int

PartitionBufferOptions[StorageOptions]
Key	Type	Description	Required
dtype	String	The datatype of the storage. Valid options [“FLOAT”, “FLOAT32”, “DOUBLE”, “FLOAT64”]. (Default “FLOAT32”)	No
num_partitions	Int	Number of node partitions.	Yes
buffer_capacity	Int	Number of partitions which can fit in the buffer.	Yes
prefetching	Bool	If true, partitions will be prefetched and written to storage asynchronously. This prevents IO wait times at the cost of additional memory overheads. (Default True)	No

Below is a disk-based storage configuration, where at max of buffer_capacity embeddings buckets are stored in memory at any given time. The dataset must be partitioned using marius_preprocess with –num_partitions set accordingly.

embeddings:
  type: PARTITION_BUFFER
  options:
    dtype: float
    num_partitions: 10
    buffer_capacity: 5
    prefetching: true

Training Configuration

TrainingConfig
Key	Type	Description	Required
batch_size	Int	Amount of training examples per batch. (Default 1000)	No
negative_sampling	NegativeSamplingConfig	Negative sampling configuration for link prediction.	Link Prediction
num_epochs	Int	Number of epochs to train.	Yes
pipeline	PipelineConfig	Advanced configuration of the training pipeline. Defaults to synchronous training.	No
epochs_per_shuffle	Int	Sets how often to shuffle the training data. (Default 1)	No
logs_per_epoch	Int	Sets how often to report progress during an epoch. (Default 10)	No
save_model	Bool	If true, the model will be saved at the end of training. (Default True)	No
resume_training	Bool	If true, the training procedure will resume from the previous state and will train num_epochs further epochs. (Default False)	No
resume_from_checkpoint	String	If set, loads the model from the given directory and resumes training procedure. Will train num_epochs further epochs and store the new model parameters in model_dir.	No

A training configuration with batchsize of 1000 and a total of 10 epochs is as follows. pipeline is set to true, which ensures that the training is synchronous and doesn’t allow staleness. Marius groups edges into chunks and reuses negative samples within the chunk. num_chunks`*`negatives_per_positive negative edges are sampled for each positive edge.

training:
  batch_size: 1000
  negative_sampling:
    num_chunks: 10
    negatives_per_positive: 10
    degree_fraction: 0.0
    filtered: false
  num_epochs: 10
  pipeline:
    sync: true
  epochs_per_shuffle: 1
  logs_per_epoch: 10
  save_model: true
  resume_training: false

Evaluation Configuration

EvaluationConfig
Key	Type	Description	Required
batch_size	Int	Amount of evaluation examples per batch. (Default 1000)	No
negative_sampling	NegativeSamplingConfig	Negative sampling configuration for link prediction.	Link Prediction
pipeline	PipelineConfig	Advanced configuration of the evaluation pipeline. Defaults to synchronous evaluation.	No
epochs_per_eval	Int	Sets how often to evaluate the model. (Default 1)	No

An evaluation configuration with batchsize of 1000 is as follows. num_chunks`*`negatives_per_positive negative edges are sampled for each positive edge.

evaluation:
  batch_size: 1000
  negative_sampling:
    num_chunks: 1
    negatives_per_positive: 1000
    degree_fraction: 0.0
    filtered: true
  pipeline:
    sync: true
  epochs_per_eval: 1