Overview
======================

The configuration interface allows for high-performance training and evaluation of models without need for writing code.

Configuration files are defined in YAML format and are grouped up into four sections:

- Model: Defines the architecture of the model, neighbor sampling configuration, loss, and optimizer(s)
- Storage: Specifies the input dataset and how to store the graph, features, and embeddings.
- Training: Sets options for the training procedure and hyperparameters. E.g. batch size, negative sampling.
- Evaluation: Sets options for the evaluation procedure (if any). The options here are similar to those in the training section.


Link Prediction Example
-----------------------

In this example, we show how to define a configuration file for training a :doc:`3-layer GraphSage GNN <../examples/config/lp_fb15k237>` for link prediction on :doc:`fb15k_237 <../examples/config/lp_fb15k237>`.

This example assumes that marius has been installed with :doc:`pip <../build>` the dataset has been preprocessed with the following command:

``marius_preprocess --dataset fb15k_237 --output_dir /home/data/datasets/fb15k_237/``


1. Define the model:
^^^^^^^^^^^^^^^^^^^^

+-------------------------------------------+-----------------------------------------------+
|                                           |                                               |
|.. code-block:: yaml                       |.. image:: ../assets/configuration_lp.png      |
|                                           |  :width: 700                                  |
|   model:                                  |                                               |
|     encoder:                              |                                               |
|       train_neighbor_sampling:            |                                               |
|         - type: ALL                       |                                               |
|         - type: ALL                       |                                               |
|         - type: ALL                       |                                               |
|       layers:                             |                                               |
|         - - type: EMBEDDING               |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|                                           |                                               |
|           - type: FEATURE                 |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|                                           |                                               |
|         - - type: REDUCTION               |                                               |
|             input_dim: 100                |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|             options:                      |                                               |
|               type: LINEAR                |                                               |
|                                           |                                               |
|         - - type: GNN                     |                                               |
|             options:                      |                                               |
|             type: GRAPH_SAGE              |                                               |
|             aggregator: MEAN              |                                               |
|             input_dim: 50                 |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|             init:                         |                                               |
|               type: GLOROT_NORMAL         |                                               |
|                                           |                                               |
|         - - type: GNN                     |                                               |
|             options:                      |                                               |
|             type: GRAPH_SAGE              |                                               |
|             aggregator: MEAN              |                                               |
|             input_dim: 50                 |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|             init:                         |                                               |
|               type: GLOROT_NORMAL         |                                               |
|                                           |                                               |
|         - - type: GNN                     |                                               |
|             options:                      |                                               |
|             type: GRAPH_SAGE              |                                               |
|             aggregator: MEAN              |                                               |
|             input_dim: 50                 |                                               |
|             output_dim: 50                |                                               |
|             bias: true                    |                                               |
|             init:                         |                                               |
|               type: GLOROT_NORMAL         |                                               |
|                                           |                                               |
|     decoder:                              |                                               |
|       type: DISTMULT                      |                                               |
|     loss:                                 |                                               |
|       type: SOFTMAX_CE                    |                                               |
|       options:                            |                                               |
|         reduction: SUM                    |                                               |
|     dense_optimizer:                      |                                               |
|       type: ADAM                          |                                               |
|       options:                            |                                               |
|         learning_rate: 0.01               |                                               |
|     sparse_optimizer:                     |                                               |
|       type: ADAGRAD                       |                                               |
|       options:                            |                                               |
|         learning_rate: 0.1                |                                               |
|                                           |                                               |
+-------------------------------------------+-----------------------------------------------+

The above model configuration has 5 stages in the encoder section, each stage separated by a `--`. The first stage has 2 layers, one embedding layer with output 
dimension 50 and another feature layer with output dimension of 50. The reduction layer in stage 2 takes input the combined vector of dimension 
100 and outputs a 50 dimensional vector. It is followed by 3 stages of GNN layers. The output from the encoder is fed to the decoder of type DISMULT. 
The loss function being used is SoftmaxCrossEntropy with sum as the reduction method. The dense optimizer is for all model parameters except the node embeddings.
Node embedings are optimized by the sparse optimizer. 

2. Set storage and dataset:
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   storage:
     device_type: cpu
     dataset:
       dataset_dir: /home/data/datasets/fb15k_237/
     edges:
       type: DEVICE_MEMORY
       options:
         dtype: int
     embeddings:
       type: DEVICE_MEMORY
       options:
         dtype: float

The storage configuration provides information on the location and statistics of the pre-processed dataset. It also specfies where 
to store the embeddings and edges during training. The `device_type` is set to `cpu` here, `cuda` mode can be used for gpu training.
`DEVICE_MEMORY` in this case states that the embeddings need to stored in cpu memory.

3. Configure training and evaluation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   training:
     batch_size: 1000
     negative_sampling:
       num_chunks: 10
       negatives_per_positive: 10
       degree_fraction: 0
       filtered: false
     num_epochs: 10
     pipeline:
       sync: true
     epochs_per_shuffle: 1
     logs_per_epoch: 10
   evaluation:
     batch_size: 1000
     negative_sampling:
       filtered: true
     epochs_per_eval: 1
     pipeline:
       sync: true

The training configuration specifies number of data samples in each batch and the total number of epochs to train the model for. 
Marius groups edges into chunks and reuses negative samples within the chunk. `num_chunks`*`negatives_per_positive` negative edges are 
sampled for each positive edge. Marius also uses pipelining to overlap data movement with training which introduces bounded staleness 
in the system. We can explicitly set sync to true if we want every minibatch to see the latest embeddings. 

Node Classification Example
---------------------------

In this example, we show how to define a configuration file for training a :doc:`3-layer GAT GNN <../examples/config/nc_ogbn_arxiv>` for node classification on :doc:`ogbn_arxiv <../examples/config/nc_ogbn_arxiv>`.

This example assumes that marius has been installed with :doc:`pip <../build>` the dataset has been preprocessed with the following command:

``marius_preprocess --dataset ogbn_arxiv --output_dir /home/data/datasets/ogbn_arxiv/``


1. Define the model:
^^^^^^^^^^^^^^^^^^^^

+-------------------------------------------+-----------------------------------------------+
|                                           |                                               |
|.. code-block:: yaml                       |.. image:: ../assets/configuration_nc.png      |
|                                           |                                               |
|   model:                                  |                                               |
|     learning_task: NODE_CLASSIFICATION    |                                               |
|     encoder:                              |                                               |
|       train_neighbor_sampling:            |                                               |
|         - type: ALL                       |                                               |
|       layers:                             |                                               |
|         - - type: FEATURE                 |                                               |
|             output_dim: 128               |                                               |
|             bias: false                   |                                               |
|             init:                         |                                               |
|               type: GLOROT_NORMAL         |                                               |
|         - - type: GNN                     |                                               |
|             options:                      |                                               |
|               type: GRAPH_SAGE            |                                               |
|               aggregator: MEAN            |                                               |
|             input_dim: 128                |                                               |
|             output_dim: 40                |                                               |
|             bias: true                    |                                               |
|             init:                         |                                               |
|               type: GLOROT_NORMAL         |                                               |
|     decoder:                              |                                               |
|       type: NODE                          |                                               |
|     loss:                                 |                                               |
|       type: CROSS_ENTROPY                 |                                               |
|       options:                            |                                               |
|         reduction: SUM                    |                                               |
|     dense_optimizer:                      |                                               |
|       type: ADAM                          |                                               |
|       options:                            |                                               |
|         learning_rate: 0.01               |                                               |
|     sparse_optimizer:                     |                                               |
|       type: ADAGRAD                       |                                               |
|       options:                            |                                               |
|         learning_rate: 0.1                |                                               |
|                                           |                                               |
+-------------------------------------------+-----------------------------------------------+

The above node classification example has 2 layers in the encoder section, one feature layer and another GNN layer. The number of
training/evaluation sampling layers should be equal to the number of GNN stages in the model. The model has a decoder of type node
classification. The loss function being used is Cross Entropy with sum as the reduction method.

2. Set storage and dataset:
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   storage:
     device_type: cuda
     dataset:
       dataset_dir: /home/data/datasets/ogbn_arxiv/
     edges:
       type: DEVICE_MEMORY
     nodes:
       type: DEVICE_MEMORY
     features:
       type: DEVICE_MEMORY
     embeddings:
       type: DEVICE_MEMORY
       options:
         dtype: float
     prefetch: true
     shuffle_input: true
     full_graph_evaluation: true

The storage configuration here is very similar to the one shown above in Link Prediction.

3. Configure training and evaluation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   training:
     batch_size: 1000
     num_epochs: 5
     pipeline:
       sync: true
     epochs_per_shuffle: 1
     logs_per_epoch: 1
   evaluation:
     batch_size: 1000
     pipeline:
       sync: true
     epochs_per_eval: 1

The above training configuration has specifications for a training batch size of 1000 and total epochs of 5. The `logs_per_epoch` attribute 
sets how often to report progres during training. `epochs_per_eval` sets how often to evaluate the model. 

Defining Encoder Architectures
------------------------------

The interface enables users to define complex model architectures. The layers field can be seen as a double-list, a list of stages wherein 
each stage is again a list of layers. We need to ensure that the total output dimension of a stage is equal to the net input dimension of 
the next stage. We need to ensure that the following conditions are met while stacking layers of a model,

#. Embedding/Feature layers have only output dimension. The `input_dim` is set to -1 by default
#. A Reduction layer can have inputs from multiple layers in the previous stage and has a single output
#. The number of training/evaluation sampling layers should be equal to the GNN stages in the model

Advanced Configuration
----------------------

Pipeline
^^^^^^^^
Marius uses pipelining training architecture that can interleave data access, transfer, and computation to achieve high utilization. This 
introduces the possibility of a few mini-batches using stale parameters during training. If `sync` is set to true, the training becomes 
synchronous and there is no staleness. Below is a sample configuration where the training is async, there is bounded staleness in the system.


.. code-block:: yaml

   pipeline:
     sync: false
     staleness_bound: 16
     batch_host_queue_size: 4
     batch_device_queue_size: 4
     gradients_device_queue_size: 4
     gradients_host_queue_size: 4
     batch_loader_threads: 4
     batch_transfer_threads: 2
     compute_threads: 1
     gradient_transfer_threads: 2
     gradient_update_threads: 4


.. image:: ../assets/marius_arch.png
  :width: 700
  :align: center


Marius follows a 5-staged pipeline architecture, 4 of which are responsible for data movement and the other is for model computation 
and in-GPU parameter updates. The `pipeline` field has options for setting thread counts for each of these stages. `staleness_bound` 
sets the maximum number of minibatches that can be present in the pipeline at any time. It implies that after a set of node embedding 
updates, at most of 16 mini-batches use stale node embeddings. 

Partition Buffer
^^^^^^^^^^^^^^^^
One of the storage backends supported for node embeddings is the `PARTITION_BUFFER` mode, where the nodes are bucketed into p partitions 
and every edge falls into one of the p^2 buckets. When pre-processed in the partitioned mode, the edges are ordered in a wat that reduces
the number of node-embedding bucket swaps from the buffer. 

The following command pre-processes the fb15k_237 dataset into 10 partitions as required by Marius for training in `PARTITION_BUFFER` mode.

``marius_preprocess --dataset fb15k_237 --num_partitions 10 --output_dir /home/data/datasets/fb15k_237_partitioned/``

Now, we can set the storage backend for node embeddings to `PARTITION_BUFFER` mode


.. code-block:: yaml

   embeddings:
     type: PARTITION_BUFFER
     options:
       dtype: float
       num_partitions: 10
       buffer_capacity: 5
       prefetching: true


`num_partitions` should hold the same value that was earlier supplied to `marius_preprocess`. `buffer_capacity` states the maximum number of 
node embedding buckets that can be present in the memory at any given time. Setting `prefetching` enables the system to prefetch partitions 
asynchronously leading to reduction in IO wait times and additional memory overheads.