Sample Files ====================== Model Configs ------------- DistMult ^^^^^^^^ +-----------------------------------------------+---------------------------------------------+ | | | |.. code-block:: yaml |.. image:: ../assets/samples_dismult.png | | | | | model: | | | learning_task: LINK_PREDICTION | | | encoder: | | | layers: | | | - - type: EMBEDDING | | | output_dim: 50 | | | bias: true | | | init: | | | type: GLOROT_NORMAL | | | decoder: | | | type: DISTMULT | | | loss: | | | type: SOFTMAX_CE | | | options: | | | reduction: SUM | | | dense_optimizer: | | | type: ADAM | | | options: | | | learning_rate: 0.01 | | | sparse_optimizer: | | | type: ADAGRAD | | | options: | | | learning_rate: 0.1 | | | | | +-----------------------------------------------+---------------------------------------------+ The above configuration has a simple embedding layer whose output is fed to the decoder layer, which uses a SoftmaxCrossEntropy loss function to optimize the loss value. An Adagrad sparse optimizer is used for the node embeddings and Adam Optimizer for all other model parameters. Graph Sage (3-layer) ^^^^^^^^^^^^^^^^^^^^ +----------------------------------------+--------------------------------------+ | | | |.. code-block:: yaml |.. image:: ../assets/samples_gs.png | | | :width: 700 | | model: | | | learning_task: LINK_PREDICTION | | | encoder: | | | train_neighbor_sampling: | | | - type: ALL | | | - type: ALL | | | - type: ALL | | | layers: | | | - - type: EMBEDDING | | | output_dim: 50 | | | bias: true | | | init: | | | type: GLOROT_NORMAL | | | - - type: GNN | | | options: | | | type: GRAPH_SAGE | | | aggregator: MEAN | | | input_dim: 50 | | | output_dim: 50 | | | bias: true | | | init: | | | type: GLOROT_NORMAL | | | - - type: GNN | | | options: | | | type: GRAPH_SAGE | | | aggregator: MEAN | | | input_dim: 50 | | | output_dim: 50 | | | bias: true | | | init: | | | type: GLOROT_NORMAL | | | - - type: GNN | | | options: | | | type: GRAPH_SAGE | | | aggregator: MEAN | | | input_dim: 50 | | | output_dim: 50 | | | bias: true | | | init: | | | type: GLOROT_NORMAL | | | decoder: | | | type: DISTMULT | | | loss: | | | type: SOFTMAX_CE | | | options: | | | reduction: SUM | | | dense_optimizer: | | | type: ADAM | | | options: | | | learning_rate: 0.01 | | | sparse_optimizer: | | | type: ADAGRAD | | | options: | | | learning_rate: 0.1 | | | | | +----------------------------------------+--------------------------------------+ Graph Sage (3 layer) has an initial stage consisting of an embedding layer. It is connected to 3 stages of GraphSage GNN layers. The number of training/evaluation neighbor sampling layers is equal to the GNN stages defined in the model. GAT (3-layer) ^^^^^^^^^^^^^ .. code-block:: yaml model: learning_task: LINK_PREDICTION encoder: train_neighbor_sampling: - type: ALL - type: ALL - type: ALL layers: - - type: EMBEDDING output_dim: 50 bias: true init: type: GLOROT_NORMAL - - type: GNN options: type: GAT input_dim: 50 output_dim: 50 bias: true init: type: GLOROT_NORMAL - - type: GNN options: type: GAT input_dim: 50 output_dim: 50 bias: true init: type: GLOROT_NORMAL - - type: GNN options: type: GAT input_dim: 50 output_dim: 50 bias: true init: type: GLOROT_NORMAL decoder: type: DISTMULT loss: type: SOFTMAX_CE options: reduction: SUM dense_optimizer: type: ADAM options: learning_rate: 0.01 sparse_optimizer: type: ADAGRAD options: learning_rate: 0.1 GAT (3 layer) has an initial stage consisting of an embedding layer. It is connected to 3 stages of GAT GNN layers. The number of training/evaluation neighbor sampling layers is equal to the GNN stages defined in the model. Embeddings + Features + Edges ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The supported storage backends for embeddings and features are `PARTITION_BUFFER`, `DEVICE_MEMORY` and `HOST_MEMORY`. For edges, the supported backends are `FLAT_FILE`, `DEVICE_MEMORY`, `HOST_MEMORY`. Storage Configs --------------- GPU Memory ^^^^^^^^^^ .. code-block:: yaml storage: device_type: cuda dataset: dataset_dir: /home/data/datasets/fb15k_237/ edges: type: DEVICE_MEMORY options: dtype: int embeddings: type: DEVICE_MEMORY options: dtype: float In the above configuration, both edges and embeddings are stored in GPU memory. Mixed CPU-GPU ^^^^^^^^^^^^^ .. code-block:: yaml storage: device_type: cuda dataset: dataset_dir: /home/data/datasets/fb15k_237/ edges: type: HOST_MEMORY options: dtype: int embeddings: type: HOST_MEMORY options: dtype: float This configuration places the edge data in the CPU memory and maintains the embedding data in GPU memory. Disk-Based ^^^^^^^^^^ .. code-block:: yaml storage: device_type: cuda dataset: dataset_dir: /home/data/datasets/fb15k_237/ edges: type: FLAT_FILE options: dtype: int embeddings: type: DEVICE_MEMORY options: dtype: float In this configuration, the edge data is stored in a flat file, on disk. FLAT_FILE storage backend is supported for edges alone, because there is no need for an index lookup. Instead, edges are traversed sequentially. Marius supports `PARTITION_BUFFER` mode to store embedding data, where all data is stored on disk and only necessary chunks are fetched and kept in the buffer. The edges are traversed in an order that minimizes bukcet swaps in the buffer. It can be configured as follows .. code-block:: yaml storage: device_type: cuda dataset: dataset_dir: /home/data/datasets/fb15k_237_partitioned/ edges: type: FLAT_FILE options: dtype: int embeddings: type: PARTITION_BUFFER options: dtype: float num_partitions: 10 buffer_capacity: 5 The above configuration states that at most 5 node embedding buckets can be present in memory at any given time. Training Configs ---------------- Synchronous Training ^^^^^^^^^^^^^^^^^^^^ To speed up training, Graph Learning systems use pipelined architecture and try to overlap data movement with computation. This introduces bounded staleness in the system, wherein after a set of updates to the node embeddings, the existing mini-batches in the pipeline use stale node embeddings. Marius provides an explicit option to turn off asynchronous training and ensure that every mini-batch sees the latest updated node embeddings. The following can be used the set training as synchronous .. code-block:: yaml training: batch_size: 1000 negative_sampling: num_chunks: 10 negatives_per_positive: 10 degree_fraction: 0 filtered: false num_epochs: 10 pipeline: sync: true Pipelined Training ^^^^^^^^^^^^^^^^^^ Marius uses pipelining training architecture that can interleave data access, transfer, and computation to achieve high utilization. This introduces the possibility of a few mini-batches using stale parameters during training. Below is a sample configuration where the training is async, and the staleness is set to 16 i.e. at most 16 mini-batches use stale node embeddings after any set of node embeddings are updated. .. code-block:: yaml pipeline: sync: false gpu_sync_interval: 16 gpu_model_average: true staleness_bound: 16 batch_host_queue_size: 4 batch_device_queue_size: 4 gradients_device_queue_size: 4 gradients_host_queue_size: 4 batch_loader_threads: 4 batch_transfer_threads: 2 compute_threads: 1 gradient_transfer_threads: 2 gradient_update_threads: 4 Marius follows a 5-staged pipeline architecture, 4 of which are responsible for data movement and the other is for model computation and in-GPU parameter updates. The `pipeline` field has options for setting thread counts for each of these stages. Evaluation Configs ------------------- Link Prediction Filtered ^^^^^^^^^^^^^^^^^^^^^^^^ An Evaluation configuration for Link Prediction with a batchsize of 1000. When `filtered` is set to true, false negative sampled edges will be filtered out. .. code-block:: yaml evaluation: batch_size: 1000 negative_sampling: num_chunks: 1 negatives_per_positive: 1000 degree_fraction: 0.0 filtered: true pipeline: sync: true epochs_per_eval: 1 Link Prediction Unfiltered ^^^^^^^^^^^^^^^^^^^^^^^^^^ Unfiltered Evaluation configuration for Link Prediction with a batchsize of 1000. False negative sampled edges will not be filtered out. .. code-block:: yaml evaluation: batch_size: 1000 negative_sampling: num_chunks: 10 negatives_per_positive: 100 filtered: false pipeline: sync: true epochs_per_eval: 1 Node Classification ^^^^^^^^^^^^^^^^^^^ Sample Evaluation configuration for a Node Classification tasks. .. code-block:: yaml evaluation: batch_size: 1000 pipeline: sync: true epochs_per_eval: 1