Introduction

Marius is a system for scaling graph learning on a single machine. Marius supports training and evaluation of GNNs and graph embedding models for link prediction or node classification. See our papers Marius and Marius++ for technical details.

Feature Overview

  • Billion scale link prediction and node classification training and evaluation

  • High performance configuration-file based execution

  • PyTorch compatible Python API for custom training and evaluation routines

Define 3-layer GraphSage model in Python

nbr_sampler = m.nn.LayeredNeighborSampler([-1, -1, -1])

feat_dim = 128
num_classes = 40

device = torch.device("cuda")

feat_layer = m.nn.layers.FeatureLayer(dimension=feature_dim,
                                      device=device)

gs_layer1 = m.nn.layers.GraphSageLayer(input_dim=feature_dim,
                                       output_dim=feature_dim,
                                       device=device)

gs_layer2 = m.nn.layers.GraphSageLayer(input_dim=feature_dim,
                                       output_dim=feature_dim,
                                       device=device)

gs_layer3 = m.nn.layers.GraphSageLayer(input_dim=feature_dim,
                                       output_dim=num_classes,
                                       device=device)

encoder = m.encoders.GeneralEncoder(layers=[[feature_layer],
                                            [graph_sage_layer1],
                                            [graph_sage_layer2],
                                            [graph_sage_layer3]])

decoder = m.nn.decoders.node.NoOpNodeDecoder()
loss = m.nn.CrossEntropyLoss(reduction="sum")

model = m.nn.Model(encoder, decoder, loss)
model.optimizers = [m.nn.AdamOptimizer(model.named_parameters(),
                                       lr=.01)]

or with YAML configuration

model:
  learning_task: node_classification
  encoder:
    train_neighbor_sampling:
      - type: all
      - type: all
      - type: all
    layers:
      - - type: feature
          output_dim: 128
      - - type: gnn
          options:
            type: graph_sage
          input_dim: 128
          output_dim: 128
      - - type: GNN
          options:
            type: graph_sage
          input_dim: 128
          output_dim: 128
      - - type: gnn
          options:
            type: graph_sage
          input_dim: 128
          output_dim: 40
  decoder:
    type: node
  loss:
    type: cross_entropy
    options:
      reduction: sum
  dense_optimizer:
    type: adam
    options:
      learning_rate: 0.01

Preprocessing

  • Performant dataset preprocessing of raw datasets in CSV format

  • 13 built-in datasets for link prediction or node classification

  • Custom dataset support

Training & Evaluation

  • CPU-GPU pipeline to mitigate data movement overheads

  • Optimized neighborhood sampling and datastructures for GNN aggregation

  • Scale beyond CPU memory with a partition buffer

Supported Input Graphs

  • Formats: CSV/TSVs, PyTorch tensors, Numpy arrays

  • Graphs with or without edge-types or node features

  • Scales to graphs with billions of edges and 100s of millions of nodes

Supported Models

  • Tasks: Link prediction, node classification

  • GNN layers: GraphSage, GCN, RGCN, GAT

  • Link prediction decoders: ComplEx, DistMult, TransE

Upcoming Features

  • Configuration file optimizer and generator (in testing)

  • SQL database to graph conversion tool (in testing)

  • Multi-GPU training (in progress)

  • Model checkpointing (in progress)

  • KNN inference module

  • marius_preprocess parquet file support

  • Remote storage for graph data and embeddings

  • Additional encoder layers and decoder layers