Getting Started
Build and Install
Requirements
CUDA >= 10.1
CuDNN >= 7
pytorch >= 1.8
python >= 3.6
GCC >= 7 (On Linux) or Clang 12.0 (On MacOS)
cmake >= 3.12
make >= 3.8
Pip installation
First check that the required software is installed (see above).
git clone https://github.com/marius-team/marius.git pip3 install .
The Python API can be accessed with import marius
.
The following commands will be installed: - marius_train: Train models using configuration files and the command line - marius_eval: Command line model evaluation - marius_preprocess: Built-in dataset downloading and preprocessing - marius_predict: Batch inference tool for link prediction or node classification
CMake build (No Python API)
This does not build the Python API, but only the C++ sources and marius_train executable.
git clone https://github.com/marius-team/marius.git # installs only marius.tools (required) MARIUS_NO_BINDINGS=1 pip3 install . mkdir build cd build cmake ../ -DUSE_CUDA=1 make marius_train -j cd .. # run with build/marius_train config.yaml
Configuration Interface
See configuration examples for detailed examples and the configuration schema for all options.
Preprocess & Configuration
Preprocess dataset: this downloads and preprocesses the dataset into the arxiv_example/ directory
marius_preprocess --dataset ogbn_arxiv --output_dir arxiv_example/
Define configuration file: 1-layer GraphSage GNN
model: learning_task: NODE_CLASSIFICATION encoder: train_neighbor_sampling: - type: ALL layers: - - type: FEATURE output_dim: 128 - - type: GNN options: type: GRAPH_SAGE aggregator: MEAN input_dim: 128 output_dim: 40 decoder: type: NODE loss: type: CROSS_ENTROPY options: reduction: SUM dense_optimizer: type: ADAM options: learning_rate: 0.01 storage: device_type: cuda dataset: dataset_dir: arxiv_example/ num_edges: 1166243 num_train: 90941 num_nodes: 169343 num_relations: 1 num_valid: 29799 num_test: 48603 node_feature_dim: 128 num_classes: 40 edges: type: DEVICE_MEMORY options: dtype: int features: type: DEVICE_MEMORY options: dtype: float training: batch_size: 1000 num_epochs: 10 pipeline: sync: true evaluation: batch_size: 1000 pipeline: sync: true
Training
Train the model described in the configuration file for 10 epochs.
marius_train arxiv_config.yamlThe output will look similar to:
[04/08/22 01:12:10.693] ################ Starting training epoch 1 ################ [04/08/22 01:12:10.721] Nodes processed: [10000/90941], 11.00% [04/08/22 01:12:10.741] Nodes processed: [20000/90941], 21.99% [04/08/22 01:12:10.762] Nodes processed: [30000/90941], 32.99% [04/08/22 01:12:10.800] Nodes processed: [40000/90941], 43.98% [04/08/22 01:12:10.820] Nodes processed: [50000/90941], 54.98% [04/08/22 01:12:10.840] Nodes processed: [60000/90941], 65.98% [04/08/22 01:12:10.863] Nodes processed: [70000/90941], 76.97% [04/08/22 01:12:10.883] Nodes processed: [80000/90941], 87.97% [04/08/22 01:12:10.916] Nodes processed: [90000/90941], 98.97% [04/08/22 01:12:10.918] Nodes processed: [90941/90941], 100.00% [04/08/22 01:12:10.918] ################ Finished training epoch 1 ################ [04/08/22 01:12:10.918] Epoch Runtime: 224ms [04/08/22 01:12:10.918] Nodes per Second: 405986.6 [04/08/22 01:12:10.918] Evaluating validation set [04/08/22 01:12:11.005] ================================= Node Classification: 29799 nodes evaluated Accuracy: 58.669754% ================================= [04/08/22 01:12:11.005] Evaluating test set [04/08/22 01:12:11.133] ================================= Node Classification: 48603 nodes evaluated Accuracy: 57.936753% ================================= ...
Inference
Evaluate the test set for the dataset after 10 epochs have completed.
marius_eval arxiv_config.yamlOutput:
[04/08/22 02:06:25.330] Evaluating test set [04/08/22 02:06:25.585] ================================= Node Classification: 48603 nodes evaluated Accuracy: 64.963068% =================================
Python API
See the Python examples and API docs (under construction) for more details.
Preprocess Dataset and load graph data
Import marius and preprocess ogbn_arxiv for node classifcation.
import torch import marius as m from marius.tools.preprocess.datasets.ogbn_arxiv import OGBNArxiv # initialize and preprocess dataset dataset = OGBNArxiv("arvix_example/") dataset.download() dataset_stats = dataset.preprocess()
Load dataset tensors into GPU memory
device = torch.device("cuda") edges = m.storage.tensor_from_file(filename=dataset.edge_list_file, shape=[dataset_stats.num_edges, -1], dtype=torch.int32, device=device) train_nodes = m.storage.tensor_from_file(filename=dataset.train_nodes_file, shape=[dataset_stats.num_train], dtype=torch.int32, device=device) test_nodes = m.storage.tensor_from_file(filename=dataset.test_nodes_file, shape=[dataset_stats.num_test], dtype=torch.int32, device=device) features = m.storage.tensor_from_file(filename=dataset.node_features_file, shape=[dataset_stats.num_nodes, -1], dtype=torch.float32, device=device) labels = m.storage.tensor_from_file(filename=dataset.node_labels_file, shape=[dataset_stats.num_nodes], dtype=torch.int32, device=device)
Define Model
Define single layer graph sage model
feature_dim = dataset_stats.node_feature_dim num_classes = dataset_stats.num_classes feature_layer = m.nn.layers.FeatureLayer(dimension=feature_dim, device=device) graph_sage_layer = m.nn.layers.GraphSageLayer(input_dim=feature_dim, output_dim=num_classes, device=device) encoder = m.encoders.GeneralEncoder(layers=[[feature_layer], [graph_sage_layer]]) decoder = m.nn.decoders.node.NoOpNodeDecoder() loss = m.nn.CrossEntropyLoss(reduction="sum") reporter = m.report.NodeClassificationReporter() reporter.add_metric(m.report.CategoricalAccuracy()) model = m.nn.Model(encoder, decoder, loss, reporter) model.optimizers = [m.nn.AdamOptimizer(model.named_parameters(), lr=.01)] nbr_sampler = m.data.samplers.LayeredNeighborSampler(num_neighbors=[-1])
Training and Evaluation
Setup training and evaluation dataloaders
train_loader = m.data.DataLoader(edges=edges, batch_size=1000 nodes=train_nodes, nbr_sampler=nbr_sampler, learning_task="nc") eval_loader = m.data.DataLoader(edges=edges, batch_size=1000 nodes=test_nodes, nbr_sampler=nbr_sampler, learning_task="nc)
Train 10 epochs
num_epochs = 10 for i in range(num_epochs) train_loader.initializeBatches() while train_loader.hasNextBatch(): batch = train_loader.getBatch() model.train_batch(batch)
Evaluate Test Set
eval_loader.initializeBatches() while eval_loader.hasNextBatch(): batch = eval_loader.getBatch() model.evaluate_batch(batch) model.reporter.report()