Training

This document details how to use TuGraph for training Graph Neural Networks (GNNs).

1. Training

When using TuGraph’s graph learning module for training, it can be divided into two categories: full graph training and mini-batch training.

Full graph training involves loading the entire graph from the TuGraph db into memory and training the GNN. Mini-batch training uses the sampling operator of the TuGraph graph learning module to sample the entire graph data and then input it into the training framework for training.

2. Mini-Batch Training

Mini-batch training requires the use of TuGraph’s graph learning module’s sampling operators, which currently support Neighbor Sampling, Edge Sampling, Random Walk Sampling, and Negative Sampling. The sampling result of the TuGraph graph learning module’s sampling operator is returned in the form of a List.

The following takes Neighbor Sampling as an example to introduce how to convert the sampled results into the training framework for format conversion. The user needs to provide a Sample class:

class TuGraphSample(object):
    def __init__(self, args=None):
        super(TuGraphSample, self).__init__()
        self.args = args
    def sample(self, g, seed_nodes):
        args = self.args
        # 1. Load graph data
        galaxy = PyGalaxy(args.db_path)
        galaxy.SetCurrentUser(args.username, args.password)
        db = galaxy.OpenGraph(args.graph_name, False)
        sample_node = seed_nodes.tolist()
        length = args.randomwalk_length
        NodeInfo = []
        EdgeInfo = []
        # 2. Sampling method, the result is stored in NodeInfo and EdgeInfo
        if args.sample_method == 'randomwalk':
            randomwalk.Process(db, 100, sample_node, length, NodeInfo, EdgeInfo)
        elif args.sample_method == 'negative':
            negativesample.Process(db, 100)
        else:
            neighborsample(db, 100, sample_node, args.nbor_sample_num, NodeInfo, EdgeInfo)
        del db
        del galaxy
        # 3. Format conversion of the result to fit the training format
        remap(EdgeInfo[0], EdgeInfo[1], NodeInfo[0])
        g = dgl.graph((EdgeInfo[0], EdgeInfo[1]))
        g.ndata['feat'] = torch.tensor(NodeInfo[1])
        g.ndata['label'] = torch.tensor(NodeInfo[2])
        return g

As shown in the code, the graph data is first loaded into memory. Then use the sampling operator to sample the graph data, and the results are stored in NodeInfo and EdgeInfo. NodeInfo and EdgeInfo are python list results, and the stored information results are as follows:

Graph Data	Storage Position
Edge source	EdgeInfo[0]
Edge destination	EdgeInfo[1]
Vertex ID	NodeInfo[0]
Vertex features	NodeInfo[1]
Vertex label	NodeInfo[2]

Finally, we format the result to fit the training format. Here we use the DGL training framework, so we construct a DGL Graph using the result data, and then return the DGL Graph.

Once we provide the TuGraphSample class, we can use it for Mini-Batch training. Let the data loading part of DGL use the sampler instance of TuGraphSample as follows:

    sampler = TugraphSample(args)
    fake_g = construct_graph() # just make dgl happy
    dataloader = dgl.dataloading.DataLoader(fake_g,
        torch.arange(train_nids),
        sampler,
        batch_size=batch_size,
        device=device,
        use_ddp=True,
        num_workers=0,
        drop_last=False,
        )

Use DGL for model training:

def train(dataloader, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-2, weight_decay=5e-4)
    model.train()
    s = time.time()
    for graph in dataloader:
        load_time = time.time()
        graph = dgl.add_self_loop(graph)
        logits = model(graph, graph.ndata['feat'])
        loss = loss_fcn(logits, graph.ndata['label'])
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_time = time.time()
        print('load time', load_time - s, 'train_time', train_time - load_time)
        s = time.time()
    return float(loss)

3. Full-Batch training

Full-Batch training of GNNs (Graph Neural Networks) is a type of training that involves processing the entire training dataset at once. It is one of the simplest and most straightforward training methods for GNNs, where the entire graph is treated as a single instance. In full-batch training, the entire dataset is loaded into memory and the model is trained on the entire graph. This type of training is especially useful for small to medium-sized graphs, and is mainly used for static graphs that do not change over time.

getdb. Process(db, olapondb, feature_len, NodeInfo, EdgeInfo)

The full graph are then fed into the training framework for training. Full code: learn/examples/train_full_cora.py。