Sampling API
This document provides a detailed guide on using the Sampling API of TuGraph.
1. Overview
This manual introduces the Sampling API using TuGraph.
2. Graph Data Instantiation
Before the sampling operation, load the graph data according to the graph data path, and map it into the olapondb graph analysis class, the code is as follows:
galaxy = PyGalaxy(args.db_path) # Create a galaxy instance based on the path
galaxy.SetCurrentUser(args.username, args.password) # Set the current user
db = galaxy.OpenGraph('default', False) # Open the graph database specified by db
txn = db.CreateReadTxn() # Create a transaction instance
olapondb = PyOlapOnDB('Empty', db, txn) # Instantiate OlapOnDB based on the graph loading method, graph database instance, and transaction instance
del txn
del db
del galaxy
3. Introduction to Sampling Operators
The graph sampling operator is implemented in the cython layer and is used to sample the input graph. The generated NodeInfo is used to save point information such as feature attributes and label attributes, and EdgeInfo is used to save edge information. These metadata information can be used for features Extraction, network embedding and other tasks. Currently, the TuGraph graph learning module supports five sampling operators: GetDB, NeighborSampling, EdgeSampling, RandomWalkSampling, and NegativeSampling.
3.1.RandomWalk Operator
Random walks are performed a specified number of times around the sampling nodes to obtain the sampling subgraph.
Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_node: list, step: size_t, NodeInfo: list, EdgeInfo: list)
Parameter list:
db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_node: A list of nodes to be sampled. nei_num: The number of neighbor nodes to be sampled for each node. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything.
3.2.NeighborSampling Operator
A certain number of nodes are sampled from the first-degree neighbors of the sampling nodes to obtain the sampling subgraph.
Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_node: list, nei_num: size_t, NodeInfo: list, EdgeInfo: list)
Parameter list:
db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_node: A list of nodes to be sampled. nei_num: The number of neighbor nodes to be sampled for each node. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything.
3.3.NegativeSampling Operator
The negative sampling algorithm is used to generate a subgraph of non-existent edges.
Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, num_samples: size_t, NodeInfo: list, EdgeInfo: list)
Parameter list:
db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. num_samples: The number of false edges to be generated. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything.
3.4.EdgeSampling Operator
The edge sampling algorithm is used to generate a subgraph of sampled edges.
Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_rate: double, NodeInfo: list, EdgeInfo: list, EdgeInfo: list)
Parameter list:
db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_rate: The sampling rate of the edges to be selected. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything.
3.5.GetDB Operator
Get the graph data from the database and convert it into the required data structure.
Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, NodeInfo: list, EdgeInfo: list)
Parameter list:
db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything.
4. User-Defined Sampling Algorithm
Users can also implement a custom sampling algorithm through the TuGraph Olap interface. For the interface document, see here. This document mainly introduces the interface of related functions used by the graph sampling algorithm design.