# Sampling API >This document provides a detailed guide on using the Sampling API of TuGraph. ## 1. Overview This manual introduces the Sampling API using TuGraph. ## 2. Graph Data Instantiation Before the sampling operation, load the graph data according to the graph data path, and map it into the olapondb graph analysis class, the code is as follows: ```python galaxy = PyGalaxy(args.db_path) # Create a galaxy instance based on the path galaxy.SetCurrentUser(args.username, args.password) # Set the current user db = galaxy.OpenGraph('default', False) # Open the graph database specified by db txn = db.CreateReadTxn() # Create a transaction instance olapondb = PyOlapOnDB('Empty', db, txn) # Instantiate OlapOnDB based on the graph loading method, graph database instance, and transaction instance del txn del db del galaxy ``` ## 3. Introduction to Sampling Operators The graph sampling operator is implemented in the cython layer and is used to sample the input graph. The generated NodeInfo is used to save point information such as feature attributes and label attributes, and EdgeInfo is used to save edge information. These metadata information can be used for features Extraction, network embedding and other tasks. Currently, the TuGraph graph learning module supports five sampling operators: GetDB, NeighborSampling, EdgeSampling, RandomWalkSampling, and NegativeSampling. ### 3.1.RandomWalk Operator Random walks are performed a specified number of times around the sampling nodes to obtain the sampling subgraph. ```python Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_node: list, step: size_t, NodeInfo: list, EdgeInfo: list) ``` Parameter list: db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_node: A list of nodes to be sampled. nei_num: The number of neighbor nodes to be sampled for each node. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything. ### 3.2.NeighborSampling Operator A certain number of nodes are sampled from the first-degree neighbors of the sampling nodes to obtain the sampling subgraph. ```python Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_node: list, nei_num: size_t, NodeInfo: list, EdgeInfo: list) ``` Parameter list: db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_node: A list of nodes to be sampled. nei_num: The number of neighbor nodes to be sampled for each node. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything. ### 3.3.NegativeSampling Operator The negative sampling algorithm is used to generate a subgraph of non-existent edges. ```python Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, num_samples: size_t, NodeInfo: list, EdgeInfo: list) ``` Parameter list: db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. num_samples: The number of false edges to be generated. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything. ### 3.4.EdgeSampling Operator The edge sampling algorithm is used to generate a subgraph of sampled edges. ```python Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, sample_rate: double, NodeInfo: list, EdgeInfo: list, EdgeInfo: list) ``` Parameter list: db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. sample_rate: The sampling rate of the edges to be selected. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything. ### 3.5.GetDB Operator Get the graph data from the database and convert it into the required data structure. ```python Process(db_: lgraph_db_python.PyGraphDB, olapondb: lgraph_db_python.PyOlapOnDB, feature_num: size_t, NodeInfo: list, EdgeInfo: list) ``` Parameter list: db_: An instance of the graph database. olapondb: Graph analysis class. feature_num: The length of the feature vectors for the nodes. NodeInfo: A list of dictionaries containing metadata information for the nodes. EdgeInfo: A list of dictionaries containing metadata information for the edges. Return value: This function does not return anything. ## 4. User-Defined Sampling Algorithm Users can also implement a custom sampling algorithm through the TuGraph Olap interface. For the interface document, see [here](../2.olap/5.python-api.md). This document mainly introduces the interface of related functions used by the graph sampling algorithm design.