# Traversal API

> This document mainly explains the Traversal API in the stored procedure of TuGraph.

1. Introduction
The powerful online analytical processing (OLAP) capability of TuGraph is an important feature that sets it apart from other graph databases. With the help of the C++ OLAP API (olap_on_db.h), users can quickly export a subgraph that needs to be analyzed, and then run iterative graph computing processes such as PageRank, connected components, and community detection on it, and then make corresponding decisions based on the results. The export and computation processes can be accelerated through parallel processing, achieving almost real-time analysis and processing, and avoiding the lengthy steps of traditional solutions that require exporting, transforming, and reimporting (ETL) data into dedicated analytical systems for offline processing.

TuGraph has built-in many commonly used graph analysis algorithms and rich auxiliary interfaces, so users hardly need to implement specific graph computing processes themselves. They just need to include the header files (.h files) of the corresponding algorithm library in their own programs when implementing their own storage procedures, and link the corresponding dynamic library files (.so) during compilation. In general, the only process that users need to implement themselves is the extraction of the subgraph to be analyzed.

Currently, the Traversal API only supports C++.

## 2. Interface

### 2.1. Snapshot

The Snapshot template class in C++ OLAP API is used to represent extracted static subgraphs. The EdgeData is used to represent the data type of the weight used for each edge in the subgraph. If the edges do not require weights, Empty is used as the EdgeData.

The extracted subgraph can be described using the constructor of the Snapshot class.

```c
Snapshot::Snapshot(
    GraphDB & db,
    Transaction & txn,
    size_t flags = 0,
    std::function<bool(VertexIterator &)> vertex_filter = nullptr,
    std::function<bool(OutEdgeIterator &, EdgeData &)> out_edge_filter = nullptr
);
```

In the above text, "db" represents the database handle, "txn" represents the transaction handle, and "flags" represents the options used during generation, with the optional values including the following combinations: SNAPSHOT_PARALLEL indicates that multiple threads are used for parallel extraction during export; SNAPSHOT_UNDIRECTED indicates that the exported graph needs to be transformed into an undirected graph.

"vertex_filter" is a user-defined filtering function for vertices, where a return value of true indicates that the vertex needs to be included in the extracted subgraph, and vice versa.

"out_edge_filter" is a user-defined filtering function for edges, where a return value of true indicates that the edge needs to be included in the extracted subgraph, and vice versa.

When the filtering functions are set to default values, it means that all vertices/edges should be included.

For other methods provided by the Snapshot class, please refer to the detailed C++ API documentation (olap_on_db.h).

### 2.2. Traversal

A common type of analysis in graph databases is to start from one or more vertices and iteratively expand and access their neighbors. Although this type of analysis can be done using Cypher, its performance is limited by the serial interpretation and execution when the depth of traversal is large. Writing stored procedures using the C++ Core API eliminates the overhead of interpretation but is still limited by the processing power of a single thread.

In order to enable users to accelerate these types of analysis tasks through parallel processing, we have wrapped a Traversal framework based on the C++ OLAP API. Users can directly use the FrontierTraversal and PathTraversal classes in this framework to perform iterative traversal analysis tasks. For specific usage instructions, please refer to the corresponding C++ API documentation (lgraph_traversal.h).

```c
ParallelVector<size_t> FindVertices(
    GraphDB & db,
    Transaction & txn,
    std::function<bool(VertexIterator &)> filter,
    bool parallel = false
);
```

This method can be used to find all vertices that satisfy a certain condition (when the filter returns true). When the "parallel" parameter is set to true, the search process will be executed in parallel.

```c
template <typename VertexData>
ParallelVector<VertexData> ExtractVertexData(
    GraphDB & db,
    Transaction & txn,
    ParallelVector<size_t> & frontier,
    std::function<void(VertexIterator &, VertexData &)> extract,
    bool parallel = false
);
```

This method can be used to extract (VertexData type) properties from a specified set of vertices (frontier) using the extract method. When the "parallel" parameter is set to true, the extraction process will be executed in parallel.

FrontierTraversal is suitable for cases where only the traversed set of vertices is of interest. When a user needs to access information along the path (vertices/edges along the path) in the traversal process or result, PathTraversal needs to be used. Both types of traversal have four parameters in their constructor, namely the database handle db, transaction handle txn, options flags and capacity. The available options include the following combinations: TRAVERSAL_PARALLEL indicates that multiple threads are used for parallel traversal; TRAVERSAL_ALLOW_REVISITS indicates that vertices can be visited repeatedly during traversal (PathTraversal implicitly includes this option).
capacity indicates the capacity of the path collection during initialization.

```c
void SetFrontier(size_t root_vid);
void SetFrontier(ParallelVector<size_t> & root_vids);
void SetFrontier(std::function<bool(VertexIterator &)> root_vertex_filter);
```

Both types of traversal have three ways to set the starting vertex/vertex set for traversal. The first two methods directly specify the vertex ID, while the last method is similar to FindVertices.

Both types of traversal start from the set of vertices in the current layer. They use the extension function to access each outgoing edge/incoming edge/outgoing and incoming edges, and use a user-defined filter function to determine if the extension is successful. If successful, the neighboring vertex/appended path with the edge is added to the set of vertices/paths in the next layer.

```c
void ExpandOutEdges(
    std::function<bool(OutEdgeIterator &)> out_edge_filter = nullptr,
    std::function<bool(VertexIterator &)> out_neighbour_filter = nullptr
);
void ExpandInEdges(
    std::function<bool(InEdgeIterator &)> in_edge_filter = nullptr,
    std::function<bool(VertexIterator &)> in_neighbour_filter = nullptr
);
void ExpandEdges(
    std::function<bool(OutEdgeIterator &)> out_edge_filter = nullptr,
    std::function<bool(InEdgeIterator &)> in_edge_filter = nullptr,
    std::function<bool(VertexIterator &)> out_neighbour_filter = nullptr,
    std::function<bool(VertexIterator &)> in_neighbour_filter = nullptr
);
```

The above describes the three traversal methods of FrontierTraversal. It starts from the current set of vertices and, for each vertex in the set, iterates through each outgoing edge/incoming edge/outgoing and incoming edges. If the edge or neighbor vertex satisfies the user-defined filter conditions (where edge_filter is the filter function for edges and neighbour_filter is the filter function for neighbor vertices), the neighbor vertex is added to the new set of vertices.

```c
ParallelVector<size_t> & GetFrontier();
```

After the expansion of the current set of vertices is finished, the new set of vertices can be obtained using the above method.

```c
void ExpandOutEdges(
    std::function<bool(OutEdgeIterator &, Path &, IteratorHelper &)> out_edge_filter = nullptr,
    std::function<bool(VertexIterator &, Path &, IteratorHelper &)> out_neighbour_filter = nullptr
);
void ExpandInEdges(
    std::function<bool(InEdgeIterator &, Path &, IteratorHelper &)> in_edge_filter = nullptr,
    std::function<bool(VertexIterator &, Path &, IteratorHelper &)> in_neighbour_filter = nullptr
);
void ExpandEdges(
    std::function<bool(OutEdgeIterator &, Path &, IteratorHelper &)> out_edge_filter = nullptr,
    std::function<bool(InEdgeIterator &, Path &, IteratorHelper &)> in_edge_filter = nullptr,
    std::function<bool(VertexIterator &, Path &, IteratorHelper &)> out_neighbour_filter = nullptr,
    std::function<bool(VertexIterator &, Path &, IteratorHelper &)> in_neighbour_filter = nullptr
);
```

The three traversal methods of PathTraversal are similar to FrontierTraversal, except that the user-defined filter function adds two additional parameters: Path, which contains the path before the new edge is expanded, and IteratorHelper, which can be used to convert the vertices/edges in the path to corresponding iterators in the database. The relevant documentation can be found in the corresponding C++ API documentation.