TuGraph Stored Procedure Guide

This document mainly explains the instructions for using the TuGraph stored procedure.

1. Introduction

When the query/update logic that users need to express is more complex (such as cannot be described by Cypher or has high performance requirements), compared to calling multiple requests and completing the entire processing flow on the client, the stored procedure provided by TuGraph is a more concise and efficient choice.

Similar to traditional databases, the TuGraph stored procedure runs on the server side. Users can encapsulate the processing logic (i.e., multiple operations) into a procedure call, and further improve performance by parallel processing (such as using relevant C++ OLAP interfaces and built-in algorithms) during implementation.

There is a special type of API in the stored procedure for parallel data operations, which we call the Traversal API. Please refer to the documentation for more information.

2. Stored Procedure Version

Currently, TuGraph supports two versions of stored procedures, which are suitable for different scenarios. Version 3.5 only supports v1, which can be directly called through the REST or RPC interface. Starting from version 3.5, v2 is supported, which allows embedding calls in graph query languages ​​(such as Cypher). We call it POG (Procedure On Graph query language, APOC).

Procedure v1

Procedure v2

Applicable Scenarios

Extreme performance or complex multi-transaction management scenarios

General scenarios, highly integrated with Cypher

Transaction

Created internally in the function, multiple transactions

can be freely controlled Passed into the external function, single transaction

Signature (parameter definition)

Not required

Required

Input and output parameter types

Not required to specify

Need to specify parameter types

Cypher Standalone Call

Supported

Supported

Cypher Embeded Call

Not supported

Supported

Language

C++/Python/Rust

C++

Calling mode

Directly pass the string, usually in JSON format

Through variables in Cypher statements

In TuGraph, stored procedures v1 and v2 are managed separately, and support for create, delete, and query operations is provided. However, it is still not recommended to have the same name for multiple stored procedures.

3. Supported Languages

In TuGraph, users can dynamically load, update, and delete stored procedures. TuGraph supports the use of C++, Python, and Rust languages to write stored procedures. Among them, C++ language has the most complete support and the best performance.

Note that the stored procedure is the logic compiled and executed on the server side, which is independent of the language support on the client side.

4. Procedure v1 Interface

4.1.Write stored procedures

4.1.1.Write C++ stored procedure

Users can write C stored procedures by using core API or Traversal API. An example of a simple C stored procedure is as follows:

#include <iostream>
#include "lgraph.h"
using namespace lgraph_api;

extern "C" LGAPI bool Process(GraphDB& db, const std::string& request, std::string& response) {
	auto txn = db.CreateReadTxn();
	size_t n = 0;
	for (auto vit = txn.GetVertexIterator(); vit.IsValid(); vit.Next()) {
        if (vit.GetLabel() == "student") {
            auto age = vit.GetField("age");
            if (!age.is_null() && age.integer() == 10) n++; ## Count all students whose age is 10
        }
	}
    output = std::to_string(n);
    return true;
}

From the code, we can see the entry of a TuGraph C++ stored procedure is the Process function, with three parameters:

  • db: the TuGraph database instance

  • request: the input data, which can be a binary byte array, or any other format such as JSON string.

  • response: the output data, which can be a string or directly return binary data.

The return value of the Process function is a boolean value. When it returns true, it means that the request is successfully completed, otherwise it means that the stored procedure found an error during execution, and the user can return an error message through response to facilitate debugging.

After the C++ stored procedure is written, it needs to be compiled into a dynamic link library. TuGraph provides compile.sh script to help users automatically compile stored procedures. The compile.sh script has only one parameter, which is the name of the stored procedure, which is age_10 in the above example. Compile and call the command line as follows:

g++ -fno-gnu-unique -fPIC -g --std=c++14 -I/usr/local/include/lgraph -rdynamic -O3 -fopenmp -o age_10.so age_10.cpp /usr/local/lib64/liblgraph.so -shared

If the compilation goes well, age_10.so will be generated, which can then be loaded into the server by the user.

4.1.2 Writing Python stored procedures

The following snippet does the same thing as the above C++ stored procedure, but via TuGraph Python API:

def Process(db, input):
    txn = db.CreateReadTxn()
    it = txn.GetVertexIterator()
    n = 0
    while it.IsValid():
        if it.GetLabel() == 'student' and it['age'] and it['age'] == 10:
            n = n + 1
        it.Next()
    return (True, str(nv))

The Python stored procedure returns a tuple, the first element of which is a Boolean value indicating whether the stored procedure was successfully executed; the second element is a str, which contains the result to be returned.

Python stored procedures do not need to be compiled and can be loaded directly.

4.2.How to use stored procedures

4.2.1.Install Stored Procedures

Users can load stored procedures through REST API and RPC. Taking the REST API as an example, the C++ code to load age_10.so is as follows:

import requests
import json
import base64

data = {'name':'age_10'}
f = open('./age_10.so','rb')
content = f.read()
data['code_base64'] = base64.b64encode(content).decode()
data['description'] = 'Custom Page Rank Procedure'
data['read_only'] = true
data['code_type'] = 'so'
js = json.dumps(data)
r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin', data=js,
                  headers={'Content-Type':'application/json'})
print(r.status_code)    ## 正常时返回200

It should be noted that data['code'] at this time is a base64-processed string, and the binary code in age_10.so cannot be directly transmitted through JSON. In addition, the loading and deletion of stored procedures can only be operated by users with administrator privileges.

After the stored procedure is loaded, it will be saved in the database, and it will be automatically loaded after the server restarts. Also, if an update to the stored procedure is required, the same REST API is called. It is recommended that users update the corresponding descriptions when updating stored procedures, so as to distinguish stored procedures of different versions.

4.2.2.List Stored Procedures

During the running of the server, the user can obtain the list of stored procedures at any time. Its call is as follows:

>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin')
>>> r.status_code
200
>>> r.text
'{"plugins":[{"description":"Custom Page Rank Procedure", "name":"age_10", "read_only":true}]}'

4.2.3.Retrieve Stored Procedures Detail

While the server is running, users can obtain the details of a single stored procedure, including codes, at any time. Its call is as follows:

>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin/age_10')
>>> r.status_code
200
>>> r.text
'{"description":"Custom Page Rank Procedure", "name":"age_10", "read_only":true, "code_base64":<CODE>, "code_type":"so"}'

4.2.4.Call stored procedure

An example code for calling a stored procedure is as follows:

>>> r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin/age_10', data='',
                headers={'Content-Type':'application/json'})
>>> r.status_code
200
>>> r.text
9

4.2.5.Uninstall Stored Procedures

Deleting a stored procedure requires only the following call:

>>> r = requests.delete(url='http://127.0.0.1:7071/db/school/cpp_plugin/age_10')
>>> r.status_code
200

Similar to loading stored procedures, only admin users can delete stored procedures.

4.2.6.Upgrade Stored Procedures

You can upgrade a stored procedure with the following two steps:

  1. Uninstall the existing one.

  2. Install the new on.

TuGraph carefully manages the concurrency of stored procedure operations. Upgrading stored procedures will not affect concurrent runs of existing ones.

5. Procedure v2 Interface

5.1.Writing stored procedures

Users can write C++ stored procedures by using lgraph API. A simple C++ stored procedure example is as follows:

// peek_some_node_salt.cpp
#include <cstdlib>
#include "lgraph/lgraph.h"
#include "lgraph/lgraph_types.h"
#include "lgraph/lgraph_result.h"

#include "tools/json.hpp"

using json = nlohmann::json;
using namespace lgraph_api;

extern "C" LGAPI bool GetSignature(SigSpec &sig_spec) {
    sig_spec.input_list = {
        {.name = "limit", .index = 0, .type = LGraphType::INTEGER},
    };
    sig_spec.result_list = {
        {.name = "node", .index = 0, .type = LGraphType::NODE},
        {.name = "salt", .index = 1, .type = LGraphType::FLOAT}
    };
    return true;
}

extern "C" LGAPI bool ProcessInTxn(Transaction &txn,
                                   const std::string &request,
                                   Result &response) {
    int64_t limit;
    try {
        json input = json::parse(request);
        limit = input["limit"].get<int64_t>();
    } catch (std::exception &e) {
        response.ResetHeader({
            {"errMsg", LGraphType::STRING}
        });
        response.MutableRecord()->Insert(
            "errMsg",
            FieldData::String(std::string("error parsing json: ") + e.what()));
        return false;
    }

    response.ResetHeader({
        {"node", LGraphType::NODE},
        {"salt", LGraphType::FLOAT}
    });
    for (size_t i = 0; i < limit; i++) {
        auto r = response.MutableRecord();
        auto vit = txn.GetVertexIterator(i);
        r->Insert("node", vit);
        r->Insert("salt", FieldData::Float(20.23*float(i)));
    }
    return true;
}

From the code we can see:

  • The stored procedure defines a method GetSignature to get the signature. This method returns the signature of the stored procedure, which includes input parameter names and their types, and return parameters and their types. This enables the Cypher query statement to use the signature information to verify whether the input data and the returned data are reasonable when calling the stored procedure.

  • The entry function is the ProcessInTxn function, which has three parameters, which are:

  • txn: The transaction of the stored procedure, generally speaking, the transaction of the Cypher statement that calls the stored procedure.

  • request: input data, its content is the string of the input parameter type defined in GetSignature and the value passed in the Cypher query statement after json serialization. e.g. {num_iteration: 10}

  • response: output data, in order to ensure compatibility in the Cypher language, users can write the data processed by the stored procedure to lgraph_api::Result, and finally use lgraph_api::Result::Dump to serialize it into data in json format.

The return value of the ProcessInTxn function is a boolean value. When it returns true, it means that the request was successfully completed, otherwise it means that the stored procedure found an error during execution.

After the C++ stored procedure is written, it needs to be compiled into a dynamic link library. TuGraph provides compile.sh script to help users automatically compile stored procedures. The compile.sh script has only one parameter, which is the name of the stored procedure, which in the above example is custom_pagerank. Compile and call the command line as follows:

g++ -fno-gnu-unique -fPIC -g --std=c++14 -I/usr/local/include/lgraph -rdynamic -O3 -fopenmp -o custom_pagerank.so custom_pagerank.cpp /usr/local/lib64/liblgraph.so -shared

If the compilation goes well, custom_pagerank.so will be generated, which can then be loaded into the server by the user.

5.2.Load stored procedure

Users can load stored procedures through REST API and RPC. Taking the REST API as an example, the C++ code to load custom_pagerank.so is as follows:

import requests
import json
import base64

data = {'name':'custom_pagerank'}
f = open('./custom_pagerank.so','rb')
content = f.read()
data['code_base64'] = base64.b64encode(content).decode()
data['description'] = 'Custom Page Rank Procedure'
data['read_only'] = true
data['code_type'] = 'so'
js = json.dumps(data)
r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin', data=js,
            headers={'Content-Type':'application/json'})
print(r.status_code)    ## 正常时返回200

It should be noted that data['code'] at this time is a base64-processed string, and the binary code in custom_pagerank.so cannot be directly transmitted through JSON. In addition, the loading and deletion of stored procedures can only be operated by users with administrator privileges.

After the stored procedure is loaded, it will be saved in the database, and it will be automatically loaded after the server restarts. Also, if an update to the stored procedure is required, the same REST API is called. It is recommended that users update the corresponding descriptions when updating stored procedures, so as to distinguish stored procedures of different versions.

5.2.1.List loaded stored procedures

During the running of the server, the user can obtain the list of stored procedures at any time. Its call is as follows:

>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin')
>>> r.status_code
200
>>> r.text
'{"plugins":[{"description":"Custom Page Rank Procedure", "name":"custom_pagerank", "read_only":true}]}'

5.2.2.Get stored procedure details

While the server is running, users can obtain the details of a single stored procedure, including codes, at any time. Its call is as follows:

>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin/custom_pagerank')
>>> r.status_code
200
>>> r.text
'{"description":"Custom Page Rank Procedure", "name":"custom_pagerank", "read_only":true, "code_base64":<CODE>, "code_type":"so"}'

5.2.3.Call stored procedure

An example code for calling a stored procedure is as follows:

CALL plugin.cpp.custom_pagerank(10)
YIELD node, pr WITH node, pr
MATCH(node)-[r]->(n) RETURN node, r, n, pr

5.2.4.Delete stored procedure

Deleting a stored procedure requires only the following call:

>>> r = requests.delete(url='http://127.0.0.1:7071/db/school/cpp_plugin/custom_pagerank')
>>> r.status_code
200

Similar to loading stored procedures, only admin users can delete stored procedures.

5.2.5.Update stored procedure

Updating a stored procedure requires the following two steps:

  1. Delete the existing stored procedure

  2. Install the new stored procedure

TuGraph carefully manages the concurrency of stored procedure operations, and updating stored procedures will not affect the operation of existing stored procedures.