TuGraph Stored Procedure Guide
This document mainly explains the instructions for using the TuGraph stored procedure.
1. Introduction
When the query/update logic that users need to express is more complex (such as cannot be described by Cypher or has high performance requirements), compared to calling multiple requests and completing the entire processing flow on the client, the stored procedure provided by TuGraph is a more concise and efficient choice.
Similar to traditional databases, the TuGraph stored procedure runs on the server side. Users can encapsulate the processing logic (i.e., multiple operations) into a procedure call, and further improve performance by parallel processing (such as using relevant C++ OLAP interfaces and built-in algorithms) during implementation.
There is a special type of API in the stored procedure for parallel data operations, which we call the Traversal API. Please refer to the documentation for more information.
2. Stored Procedure Version
Currently, TuGraph supports two versions of stored procedures, which are suitable for different scenarios. Version 3.5 only supports v1, which can be directly called through the REST or RPC interface. Starting from version 3.5, v2 is supported, which allows embedding calls in graph query languages (such as Cypher). We call it POG (Procedure On Graph query language, APOC).
Procedure v1 |
Procedure v2 |
|
---|---|---|
Applicable Scenarios |
Extreme performance or complex multi-transaction management scenarios |
General scenarios, highly integrated with Cypher |
Transaction |
Created internally in the function, multiple transactions |
can be freely controlled Passed into the external function, single transaction |
Signature (parameter definition) |
Not required |
Required |
Input and output parameter types |
Not required to specify |
Need to specify parameter types |
Cypher Standalone Call |
Supported |
Supported |
Cypher Embeded Call |
Not supported |
Supported |
Language |
C++/Python/Rust |
C++ |
Calling mode |
Directly pass the string, usually in JSON format |
Through variables in Cypher statements |
In TuGraph, stored procedures v1 and v2 are managed separately, and support for create, delete, and query operations is provided. However, it is still not recommended to have the same name for multiple stored procedures.
3. Supported Languages
In TuGraph, users can dynamically load, update, and delete stored procedures. TuGraph supports the use of C++, Python, and Rust languages to write stored procedures. Among them, C++ language has the most complete support and the best performance.
Note that the stored procedure is the logic compiled and executed on the server side, which is independent of the language support on the client side.
4. Procedure v1 Interface
4.1.Write stored procedures
4.1.1.Write C++ stored procedure
Users can write C stored procedures by using core API or Traversal API. An example of a simple C stored procedure is as follows:
#include <iostream>
#include "lgraph.h"
using namespace lgraph_api;
extern "C" LGAPI bool Process(GraphDB& db, const std::string& request, std::string& response) {
auto txn = db.CreateReadTxn();
size_t n = 0;
for (auto vit = txn.GetVertexIterator(); vit.IsValid(); vit.Next()) {
if (vit.GetLabel() == "student") {
auto age = vit.GetField("age");
if (!age.is_null() && age.integer() == 10) n++; ## Count all students whose age is 10
}
}
output = std::to_string(n);
return true;
}
From the code, we can see the entry of a TuGraph C++ stored procedure is the Process
function, with three parameters:
db
: the TuGraph database instancerequest
: the input data, which can be a binary byte array, or any other format such as JSON string.response
: the output data, which can be a string or directly return binary data.
The return value of the Process
function is a boolean value. When it returns true
, it means that the request is successfully completed, otherwise it means that the stored procedure found an error during execution, and the user can return an error message through response
to facilitate debugging.
After the C++ stored procedure is written, it needs to be compiled into a dynamic link library. TuGraph provides compile.sh
script to help users automatically compile stored procedures. The compile.sh
script has only one parameter, which is the name of the stored procedure, which is age_10
in the above example. Compile and call the command line as follows:
g++ -fno-gnu-unique -fPIC -g --std=c++14 -I/usr/local/include/lgraph -rdynamic -O3 -fopenmp -o age_10.so age_10.cpp /usr/local/lib64/liblgraph.so -shared
If the compilation goes well, age_10.so will be generated, which can then be loaded into the server by the user.
4.1.2 Writing Python stored procedures
The following snippet does the same thing as the above C++ stored procedure, but via TuGraph Python API:
def Process(db, input):
txn = db.CreateReadTxn()
it = txn.GetVertexIterator()
n = 0
while it.IsValid():
if it.GetLabel() == 'student' and it['age'] and it['age'] == 10:
n = n + 1
it.Next()
return (True, str(nv))
The Python stored procedure returns a tuple, the first element of which is a Boolean value indicating whether the stored procedure was successfully executed; the second element is a str
, which contains the result to be returned.
Python stored procedures do not need to be compiled and can be loaded directly.
4.2.How to use stored procedures
4.2.1.Install Stored Procedures
Users can load stored procedures through REST API and RPC. Taking the REST API as an example, the C++ code to load age_10.so
is as follows:
import requests
import json
import base64
data = {'name':'age_10'}
f = open('./age_10.so','rb')
content = f.read()
data['code_base64'] = base64.b64encode(content).decode()
data['description'] = 'Custom Page Rank Procedure'
data['read_only'] = true
data['code_type'] = 'so'
js = json.dumps(data)
r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin', data=js,
headers={'Content-Type':'application/json'})
print(r.status_code) ## 正常时返回200
It should be noted that data['code']
at this time is a base64-processed string, and the binary code in age_10.so
cannot be directly transmitted through JSON. In addition, the loading and deletion of stored procedures can only be operated by users with administrator privileges.
After the stored procedure is loaded, it will be saved in the database, and it will be automatically loaded after the server restarts. Also, if an update to the stored procedure is required, the same REST API is called. It is recommended that users update the corresponding descriptions when updating stored procedures, so as to distinguish stored procedures of different versions.
4.2.2.List Stored Procedures
During the running of the server, the user can obtain the list of stored procedures at any time. Its call is as follows:
>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin')
>>> r.status_code
200
>>> r.text
'{"plugins":[{"description":"Custom Page Rank Procedure", "name":"age_10", "read_only":true}]}'
4.2.3.Retrieve Stored Procedures Detail
While the server is running, users can obtain the details of a single stored procedure, including codes, at any time. Its call is as follows:
>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin/age_10')
>>> r.status_code
200
>>> r.text
'{"description":"Custom Page Rank Procedure", "name":"age_10", "read_only":true, "code_base64":<CODE>, "code_type":"so"}'
4.2.4.Call stored procedure
An example code for calling a stored procedure is as follows:
>>> r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin/age_10', data='',
headers={'Content-Type':'application/json'})
>>> r.status_code
200
>>> r.text
9
4.2.5.Uninstall Stored Procedures
Deleting a stored procedure requires only the following call:
>>> r = requests.delete(url='http://127.0.0.1:7071/db/school/cpp_plugin/age_10')
>>> r.status_code
200
Similar to loading stored procedures, only admin users can delete stored procedures.
4.2.6.Upgrade Stored Procedures
You can upgrade a stored procedure with the following two steps:
Uninstall the existing one.
Install the new on.
TuGraph carefully manages the concurrency of stored procedure operations. Upgrading stored procedures will not affect concurrent runs of existing ones.
5. Procedure v2 Interface
5.1.Writing stored procedures
Users can write C++ stored procedures by using lgraph API. A simple C++ stored procedure example is as follows:
// peek_some_node_salt.cpp
#include <cstdlib>
#include "lgraph/lgraph.h"
#include "lgraph/lgraph_types.h"
#include "lgraph/lgraph_result.h"
#include "tools/json.hpp"
using json = nlohmann::json;
using namespace lgraph_api;
extern "C" LGAPI bool GetSignature(SigSpec &sig_spec) {
sig_spec.input_list = {
{.name = "limit", .index = 0, .type = LGraphType::INTEGER},
};
sig_spec.result_list = {
{.name = "node", .index = 0, .type = LGraphType::NODE},
{.name = "salt", .index = 1, .type = LGraphType::FLOAT}
};
return true;
}
extern "C" LGAPI bool ProcessInTxn(Transaction &txn,
const std::string &request,
Result &response) {
int64_t limit;
try {
json input = json::parse(request);
limit = input["limit"].get<int64_t>();
} catch (std::exception &e) {
response.ResetHeader({
{"errMsg", LGraphType::STRING}
});
response.MutableRecord()->Insert(
"errMsg",
FieldData::String(std::string("error parsing json: ") + e.what()));
return false;
}
response.ResetHeader({
{"node", LGraphType::NODE},
{"salt", LGraphType::FLOAT}
});
for (size_t i = 0; i < limit; i++) {
auto r = response.MutableRecord();
auto vit = txn.GetVertexIterator(i);
r->Insert("node", vit);
r->Insert("salt", FieldData::Float(20.23*float(i)));
}
return true;
}
From the code we can see:
The stored procedure defines a method
GetSignature
to get the signature. This method returns the signature of the stored procedure, which includes input parameter names and their types, and return parameters and their types. This enables the Cypher query statement to use the signature information to verify whether the input data and the returned data are reasonable when calling the stored procedure.The entry function is the
ProcessInTxn
function, which has three parameters, which are:txn
: The transaction of the stored procedure, generally speaking, the transaction of the Cypher statement that calls the stored procedure.request
: input data, its content is the string of the input parameter type defined inGetSignature
and the value passed in the Cypher query statement after json serialization. e.g.{num_iteration: 10}
response
: output data, in order to ensure compatibility in the Cypher language, users can write the data processed by the stored procedure tolgraph_api::Result
, and finally uselgraph_api::Result::Dump
to serialize it into data in json format.
The return value of the ProcessInTxn
function is a boolean value. When it returns true
, it means that the request was successfully completed, otherwise it means that the stored procedure found an error during execution.
After the C++ stored procedure is written, it needs to be compiled into a dynamic link library. TuGraph provides compile.sh
script to help users automatically compile stored procedures. The compile.sh
script has only one parameter, which is the name of the stored procedure, which in the above example is custom_pagerank
. Compile and call the command line as follows:
g++ -fno-gnu-unique -fPIC -g --std=c++14 -I/usr/local/include/lgraph -rdynamic -O3 -fopenmp -o custom_pagerank.so custom_pagerank.cpp /usr/local/lib64/liblgraph.so -shared
If the compilation goes well, custom_pagerank.so will be generated, which can then be loaded into the server by the user.
5.2.Load stored procedure
Users can load stored procedures through REST API and RPC. Taking the REST API as an example, the C++ code to load custom_pagerank.so
is as follows:
import requests
import json
import base64
data = {'name':'custom_pagerank'}
f = open('./custom_pagerank.so','rb')
content = f.read()
data['code_base64'] = base64.b64encode(content).decode()
data['description'] = 'Custom Page Rank Procedure'
data['read_only'] = true
data['code_type'] = 'so'
js = json.dumps(data)
r = requests.post(url='http://127.0.0.1:7071/db/school/cpp_plugin', data=js,
headers={'Content-Type':'application/json'})
print(r.status_code) ## 正常时返回200
It should be noted that data['code']
at this time is a base64-processed string, and the binary code in custom_pagerank.so
cannot be directly transmitted through JSON. In addition, the loading and deletion of stored procedures can only be operated by users with administrator privileges.
After the stored procedure is loaded, it will be saved in the database, and it will be automatically loaded after the server restarts. Also, if an update to the stored procedure is required, the same REST API is called. It is recommended that users update the corresponding descriptions when updating stored procedures, so as to distinguish stored procedures of different versions.
5.2.1.List loaded stored procedures
During the running of the server, the user can obtain the list of stored procedures at any time. Its call is as follows:
>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin')
>>> r.status_code
200
>>> r.text
'{"plugins":[{"description":"Custom Page Rank Procedure", "name":"custom_pagerank", "read_only":true}]}'
5.2.2.Get stored procedure details
While the server is running, users can obtain the details of a single stored procedure, including codes, at any time. Its call is as follows:
>>> r = requests.get('http://127.0.0.1:7071/db/school/cpp_plugin/custom_pagerank')
>>> r.status_code
200
>>> r.text
'{"description":"Custom Page Rank Procedure", "name":"custom_pagerank", "read_only":true, "code_base64":<CODE>, "code_type":"so"}'
5.2.3.Call stored procedure
An example code for calling a stored procedure is as follows:
CALL plugin.cpp.custom_pagerank(10)
YIELD node, pr WITH node, pr
MATCH(node)-[r]->(n) RETURN node, r, n, pr
5.2.4.Delete stored procedure
Deleting a stored procedure requires only the following call:
>>> r = requests.delete(url='http://127.0.0.1:7071/db/school/cpp_plugin/custom_pagerank')
>>> r.status_code
200
Similar to loading stored procedures, only admin users can delete stored procedures.
5.2.5.Update stored procedure
Updating a stored procedure requires the following two steps:
Delete the existing stored procedure
Install the new stored procedure
TuGraph carefully manages the concurrency of stored procedure operations, and updating stored procedures will not affect the operation of existing stored procedures.