Bootstrap program

This document is a bootstrap program designed for TuGraph users. Before reading the detailed documents, users should read this document first to have a general understanding of TuGraph’s graph computing operation process, and it will be more convenient to read the detailed documents later. The bootstrap program is a simple instance of a bfs (breadth-first search) program based on Tugraph, and we will focus on how it is used.

1.Introduction to TuGraph-Graph Analysis Engine

TuGraph’s graph analysis engine is mainly used for full graph/full data analysis tasks. With the help of TuGraph’s C++ / Python graph analysis engine API, users can quickly derive a complex subgraph from different data sources, and then run iterative graph algorithms such as PageRank, LPA, WCC on the subgraph, and finally make corresponding countermeasures according to the running results.

In TuGraph, the process of both exporting and computing can be accelerated by in-memory parallel processing, resulting in near-real-time processing and analysis. Compared to traditional methods, this approach can avoid the overhead of data export and disk drop, and can achieve ideal performance by using a compact graph data structure.

TuGraph has 6 built-in algorithms in the community version and 25 built-in algorithms in the commercial version. Users hardly need to implement the specific graph calculation process themselves. For details, please refer to algorithms.md.

According to different data sources and implementations, it can be divided into Procedure, Embed and Standalone, which all inherit from OlapBase API. OlapBase API interface documentation can be found in OlAPBase-APi.md.

The data source of Procedure and Embed is the db data preloaded in the graph database, which can be compiled to generate the.so file used by TuGraph-Web loading and the embed file used by the background terminal respectively. The input graph data are loaded in the form of db. The interface document can refer to olapondb-api.md.

Standalone is used to compile and generate a standalone file. Different from the former, the input graph data of the file is loaded in the form of txt, binary, and ODPS files, and the interface document can refer to olapondisk-api.md.

2.Procedure compile and run

This mode is mainly used for visual loading and running on the TuGraph-web interface. The usage method is as follows:

2.1.C++

Run bash make_so.sh bfs in the TuGraph/plugins directory to the BFs.so file in the TuGraph/plugins directory, upload the file as a plug-in to Tugraph-web, input parameters and then execute.

2.2.Python

Upload the python file as a plugin to TuGraph-web, and execute it after inputting parameters.

Example: Compile the.so algorithm file at TuGraph/plugins bash make_so.sh bfs

After loading the bfs.so (or TuGraph/plugins/cython/bfs.py) file as a plug-in to TuGraph-web, enter the following json parameters:

{
  "root_id": "0",
  "label": "node",
  "field": "id"
}

The following result

result:”{“core_cost”:0.013641119003295898, “found_vertices”:3829, “num_edges”:88234, “num_vertices”:4039, “output_cost”:8.821487426757813e-06, “prepare_cost”:0.03479194641113281, “total_cost”:0.04844188690185547}”

The output :

  • num_edges: the number of edges in the graph

  • num_vertices: the number of nodes in the graph

  • prepare_cost: Represents the time required for the preprocessing phase. The preprocessing phase works: loading parameters, graph data loading, index initialization.

  • core_cost: Represents the time required for the algorithm to run.

  • found_vertices: Number of vertices found.

  • output_cost: the time it takes for the algorithm result to be written back to db.。

  • total_cost: The overall running time of the algorithm is executed.

make_so.sh This file is used to compile the graph algorithm files involved in TuGraph-OLAP into a.so file available for TuGraph-web.

3.Embed compile and run

This way is mainly used for TuGraph in the background program on the preloaded db graph data algorithm analysis. Its use method is as follows:

To complete the embed_main.cpp file in TuGraph/plugins directory, add the data name, input parameters, data path and other information, as shown in the following example:

3.1.C++

#include <iostream>
#include "lgraph/lgraph.h"
#include "lgraph/olap_base.h"
using namespace std;

extern "C" bool Process(lgraph_api::GraphDB &db, const std::string &request, std::string &response);

int main(int argc, char **argv) {
    // db_path Specifies the path for saving the preloaded graph data
    std::string db_path = "../fb_db/";
    if (argc > 1)
        db_path = argv[1];
    lgraph_api::Galaxy g(db_path);
    g.SetCurrentUser("admin", "73@TuGraph");
    // Specifies the name of the graph
    lgraph_api::GraphDB db = g.OpenGraph("fb_db");
    std::string resp;
    // Enter the algorithm parameters in json format
    bool r = Process(db, "{\"root_id\":\"0\", \"label\":\"node\",\"field\":\"id\"}", resp);
    cout << r << endl;
    cout << resp << endl;
    return 0;
}

Run bash make_so.sh bfs in TuGraph/plugins to bfs_procedure in TuGraph/plugins/cpp. bash make_embed.sh bfs

Do this in the TuGraph/plugins folder ./cpp/bfs_procedure The result is returned.

Input: {“root_id”:”0”, “label”:”node”,”field”:”id”} found_vertices = 3829 {“core_cost”:0.025603055953979492, “found_vertices”:3829, “num_edges”:88234, “num_vertices”:4039, “output_cost”:9.059906005859375e-06, “prepare_cost”:0.056738853454589844, “total_cost”:0.0823509693145752}

Input indicates the input parameters.The other parameters are described as above.

3.2.Python

To compile bfs.so, run bash make_cython_so.sh bfs in TuGraph/plugins or python3 setup.py build_ext -i in TuGraph/plugins/cython.

After obtaining bfs.so, it can be imported and used in Python, as shown in TuGraph/plugins/embed_main.py:

# TuGraph/plugins/embed_main.py
from lgraph_db_python import *

import bfs as python_plugin

if __name__ == "__main__":
    galaxy = PyGalaxy("../build/output/lgraph_db")
    galaxy.SetCurrentUser("admin", "73@TuGraph")
    db = galaxy.OpenGraph("default", False)
    res = python_plugin.Process(db, "{\"root_id\":\"0\", \"label\":\"node\",\"field\":\"id\"}".encode('utf-8'))
    print(res)
    del db
    del galaxy

# shell
python3 embed_main.py

The output is the same as C++

4.Standalone compile and run

This file is mainly used to load the graph data directly at the terminal and run the printout results. Here’s how to use it:

In TuGraph/build directory make bfs_standalone (need to include boost/sort/sort.hpp in the g++ default include path) can get bfs_standalone file, the file is generated with TuGraph/build/output/algo folder.

Run: Go to the TuGraph/build directory and do./output/algo/bfs_standalone -- type [type] -- input_dir [input_dir] --id_mapping [id_mapping] -- vertices [vertices] --root (root) - output_dir [output_dir]

Ready to run.

  • [type] : indicates the type source of the input graph file, including text file, BINARY_FILE binary file, and ODPS source.

  • [input_dir] : indicates the path of the input graph file folder, which may contain one or more input files. TuGraph will read all the files under [input_dir] when reading the input file. It is required that [input_dir] can only contain the input file and cannot contain other files. Parameters cannot be omitted.

  • [id_mapping]:When reading the side table, whether to do id mapping for the input data to make it conform to the form of algorithm operation. 1 is required for id mapping, 0 is not required. This process will take some time. The parameter can be omitted, and the default value is 0.

  • [vertices] : represents the number of vertices in the graph. 0 indicates the number of vertices that the user wants the system to automatically recognize. When is non-zero, it indicates the number of vertices that the user wants to customize. The number of user-defined vertices must be greater than the maximum vertex ID. This parameter can be omitted. The default value is 0.

  • [root] : indicates the starting vertex id for bfs. The argument cannot be omitted.

  • [output_dir] : indicates the path of the folder where the output data is saved. The output content is saved to this file, and the parameter cannot be omitted.

Example

4.1.C++

Compile the standalone algorithm program at TuGraph/build

make bfs_standalone

Run the text source file in TuGraph/build/output

    ./output/algo/bfs_standalone --type text --input_dir ../test/integration/data/algo/fb_unweighted --root 0

Result:

prepare_cost = 0.10(s)
core_cost = 0.02(s)
found_vertices = 3829
output_cost = 0.00(s)
total_cost = 0.11(s)
DONE.

Result Parameter description is the same as above.

If you do not know the required parameters of a new algorithm, run the./output/algo/bfs_standalone -h to view the parameters.

4.2.Python

The bfs extension compilation process of the Python language is the same as that of the embed mode. It is called through the Standalone interface at runtime. The example is as follows:

# TuGraph/plugins/standalone_main.py
import bfs as python_plugin

if __name__ == "__main__":
    python_plugin.Standalone(
        input_dir="../test/integration/data/algo/fb_unweighted",
        root=0)

# shell
python3 standalone_main.py

At this point, the process of bfs operation on the figure above by TuGraph is complete.