TuGraph DataX

This document mainly introduces the installation, compilation and usage examples of TuGraph DataX

1.Introduction

On the basis of Ali’s open source DataX, TuGraph implements the support of writing plug-ins and jsonline data format, and other data sources can write data into TuGraph through DataX. TuGraph DataX introduces https://github.com/TuGraph-family/DataX, Supported features include:

  • Import TuGraph from various heterogeneous data sources such as MySQL, SQL Server,Oracle, PostgreSQL, HDFS, Hive, HBase, OTS, ODPS, Kafka and so on.

  • Import TuGraph to the corresponding target source (to be developed).

Reference for DataX Original Project Introduction https://github.com/alibaba/DataX

2.Compile and Install

git clone https://github.com/TuGraph-family/DataX.git
yum install maven
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

The compiled DataX file is in the target directory

3.Import TuGraph

3.1.Text data imported into TuGraph with DataX

Using the data from the lgraph_import section of the TuGraph manual as an example, we have three csv data files, as follows: actors.csv


nm015950,Stephen Chow
nm0628806,Man-Tat Ng
nm0156444,Cecilia Cheung
nm2514879,Yuqi Zhang

movies.csv


tt0188766,King of Comedy,1999,7.3
tt0286112,Shaolin Soccer,2001,7.3
tt4701660,The Mermaid,2016,6.3

roles.csv


nm015950,Tianchou Yin,tt0188766
nm015950,Steel Leg,tt0286112
nm0628806,,tt0188766
nm0628806,coach,tt0286112
nm0156444,PiaoPiao Liu,tt0188766
nm2514879,Ruolan Li,tt4701660

Then create three DataX job profiles: job_actors.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["actors.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "actor",
                "type": "VERTEX",
                "properties": [
                  { "name": "aid", "type": "STRING" },
                  { "name": "name", "type": "STRING" }
                ],
                "primary": "aid"
              }
            ],
            "files": [
              {
                "label": "actor",
                "format": "JSON",
                "columns": ["aid", "name"]
              }
            ]
          }
        }
      }
    ]
  }
}

job_movies.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["movies.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              },
              {
                "index": 3,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

job_roles.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["roles.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "play_in",
                "type": "EDGE",
                "properties": [{ "name": "role", "type": "STRING" }]
              }
            ],
            "files": [
              {
                "label": "play_in",
                "format": "JSON",
                "SRC_ID": "actor",
                "DST_ID": "movie",
                "columns": ["SRC_ID", "role", "DST_ID"]
              }
            ]
          }
        }
      }
    ]
  }
}

/lgraph_server -c lgraph_standalone.json -d 'run' ‘Start TuGraph and run the following commands in sequence:

python3 datax/bin/datax.py  job_actors.json
python3 datax/bin/datax.py  job_movies.json
python3 datax/bin/datax.py  job_roles.json

3.2.MySQL’s data imported into TuGraph with DataX

We create the following table of movies under ‘test’ database

CREATE TABLE `movies` (
  `mid`  varchar(200) NOT NULL,
  `name` varchar(100) NOT NULL,
  `year` int(11) NOT NULL,
  `rate` float(5,2) unsigned NOT NULL,
  PRIMARY KEY (`mid`)
);

Insert some data into the table

insert into
test.movies (mid, name, year, rate)
values
('tt0188766', 'King of Comedy', 1999, 7.3),
('tt0286112', 'Shaolin Soccer', 2001, 7.3),
('tt4701660', 'The Mermaid',   2016,  6.3);

Create a DataX job configuration file

job_mysql_to_tugraph.json

Configuring Field

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "column": ["mid", "name", "year", "rate"],
            "splitPk": "mid",
            "connection": [
              {
                "table": ["movies"],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

Write simple sql

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "querySql": [
                  "select mid, name, year, rate from test.movies where year > 2000;"
                ],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

./lgraph_server -c lgraph_standalone.json -d 'run' Start TuGraph and run the following command:

python3 datax/bin/datax.py  job_mysql_to_tugraph.json

4.Export TuGraph

4.1. Configuration example

TuGraph supports exporting data using DataX. Use the following configuration to export data to text data

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "tugraphreader",
          "parameter": {
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "Movie_8C5C",
            "queryCypher": "match (n:person) return n.id,n.name,n.born;",
            "url": "bolt://100.83.30.35:27687"
          }
        },
        "writer": {
          "name": "txtfilewriter",
          "parameter": {
            "path": "./result",
            "fileName": "luohw",
            "writeMode": "truncate"
          }
        }
      }
    ]
  }
}

Using this configuration file, you can export all the id, name and born attributes of the person node in the TuGraph Movie_8C5C subgraph, export them to the result directory under the current directory, and the file name is luohw+random suffix.

4.2. Parameter Description

When using DataX to export TuGraph data, you need to set the reader to tugraphreader and configure the following 5 parameters:

  • url

    • Description: TuGraph’s bolt server address

    • Required: Yes

    • Default value: None

  • username

    • Description: TuGraph’s username

    • Required: Yes

    • Default value: None

  • password

    • Description: TuGraph’s password

    • Required: Yes

    • Default value: None

  • graphName

    • Description: The selected TuGraph subgraph to be synchronized

    • Required: Yes

    • Default value: None

  • queryCypher

    • Description: Read data in TuGraph through cypher statements

    • Required: No

    • Default value: None