TuGraph schema Instructions

1.The data model

1.1.Graph model

TuGraph is a strong schema, directed property graph database with multi-graph capability.

  • Graph Project: Each database service can host multiple graph projects (multi-graphs), and each graph project can have its own access control configuration. The database administrator can create or delete specified graph projects.

  • Vertex: Refers to entity, generally used to express real-world entities, such as a movie or an actor.

    • Primary Key: User-defined vertex data primary key, unique in the corresponding graph project and vertex type.

    • VID: Refers to the auto-generated unique ID of the vertex, which cannot be modified by the user.

    • Upper Limit: Each graph project can store up to 2^(40) vertex data.

  • Edge: Used to express the relationship between vertexs, such as an actor appears in a movie.

    • Directed Edge: The edge is a directed edge. If you want to simulate an undirected edge, the user can create two edges with opposite directions.

    • Duplicate Edge: TuGraph currently supports duplicate edges. If you want to ensure the uniqueness of the edge, you need to implement it through business policies.

    • Upper Limit: Up to 2^(32) edge data can be stored between two vertex data.

  • Property Graph: vertexs and edges can have properties associated with them, and each property can have a different type.

  • Strong-typed: Each vertex and edge has only one label, and after creating a label, there is a cost to modify the number and type of attributes.

    • Specify the starting/ending vertex type of the edge: You can limit the starting and ending vertex types of the edge, and support different vertex types of the starting and ending vertexs of the same type of edge, such as individuals transferring money to companies, companies transferring money to companies. After specifying the starting/ending vertex type of the edge, you can add multiple sets of starting/ending vertex types, but you cannot delete the restricted starting/ending vertex types.

    • Unrestricted Mode: Supports creating edge data of this type between any two vertex types without specifying the starting and ending vertex types of the edge. Note: After specifying the starting/ending vertex type of the edge, the unrestricted mode cannot be used again.

1.2.The data type

TuGraph Supports a variety of data types that can be used as attributes, the specific supported data types are as follows:

Table 1. TuGraph supported data types

Type

Min

Max

Description

BOOL

false

true

Boolean

INT8

-128

127

8-bit int

INT16

-32768

32767

16-bit int

INT32

- 2^31

2^31 - 1

32-bit int

INT64

- 2^63

2^63 - 1

64-bit int

DATE

0000-00-00

9999-12-31

“YYYY-MM-DD” Date of format

DATETIME

0000-00-00 00:00:00.000000

9999-12-31 23:59:59.999999

“YYYY-MM-DD hh:mm:ss[.ffffff]”Format of the date and time

FLOAT

32-bit float

DOUBLE

64-bit float

STRING

A string of variable length

BLOB

Binary data

POINT

EWKB format data of point

LINESTRING

EWKB format data of linestring

POLYGON

EWKB format data of polygon

FLOAT_VECTOR

The dynamic vector containing 32-bit float numbers

BLOB data is BASE64 encoded in input and output

1.3.Index

TuGraph supports indexing vertex fields.

Indexes can be unique or non-unique. If a unique index is created for a vertex label, TuGraph will perform a data integrity check to ensure the uniqueness of the index before modifying the vertex of the label.

Each index built on a single field of a label, and multiple fields can be indexed using the same label.

BLOB fields cannot be indexed.

TuGraph supports creating indexes on node or edge attributes to improve query efficiency. Its characteristics are as follows:

  • Indexes include single indexes and composite indexes. A single index is created based on a single property of a node or an edge, while a composite index is created based on multiple properties of a node or an edge (no more than 16). Indexes can be created on multiple sets of properties for the same node or edge.

  • If a unique index is created for a node label, when modifying a node for that label, a data integrity check is first performed to ensure the uniqueness of the index.

  • Attributes of type BLOB cannot be indexed.

There are multiple index types for TuGraph’s vertices and edges. Different index types have different functions and restrictions, as follows:

1.3.1 Single index

1.3.1.1 Node index

1.3.1.1.1 unique index

The unique index of a node refers to a globally unique index. That is, if a unique index is set for an attribute, in the same graph, the attribute of nodes with the same label will not have the same value. The maximum length of a unique index key is 480 bytes. Unique indexes cannot be created for attributes exceeding 480 bytes. Primary is a special unique index, so the maximum key length is also 480 bytes.

1.3.1.1.2 non_unique index

The non_unique index of a node refers to a non-global unique index, that is, if an attribute is set with a non_unique index, In the same graph, nodes with the same label can have the same value for this attribute. Since a key in a non_unique index may be mapped to multiple values, in order to speed up search and writing, The maximum value of a group of vids with the same index key is added after the user-specified key. Each vid is 5 bytes long, so the maximum length of a non_unique index key is 475 bytes. However, unlike unique indexes, non_unique indexes can also be established if they exceed 475 bytes. However, when indexing such an attribute, only the first 475 bytes will be intercepted as the index key (the value stored in the attribute itself will not be affected). Moreover, when traversing through an iterator, the first 475 bytes of the query value are automatically intercepted before traversing. Therefore, the results may be inconsistent with expectations and require users to filter again.

1.3.1.2 Edge index

1.3.1.2.1 unique index

Similar to node, the unique index of an edge refers to a globally unique index. That is, if an attribute is set to a unique index, in the same graph, the attribute of the edge with the same label will not have the same value. The maximum length of a unique index key is 480 bytes. Unique indexes cannot be created for attributes exceeding 480 bytes.

1.3.1.2.2 pair_unique index

pair_unique index refers to the unique index between two nodes, that is, if an attribute sets a unique index, between the same set of the starting node and the ending node in the same graph, Edges with the same label will not have the same value for this attribute. In order to ensure that the pair_unique index key does not repeat between the starting node and the end node of the same group, The index adds the starting and ending vids after the user-specified key. Each vid is 5 bytes in length. Therefore, the maximum key length is 470 bytes, and a pair_unique index cannot be created for attributes exceeding 470 bytes.

1.3.1.2.3 non_unique index

Similar to node, the non_unique index of an edge refers to a non-global unique index, that is, if an attribute sets a non_unique index, In the same graph, edges with the same label can have the same value for this attribute. Since a key in a non_unique index may be mapped to multiple values, in order to speed up search and writing, The maximum value of a group of eids with the same index key is added after the user-specified key. Each eid is 24 bytes in length, so the maximum length of a non_unique index key is 456 bytes. However, unlike unique indexes, non_unique indexes can also be established if they exceed 456 bytes. However, when indexing such an attribute, only the first 456 bytes will be intercepted as the index key (the value stored in the attribute itself will not be affected). Moreover, when traversing through an iterator, the first 475 bytes of the query value are automatically intercepted before traversing. Therefore, the results may be inconsistent with expectations and require users to filter again.

1.3.2 Composite index

Currently, composite indexes are only supported for multiple properties of a vertex, and not supported for properties of an edge. There are two types of composite indexes: unique indexes and non-unique indexes. The requirements for creating a composite index are as follows:

  1. The number of properties for creating a composite index should be between 2 and 16 (inclusive).

  2. For a unique composite index, the sum of the lengths of the properties cannot exceed 480 - 2 * (number of properties - 1) bytes, while for a non-unique composite index, it cannot exceed 475 - 2 * (number of properties - 1) bytes.

1.3.2.1 Unique index

Similar to a vertex’s single unique index, a vertex’s composite unique index refers to a globally unique index, meaning that for a set of properties with a unique index, there will not be another vertex with the same label in the same graph that has the same value for that group of properties. Due to the underlying storage design, the composite index key needs to retain the length of the properties; therefore, the maximum length for a composite unique index key is 480 - 2 * (number of properties - 1) bytes. Properties exceeding this length cannot be indexed uniquely.

1.3.2.2 non_unique index

Similar to a vertex’s single non-unique index, a vertex’s non-unique index refers to a non-globally unique index, meaning that for a set of properties with a non-unique index, different vertices with the same label in the same graph may have the same value for that group of properties. Since a non-unique index key may map to multiple values to accelerate lookups and writing, the maximum value from a group of vertex IDs (vid) has been appended to the user-specified key where the index keys are identical. Each vid is 5 bytes in length; thus, the maximum length for a non-unique index key is 475 - 2 * (number of properties - 1) bytes. Properties exceeding this length cannot be indexed non-uniquely.

2. Graph Project, Vertex, Edge, and Attribute Naming Conventions and suggestions

2.1 Naming Rules

Graph projects, vertices, edges, and attributes are identifiers. This section describes the allowed syntax for identifiers in TuGraph. The table below describes the maximum length and allowed characters for each type of identifier.

Identifier

Length

Allowed Characters

User, role, graph project

1-64 characters

Chinese, letters, numbers, underscore, and the first character cannot be a number

Vertex type, edge type, attribute

1-256 characters

Chinese, letters, numbers, underscore, and the first character cannot be a number

2.2 Usage Restrictions

Description

Maximum number

Number of users, number of roles

65536

Number of graphs

4096

Number of vertex and edge types per graph

4096

Number of attributes per type

1024

Note: 1.Special characters and keywords: When using special characters or keywords, they need to be enclosed in backquotes (``) for reference;

Example: match (`match`:match) return `match`.id limit 1

2.Case sensitivity: TuGraph is case-sensitive;

3.Graph project, vertex/edge, and attribute names can be reused, but attribute names under the same vertex or edge cannot be duplicated;

4.Reserved keywords for attribute names: SRC_ID / DST_ID / SKIP.

2.3 Naming Suggestions

Identifier

Description

Suggestions

Graph project

Start with a letter or Chinese character

Examples: graph123, project123, etc.

Vertex/edge type

Start with a letter or Chinese character and use underscores to separate words

Examples: person, act_in, etc.

Attribute

Letters or Chinese characters

Examples: name, age, etc.