TuGraph schema Instructions

1.The data model

1.1.Graph model

TuGraph is a strong schema, directed property graph database with multi-graph capability.

Graph Project: Each database service can host multiple graph projects (multi-graphs), and each graph project can have its own access control configuration. The database administrator can create or delete specified graph projects.
Vertex: Refers to entity, generally used to express real-world entities, such as a movie or an actor.
- Primary Key: User-defined vertex data primary key, unique in the corresponding graph project and vertex type.
- VID: Refers to the auto-generated unique ID of the vertex, which cannot be modified by the user.
- Upper Limit: Each graph project can store up to 2^(40) vertex data.
Edge: Used to express the relationship between vertexs, such as an actor appears in a movie.
- Directed Edge: The edge is a directed edge. If you want to simulate an undirected edge, the user can create two edges with opposite directions.
- Duplicate Edge: TuGraph currently supports duplicate edges. If you want to ensure the uniqueness of the edge, you need to implement it through business policies.
- Upper Limit: Up to 2^(32) edge data can be stored between two vertex data.
Property Graph: vertexs and edges can have properties associated with them, and each property can have a different type.
Strong-typed: Each vertex and edge has only one label, and after creating a label, there is a cost to modify the number and type of attributes.
- Specify the starting/ending vertex type of the edge: You can limit the starting and ending vertex types of the edge, and support different vertex types of the starting and ending vertexs of the same type of edge, such as individuals transferring money to companies, companies transferring money to companies. After specifying the starting/ending vertex type of the edge, you can add multiple sets of starting/ending vertex types, but you cannot delete the restricted starting/ending vertex types.
- Unrestricted Mode: Supports creating edge data of this type between any two vertex types without specifying the starting and ending vertex types of the edge. Note: After specifying the starting/ending vertex type of the edge, the unrestricted mode cannot be used again.

1.2.The data type

TuGraph Supports a variety of data types that can be used as attributes, the specific supported data types are as follows:

Table 1. TuGraph supported data types

Type	Min	Max	Description
BOOL	false	true	Boolean
INT8	-128	127	8-bit int
INT16	-32768	32767	16-bit int
INT32	- 2^31	2^31 - 1	32-bit int
INT64	- 2^63	2^63 - 1	64-bit int
DATE	0000-00-00	9999-12-31	“YYYY-MM-DD” Date of format
DATETIME	0000-00-00 00:00:00.000000	9999-12-31 23:59:59.999999	“YYYY-MM-DD hh:mm:ss[.ffffff]”Format of the date and time
FLOAT			32-bit float
DOUBLE			64-bit float
STRING			A string of variable length
BLOB			Binary data
POINT			EWKB format data of point
LINESTRING			EWKB format data of linestring
POLYGON			EWKB format data of polygon
FLOAT_VECTOR			The dynamic vector containing 32-bit float numbers

BLOB data is BASE64 encoded in input and output

1.3.Index

TuGraph supports indexing vertex fields.

Indexes can be unique or non-unique. If a unique index is created for a vertex label, TuGraph will perform a data integrity check to ensure the uniqueness of the index before modifying the vertex of the label.

Each index built on a single field of a label, and multiple fields can be indexed using the same label.

BLOB fields cannot be indexed.

TuGraph supports creating indexes on node or edge attributes to improve query efficiency. Its characteristics are as follows:

Indexes include single indexes and composite indexes. A single index is created based on a single property of a node or an edge, while a composite index is created based on multiple properties of a node or an edge (no more than 16). Indexes can be created on multiple sets of properties for the same node or edge.
If a unique index is created for a node label, when modifying a node for that label, a data integrity check is first performed to ensure the uniqueness of the index.
Attributes of type BLOB cannot be indexed.

There are multiple index types for TuGraph’s vertices and edges. Different index types have different functions and restrictions, as follows:

1.3.1 Single index

1.3.1.1 Node index

1.3.1.1.1 unique index

The unique index of a node refers to a globally unique index. That is, if a unique index is set for an attribute, in the same graph, the attribute of nodes with the same label will not have the same value. The maximum length of a unique index key is 480 bytes. Unique indexes cannot be created for attributes exceeding 480 bytes. Primary is a special unique index, so the maximum key length is also 480 bytes.

1.3.1.1.2 non_unique index

The non_unique index of a node refers to a non-global unique index, that is, if an attribute is set with a non_unique index, In the same graph, nodes with the same label can have the same value for this attribute. Since a key in a non_unique index may be mapped to multiple values, in order to speed up search and writing, The maximum value of a group of vids with the same index key is added after the user-specified key. Each vid is 5 bytes long, so the maximum length of a non_unique index key is 475 bytes. However, unlike unique indexes, non_unique indexes can also be established if they exceed 475 bytes. However, when indexing such an attribute, only the first 475 bytes will be intercepted as the index key (the value stored in the attribute itself will not be affected). Moreover, when traversing through an iterator, the first 475 bytes of the query value are automatically intercepted before traversing. Therefore, the results may be inconsistent with expectations and require users to filter again.

1.3.1.2 Edge index

1.3.1.2.1 unique index

Similar to node, the unique index of an edge refers to a globally unique index. That is, if an attribute is set to a unique index, in the same graph, the attribute of the edge with the same label will not have the same value. The maximum length of a unique index key is 480 bytes. Unique indexes cannot be created for attributes exceeding 480 bytes.

1.3.1.2.2 pair_unique index

pair_unique index refers to the unique index between two nodes, that is, if an attribute sets a unique index, between the same set of the starting node and the ending node in the same graph, Edges with the same label will not have the same value for this attribute. In order to ensure that the pair_unique index key does not repeat between the starting node and the end node of the same group, The index adds the starting and ending vids after the user-specified key. Each vid is 5 bytes in length. Therefore, the maximum key length is 470 bytes, and a pair_unique index cannot be created for attributes exceeding 470 bytes.

1.3.1.2.3 non_unique index

Similar to node, the non_unique index of an edge refers to a non-global unique index, that is, if an attribute sets a non_unique index, In the same graph, edges with the same label can have the same value for this attribute. Since a key in a non_unique index may be mapped to multiple values, in order to speed up search and writing, The maximum value of a group of eids with the same index key is added after the user-specified key. Each eid is 24 bytes in length, so the maximum length of a non_unique index key is 456 bytes. However, unlike unique indexes, non_unique indexes can also be established if they exceed 456 bytes. However, when indexing such an attribute, only the first 456 bytes will be intercepted as the index key (the value stored in the attribute itself will not be affected). Moreover, when traversing through an iterator, the first 475 bytes of the query value are automatically intercepted before traversing. Therefore, the results may be inconsistent with expectations and require users to filter again.

1.3.2 Composite index

Currently, composite indexes are only supported for multiple properties of a vertex, and not supported for properties of an edge. There are two types of composite indexes: unique indexes and non-unique indexes. The requirements for creating a composite index are as follows:

The number of properties for creating a composite index should be between 2 and 16 (inclusive).
For a unique composite index, the sum of the lengths of the properties cannot exceed 480 - 2 * (number of properties - 1) bytes, while for a non-unique composite index, it cannot exceed 475 - 2 * (number of properties - 1) bytes.

1.3.2.1 Unique index

Similar to a vertex’s single unique index, a vertex’s composite unique index refers to a globally unique index, meaning that for a set of properties with a unique index, there will not be another vertex with the same label in the same graph that has the same value for that group of properties. Due to the underlying storage design, the composite index key needs to retain the length of the properties; therefore, the maximum length for a composite unique index key is 480 - 2 * (number of properties - 1) bytes. Properties exceeding this length cannot be indexed uniquely.

1.3.2.2 non_unique index

Similar to a vertex’s single non-unique index, a vertex’s non-unique index refers to a non-globally unique index, meaning that for a set of properties with a non-unique index, different vertices with the same label in the same graph may have the same value for that group of properties. Since a non-unique index key may map to multiple values to accelerate lookups and writing, the maximum value from a group of vertex IDs (vid) has been appended to the user-specified key where the index keys are identical. Each vid is 5 bytes in length; thus, the maximum length for a non-unique index key is 475 - 2 * (number of properties - 1) bytes. Properties exceeding this length cannot be indexed non-uniquely.

2. Graph Project, Vertex, Edge, and Attribute Naming Conventions and suggestions

2.1 Naming Rules

Graph projects, vertices, edges, and attributes are identifiers. This section describes the allowed syntax for identifiers in TuGraph. The table below describes the maximum length and allowed characters for each type of identifier.

Identifier	Length	Allowed Characters
User, role, graph project	1-64 characters	Chinese, letters, numbers, underscore, and the first character cannot be a number
Vertex type, edge type, attribute	1-256 characters	Chinese, letters, numbers, underscore, and the first character cannot be a number

2.2 Usage Restrictions

Description	Maximum number
Number of users, number of roles	65536
Number of graphs	4096
Number of vertex and edge types per graph	4096
Number of attributes per type	1024

Note: 1.Special characters and keywords: When using special characters or keywords, they need to be enclosed in backquotes (``) for reference;

Example: match (`match`:match) return `match`.id limit 1

2.Case sensitivity: TuGraph is case-sensitive;

3.Graph project, vertex/edge, and attribute names can be reused, but attribute names under the same vertex or edge cannot be duplicated;

4.Reserved keywords for attribute names: SRC_ID / DST_ID / SKIP.

2.3 Naming Suggestions

Identifier	Description	Suggestions
Graph project	Start with a letter or Chinese character	Examples: graph123, project123, etc.
Vertex/edge type	Start with a letter or Chinese character and use underscores to separate words	Examples: person, act_in, etc.
Attribute	Letters or Chinese characters	Examples: name, age, etc.