Graph databases don’t have tables and columns like relational ones. Instead, they have nodes, edges, facets, and predicates.
Node is an object or entity. Also known as a vertex.
Edge is a relationship between nodes
Facet is an attribute of an edge
Predicate is an attribute of a node
Loosely translating this from a relational point of view:
Use a node with its type set instead of a table
Use an edge instead of a foreign key reference
Use a facet instead of an additional column on a join table
Use a predicate instead of a column
Relationships between data in a graph database is a first class citizen. Instead of doing complicated math to join data together, nodes are simply connected by pointers which results in some major speed improvements.
Using Big O Notation, assuming two tables are labeled M and N, and the results of walking a set of graph nodes are labeled k, the Big O time complexity is as follows1:
RDMS Nested Join: O(MN)
RDMS Hash Join: O(M+N)
When working with relational databases, it’s common to have to save data to many tables in a transaction. The basic flow is:
insert record into table A
get the resulting primary key id for table A
insert record into table B with table A’s primary key as a foreign key reference
repeat for any other joined tables
The responsibility of referential integrity is solely on the application using the database. This gets exponentially more complicated when batch inserting many records and their accompanying foreign table references.
For graph databases, such as Dgraph, the query language has a built in mechanism to put the work where it belongs. The query can contain temporary identifiers called Blank Nodes2, which are written like
_:identifier. Dgraph creates a UID for each of these, persists them to disk, then returns them in the resultset.
Relational schemas must be defined and created before any data can be saved. If a new column is needed, it must be specifically added. If it has things like non-null constraints, the column must be backfilled with a default state before the constraint can be added. This can take hours or days on a large table.
By contrast, the schema and storage layers are separate in Dgraph.3 If a new predicate is used, the database will simply start storing it. Node types and predicates can be defined either before or after data is present.
The schema can be modified by adding, dropping, or changing predicates to reflect business requirement changes, while reducing the overall downtime needed to run migrations.
An ORM or Object Relational Mapping tool is essentially a way to transpile code into SQL and convert the results of a query into data structures.
SQL is declarative, which makes it wonderful for analysis and ad hoc queries, but requires a fair amount of work to compose from an application.
With Dgraph, the query is using a derivative of GraphQL called GraphQL+-4 and returns the results in JSON. GraphQL is quickly gaining in popularity over RESTful APIs because each API call can be tailored to a specific request. Dgraph has taken this one step further by adding functions to GraphQL which is used to filter results and traverse the graph.
JOINS are done by simply asking for embedded data in the GraphQL request, and results are able to be parsed by existing JSON decoders.
It should be noted that an ORM is still likely needed with Neo4j’s Cypher language and GQL, a new standard derived from it. GQL aims to be a graph query language that complements SQL.5 However, since it’s vastly different than most frontend APIs, it would require work to transpile.
NoSQL databases such as document and graph stores were invented out of the need to be able to scale to today’s data needs.
Dgraph was designed from the outset to be horizontally scalable, fast, concurrent, and transactional. It handles shard rebalancing and synchronous replication out of the box so losing a hard drive or server won’t bring down the application.6
Dgraph is open source and doesn’t require a commercial license for these features. Neo4j, on the other hand, has free community and startup editions for companies earning less than $3MM USD per year.7 The commercial version can get expensive. The last time someone gave me a quote, it was about $20k per server.
Relational databases were designed for spinning magnetic hard drives. These are able to get around 100 IOPS per drive, where modern NVMe SSD drives can achieve 500k-10MM IOPS.8 Hard drives are the area in computing that have seen the largest speed gains in recent times.
Dgraph’s storage engine, Badger, is optimized for modern SSD drives.9 This makes it so the overall memory requirements of Dgraph are quite a bit less than alternatives such as RocksDB or Neo4j.