Migration guide for the course 'Exploring graph algorithms with Neo4j'
May 03, 2020
The course
In 2019, I was asked to record a video course about the Graph Algorithms library (GA) released by Neo4j. These videos are available on the publisher website.
Even if the main concepts about the algorithms remain unchanged (centrality algorithms are still centrality algorithms), the Graph Algorithms library is being deprecated by the Neo4j team in favor of a more optimized version called Graph Data Science plugin (GDS) also available on GitHub.
A very detailed migration guide has been created by the Neo4j folks and is available here.
In this post, I will focus on the changes to be made to follow the above mentioned course using the GDS. This update is mandatory is you are using Neo4j 4, since only the GDS (starting from its version 1.2) is compatible with this version of Neo4j.
While all GA procedures were within the
algo.*
namespace, in the GDS all procedures start withgds.*
Graph projection
Named VS anonymous projections
Even if creating named projected graphs was already possible in the Graph Algorithm plugin, this functionality was not really highlighted and is not used at all in the videos.
In the GDS on the contrary, a lot of examples are given using a named projected graph created using gds.graph.create
procedure. But it is still possible to use anonymous projected graphs using either native or cypher projection:
- Native projection:
nodeProjection
andrelationshipProjection
allow to configure the list of nodes, relaitonships and their properties that need to be copied into the in-memory projected graph. - Cypher projection: the nodes and relationships to use are selected through a Cypher statement. For this, include
nodeQuery
and/orrelationshipQuery
to the configuration map.
Relationship projection
Important difference: while the projected graph in the GA was by default undirected, the projected graph in the GDS by default uses the relationships in their native direction. In order to use undirected relationships, you need to use:
A typical relationship projection will have the following structure:
relationshipProjection: {
SUBWAY: { // type of the relationships in the projected graph
type: "SUBWAY", // type of the relationships in the Neo4j graph to be copied
orientation: "UNDIRECTED", // relationship orientation: NATIVE (default), REVERSE or UNDIRECTED
aggregation: "SINGLE", // what to do with parallel relationships? SINGLE means take only one of them
properties: ["weight"] // list of relationship properties to include in the projected graph
}
},
Algorithms
With the GDS, all algorithms share the same signature:
gds.<algo_path>(graphName, configurationMap)
if using a named projected graph, which is not our casegds.<algo_path>(configurationMap)
if using an anonymouse projected graph.
Some of them are fully optimized and production-ready. They are accessible directly through gds.<algoName>
procedures. There are two other cases:
- the algorithms back-ported from the GA but not yet optimized are in the
gds.alpha
namespace. They are not stable and might suffer from performance issues - the algorithms being improved but still in testing mode are in the
gds.beta
namespace
Stream or write
While in the GA, the write mode was triggered by adding a write: true
configuration parameter, this behavior is now handled by the procedure name:
gds.pageRank.stream
to stream the results back to the user.gds.pageRank.write
to write the results back to the Neo4j graph. The procedure signatures are similar, except that the write mode accepts another paramter:writeProperty
to configure the name of the property the results will be written into.
Shortest path
All the algorithms in the shortest path section are still in the
alpha
tier of the GDS.
While in the GA we were writting:
MATCH (A:City {name: "A"})
MATCH (E:City {name: "E"})
CALL algo.shortestPath.stream(A, E, null, {relationshipQuery: "ROAD"})
YIELD nodeId, cost
RETURN algo.getNodeById(nodeId).name, cost
In the GDS we now have to use the following syntax:
MATCH (A:City {name: "A"})
MATCH (E:City {name: "E"})
CALL gds.alpha.shortestPath.stream({
startNode: A,
endNode: E,
nodeProjection: "*",
relationshipProjection: {
ROAD: {
type: "ROAD",
orientation: "UNDIRECTED"
}
}
})
YIELD nodeId, cost
RETURN gds.util.asNode(nodeId).name, cost
nodeProjection: "*"
means ‘use all nodes in the graph’relationshipProjection:
takes only relationships with typeROAD
, but consider them undirected, contrarily to the native Neo4j graph where relationships are directed.
Also note that the util functions algo.getNodeById
has been moved to gds.util.asNode
.
Centrality
The PageRank algorithm is a production-ready algorithm from GDS 1.0
- GA:
CALL algo.pageRank.stream(null, null, {direction: "BOTH", dampingFactor: 0.5, iterations: 50}) YIELD nodeId, score RETURN algo.getNodeById(nodeId).pId, score ORDER BY score DESC
- GDS:
CALL gds.pageRank.stream({ nodeProjection: "*", relationshipProjection: { LINKED_TO: { type: "LINKED_TO", orientation: "UNDIRECTED", aggregation: "SINGLE" } }, dampingFactor: 0.5, maxIterations: 50 }) YIELD nodeId, score RETURN gds.util.asNode(nodeId).pId, score ORDER BY score DESC
Closeness and betweenness are both in the alpha tier:
gds.alpha.closeness.stream
orgds.alpha.closeness.write
gds.alpha.betweeness.stream
orgds.alpha.betweeness.write
Their usage is similar to the ones of PageRank.
Community detection
Weakly Connected Components (Union Find), Louvain and Label Propagation algorithms are all part of the production quality algorithms!
Example with Union find or Weakly connected components, native projection and stream mode:
- GA:
CALL algo.unionFind.stream("Person", null) YIELD nodeId, setId RETURN algo.getNodeById(nodeId).pId, setId ORDER BY setId
- GDS:
CALL gds.wcc.stream({ nodeProjection: "Person", relationshipProjection: "*" }) YIELD nodeId, componentId as setId RETURN gds.util.asNode(nodeId).pId, setId ORDER BY setId
Example with the Louvain algorithm, cypher projection and write mode:
- GA:
CALL algo.louvain("MATCH (p:Product) RETURN id(p) as id", "MATCH (p:Product)<-[:ORDERS]-(:Order)-[:ORDERS]->(q:Product) RETURN id(p) as source, id(q) as target", {direction: "BOTH", graph: "cypher", writeProperty: "louvain"}) YIELD nodes, communityCount RETURN nodes, communityCount
- GDS:
CALL gds.louvain.write({ nodeQuary: "MATCH (p:Product) RETURN id(p) as id", relationshipQuery: "MATCH (p:Product)<-[:ORDERS]-(:Order)-[:ORDERS]->(q:Product) RETURN id(p) as source, id(q) as target", writeProperty: "louvain" }) YIELD nodes, communityCount RETURN nodes, communityCount
I hope this small guide will help you follow the course using the GDS as well! The full update of the code file associated with the course (link) is ongoing and will be published here soon. In the meantime, do not hesitate to contact me in case of any problem with the course!