Next level Cypher aggregations using WITH, COLLECT and UNWIND

Let’s take a look at how we can take your Cypher aggregations to the next level

Here we have a query that gets the number of completed enrolments per quarter on graph Academy since the start of 2021.

MATCH (u:User)-[:HAS_ENROLMENT]->(e:CompletedEnrolment)
WHERE e.completedAt >= datetime('2021-01-01')
RETURN
  e.completedAt.year +'-'+ e.completedAt.quarter AS quarter,
  count(*) AS count
ORDER BY quarter DESC

The query produces the following table:

quartercount
2023-021500
2023-011400
2022-041300
2022-031200
2022-021100
2022-011000

Read more about the modelling decisions behind the GraphAcademy website

The Next Level

Let’s take this to the next level by adding a percentage change between the quarters. Now, for this we’ll need to emulate a window function using Cypher.

First, let’s change the RETURN clause to a WITH to allow us to manipulate the data further.

- RETURN
+ WITH
    e.completedAt.year +'-'+ e.completedAt.quarter AS quarter,
    count(*) AS count

Next, use the collect() function to turn the individual arrays into an aggregated list.

WITH collect({quarter: quarter, count: count}) AS all

The range() function generates a list of numbers that we can use to access the individual values within that list.

range(0, size(all)-1) // [ {quarter: '2023-02', count: '1500'} ... ]

UNWIND the index array to get this and the previous quarter’s count

UNWIND range(0, size(all)-1) AS idx
RETURN idx,
  all[idx].quarter AS quarter,
  all[idx].count AS count,           // This quarter count
  all[idx+1].count AS previousCount // Previous quarter count

Divide the current number by the previous value and multiply by 100 to get the percentage, and minus 100 to get the delta.

(100.0 * all[idx].count / all[idx+1].count) - 100

That number’s a little bit, long so round it to two decimal places with the round() function.

round((100.0 * all[idx].count / all[idx+1].count) - 100, 2)

The first row represents the current quarter, which isn’t finished yet. We can hide this by using the CASE statement and only RETURN the value if the index is greater than one.

CASE
  WHEN idx > 0
    THEN round((100.0 * all[idx].count / all[idx+1].count) - 100, 2)
  ELSE null
END AS change

Here is the full Cypher statement.

MATCH (u:User)-[:HAS_ENROLMENT]->(e:CompletedEnrolment)
WHERE e.completedAt >= datetime('2021-01-01')
WITH
  e.completedAt.year +'-'+ e.completedAt.quarter AS quarter,
  count(*) AS count

WITH collect({quarter: quarter, count: count}) AS all

UNWIND range(0, size(all)-1) AS idx

RETURN idx,
  all[idx].quarter AS quarter,
  all[idx].count AS count,
  CASE
    WHEN idx > 0
        THEN round((100.0 * all[idx].count / all[idx+1].count) - 100, 2)
    ELSE null
  END AS change

and the table of results…

quartercountchange
2023-021500null
2023-0114007.14
2022-0413007.69
2022-0312008.33
2022-0211009.09
2022-01100010

This video was originally published on the Neo4j YouTube Channel.


For more tips and tricks on Cypher and Neo4j in general,
head to Neo4j GraphAcademy and enrol now.