Graph DB solution Neo4j is popular with Data Scientists and Data Architects trying to make connections of the nodes and relationships. Neo4j is able to do memory management other in memory operations to allow for efficiency and performance. All of that data needs to eventual persist to a data management platform. This is why I was first asked about Neo4j.
There are Community and Enterprise Editions like most software solutions these days and the robust enterprise type functions land in the Enterprise Edition. Things like RBAC and much higher scale.
Neo4j provides a repo of Helm charts and some helpful documentation on running the Graph Database in Kubernetes. Unfortunately, the instructions stop short of many of the K8s flavors supported by the cloud and on premises solutions. Pre-provisioning cloud disks might be great for a point solution of a single app. Most of my interactions with the people running k8s in production are building Platform as a Service (PaaS) or Database as a Service (DBaaS). Nearly all at least want the option of building these solutions to be hybrid or multi cloud capable. Additionally, DR and Backup are requirements to run in any environment that values their data and staying in business.
Portworx in 20 seconds
Portworx is a data platform that allows stateful applications such as Neo4j to run on an Cloud, on Premises Hardware on any K8s Distribution. It was built from the very beginning to run as a container for containers. </end commercial>
In this blog post I want to enhance and clarify the documentation for the Neo4j helm chart so that you can easily run the community or Enterprise Editions in your K8s deployment.
As with any database Neo4j will benefit greatly from running the persistence on Flash. All of my testing was done with a Pure Storage FlashArray.
Step 1
Already have K8s and Portworx installed. I used Portworx 2.9.1.1 and Vanilla K8s 1.22. Also already have Helm installed.
Note: The helm chart was giving me trouble until I updated helm to version 3.8.x
Step 2
Add the Neo4j helm repo
helm repo add neo4j https://helm.neo4j.com/neo4j
helm repo update
Step 3
Create a neo4j storage class
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: neo4j
provisioner: pxd.portworx.com
parameters:
repl: "2"
io_profile: db_remote
kubectl apply -f neo4j-storageclass.yaml
Verify the Storage Class is available
kubectl get sc
Step 4
Create your values.yaml for your install.
values-standalone.yaml
neo4j:
resources:
cpu: "0.5"
memory: "2Gi"
# Uncomment to set the initial password
#password: "my-initial-password"
# Uncomment to use enterprise edition
#edition: "enterprise"
#acceptLicenseAgreement: "yes"
volumes:
data:
mode: "dynamic"
# Only used if mode is set to "dynamic"
# Dynamic provisioning using the provided storageClass
dynamic:
storageClassName: "neo4j"
accessModes:
- ReadWriteOnce
requests:
storage: 100Gi
Cluster values.yaml
node[0-x]values-cluster.yaml
Why do I say 0-x? Well neo4j requires a helm release for each core cluster node (read more detail in the neo4j helm docs). Each file for now is the same. Note: neo4j is an in memory database. 2Gi ram is great for the lab, but for really analytics use I would hope to use much more memory.
neo4j:
name: "my-cluster"
resources:
cpu: "0.5"
memory: "2Gi"
password: "my-password"
acceptLicenseAgreement: "yes"
volumes:
data:
mode: "dynamic"
# Only used if mode is set to "dynamic"
# Dynamic provisioning using the provided storageClass
dynamic:
storageClassName: "neo4j"
accessModes:
- ReadWriteOnce
requests:
storage: 100Gi
Read Replica values
Create another helm yaml file here is rr1-values-cluster.yaml
neo4j:
name: "my-cluster"
resources:
cpu: "0.5"
memory: "2Gi"
password: "my-password"
acceptLicenseAgreement: "yes"
volumes:
data:
mode: "dynamic"
# Only used if mode is set to "dynamic"
# Dynamic provisioning using the provided storageClass
dynamic:
storageClassName: "neo4j"
accessModes:
- ReadWriteOnce
requests:
storage: 100Gi
Download all the values.yaml from GitHub
Step 5
Install Neo4j (stand alone)
Standalone
The docs say to run this:
helm install my-neo4j-release neo4j/neo4j-standalone -f my-neo4j.values-standalone.yaml
I do the following and I’ll explain why.
helm install -n neo4j-1 neo4j-1 neo4j/neo4j-standalone -f ./values-standalone.yaml --create-namespace
Cluster Install
For each cluster node values.yaml
helm install -n neo4j-cluster neo4j-cluster-3 neo4j/neo4j-cluster-core -f ./node1-values-cluster.yaml --create-namespace
You need a minimum of 3 core nodes to create a cluster. So you must run the helm install command 3 times for the neo4j-cluster-core helm chart.
Successful Cluster Creation
kubectl -n neo4j-cluster exec neo4j-cluster-0 -- tail /logs/neo4j.log
2022-03-29 19:14:55.139+0000 INFO Bolt enabled on [0:0:0:0:0:0:0:0%0]:7687.
2022-03-29 19:14:55.141+0000 INFO Bolt (Routing) enabled on [0:0:0:0:0:0:0:0%0]:7688.
2022-03-29 19:15:09.324+0000 INFO Remote interface available at http://localhost:7474/
2022-03-29 19:15:09.337+0000 INFO id: E2E827273BD3E291C8DF4D4162323C77935396BB4FFB14A278EAA08A989EB0D2
2022-03-29 19:15:09.337+0000 INFO name: system
2022-03-29 19:15:09.337+0000 INFO creationDate: 2022-03-29T19:13:44.464Z
2022-03-29 19:15:09.337+0000 INFO Started.
2022-03-29 19:15:35.595+0000 INFO Connected to neo4j-cluster-3-internals.neo4j-cluster.svc.cluster.local/10.233.125.2:7000 [RAFT version:5.0]
2022-03-29 19:15:35.739+0000 INFO Connected to neo4j-cluster-2-internals.neo4j-cluster.svc.cluster.local/10.233.127.2:7000 [RAFT version:5.0]
2022-03-29 19:15:35.876+0000 INFO Connected to neo4j-cluster-3-internals.neo4j-cluster.svc.cluster.local/10.233.125.2:7000 [RAFT version:5.0]
Install Read Replica
The cluster must be up and functioning to install the read replica.
helm install -n neo4j-cluster neo4j-cluster-rr1 neo4j/neo4j-cluster-read-replica -f ./rr1-values-cluster.yaml
Install the Loadbalancer
To access neo4j from an external source you should install the loadbalancer service. Run the following command in our example.
helm install -n neo4j-cluster lb neo4j/neo4j-cluster-loadbalancer --set neo4j.name=my-cluster
Why the -n tag?
I provide the -n with a namespace and the –create-namespace tag because it allows me to install my helm release in this case neo4j-1 into its own namespace. Which helps with operations for DR, Backup and even lifecycle cleanup down the road. When installing a cluster all the helm releases must be in the same namesapce.
Start Graph Databasing!
As you can see there are plenty of tutorials to see how you may use Neo4j
Some other tips:
https://neo4j.com/docs/operations-manual/current/performance/disks-ram-and-other-tips/
See below for detials of the PX Cluster
Status: PX is operational
Telemetry: Disabled or Unhealthy
License: Trial (expires in 31 days)
Node ID: ade858a2-30d4-41ba-a2ce-7ee1f9b7c4c0
IP: 10.21.244.207
Local Storage Pool: 1 pool
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 297 GiB 12 GiB Online default default
Local Storage Devices: 2 devices
Device Path Media Type Size Last-Scan
0:1 /dev/mapper/3624a937081f096d1c1642a6900d954aa-part2 STORAGE_MEDIUM_SSD 147 GiB 29 Mar 22 16:21 UTC
0:2 /dev/mapper/3624a937081f096d1c1642a6900d954ab STORAGE_MEDIUM_SSD 150 GiB 29 Mar 22 16:21 UTC
total - 297 GiB
Cache Devices:
* No cache devices
Journal Device:
1 /dev/mapper/3624a937081f096d1c1642a6900d954aa-part1 STORAGE_MEDIUM_SSD
Cluster Summary
Cluster ID: px-fa-demo1
Cluster UUID: 76eaa789-384d-4af2-b476-b9b3fd7fdcab
Scheduler: kubernetes
Nodes: 8 node(s) with storage (8 online)
IP ID SchedulerNodeName Auth StorageNode Used Capacity Status StorageStatus Version Kernel OS
10.21.244.203 f917d321-3857-4ee1-bfaf-df6419fdac53 pxfa1-3 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.209 c3bea2dc-10cd-4485-8d02-e65a23bf10aa pxfa1-9 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.204 b27e351a-fed3-40b9-a6d3-7de2e7e88ac3 pxfa1-4 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.207 ade858a2-30d4-41ba-a2ce-7ee1f9b7c4c0 pxfa1-7 Disabled Yes 12 GiB 297 GiB Online Up (This node) 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.205 a413f877-3199-4664-961a-0296faa3589d pxfa1-5 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.202 37ae23c7-1205-4a03-bbe1-a16a7e58849a pxfa1-2 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.206 369bfc54-9f75-45ef-9442-b6494c7f0572 pxfa1-6 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
10.21.244.208 00683edb-96fc-4861-815d-a0430f8fc84b pxfa1-8 Disabled Yes 12 GiB 297 GiB Online Up 2.9.1.3-7769924 5.4.0-105-generic Ubuntu 20.04.4 LTS
Global Storage Pool
Total Used : 96 GiB
Total Capacity : 2.3 TiB