neo4j-spark-connector | Neo4j Connector for Apache Spark
kandi X-RAY | neo4j-spark-connector Summary
kandi X-RAY | neo4j-spark-connector Summary
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of neo4j-spark-connector
neo4j-spark-connector Key Features
neo4j-spark-connector Examples and Code Snippets
Community Discussions
Trending Discussions on neo4j-spark-connector
QUESTION
I'm working on a rather big project. I need to use azure-security-keyvault-secrets, so I added following to my pom.xml file:
...ANSWER
Answered 2019-Dec-27 at 18:36So I managed to fix the problem with the maven-shade-plugin. I added following piece of code to my pom.xml file:
QUESTION
I've run into a technical challenge around Neo4j usage that has had me stumped for a while. My organization uses Neo4j to model customer interaction patterns. The graph has grown to a size of around 2 million nodes and 7 million edges. All nodes and edges have between 5 and 10 metadata properties. Every day, we export data on all of our customers from Neo4j to a series of python processes that perform business logic.
Our original method of data export was to use paginated cypher queries to pull the data we needed. For each customer node, the cypher queries had to collect many types of surrounding nodes and edges so that the business logic could be performed with the necessary context. Unfortunately, as the size and density of the data grew, these paginated queries began to take too long to be practical.
Our current approach uses a custom Neo4j procedure to iterate over nodes, collect the necessary surrounding nodes and edges, serialize the data, and place it on a Kafka queue for downstream consumption. This method worked for some time but is now taking long enough so that it is also becoming impractical, especially considering that we expect the graph to grow an order of magnitude in size.
I have tried the cypher-for-apache-spark and neo4j-spark-connector projects, neither of which have been able to provide the query and data transfer speeds that we need.
We currently run on a single Neo4j instance with 32GB memory and 8 cores. Would a cluster help mitigate this issue?
Does anyone have any ideas or tips for how to perform this kind of data export? Any insight into the problem would be greatly appreciated!
...ANSWER
Answered 2018-May-02 at 17:34As far as I remember Neo4j doesn't support horizontal scaling and all data is stored in a single node. To use Spark you could try to store your graph in 2+ nodes and load the parts of the dataset from these separate nodes to "simulate" the parallelization. I don't know if it's supported in both of connectors you quote.
But as told in the comments of your question, maybe you could try an alternative approach. An idea:
- Find a data structure representing everything you need to train your model.
- Store such "flattened" graph in some key-value store (Redis, Cassandra, DynamoDB...)
- Now if something changes in the graph, push the message to your Kafka topic
- Add consumers updating the data in the graph and in your key-value store directly after (= make just an update of the graph branch impacted by the change, no need to export the whole graph or change the key-value store at the same moment but it would very probably lead to duplicate the logic)
- Make your model querying directly the key-value store.
It depends also on how often your data changes, how deep and breadth is your graph ?
QUESTION
I am using Neo4j/Cypher , my data is about 200GB , so i thought of scalable solution "spark".
Two solutions are available to make neo4j graphs with spark :
1) Cypher for Apache Spark (CAPS)
2) Neo4j-Spark-Connector
I used the first one ,CAPS . The pre-processed CSV got two "geohash" informations : one for pickup and another for drop off for each row what i want is to make a connected graph of geohash nodes.
CAPS allow only to make a graph by mapping nodes : If node with id 0 is to be connected to node with id 1 you need to have a relationship with start id 0 and end id 1.
A very simple layout would be:
...ANSWER
Answered 2018-Aug-27 at 07:32You are right, CAPS is, just like Spark, an immutable system. However, with CAPS you can create new graphs from within a Cypher statement: https://github.com/opencypher/cypher-for-apache-spark/blob/master/spark-cypher-examples/src/main/scala/org/opencypher/spark/examples/MultipleGraphExample.scala
At the moment the CONSTRUCT
clause has limited support for MERGE
. It only allows to add already bound nodes to the newly created graph, while each bound node is added exactly once independent off how many time it occurs in the binding table.
Consider the following query:
QUESTION
I am trying to test on Intellij IDEA a scala maven project
when i run
mvn test
i get this error :
...ANSWER
Answered 2018-Jul-21 at 22:31I found the same issue as mine here , so i upgraded the maven-scala-plugin to 3.3.3 instead of 3.2.0, the previous error is disappeared
QUESTION
I have defined project dependencies and merge strategies for them to generate the jar file with the sbt-assembly plugin. Alos, the main class in the future jar was defined too. After I create the jar file and try to run it using the bash script I get an error:
...ANSWER
Answered 2018-Jul-13 at 08:48Rebuilding new jar and checking main class location worked.
QUESTION
I have multi-project with the main module called root, consumer and producer modules with dependencies which depend on the core module. The core modules hold configuration related classes.
I would like to build 2 separate jars for consumer and producer with separate main classes with sbt-assembly. However, when I try to build them individually like this sbt consumer/assembly
or altogether by running sbt assembly
I get such an error and sbt cannot compile the whole project:
ANSWER
Answered 2018-Jun-08 at 11:53The problem is in this line:
QUESTION
I am fetching neo4j data into spark dataframe using neo4j-spark connector. I am able to fetch it successfully as I am able to show the dataframe. Then I register the dataframe with createOrReplaceTempView() method. Then I try running spark sql on it, but it gives exception saying
...ANSWER
Answered 2018-Jun-01 at 15:02Based on the symptoms we can infer that both pieces of code use different SparkSession
/ SQLContext
. Assuming there is nothing unusual going on in the Neo4j connector, you should be able to fix this by changing:
QUESTION
I was running scala code in spark-shell using this:
...ANSWER
Answered 2017-Dec-07 at 21:58Try adding the following
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install neo4j-spark-connector
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page