In this this blog we will see -
- Spark Graph Frames
- Use of Graph Frames in Machine Learning to execute algorithms like - Page Rank
A graph is made up of vertices and edges that connect them.
- 1. Vertices are Objects
- 2. Edges are relationships
A regular graph
is a graph where each vertex has the same number of edges.
A directed graph is
a graph where the edges have a direction associated with them.
Example –
- 1. Facebook friends – A is friend of B, and so is B friend of A.
- 2. Instagram followers- A, B, C are followers of D. But, D may not be follower of either A, B, or C
- 3. Websites – Every page is a node and every linking page is an edge. Page rank Algorithm measures the importance of a page by number of links to a page and number of links to each linking page.
- 4. Recommendation Engines - Recommendation algorithms can use graphs where the nodes are the users and products, and their respective attributes and the edges are the ratings or purchases of the products by users. Graph algorithms can calculate weights for how similar users rated or purchased similar products.
- 5. Fraud - Graphs are useful for fraud detection algorithms in banking, healthcare, and network security. Example – Connection between Doctor, patient and pharmacy
You can find a introductory example of creating graph frames here @https://github.com/dinesh028/SparkDS/blob/master/src/indore/dinesh/sachdev/flight/graph/IntroDriver.scala
We will explore the same flight dataset, as used in @http://querydb.blogspot.com/2019/12/machine-learning-part-2.html with Graph Frames
Reference code can be found here @https://github.com/dinesh028/SparkDS/blob/master/src/indore/dinesh/sachdev/flight/graph/Driver.scala
Sparks GrpahX api is an implementation of Google's Pregel which was a paper published in 2010 - http://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf
Its an iterative processing model, GraphFrames aggregateMessages, sends messages between vertices and aggregates message values from the neighboring edges and vertices of each vertex.
In the example, we see how to use aggregated messages.
Comments
Post a Comment