Machine Learning Part 6

In this this blog we will see -

Spark Graph Frames
Use of Graph Frames in Machine Learning to execute algorithms like - Page Rank

A graph is made up of vertices and edges that connect them.

1. Vertices are Objects
2. Edges are relationships

A regular graph is a graph where each vertex has the same number of edges.

A directed graph is a graph where the edges have a direction associated with them.

Example –

1. Facebook friends – A is friend of B, and so is B friend of A.
2. Instagram followers- A, B, C are followers of D. But, D may not be follower of either A, B, or C
3. Websites – Every page is a node and every linking page is an edge. Page rank Algorithm measures the importance of a page by number of links to a page and number of links to each linking page.
4. Recommendation Engines - Recommendation algorithms can use graphs where the nodes are the users and products, and their respective attributes and the edges are the ratings or purchases of the products by users. Graph algorithms can calculate weights for how similar users rated or purchased similar products.
5. Fraud - Graphs are useful for fraud detection algorithms in banking, healthcare, and network security. Example – Connection between Doctor, patient and pharmacy

You can find a introductory example of creating graph frames here @https://github.com/dinesh028/SparkDS/blob/master/src/indore/dinesh/sachdev/flight/graph/IntroDriver.scala

We will explore the same flight dataset, as used in @http://querydb.blogspot.com/2019/12/machine-learning-part-2.html with Graph Frames

Reference code can be found here @https://github.com/dinesh028/SparkDS/blob/master/src/indore/dinesh/sachdev/flight/graph/Driver.scala

Sparks GrpahX api is an implementation of Google's Pregel which was a paper published in 2010 - http://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf

Its an iterative processing model, GraphFrames aggregateMessages, sends messages between vertices and aggregates message values from the neighboring edges and vertices of each vertex.

In the example, we see how to use aggregated messages.

QueryDB

Search This Blog

Machine Learning Part 6

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

AWS EMR Spark – Much Larger Executors are Created than Requested

Hive Count Query not working