QueryDB

Posts

Error to integrate Impala with Kudu

Create Table Failed W0106 18:18:54.640544 368440 negotiation.cc:307] Unauthorized connection attempt: Server connection negotiation failed: server connection from 172.136.38.157:35678: unauthenticated connections from publicly routable IPs are prohibited. See --trusted_subnets flag for more information.: 172.136.38.157:35678 After setting up kudu, we can enable it to work with Impala. We, can check the cluster status - kudu cluster ksck <master> The cluster doesn't have any matching tables ================== Errors: ================== error fetching info from tablet servers: Not found: No tablet servers found FAILED -- Also, tablet Server UI is not opening. Solution- This error might be because Kudu Service has to know about trusted networks, which we can set - Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile Kudu (Service-Wide) --trusted_subnets=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,169.254.0

Machine Learning Part 6

In this this blog we will see - Spark Graph Frames Use of Graph Frames in Machine Learning to execute algorithms like - Page Rank A graph is made up of vertices and edges that connect them. 1. Vertices are Objects 2. Edges are relationships A regular graph is a graph where each vertex has the same number of edges. A directed graph is a graph where the edges have a direction associated with them. Example – 1. Facebook friends – A is friend of B, and so is B friend of A. 2. Instagram followers- A, B, C are followers of D. But, D may not be follower of either A, B, or C 3. Websites – Every page is a node and every linking page is an edge. Page rank Algorithm measures the importance of a page by number of links to a page and number of links to each linking page. 4. Recommendation Engines - Recommendation algorithms can use graphs where the nodes are the users and products, and their respective attributes and the edges ar

Machine Learning Part 5

In this blog, we will describe another example that utilize KMeans & Spark to determine locations. Before that, we would suggest you to got to previous blogs - https://querydb.blogspot.com/2019/12/machine-learning-part-4.html https://querydb.blogspot.com/2019/12/machine-learning-part-3.html https://querydb.blogspot.com/2019/12/machine-learning-part-2.html https://querydb.blogspot.com/2019/12/machine-learning-part-1.html In this blog, we will analyze and try to make predictions on Fire detection GIS Data- https://fsapps.nwcg.gov/gisdata.php We will have historical data of Wild fires. And, we will try to analyze. That is eventually helpful to reduce response time in case of fire, reduce cost, reduce damages due to fire, etc. Fire can grow exponentially based on various factors like - Wild life, Wind Velocity, terrain surface, etc. Incident tackle time is limited by various factors one of which is moving firefighting equipment. If we are plan in advance where to pl

Machine Learning Part 4

In the previous blog, we learned about creating K-Means Clustering Model . In this blog we will use the created model in a streaming use case for analysis in real time. For previous blog refer @ https://querydb.blogspot.com/2019/12/machine-learning-part-3.html 1) Load the Model created in previous blog. 2) Create a dataframe with cluster id, and centroid location ( centroid longitude , centroid latitude) 3) Create a Kafka Streaming dataframe. 4) Parse the Message into a Typed Object. 5) Use Vector assembler to put all features in to a vector 6) Transform the dataframe using model to get predictions. 7) Join with dataframe created in #2 8) Print the results to console or save it to HBase. Note that this example also describes about Spark Structured Streaming, where-in, We created a streaming Kafka Source And, a custom Foreach Sink to write data to HBase. Refer code @ https://github.com/dinesh028/SparkDS/tree/master/src/indore/dinesh/sachdev/uber/streaming

Machine Learning Part 3

Refer code @ https://github.com/dinesh028/SparkDS/blob/master/src/indore/dinesh/sachdev/uber/UberClusteringDriver.scala Now days, Machine Learning is helping to improve cities. The analysis of location and behavior patterns within cities allows optimization of traffic, better planning decisions, and smarter advertising. For example, analysis of GPS data to optimize traffic flow, Many companies are using it for Field Technician optimization. It can also be used for recommendations, anomaly detection, and fraud. Uber is using same to optimize customer experience - https://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop-to-optimize-customer-experience/ In this blog, we will see clustering and the k-means algorithm. And, its usage to analyze public Uber data. Clustering is a family of unsupervised machine learning algorithms that discover groupings that occur in collections of data by analyzing similarities between input examples. Some examples of clustering uses inc