Skip to main content

Posts

Setting up Spark using hadoop Yarn as cluster manager

Yarn as cluster manager While using Yarn we do not need to start Spark workers and master. For Spark standalone cluster manager refer http://querydb.blogspot.in/2016/01/installing-and-setting-up-spark-152.html YARN is a cluster manager introduced in Hadoop 2.0 that allows diverse data processing frameworks to run on a shared resource pool, and is typically installed on the same nodes as the Hadoop filesystem (HDFS). Running Spark on YARN in these environments is useful because it lets Spark access HDFS data quickly, on the same nodes where the data is stored. Using YARN in Spark is straightforward: you set an environment variable that points to your Hadoop configuration directory, then submit jobs to a special master URL with spark-submit. ·          Set Hadoop configuration directory as the environment variable HADOOP_CONF_DIR ·          Then, submit your application as follows: spa...

Installing and setting up Spark with Standalone Cluster manager

Note:- To set-up I have used version 1.5.2 When you run jobs in Spark’s local mode. In this mode, the Spark driver runs along with an executor in the same Java process.   Spark can run over a variety of cluster managers to access the machines in a cluster. If you only want to run Spark by itself on a set of machines, the built-in Standalone mode is the easiest way to deploy it. Spark can also run over two popular cluster managers: Hadoop YARN and Apache Mesos. Standalone Cluster manager Copy a compiled version of Spark to the same location on all your machines—for example, /usr/local/spark. Set up password-less SSH access from your master machine to the others. This requires having the same user account on all the machines, creating a private SSH key for it on the master via ssh-keygen, and adding this key to the .ssh/authorized_keys file of all the workers. If you have not set this up before, you can follow these commands: # On master: run ssh-keygen acceptin...

HTTP Response Codes

CODE DESCRIPTION 200 OK. The request has successfully executed. Response depends upon the verb invoked. 201 Created. The request has successfully executed and a new resource has been created in the process. The response body is either empty or contains a representation containing URIs for the resource created. The Location header in the response should point to the URI as well. 202 Accepted. The request was valid and has been accepted but has not yet been processed. The response should include a URI to poll for status updates on the request. This allows asynchronous REST requests 204 No Content. The request was successfully processed but the server did not have any response. The client should not update its display. 301 Moved Permanently. The requested resource is no longer located at the specified URL. The new Location should be returned in the re...

REST Web Services

REST stands for REpresentational State Transfer. REST is an architectural style. HTTP is a protocol which contains the set of REST architectural constraints. Fundamentals: ·          Everything in REST is considered as a resource. ·          Every resource is identified by an URI. ·          Uses uniform interfaces. Resources are handled using POST, GET, PUT, DELETE operations which are similar to create, read, update and delete (CRUD) operations. ·          Be stateless. Every request is an independent request. Each request from client to server must contain all the information necessary to understand the request. ·          Communications are done via representations. E.g. XML, JSON Implementations: Jersey framework is the reference implementation JAX-RS API. The Jer...

Apache Kafka: Low Level api Consumer

If you want greater control over partition consumption then High Level api consumer you may implement it in low level api. It will require to do more work that is not required in consumer group, like: Keeping track of offset where consumer left consumption. Identify lead Broker and adjust with Broker leader changes. Steps for implementing : Find an active Broker and find out which Broker is the leader for your topic and partition Determine who the replica Brokers are for your topic and partition Build the request defining what data you are interested in Fetch the data Identify and recover from leader changes package learning.kafka; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import kafka.api.FetchRequestBuilder; import kafka.api.PartitionOffsetRequestInfo; import kafka.cluster.Broker; import kafka.commo...