Yarn as cluster
manager
While using Yarn we do not need to start Spark workers and
master. For Spark standalone cluster manager refer http://querydb.blogspot.in/2016/01/installing-and-setting-up-spark-152.html
YARN is a cluster
manager introduced in Hadoop 2.0 that allows diverse data processing frameworks
to run on a shared resource pool, and is typically installed on the same nodes
as the Hadoop filesystem (HDFS). Running Spark on YARN in these environments is
useful because it lets Spark access HDFS data quickly, on the same nodes where
the data is stored.
Using YARN in
Spark is straightforward: you set an environment variable that points to your
Hadoop configuration directory, then submit jobs to a special master URL with
spark-submit.
- · Set Hadoop configuration directory as the environment variable HADOOP_CONF_DIR
- · Then, submit your application as follows:
spark-submit --master yarn yourapp
- · Launch spark-shell:
Comments
Post a Comment