Yarn as cluster manager While using Yarn we do not need to start Spark workers and master. For Spark standalone cluster manager refer http://querydb.blogspot.in/2016/01/installing-and-setting-up-spark-152.html YARN is a cluster manager introduced in Hadoop 2.0 that allows diverse data processing frameworks to run on a shared resource pool, and is typically installed on the same nodes as the Hadoop filesystem (HDFS). Running Spark on YARN in these environments is useful because it lets Spark access HDFS data quickly, on the same nodes where the data is stored. Using YARN in Spark is straightforward: you set an environment variable that points to your Hadoop configuration directory, then submit jobs to a special master URL with spark-submit. · Set Hadoop configuration directory as the environment variable HADOOP_CONF_DIR · Then, submit your application as follows: spa...