Skip to main content

Posts

SPARK running explained - 3

SPARK running explained - 2 SPARK running explained - 1 1.        YARN Cluster Manager – Basic YARN architecture is described as below, and it is similar to Spark Standalone Cluster Manager. Main components – a.        Resource manager – (Just like Spark Master process) b.       Node manager – (similar to Spark’s worker processes) Unlike running on Spark’s standalone cluster, applications on YARN run in containers (JVM processes to which CPU and memory resources are granted). There is an “Application Master” for each application, running in its own container, it’s responsible for requesting application resources from Resource Manager. Node managers track resources used by containers and report to the resource manager. Below depicts the Spark application (cluster-deploy mode) running on YARN cluster with 2 nodes- 1.        Client submit applic...

SPARK running explained - 2

SPARK running explained - 1 Set Speculative execution by setting – 1.        “spark.speculation” to true, default is false. 2.        “spark.speculation.interval” Spark checks with given interval to see if any task needs to be restarted 3.        “spark.speculation.quantile” percentage of tasks that need to complete before speculation is started for a stage 4.        “spark.speculation.multiplier” how many times a task needs to run before it needs to be restarted Data locality means Spark tries to run tasks as close to the data location as possible. Five levels of data locality – 1.        PROCESS_LOCAL - Execute a task on the executor that cached the partition 2.        NODE_LOCAL - Execute a task on the node where the partition is available 3.        RACK...

SPARK running explained - 1

Spark runtime components The main Spark components running in a cluster: client, driver, and executors. The client process starts the driver program. It can be spark-submit or spark-shell or spark-sql or custom application. Client process:- 1.        Prepares the classpath and all configuration options for the Spark application 2.        Passes application arguments to application running in driver. There is always one driver per Spark application. The driver orchestrates and monitors execution of a application. Subcomponents of driver- 1.        Spark context 2.        Scheduler These subcomponents are responsible for- 1.        Requesting memory and CPU resources from cluster managers 2.        Breaking application logic into stages and tasks 3.        Se...

Scala - Scalable Language

Scala, short for Scalable Language- •             Created by Martin Odersky •             Is object-oriented & functional Programming language •             Scala runs on the JVM Installations •             Install Java •             Set Your Java Environment. Ex- JAVA_HOME, PATH, etc •             Install Scala •             After installation, verify version by typing on command prompt or shell >scala –version >java –version If you have a good understanding on Java, then it will be very easy for you to learn Scala. But, we would again des...

Design Patterns (aka DP), Creational - Abstract Factory Pattern

DP is a well-described solution to a common software problem. Its benefits: Already defined to solve a problem. Increase code reusability and robustness. Faster devlopment and new developers in team can understand it easily DP defined in to 3 categories: Creational  - Used to construct objects such that they can be decoupled from their implementing system. Structural  - Used to form large object structures between many disparate objects Behavioral  - Used to manage algorithms, relationships, and responsibilities between objects. Creational : Abstract Factory :- In short we call it Factory of Factories. To understand this please read factory pattern . When you go through factory pattern: You see that factory can produce only Computers or aptly varied types of Computers. But only Computers. Now, Say you have a 3rd variety of Computer just like PC or Server. Say, Laptop class. You can easily inherit Computer an create your Laptops. So...