Spark Standalone: Differences between client and cluster deploy modes

2017. 2. 4. 13:27

Spark Standalone: Differences between client and cluster deploy modes

#출처

http://stackoverflow.com/questions/37027732/spark-standalone-differences-between-client-and-cluster-deploy-modes

http://spark.apache.org/docs/latest/submitting-applications.html

Client:

Driver runs on a dedicated server (Master node) inside a dedicated process. This means it has all available resources at it's disposal to execute work.

Driver opens up a dedicated Netty HTTP server and distributes the JAR files specified to all Worker nodes (big advantage).

Because the Master node has dedicated resources of it's own, you don't need to "spend" worker resources for the Driver program.

If the driver process dies, you need an external monitoring system to reset it's execution.

Cluster:

Driver runs on one of the cluster's Worker nodes. The worker is chosen by the Master leader

Driver runs as a dedicated, standalone process inside the Worker.

Driver programs takes up at least 1 core and a dedicated amount of memory from one of the workers (this can be configured).

Driver program can be monitored from the Master node using the --supervise flag and be reset in case it dies.

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \ # --master yarn
  --deploy-mode cluster \ # --deploy-mode client
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

'Platform > Spark' 카테고리의 다른 글

Installing Spark Standalone to a Cluster (1)	2017.02.05
Spark의 Streaming 처리 방식 - 마이크로 배치(micro-batch) (0)	2017.01.19
How to set up a Spark project with Scala Eclipse IDE Maven (0)	2017.01.14

Big Data Analytics Platform & Solution Architect

Spark Standalone: Differences between client and cluster deploy modes

'Platform > Spark' 카테고리의 다른 글

+ Recent posts

티스토리툴바