$ cd spark-1.6.1 $ ./build/mvn -Pyarn -Phadoop-provided -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -DskipTests clean install
Building Spark
Spark’s DataFrame window functions need SQLContext to be in fact HiveContext. Default distribution with hadoop provided does not support HiveContext. The steps to build custom Spark version with HiveContext support
-
Download source code from Spark downloads
-
Choose Spark release version, choose Source Code (can build serveral Hadoop version) as a package.
-
-
Build distribution. Configure following settings: yarn, hadoop provided as external distribution, hadoop 2.7.1 version, hive support, hive over thrift server.
-
Prepare configuration
$ cd conf $ cp spark-env.sh.template spark-env.sh
-
Change spark environment settings
# Explicit path to hadoop's class path SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) SPARK_MASTER_IP=localhost SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=recovery" SPARK_MASTER_WEBUI=8081
-
Add log4j properties
$ cd conf $ cp log4j.properties.template log4j.properties