Building Spark

Spark’s DataFrame window functions need SQLContext to be in fact HiveContext. Default distribution with hadoop provided does not support HiveContext. The steps to build custom Spark version with HiveContext support

  1. Download source code from Spark downloads

    1. Choose Spark release version, choose Source Code (can build serveral Hadoop version) as a package.

  2. Build distribution. Configure following settings: yarn, hadoop provided as external distribution, hadoop 2.7.1 version, hive support, hive over thrift server.

    $ cd spark-1.6.1
    $ ./build/mvn -Pyarn -Phadoop-provided -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -DskipTests clean install
  3. Prepare configuration

    $ cd conf
    $ cp spark-env.sh.template spark-env.sh
  4. Change spark environment settings

    # Explicit path to hadoop's class path
    SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
    SPARK_MASTER_IP=localhost
    SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=recovery"
    SPARK_MASTER_WEBUI=8081
  5. Add log4j properties

    $ cd conf
    $ cp log4j.properties.template log4j.properties

results matching ""

    No results matching ""