• Apache Big_Data Notes: Hadoop, Spark, Flink, etc.
  • Introduction
  • HDFS Native Libraries
  • HDFS Compression Formats
    • Add splittable LZO compression support to HDFS
    • Compression vs. Performance
  • Spark Logging (Log4J)
  • Spark Listener as Driver Health Check
  • Spark Unit Testing with HDFS
  • Flume, HDFS, Hive, PrestoDB
  • REFERENCES
  • Spark Shell
  • Category Theory Notes
  • Building Spark
  • Spark Job running in cron
  • Spark SQL
    • Logging
    • Spark Configuration
    • Spark DataFrame with XML source
    • App.scala
  • ScalaTest
  • ZeroMQ
    • ScalaTest
Powered by GitBook

REFERENCES

REFERENCES

Apache Spark Key Concepts

Resilient Distributed Datases: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

Spark Internals

https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/1-Overview.md

results matching ""

    No results matching ""