- Download sbt-launch.jar, and put it into $HOME/bin.
- Create $HOME/bin/sbt, and change mode to 755. The content is:
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M -Dsbt.override.build.repos=true"
java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
- Create $HOME/.sbt/repositories, content is:
[repositories]
local
my-ivy-proxy-releases: http://10.20.8.31:8081/nexus/content/groups/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
my-maven-proxy-releases: http://10.20.8.31:8081/nexus/content/groups/public/
- Download Spark, choose the version corresponding to your HDFS.
- Extract the tar ball, say /path/to/spark
- Setup $SPARK_HOME=/path/to/spark
- Add $SPARK_HOME/bin to $PATH
$ git clone git@github.com:jizhang/spark-sandbox
$ cd spark-sandbox
$ sbt eclipse
And import the project into Eclipse, provided ScalaIDE for Eclipse is installed.
- Run locally:
$ cd spark-sandbox
$ sbt "run-main Wordcount data/wordcount.txt"
- Submit to cluster:
$ sbt package
$ spark-submit --class Wordcount --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/wordcount.txt
$ spark-submit --class LogisticRegression --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/lr_data.txt 10 10
$ spark-submit --class LogMining --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/logs.txt
$ nc -lk 9999
$ spark-submit --class StreamingWordcount --master local[2] target/scala-2.10/spark-sandbox_2.10-0.1.0.jar
$ spark-submit --class KMeans --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/kmeans_data.txt 2 0.01
$ sbt "run-main recommendation.MainClass als"