Writing Spark Applications

16 / 16

Tutorial - Setting up Dev Machine and fixing code on windows

The following steps are needed in order to setup windows dev machine:

  1. Install a simple text editor - sublime - https://www.sublimetext.com/
  2. Install a git client - https://desktop.github.com/
  3. Clone the code: https://github.com/cloudxlab/bigdata.git
  4. Install JDK [http://www.oracle.com/technetwork/java/javase/downloads/index.html]
  5. Install sbt - http://www.scala-sbt.org/index.html
  6. Open command prompt and go to the project folder: cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt
  7. Run sbt test on command prompt
  8. Install eclipse:

  9. Create eclipse project using sbt eclipse plugin: https://github.com/typesafehub/sbteclipse

  10. With sublime texteditor, edit C:/Users/myuser/.sbt/1.0/plugins/plugins.sbt And add following to it or whatever is mentioned on sbteclipse github homepage:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.2")
11. Re-Open command prompt and go to the project folder: ```cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt``` 12. And run ```sbt eclipse``` 13. Using eclipse Import project and modify the code: 14. Use File -> Import -> General/Existing Project into Workspace. Select the directory containing your project as the root directory, select the project and hit Finish. 15. In Utils, add:
    def isClassA(ip:String):Boolean = {
        ip.split('.')(0).toInt < 127
  1. In log-parser-test.scala, add a unit test case:
    "CLASSA" should "Return true if class is A" in {
        val utils = new Utils
        assert(utils.isClassA(" "))
        assert(!utils.isClassA(" "))
        assert(!utils.isClassA(" "))
        assert(!utils.isClassA(" "))
  1. In log-parser-test.scala, add a filter after extracting the IP:

    var cleanips = ipaccesslogs.map(extractIP(_))

    * var cleanips = ipaccesslogs.map(extractIP(_)).filter(isClassA)*

  2. Using eclipse run test cases

  3. Commit and push the modified code
  4. Run the test case: sbt test
  5. Now, build again using: sbt package
  6. Copy apache-log-parsing_2.10-0.0.1.jar to server using winscp
  7. Run it using the usual command:
spark-submit apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz