Writing Spark Applications

16 / 16
Tutorial - Setting up Dev Machine and fixing code on windows

The following steps are needed in order to setup windows dev machine:

  1. Install a simple text editor - sublime - https://www.sublimetext.com/
  2. Install a git client - https://desktop.github.com/
  3. Clone the code: https://github.com/cloudxlab/bigdata.git
  4. Install JDK [http://www.oracle.com/technetwork/java/javase/downloads/index.html]
  5. Install sbt - http://www.scala-sbt.org/index.html
  6. Open command prompt and go to the project folder: cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt
  7. Run sbt test on command prompt
  8. Install eclipse:

  9. Create eclipse project using sbt eclipse plugin: https://github.com/typesafehub/sbteclipse

  10. With sublime texteditor, edit C:/Users/myuser/.sbt/1.0/plugins/plugins.sbt And add following to it or whatever is mentioned on sbteclipse github homepage:

    addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.2")

  11. Re-Open command prompt and go to the project folder: cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt

  12. And run sbt eclipse
  13. Using eclipse Import project and modify the code:
  14. Use File -> Import -> General/Existing Project into Workspace. Select the directory containing your project as the root directory, select the project and hit Finish.
  15. In Utils, add:

        def isClassA(ip:String):Boolean = {
            ip.split('.')(0).toInt < 127
        }
    

  16. In log-parser-test.scala, add a unit test case:

        "CLASSA" should "Return true if class is A" in {
            val utils = new Utils
            assert(utils.isClassA("121.242.40.10 "))
            assert(!utils.isClassA("212.242.40.10 "))
            assert(!utils.isClassA("239.242.40.10 "))
            assert(!utils.isClassA("191.242.40.10 "))
          }
    

  17. In log-parser-test.scala, add a filter after extracting the IP:

    var cleanips = ipaccesslogs.map(extractIP(_))

    var cleanips = ipaccesslogs.map(extractIP(_)).filter(isClassA)

  18. Using eclipse run test cases

  19. Commit and push the modified code
  20. Run the test case: sbt test
  21. Now, build again using: sbt package
  22. Copy apache-log-parsing_2.10-0.0.1.jar to server using winscp
  23. Run it using the usual command:
    spark-submit apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz