Writing Spark Applications

16 / 16
Tutorial - Setting up Dev Machine and fixing code on windows

In the video, we have followed these steps:

  1. Install a simple text editor - sublime - https://www.sublimetext.com/
  2. Install a git client - https://desktop.github.com/
  3. Clone the code: https://github.com/sandeepcxl/bigdata.git
  4. Install JRE
  5. Install sbt - http://www.scala-sbt.org/index.html
  6. Run sbt test
  7. Install eclipse:

  8. Install JDK

  9. Create eclipse project using sbt eclipse plugin: https://github.com/typesafehub/sbteclipse
  10. With sublime texteditor, edit C:/Users/myuser/.sbt/0.13/plugins/plugins.sbt And add following to it:

    addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.1.0")

  11. And run sbt in interactive mode and run the command eclipse it

  12. Using eclipse Import project and modify code:
  13. Use File -> Import -> General/Existing Project into Workspace. Select the directory containing your project as root directory, select the project and hit Finish.
  14. In Utils, add:

        def isClassA(ip:String):Boolean = {
            ip.split('.')(0).toInt < 127
        }
    

  15. In log-parser-test.scala, add a unit test case:

        "CLASSA" should "Return true if class is A" in {
            val utils = new Utils
            assert(utils.isClassA("121.242.40.10 "))
            assert(!utils.isClassA("212.242.40.10 "))
            assert(!utils.isClassA("239.242.40.10 "))
            assert(!utils.isClassA("191.242.40.10 "))
          }
    

  16. In log-parser-test.scala, add a filter after extracting the IP:

    var cleanips = ipaccesslogs.map(extractIP(_))

    var cleanips = ipaccesslogs.map(extractIP(_)).filter(isClassA)

  17. Using eclipse run test cases

  18. Commit and push the modified code
  19. Run the test case: sbt test
  20. Now, build again using: sbt package
  21. Copy apache-log-parsing_2.10-0.0.1.jar to server using winscp
  22. Run it using the usual command:
    spark-submit apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.45.gz
    

Lab Details


Enroll now to learn and practice or Refer friends and get 15 days lab access


Enroll Now >>