Building Spark Application - End to End

16 / 16

Tutorial - Setting up Dev Machine and fixing code on windows




Not able to play video? Try with youtube

The following steps are needed in order to setup windows dev machine:

  1. Install a simple text editor - sublime - https://www.sublimetext.com/
  2. Install a git client - https://desktop.github.com/
  3. Clone the code: https://github.com/cloudxlab/bigdata.git
  4. Install JDK [http://www.oracle.com/technetwork/java/javase/downloads/index.html]
  5. Install sbt - http://www.scala-sbt.org/index.html
  6. Open command prompt and go to the project folder: cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt
  7. Run sbt test on command prompt. This should some test cases have failed. So, there is an error in the code that we need to fix. The following steps demonstrate the software development process of fixing the error using eclipse.
  8. Install eclipse:

  9. Create eclipse project using sbt eclipse plugin: https://github.com/typesafehub/sbteclipse

  10. With sublime texteditor, edit C:/Users/myuser/.sbt/1.0/plugins/plugins.sbt And add following to it or whatever is mentioned on sbteclipse github homepage:

    addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.2")

  11. Re-Open command prompt and go to the project folder: cd c:\Users\MY_WINDOWS_USERNAME\Documents\GitHub\bigdata\spark\projects\apache-log-parsing_sbt

  12. And run sbt eclipse
  13. Using eclipse Import project and modify the code.
  14. Use File -> Import -> General/Existing Project into Workspace. Select the directory containing your project as the root directory, select the project and hit Finish.
  15. In file log-parser.java (src/main/scala/com/cloudxlab/logparsing directory), add the following function inside Utils class:

    def isClassA(ip:String):Boolean = {
        ip.split('.')(0).toInt < 127
    }
    

  16. In log-parser-test.scala (located in test/scala/com/cloudxlab/logparsing directory), add a unit test case:

         "CLASSA" should "Return true if class is A" in {
            val utils = new Utils
            assert(utils.isClassA("121.242.40.10 "))
            assert(!utils.isClassA("212.242.40.10 "))
            assert(!utils.isClassA("239.242.40.10 "))
            assert(!utils.isClassA("191.242.40.10 "))
          }

  17. In log-parser.scala (located in src/main/scala/com/cloudxlab/logparsing directory), add a filter after extracting the IP :

    var cleanips = ipaccesslogs.map(extractIP(_))
    var cleanips = ipaccesslogs.map(extractIP(_)).filter(isClassA)

  18. Using eclipse run test cases. This should now show the test cases have passed because we made modifications in the above steps. Please go thru the code and try to learn what changes we have made.
  19. Commit and push the modified code. If you have cloned from our repository as mentioned above, you will not be able to push but if you cloned a repo on which you have permission to make modifications, you will be able to push.
  20. Run the test case: sbt test
  21. Now, build again using: sbt package
  22. Copy apache-log-parsing_2.10-0.0.1.jar to CloudxLab web console server (e.cloudxlab.com or f.cloudxlab.com) using WinSCP (rsync or SCP if you are using Cygwin on windows)
  23. Run the newly uploaded file using spark on the CloudxLab webconsole using the usual command:
spark-submit apache-log-parsing_2.10-0.0.1.jar com.cloudxlab.logparsing.EntryPoint 10 /data/spark/project/access/access.log.2.gz

You should see something like the following on the screen after lots of log messages:


===== TOP 10 IP Addresses =====
(107.170.18.142,142072)
(106.216.188.163,4584)
(69.65.19.184,1259)
(106.216.154.50,1187)
(78.46.22.138,1093)
(106.216.189.5,638)
(59.97.17.204,478)
(72.195.144.124,436)
(4.26.51.74,393)
(122.172.105.180,387)


Loading comments...