Jun 25, 2015

Getting Started with Spark on Windows

This article talks about how to get started with the Spark shell on Windows. Based on the documentation on Spark's website it seemed like a breeze to get started. But, there were several mistakes I made which took me longer to get started than I had expected. This is what worked for me on a Windows machine:
  1. First, download Spark with a hadoop distribution: http://spark.apache.org/downloads.html

  2. Next, go to Windows=>Run and type cmd to get the DOS command prompt.

    *Note: Take note that Cygwin may not work. You will have to use the DOS prompt.

  3. Change directory into the Spark installation directory (home directory)

  4. Next, at the command prompt, type
    bin\spark-shell.cmd
    
    
  5. You should see something like this:


  6. Once Spark has started, you will see a prompt "scala>".

  7. If Spark correctly initialized, if you type:
    val k=5+5  
    at the command prompt, you should get back:
    k: Int = 10
    If you don't then Spark did not start correctly.



  8. Another check to do is to go to your Web browser and type http://localhost:port_that_spark_started_on. This value can be found in the start up screen. It is usually 4040, but it can be some other value if Spark had issues binding to that specific port.