Posted by on
Categories: Apache Bigdata Hadoop MapReduce Python

Why #Java ? Why not? What about #Scala ? Or #Python. I use all three for various parts of #BigData projects. Use the best tool for the job. A lot of things can be orchestrated and managed without any coding through #Apache NiFi 1.0. Some things like #TensorFlow are best done in Python, while #Spark and #Flink jobs could be Scala, Python, or Java. #ApacheBeam is Java only (Spotify added a Scala interface, but it’s not official yet. If you are a really strong Java 8 developer and code clean, you can write #Hadoop #MapReduce, #Kafka, Spark, Flink, #Apex. Apache NiFi is written in Java and so is most of Hadoop, so it’s Big Data scale. Spark and others are written mostly in Scala. Ecosystem Scala and Java share a ton of libraries, as they run on the JVM. Python has its own huge ecosystem, but for many Hadoop things the JVM languages have a bit of an advantage. You can run JPython on the JVM, but I really haven’t seen that used for Big Data, Spark, or Machine Learning. I am wondering if anyone is doing this? Please comment here. Python has TensorFlow and some nice Deep Learning and Machine Learning libraries. They are also starting to get more Universities teaching Python instead of Java. Not too many Universities are teaching Scala.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.