Justin du Coeur (jducoeur) wrote,
Justin du Coeur

Big Data, Scala and Spark

[For the programmers, particularly for the architects.]

One of the trends that has happened so fast that I suspect most folks haven't even noticed it yet is the sea change that is occurring in Big Data processing right now. The short version is that, relatively recently, some folks from the Scala world pointed out that while Hadoop is a lot better than traditional RDBMS methods for dealing with data at scale, it still kind of sucks for many use cases. So a project got started to rethink the approach to Big Data around a streaming model. That became Apache Spark, and it is taking over the world with remarkable speed.

TypeSafe has posted a blog entry summarizing the benefits of Spark: it's fairly brief, and worth reading if you have any scaled-data requirements, to understand the strengths of the system. It includes a very concise tl;dr summary at the end. (Note, though, that it is written by Dean Wampler, who isn't exactly objective: his talk at NE Scala a few weeks ago kind of bragged about his self-described trolling of the Hadoop community getting the ball rolling in the first place.)

Querki isn't using this stuff *yet*, and probably won't for a year yet -- I have to focus on more critical-path issues for now. But I suspect I'll be adopting Spark before long, for things like automatic abuse catching. (I already know some of the obvious ways that wikispammers are going to try to game Querki, and a combination of event stream and graph analysis is probably going to be helpful to tame that.) And one of Querki's most game-changing features, App Communities, is going to be all about what happens when you combine Querki with Big Data. I suspect that almost any large-scale JVM-based system is likely to find this stuff useful in some fashion...
Tags: programming, scala

  • Adtech

    Here's an interesting article about "adtech" -- those automated algorithms that companies like Google and Facebook use to spy on you and serve up…

  • Chrome instability?

    For the past week or two, Chrome has become surprisingly unstable -- it's been crashing on me about once a day. Weirdly, it is usually when I'm not…

  • Crossing the Uncanny Valley

    [Trying out posting from DreamWidth. Let's see if everything is configured right.] Just saw Rogue One. Capsule Summary: not an epic for the ages,…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded