Datasalt – Datasalt

Channel: Datasalt – Datasalt

↧

Image may be NSFW.
Clik here to view.

Cascading + Splout SQL for log analysis and serving: A Big Data love story

March 15, 2013, 6:14 am

(This is the first post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article

Image may be NSFW.
Clik here to view.

Hive + Splout SQL for a social media reporting webapp: A Big Data love story

March 18, 2013, 11:34 am

(This is the second post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article

Image may be NSFW.
Clik here to view.

Pig + Splout SQL for a retail coupon generator: A Big Data love story

April 4, 2013, 8:28 am

(This is the last post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article

Image may be NSFW.
Clik here to view.

Presenting Splout Cloud: a managed web-latency SQL querying engine in the cloud

April 5, 2013, 7:29 am

We have created Splout Cloud, a web-latency managed service in the AWS cloud. Simply put, Splout Cloud converts any data files – regardless of their size – into a scalable, partitioned SQL querying...

View Article

Image may be NSFW.
Clik here to view.

A practical Storm’s Trident API Overview

April 11, 2013, 8:08 am

On the 10th of April Pere gave a Trident hackaton at Berlin’s Big Data Beers. There was also a parallel Disco hackaton by Dave from Continuum Analytics. Es war viel spaß! The people who came had the...

View Article

Image may be NSFW.
Clik here to view.

Parsing Qype reviews with Pangool and saving results into MongoDB

June 17, 2013, 4:39 am

In this post we will see how easy it is to integrate a Pangool MapReduce Job with MongoDB, the famous document-oriented NoSQL database. For that, we will perform a review scraping task on Qype HTML...

View Article

Ad Networks analytics using Hadoop and Splout SQL

January 17, 2014, 8:52 am

In this post we share the talk (and the slides) Datasalt gave about analytics for ad networks at the past Big Data Spain. In the talk, we sketch the architecture of the whole system. Specifically, we...

View Article

Lambda Architecture: A state-of-the-art

January 17, 2014, 10:59 am

It’s been some time now since Nathan Marz wrote the first Lambda Architecture post. What has happened since then? How has the community reacted to such a concept? What are the architectural trends in...

View Article

SQL on Hadoop: A state of the art

April 1, 2014, 2:00 am

Since the mainstream adoption of Hadoop, many open-source and enterprise tools have emerged recently which aim to solve the problem of “querying Hadoop”. Indeed, Hadoop started as a simple (yet...

View Article

A scalable groupByKey and secondary sort for (Java) Spark

December 9, 2015, 1:09 am

Spark is, in our opinion, the new reference Big Data processing framework. Its flexible API allows for unified batch and stream processing, and can be easily extended for many purposes. It also...

View Article