Quantcast
Channel: Datasalt – Datasalt
Browsing all 10 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Cascading + Splout SQL for log analysis and serving: A Big Data love story

(This is the first post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article



Image may be NSFW.
Clik here to view.

Hive + Splout SQL for a social media reporting webapp: A Big Data love story

(This is the second post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article

Image may be NSFW.
Clik here to view.

Pig + Splout SQL for a retail coupon generator: A Big Data love story

(This is the last post of a series of three posts presenting Splout SQL 0.2.2 native integration with main Hadoop processing tools: Cascading, Hive and Pig). In this post we’ll present an example Big...

View Article

Image may be NSFW.
Clik here to view.

Presenting Splout Cloud: a managed web-latency SQL querying engine in the cloud

We have created Splout Cloud, a web-latency managed service in the AWS cloud. Simply put, Splout Cloud converts any data files – regardless of their size – into a scalable, partitioned SQL querying...

View Article

Image may be NSFW.
Clik here to view.

A practical Storm’s Trident API Overview

On the 10th of April Pere gave a Trident hackaton at Berlin’s Big Data Beers. There was also a parallel Disco hackaton by Dave from Continuum Analytics. Es war viel spaß! The people who came had the...

View Article


Image may be NSFW.
Clik here to view.

Parsing Qype reviews with Pangool and saving results into MongoDB

In this post we will see how easy it is to integrate a Pangool MapReduce Job with MongoDB, the famous document-oriented NoSQL database. For that, we will perform a review scraping task on Qype HTML...

View Article

Ad Networks analytics using Hadoop and Splout SQL

In this post we share the talk (and the slides) Datasalt gave about analytics for ad networks at the past Big Data Spain. In the talk, we sketch the architecture of the whole system. Specifically, we...

View Article

Lambda Architecture: A state-of-the-art

It’s been some time now since Nathan Marz wrote the first Lambda Architecture post. What has happened since then? How has the community reacted to such a concept? What are the architectural trends in...

View Article


SQL on Hadoop: A state of the art

Since the mainstream adoption of Hadoop, many open-source and enterprise tools have emerged recently which aim to solve the problem of “querying Hadoop”. Indeed, Hadoop started as a simple (yet...

View Article


A scalable groupByKey and secondary sort for (Java) Spark

Spark is, in our opinion, the new reference Big Data processing framework. Its flexible API allows for unified batch and stream processing, and can be easily extended for many purposes. It also...

View Article
Browsing all 10 articles
Browse latest View live




Latest Images