Post has attachment
Occupy the Cloud: Distributed Computing for the 99% (SoCC17). PyWren is a small library for running CPU-intensive BSP workloads using AWS Lambda:
http://shivaram.org/publications/pywren-socc17.pdf

This is available as an open-source project http://pywren.io/

Post has attachment
Sundial: Harmonizing Concurrency Control and Caching
in a Distributed OLTP Database Management System. VLDB 2018
http://www.vldb.org/pvldb/vol11/p1289-yu.pdf

Post has attachment
The FuzzyLog is a partially ordered shared log. Unlike traditional SMR systems, such as Paxos or Tango, which store all events in a single total order, the FuzzyLog allows the storage and update of partially ordered histories. This relaxation of ordering contraints enables richer application semantics around consistency guarentees, data partitioning and log-playback, while retaining the ease-of-programming of the shared-log model.
http://fuzzylog.cs.yale.edu/

Post has attachment
Recent topic around deterministic database systems are coming from him:

http://www.jmfaleiro.com/

Post has attachment
Providing Streaming Joins as a Service at Facebook. Gabriela Jacques-Silva et al. VLDB2018.

A unique research addressing how to deploy (replace or update) stream processing queries without causing much delay and how to reduce stream join memory consumption.

PQL (Puma Query Language) infers types using the expressions and functions used in queries. So users don't need to specify types in describing schemas.

https://research.fb.com/publications/providing-streaming-joins-as-a-service-at-facebook/

Post has attachment
F1 Query: Declarative Querying at Scale.
http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf

F1 is a federated SQL-based DBMS engine for accessing various types of storage engines in Google. Similarly to modern query engines (e.g., Presto, Treasure Data, Snowflakes, etc.) it decouples the storage system from the query engine, and provides SQL based interface. OLTP is supported for write-oriented storage like Google Mesa, and OLAP is supported for Big-Query or Spanner backend.

- The query processing kernel is row-oriented, and not fully optimized for vectorized processing. Caching is not yet available, so its performance highly depends on Google's storage subsystems.
- SQL logical query plan optimizer is written in Python for maintainability, and C++ code is generated for the query execution.
- Standard optimizations are available, including filter pushdown, constant folding, attribute pruning, constraint propagation, sort elimination, common sub-plan deduplication, materialized view rewrite), etc.
- Dynamic query planning (e.g., range partitioning, smooth scan to make the query performance more predictable by gradually applying disk spills, etc.)
- Allow switching interactive (single/distributed) or batch (with MapReduce) mode query executions
- ProtocolBuffers-based UDF server is implemented. This supports bi-directional row-batch transfer.
- 450,000 interactive queries/sec, 55,000 batch queries (8~16PB) /day
- 90 percentile latency: 50ms (interactive - centralized), 200ms (interactive distributed). 500 seconds (mean for batch queries)


Post has attachment
In memory DBMS researcher at Microsoft is now working at AWS Aurora team. Interesting

Post has attachment
Drizzle: Fast and Adaptable Stream Processing at Scale. Shivaram Venkataraman, et al. SOSP 17.

Optimizing scheduling of stream processing to avoid the overhead of coordination of between tasks. It showed 2x-3x lower latency than Spark (2.0.0).

We need to revisit this results by using the latest version of Spark (2.3.x), which has implemented the structured streaming for low-latency queries.
http://shivaram.org/publications/drizzle-sosp17.pdf

Post has attachment
Sundial: Harmonizing Concurrency Control and Caching
in a Distributed OLTP Database Management System
http://www.vldb.org/pvldb/vol11/p1289-yu.pdf

Post has attachment
Wait while more posts are being loaded