Skip to content

Instantly share code, notes, and snippets.

@pbailis
pbailis / list.md
Last active April 15, 2018 08:54
Quick and dirty (incomplete) list of interesting, mostly recent data warehousing/"big data" papers

A friend asked me for a few pointers to interesting, mostly recent papers on data warehousing and "big data" database systems, with an eye towards real-world deployments. I figured I'd share the list. It's biased and rather incomplete but maybe of interest to someone. While many are obvious choices (I've omitted several, like MapReduce), I think there are a few underappreciated gems.

###Dataflow Engines:

Dryad--general-purpose distributed parallel dataflow engine
http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf

Spark--in memory dataflow
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

@rednaxelafx
rednaxelafx / PrintThreadIds.java
Created February 25, 2011 10:31
find out the correspondence between the tid/nid of Java threads as shown from jstack/JMX, on HotSpot/Linux
package fx.jvm.hotspot.tools;
import java.util.List;
import sun.jvm.hotspot.tools.Tool;
public class PrintThreadIds extends Tool {
public static void main(String[] args) {
PrintThreadIds tool = new PrintThreadIds();
tool.start(args);
@rednaxelafx
rednaxelafx / JDK 6 update 23, client, Windows x86
Created February 15, 2011 05:43
The VM flags as set by HotSpot's ergonomics by default (-XX:+PrintFlagsFinal). See http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/tip/src/share/vm/runtime/globals.hpp
[Global flags]
uintx AdaptivePermSizeWeight = 20 {product}
uintx AdaptiveSizeDecrementScaleFactor = 4 {product}
uintx AdaptiveSizeMajorGCDecayTimeScale = 10 {product}
uintx AdaptiveSizePausePolicy = 0 {product}
uintx AdaptiveSizePolicyCollectionCostMargin = 50 {product}
uintx AdaptiveSizePolicyInitializingSteps = 20 {product}
uintx AdaptiveSizePolicyOutputInterval = 0 {product}
uintx AdaptiveSizePolicyWeight = 10 {product}
uintx AdaptiveSizeThroughPutPolicy = 0 {product}