Package com.tdunning.math.stats
Class ArrayDigest
- java.lang.Object
-
- com.tdunning.math.stats.TDigest
-
- com.tdunning.math.stats.AbstractTDigest
-
- com.tdunning.math.stats.ArrayDigest
-
public class ArrayDigest extends AbstractTDigest
Array based implementation of a TDigest.
This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
ArrayDigest.Index
private static class
ArrayDigest.Page
-
Field Summary
Fields Modifier and Type Field Description private int
centroidCount
private double
compression
private java.util.List<ArrayDigest.Page>
data
private int
pageSize
static int
SMALL_ARRAY_DIGEST
static int
SMALL_ENCODING
private long
totalWeight
static int
VERBOSE_ARRAY_DIGEST
static int
VERBOSE_ENCODING
-
Fields inherited from class com.tdunning.math.stats.AbstractTDigest
gen, recordAllData
-
-
Constructor Summary
Constructors Constructor Description ArrayDigest(int pageSize, double compression)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(double x, int w)
Adds a sample to a histogram.(package private) void
add(double x, int w, Centroid base)
(package private) void
addRaw(double x, int w)
(package private) void
addRaw(double x, int w, java.util.List<java.lang.Double> history)
java.util.Iterator<ArrayDigest.Index>
allAfter(double x)
java.util.Iterator<ArrayDigest.Index>
allBefore(double x)
Returns an iterator which will give each element <= to x in non-increasing order.void
asBytes(java.nio.ByteBuffer buf)
Outputs a histogram as bytes using a particularly cheesy encoding.void
asSmallBytes(java.nio.ByteBuffer buf)
Serialize this TDigest into a byte buffer.int
byteSize()
Returns an upper bound on the number bytes that will be required to represent this histogram.double
cdf(double x)
Returns the fraction of all points added which are <= x.ArrayDigest.Index
ceiling(double x)
int
centroidCount()
The number of centroids currently in the TDigest.java.lang.Iterable<? extends Centroid>
centroids()
An iterable that lets you go through the centroids in ascending order by mean.void
compress()
Re-examines a t-digest to determine whether some centroids are redundant.void
compress(GroupTree other)
double
compression()
Returns the current compression factor.int
count(ArrayDigest.Index index)
private void
delete(ArrayDigest.Index index)
ArrayDigest.Index
floor(double x)
Returns a cursor pointing to the first element <= x.static ArrayDigest
fromBytes(java.nio.ByteBuffer buf)
Reads a histogram from a byte bufferprivate int
headCount(ArrayDigest.Index limit)
Returns the number of centroids strictly before the limit.long
headSum(ArrayDigest.Index limit)
private java.util.List<java.lang.Double>
history(ArrayDigest.Index index)
private java.lang.Iterable<ArrayDigest.Index>
inclusiveTail(ArrayDigest.Index start)
ArrayDigest.Index
increment(ArrayDigest.Index x, int delta)
private java.util.Iterator<ArrayDigest.Index>
iterator(int startPage, int startSubPage)
double
mean(ArrayDigest.Index index)
double
quantile(double q)
Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.private java.util.Iterator<ArrayDigest.Index>
reverse(int startPage, int startSubPage)
long
size()
Returns the number of points that have been added to this TDigest.int
smallByteSize()
Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.-
Methods inherited from class com.tdunning.math.stats.AbstractTDigest
add, add, createCentroid, decode, encode, interpolate, isRecording, merge, quantile, recordAllData
-
Methods inherited from class com.tdunning.math.stats.TDigest
checkValue, createArrayDigest, createArrayDigest, createTreeDigest
-
-
-
-
Field Detail
-
pageSize
private final int pageSize
-
data
private java.util.List<ArrayDigest.Page> data
-
totalWeight
private long totalWeight
-
centroidCount
private int centroidCount
-
compression
private double compression
-
VERBOSE_ENCODING
public static final int VERBOSE_ENCODING
- See Also:
- Constant Field Values
-
SMALL_ENCODING
public static final int SMALL_ENCODING
- See Also:
- Constant Field Values
-
VERBOSE_ARRAY_DIGEST
public static final int VERBOSE_ARRAY_DIGEST
- See Also:
- Constant Field Values
-
SMALL_ARRAY_DIGEST
public static final int SMALL_ARRAY_DIGEST
- See Also:
- Constant Field Values
-
-
Method Detail
-
add
public void add(double x, int w)
Description copied from class:TDigest
Adds a sample to a histogram.
-
headSum
public long headSum(ArrayDigest.Index limit)
-
headCount
private int headCount(ArrayDigest.Index limit)
Returns the number of centroids strictly before the limit.
-
mean
public double mean(ArrayDigest.Index index)
-
count
public int count(ArrayDigest.Index index)
-
compress
public void compress()
Description copied from class:TDigest
Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.
The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.
This is a destructive operation that is not thread-safe.
-
compress
public void compress(GroupTree other)
- Specified by:
compress
in classAbstractTDigest
-
size
public long size()
Description copied from class:TDigest
Returns the number of points that have been added to this TDigest.
-
cdf
public double cdf(double x)
Description copied from class:TDigest
Returns the fraction of all points added which are <= x.
-
quantile
public double quantile(double q)
Description copied from class:TDigest
Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
-
centroidCount
public int centroidCount()
Description copied from class:TDigest
The number of centroids currently in the TDigest.- Specified by:
centroidCount
in classTDigest
- Returns:
- The number of centroids
-
centroids
public java.lang.Iterable<? extends Centroid> centroids()
Description copied from class:TDigest
An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
-
allAfter
public java.util.Iterator<ArrayDigest.Index> allAfter(double x)
-
floor
public ArrayDigest.Index floor(double x)
Returns a cursor pointing to the first element <= x. Exposed only for testing.- Parameters:
x
- The value used to find the cursor.- Returns:
- The cursor.
-
ceiling
public ArrayDigest.Index ceiling(double x)
-
allBefore
public java.util.Iterator<ArrayDigest.Index> allBefore(double x)
Returns an iterator which will give each element <= to x in non-increasing order.- Parameters:
x
- The upper bound of all returned elements- Returns:
- An iterator that returns elements in non-increasing order.
-
increment
public ArrayDigest.Index increment(ArrayDigest.Index x, int delta)
-
compression
public double compression()
Description copied from class:TDigest
Returns the current compression factor.- Specified by:
compression
in classTDigest
- Returns:
- The compression factor originally used to set up the TDigest.
-
byteSize
public int byteSize()
Returns an upper bound on the number bytes that will be required to represent this histogram.
-
smallByteSize
public int smallByteSize()
Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.- Specified by:
smallByteSize
in classTDigest
- Returns:
- The number of bytes required.
-
asBytes
public void asBytes(java.nio.ByteBuffer buf)
Outputs a histogram as bytes using a particularly cheesy encoding.
-
asSmallBytes
public void asSmallBytes(java.nio.ByteBuffer buf)
Description copied from class:TDigest
Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.- Specified by:
asSmallBytes
in classTDigest
- Parameters:
buf
- The byte buffer into which the TDigest should be serialized.
-
fromBytes
public static ArrayDigest fromBytes(java.nio.ByteBuffer buf)
Reads a histogram from a byte buffer- Returns:
- The new histogram structure
-
history
private java.util.List<java.lang.Double> history(ArrayDigest.Index index)
-
delete
private void delete(ArrayDigest.Index index)
-
inclusiveTail
private java.lang.Iterable<ArrayDigest.Index> inclusiveTail(ArrayDigest.Index start)
-
addRaw
void addRaw(double x, int w)
-
addRaw
void addRaw(double x, int w, java.util.List<java.lang.Double> history)
-
add
void add(double x, int w, Centroid base)
- Specified by:
add
in classAbstractTDigest
-
iterator
private java.util.Iterator<ArrayDigest.Index> iterator(int startPage, int startSubPage)
-
reverse
private java.util.Iterator<ArrayDigest.Index> reverse(int startPage, int startSubPage)
-
-