Class ArrayDigest


  • public class ArrayDigest
    extends AbstractTDigest
    Array based implementation of a TDigest.
    This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.
    • Field Detail

      • pageSize

        private final int pageSize
      • totalWeight

        private long totalWeight
      • centroidCount

        private int centroidCount
      • compression

        private double compression
    • Constructor Detail

      • ArrayDigest

        public ArrayDigest​(int pageSize,
                           double compression)
    • Method Detail

      • add

        public void add​(double x,
                        int w)
        Description copied from class: TDigest
        Adds a sample to a histogram.
        Specified by:
        add in class TDigest
        Parameters:
        x - The value to add.
        w - The weight of this point.
      • headCount

        private int headCount​(ArrayDigest.Index limit)
        Returns the number of centroids strictly before the limit.
      • compress

        public void compress()
        Description copied from class: TDigest
        Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.
        The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.
        This is a destructive operation that is not thread-safe.
        Specified by:
        compress in class TDigest
      • size

        public long size()
        Description copied from class: TDigest
        Returns the number of points that have been added to this TDigest.
        Specified by:
        size in class TDigest
        Returns:
        The sum of the weights on all centroids.
      • cdf

        public double cdf​(double x)
        Description copied from class: TDigest
        Returns the fraction of all points added which are <= x.
        Specified by:
        cdf in class TDigest
      • quantile

        public double quantile​(double q)
        Description copied from class: TDigest
        Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
        Specified by:
        quantile in class TDigest
        Parameters:
        q - The desired fraction
        Returns:
        The value x such that cdf(x) == q
      • centroidCount

        public int centroidCount()
        Description copied from class: TDigest
        The number of centroids currently in the TDigest.
        Specified by:
        centroidCount in class TDigest
        Returns:
        The number of centroids
      • centroids

        public java.lang.Iterable<? extends Centroid> centroids()
        Description copied from class: TDigest
        An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
        Specified by:
        centroids in class TDigest
        Returns:
        The centroids in the form of an Iterable.
      • floor

        public ArrayDigest.Index floor​(double x)
        Returns a cursor pointing to the first element <= x. Exposed only for testing.
        Parameters:
        x - The value used to find the cursor.
        Returns:
        The cursor.
      • allBefore

        public java.util.Iterator<ArrayDigest.Index> allBefore​(double x)
        Returns an iterator which will give each element <= to x in non-increasing order.
        Parameters:
        x - The upper bound of all returned elements
        Returns:
        An iterator that returns elements in non-increasing order.
      • compression

        public double compression()
        Description copied from class: TDigest
        Returns the current compression factor.
        Specified by:
        compression in class TDigest
        Returns:
        The compression factor originally used to set up the TDigest.
      • byteSize

        public int byteSize()
        Returns an upper bound on the number bytes that will be required to represent this histogram.
        Specified by:
        byteSize in class TDigest
        Returns:
        The number of bytes required.
      • smallByteSize

        public int smallByteSize()
        Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
        Specified by:
        smallByteSize in class TDigest
        Returns:
        The number of bytes required.
      • asBytes

        public void asBytes​(java.nio.ByteBuffer buf)
        Outputs a histogram as bytes using a particularly cheesy encoding.
        Specified by:
        asBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • asSmallBytes

        public void asSmallBytes​(java.nio.ByteBuffer buf)
        Description copied from class: TDigest
        Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
        Specified by:
        asSmallBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • fromBytes

        public static ArrayDigest fromBytes​(java.nio.ByteBuffer buf)
        Reads a histogram from a byte buffer
        Returns:
        The new histogram structure
      • history

        private java.util.List<java.lang.Double> history​(ArrayDigest.Index index)
      • addRaw

        void addRaw​(double x,
                    int w)
      • addRaw

        void addRaw​(double x,
                    int w,
                    java.util.List<java.lang.Double> history)
      • iterator

        private java.util.Iterator<ArrayDigest.Index> iterator​(int startPage,
                                                               int startSubPage)
      • reverse

        private java.util.Iterator<ArrayDigest.Index> reverse​(int startPage,
                                                              int startSubPage)