public final class ByteQuadsCanonicalizer
extends java.lang.Object
BytesToNameCanonicalizer
which aims at more localized
memory access due to flattening of name quad data.
Performance improvement modest for simple JSON document data binding (maybe 3%),
but should help more for larger symbol tables, or for binary formats like Smile.
Hash area is divided into 4 sections:
hash (LSB) >> 1
hash (LSB) >> 2
int
s, where 1 - 3 ints contain 1 - 12
UTF-8 encoded bytes of name (null-padded), and last int is offset in
_names
that contains actual name Strings.Modifier and Type | Class and Description |
---|---|
private static class |
ByteQuadsCanonicalizer.TableInfo
Immutable value class used for sharing information as efficiently
as possible, by only require synchronization of reference manipulation
but not access to contents.
|
Modifier and Type | Field and Description |
---|---|
private int |
_count
Total number of Strings in the symbol table; only used for child tables.
|
private boolean |
_failOnDoS
Flag that indicates whether we should throw an exception if enough
hash collisions are detected (true); or just worked around (false).
|
private int[] |
_hashArea
Primary hash information area: consists of
2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup
structure (details of which may be tweaked depending on expected rates
of collisions). |
private boolean |
_hashShared
Flag that indicates whether underlying data structures for
the main hash area are shared or not.
|
private int |
_hashSize
Number of slots for primary entries within
_hashArea ; which is
at most 1/8 of actual size of the underlying array (4-int slots,
primary covers only half of the area; plus, additional area for longer
symbols after hash area). |
private boolean |
_intern
Whether canonical symbol Strings are to be intern()ed before added
to the table or not.
|
private int |
_longNameOffset
Offset within
_hashArea that follows main slots and contains
quads for longer names (13 bytes or longer), and points to the
first available int that may be used for appending quads of the next
long name. |
private java.lang.String[] |
_names
Array that contains
String instances matching
entries in _hashArea . |
private ByteQuadsCanonicalizer |
_parent
Reference to the root symbol table, for child tables, so
that they can merge table information back as necessary.
|
private int |
_secondaryStart
Offset within
_hashArea where secondary entries start |
private int |
_seed
Seed value we use as the base to make hash codes non-static between
different runs, but still stable for lifetime of a single symbol table
instance.
|
private int |
_spilloverEnd
Pointer to the offset within spill-over area where there is room
for more spilled over entries (if any).
|
private java.util.concurrent.atomic.AtomicReference<ByteQuadsCanonicalizer.TableInfo> |
_tableInfo
Member that is only used by the root table instance: root
passes immutable state info child instances, and children
may return new state if they add entries to the table.
|
private int |
_tertiaryShift
Constant that determines size of buckets for tertiary entries:
1 << _tertiaryShift is the size, and shift value
is also used for translating from primary offset into
tertiary bucket (shift right by 4 + _tertiaryShift ). |
private int |
_tertiaryStart
Offset within
_hashArea where tertiary entries start |
private static int |
DEFAULT_T_SIZE
Initial size of the primary hash area.
|
(package private) static int |
MAX_ENTRIES_FOR_REUSE
Let's only share reasonably sized symbol tables.
|
private static int |
MAX_T_SIZE
Let's not expand symbol tables past some maximum size;
with unique (~= random) names.
|
private static int |
MIN_HASH_SIZE
No point in trying to construct tiny tables, just need to resize soon.
|
private static int |
MULT |
private static int |
MULT2 |
private static int |
MULT3 |
Modifier | Constructor and Description |
---|---|
private |
ByteQuadsCanonicalizer(ByteQuadsCanonicalizer parent,
boolean intern,
int seed,
boolean failOnDoS,
ByteQuadsCanonicalizer.TableInfo state)
Constructor used when creating a child instance
|
private |
ByteQuadsCanonicalizer(int sz,
boolean intern,
int seed,
boolean failOnDoS)
Constructor used for creating per-
JsonFactory "root"
symbol tables: ones used for merging and sharing common symbols |
Modifier and Type | Method and Description |
---|---|
private int |
_appendLongName(int[] quads,
int qlen) |
private int |
_calcOffset(int hash) |
(package private) static int |
_calcTertiaryShift(int primarySlots) |
private boolean |
_checkNeedForRehash() |
private int |
_findOffsetForAdd(int hash)
Method called to find the location within hash table to add a new symbol in.
|
private java.lang.String |
_findSecondary(int origOffset,
int q1) |
private java.lang.String |
_findSecondary(int origOffset,
int q1,
int q2) |
private java.lang.String |
_findSecondary(int origOffset,
int hash,
int[] q,
int qlen) |
private java.lang.String |
_findSecondary(int origOffset,
int q1,
int q2,
int q3) |
protected void |
_reportTooManyCollisions() |
private int |
_resizeAndFindOffsetForAdd(int hash) |
private int |
_spilloverStart()
Helper method that calculates start of the spillover area
|
private boolean |
_verifyLongName(int[] q,
int qlen,
int spillOffset) |
private boolean |
_verifyLongName2(int[] q,
int qlen,
int spillOffset) |
private void |
_verifySharing() |
java.lang.String |
addName(java.lang.String name,
int q1) |
java.lang.String |
addName(java.lang.String name,
int[] q,
int qlen) |
java.lang.String |
addName(java.lang.String name,
int q1,
int q2) |
java.lang.String |
addName(java.lang.String name,
int q1,
int q2,
int q3) |
int |
bucketCount()
Returns number of primary slots table has currently
|
int |
calcHash(int q1) |
int |
calcHash(int[] q,
int qlen) |
int |
calcHash(int q1,
int q2) |
int |
calcHash(int q1,
int q2,
int q3) |
static ByteQuadsCanonicalizer |
createRoot()
Factory method to call to create a symbol table instance with a
randomized seed value.
|
protected static ByteQuadsCanonicalizer |
createRoot(int seed) |
java.lang.String |
findName(int q1) |
java.lang.String |
findName(int[] q,
int qlen) |
java.lang.String |
findName(int q1,
int q2) |
java.lang.String |
findName(int q1,
int q2,
int q3) |
int |
hashSeed() |
ByteQuadsCanonicalizer |
makeChild(int flags)
Factory method used to create actual symbol table instance to
use for parsing.
|
boolean |
maybeDirty()
Method called to check to quickly see if a child symbol table
may have gotten additional entries.
|
private void |
mergeChild(ByteQuadsCanonicalizer.TableInfo childState) |
private void |
nukeSymbols(boolean fill)
Helper method called to empty all shared symbols, but to leave
arrays allocated
|
int |
primaryCount()
Method mostly needed by unit tests; calculates number of
entries that are in the primary slot set.
|
private void |
rehash() |
void |
release()
Method called by the using code to indicate it is done with this instance.
|
int |
secondaryCount()
Method mostly needed by unit tests; calculates number of entries
in secondary buckets
|
int |
size() |
int |
spilloverCount()
Method mostly needed by unit tests; calculates number of entries
in shared spillover area
|
int |
tertiaryCount()
Method mostly needed by unit tests; calculates number of entries
in tertiary buckets
|
java.lang.String |
toString() |
int |
totalCount() |
private static final int DEFAULT_T_SIZE
private static final int MAX_T_SIZE
private static final int MIN_HASH_SIZE
static final int MAX_ENTRIES_FOR_REUSE
private final ByteQuadsCanonicalizer _parent
private final java.util.concurrent.atomic.AtomicReference<ByteQuadsCanonicalizer.TableInfo> _tableInfo
private final int _seed
private boolean _intern
NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
private final boolean _failOnDoS
private int[] _hashArea
2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup
structure (details of which may be tweaked depending on expected rates
of collisions).private int _hashSize
_hashArea
; which is
at most 1/8
of actual size of the underlying array (4-int slots,
primary covers only half of the area; plus, additional area for longer
symbols after hash area).private int _secondaryStart
_hashArea
where secondary entries startprivate int _tertiaryStart
_hashArea
where tertiary entries startprivate int _tertiaryShift
1 << _tertiaryShift
is the size, and shift value
is also used for translating from primary offset into
tertiary bucket (shift right by 4 + _tertiaryShift
).
Default value is 2, for buckets of 4 slots; grows bigger with bigger table sizes.
private int _count
private java.lang.String[] _names
private int _spilloverEnd
_hashArea
.private int _longNameOffset
private boolean _hashShared
This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)
private static final int MULT
private static final int MULT2
private static final int MULT3
private ByteQuadsCanonicalizer(int sz, boolean intern, int seed, boolean failOnDoS)
JsonFactory
"root"
symbol tables: ones used for merging and sharing common symbolssz
- Initial primary hash area sizeintern
- Whether Strings contained should be String.intern()
edseed
- Random seed valued used to make it more difficult to cause
collisions (used for collision-based DoS attacks).private ByteQuadsCanonicalizer(ByteQuadsCanonicalizer parent, boolean intern, int seed, boolean failOnDoS, ByteQuadsCanonicalizer.TableInfo state)
public static ByteQuadsCanonicalizer createRoot()
protected static ByteQuadsCanonicalizer createRoot(int seed)
public ByteQuadsCanonicalizer makeChild(int flags)
public void release()
private void mergeChild(ByteQuadsCanonicalizer.TableInfo childState)
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int primaryCount()
public int secondaryCount()
public int tertiaryCount()
public int spilloverCount()
public int totalCount()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String findName(int q1)
public java.lang.String findName(int q1, int q2)
public java.lang.String findName(int q1, int q2, int q3)
public java.lang.String findName(int[] q, int qlen)
private final int _calcOffset(int hash)
private java.lang.String _findSecondary(int origOffset, int q1)
private java.lang.String _findSecondary(int origOffset, int q1, int q2)
private java.lang.String _findSecondary(int origOffset, int q1, int q2, int q3)
private java.lang.String _findSecondary(int origOffset, int hash, int[] q, int qlen)
private boolean _verifyLongName(int[] q, int qlen, int spillOffset)
private boolean _verifyLongName2(int[] q, int qlen, int spillOffset)
public java.lang.String addName(java.lang.String name, int q1)
public java.lang.String addName(java.lang.String name, int q1, int q2)
public java.lang.String addName(java.lang.String name, int q1, int q2, int q3)
public java.lang.String addName(java.lang.String name, int[] q, int qlen)
private void _verifySharing()
private int _findOffsetForAdd(int hash)
private int _resizeAndFindOffsetForAdd(int hash)
private boolean _checkNeedForRehash()
private int _appendLongName(int[] quads, int qlen)
public int calcHash(int q1)
public int calcHash(int q1, int q2)
public int calcHash(int q1, int q2, int q3)
public int calcHash(int[] q, int qlen)
private void rehash()
private void nukeSymbols(boolean fill)
private final int _spilloverStart()
protected void _reportTooManyCollisions()
static int _calcTertiaryShift(int primarySlots)