Static Public Member Functions | |
static | cleanUp ($string) |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition. | |
static | toNFC ($string) |
Convert a UTF-8 string to normal form C, canonical composition. | |
static | toNFD ($string) |
Convert a UTF-8 string to normal form D, canonical decomposition. | |
static | toNFKC ($string) |
Convert a UTF-8 string to normal form KC, compatibility composition. | |
static | toNFKD ($string) |
Convert a UTF-8 string to normal form KD, compatibility decomposition. | |
static | quickIsNFC ($string) |
Returns true if the string is _definitely_ in NFC. | |
static | quickIsNFCVerify (&$string) |
Returns true if the string is _definitely_ in NFC. | |
static | NFC ($string) |
static | NFD ($string) |
static | NFKC ($string) |
static | NFKD ($string) |
static | fastDecompose ($string, $map) |
Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us). | |
static | fastCombiningSort ($string) |
Sorts combining characters into canonical order. | |
static | fastCompose ($string) |
Produces canonically composed sequences, i.e. | |
static | placebo ($string) |
This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance. | |
Static Private Member Functions | |
static | loadData () |
Load the basic composition data if necessary. |
Currently assumes that input strings are valid UTF-8!
Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly deterimine is already normalized.
All functions can be called static.
See description of forms at http://www.unicode.org/reports/tr15/
Definition at line 63 of file UtfNormal.php.
static UtfNormal::cleanUp | ( | $ | string | ) | [static] |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().
$string | String: a UTF-8 string |
Definition at line 74 of file UtfNormal.php.
References NFC(), and quickIsNFCVerify().
Referenced by ApiResult::cleanUp_helper(), CleanUpTest::doTestBytes(), CleanUpTest::doTestDoubleBytes(), CleanUpTest::doTestTripleBytes(), Xml::elementClean(), FeedUtils::formatDiffRow(), WebRequest::getFileName(), TextPassDumper::getTextDb(), TextPassDumper::getTextSpawnedOnce(), PPFuzzTester::makeInputText(), WebRequest::normalizeUnicode(), Preprocessor_DOM::preprocessToObj(), WatchlistCleanup::processPage(), TitleCleanup::processPage(), ImageCleanup::processPage(), CleanUpTest::testAscii(), CleanUpTest::testBomRegression(), CleanUpTest::testChunkRegression(), CleanUpTest::testForbiddenRegression(), CleanUpTest::testHangulRegression(), CleanUpTest::testInterposeRegression(), CleanUpTest::testLatin(), CleanUpTest::testLatinNormal(), CleanUpTest::testNull(), CleanUpTest::testOverlongRegression(), CleanUpTest::testSurrogateRegression(), xmlsafe(), and CleanUpTest::XtestAllChars().
static UtfNormal::fastCombiningSort | ( | $ | string | ) | [static] |
Sorts combining characters into canonical order.
This is the final step in creating decomposed normal forms D and KD.
$string | String: a valid, decomposed UTF-8 string. Input is not validated. |
Definition at line 547 of file UtfNormal.php.
References $i, $n, $out, $utfCombiningClass, and loadData().
static UtfNormal::fastCompose | ( | $ | string | ) | [static] |
Produces canonically composed sequences, i.e.
normal form C or KC.
$string | String: a valid UTF-8 string in sorted normal form D or KD. Input is not validated. |
Definition at line 600 of file UtfNormal.php.
References $i, $n, $out, $utfCanonicalComp, $utfCombiningClass, and loadData().
static UtfNormal::fastDecompose | ( | $ | string, | |
$ | map | |||
) | [static] |
Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).
Input is assumed to be *valid* UTF-8. Invalid code will break.
$string | String: valid UTF-8 string | |
$map | Array: hash of expanded decomposition map |
Definition at line 487 of file UtfNormal.php.
References $i, $n, $out, $t, and loadData().
static UtfNormal::loadData | ( | ) | [static, private] |
Load the basic composition data if necessary.
Definition at line 166 of file UtfNormal.php.
References $utfCombiningClass.
Referenced by fastCombiningSort(), fastCompose(), fastDecompose(), NFD(), quickIsNFC(), and quickIsNFCVerify().
static UtfNormal::NFC | ( | $ | string | ) | [static] |
$string | string |
Definition at line 438 of file UtfNormal.php.
References fastCompose(), and NFD().
Referenced by cleanUp(), CleanUpTest::doTestDoubleBytes(), CleanUpTest::doTestTripleBytes(), toNFC(), and CleanUpTest::XtestAllChars().
static UtfNormal::NFD | ( | $ | string | ) | [static] |
$string | string |
Definition at line 447 of file UtfNormal.php.
References $utfCanonicalDecomp, fastCombiningSort(), fastDecompose(), and loadData().
static UtfNormal::NFKC | ( | $ | string | ) | [static] |
$string | string |
Definition at line 459 of file UtfNormal.php.
References fastCompose(), and NFKD().
Referenced by toNFKC().
static UtfNormal::NFKD | ( | $ | string | ) | [static] |
$string | string |
Definition at line 468 of file UtfNormal.php.
References $utfCompatibilityDecomp, fastCombiningSort(), and fastDecompose().
static UtfNormal::placebo | ( | $ | string | ) | [static] |
This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.
$string | string |
Definition at line 732 of file UtfNormal.php.
static UtfNormal::quickIsNFC | ( | $ | string | ) | [static] |
Returns true if the string is _definitely_ in NFC.
Returns false if not or uncertain.
$string | String: a valid UTF-8 string. Input is not validated. |
Definition at line 179 of file UtfNormal.php.
References $i, $n, $utfCheckNFC, $utfCombiningClass, and loadData().
Referenced by toNFC().
static UtfNormal::quickIsNFCVerify | ( | &$ | string | ) | [static] |
Returns true if the string is _definitely_ in NFC.
Returns false if not or uncertain.
$string | String: a UTF-8 string, altered on output to be valid UTF-8 safe for XML. |
Definition at line 219 of file UtfNormal.php.
References $i, $n, $utfCheckNFC, $utfCombiningClass, is(), and loadData().
Referenced by cleanUp().
static UtfNormal::toNFC | ( | $ | string | ) | [static] |
Convert a UTF-8 string to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.
$string | String: a valid UTF-8 string. Input is not validated. |
Definition at line 103 of file UtfNormal.php.
References NFC(), and quickIsNFC().
static UtfNormal::toNFD | ( | $ | string | ) | [static] |
Convert a UTF-8 string to normal form D, canonical decomposition.
Fast return for pure ASCII strings.
$string | String: a valid UTF-8 string. Input is not validated. |
Definition at line 119 of file UtfNormal.php.
References NFD().
static UtfNormal::toNFKC | ( | $ | string | ) | [static] |
Convert a UTF-8 string to normal form KC, compatibility composition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
$string | String: a valid UTF-8 string. Input is not validated. |
Definition at line 136 of file UtfNormal.php.
References NFKC().
static UtfNormal::toNFKD | ( | $ | string | ) | [static] |
Convert a UTF-8 string to normal form KD, compatibility decomposition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
$string | String: a valid UTF-8 string. Input is not validated. |
Definition at line 153 of file UtfNormal.php.
References NFKD().