com.vgrs.xcode.common.unicodedata
Class UnicodeData

java.lang.Object
  extended by com.vgrs.xcode.common.unicodedata.UnicodeData

public class UnicodeData
extends java.lang.Object

This class will read and store all the relevant Unicode data files into memory. It also implements the algorithm defined in IDNA2008 Tables document to assign a Unicode category for each code point. This class has all the data needed to implement the IDNA 2008 protocol.

Version:
1.0 May 29, 2010, 1.1 June 10, 2010

Cleaned up the code to use GNU Trove collections, refactored the common code and comments to all the variables and methods.

Author:
hjarada, nchigurupati

Constructor Summary
UnicodeData()
           
 
Method Summary
static void assertNoDisallowedOrUnassignedCodePoints(int[] aCodePoints)
          Assert that none of the code points are either UNASSIGNED or DISALLOWED.
static void assertNormalized(int[] aCodePoints)
          Assert that the given set of code points are in NFC normalized form.
static java.lang.String getBidiClass(int aCodePoint)
           
static TIntObjectMap<java.lang.String> getBidiClassTable()
          Returns the Bidi Class Table
static int getCanonicalClass(int aCodePoint)
          The canonical combining class value of the code point or 0 if not found
static TIntIntMap getCanonicalClassTable()
          Returns the canonicalClassTable
static int getCanonicalCombiningClass(int aCodePoint)
          Returns the canonical combining class value for the given code point
static UnicodeCodePointCategory getCodePointDerivedProperty(int aCodePoint)
           
static TIntSet getCombiningMarkTable()
          Returns the combiningMark table
static TIntSet getCompatibilityTable()
          Returns the compatibilityTable
static TLongIntMap getComposeTable()
          Returns the composeTable
static TIntSet getContextualCodePointsTable()
          Returns the contextualCodePoints
static TIntObjectMap<int[]> getDecomposeTable()
          Returns the decomposeTable
static TIntCharMap getDerivedJoiningTypeTable()
          Returns the derivedJoiningTypeTable
static java.lang.String getGeneralCategory(int aCodePoint)
           
static TIntObjectMap<java.lang.String> getGeneralCategoryTable()
          Returns the General Category table
static char getJoiningType(int aCodePoint)
           
static java.lang.String getScript(int aCodePoint)
           
static TIntObjectMap<java.lang.String> getScriptsTable()
          Returns the scriptsTable
static boolean hasContextualCodePoints(int[] aCodePoints)
          Checks to see if there are any contextual (CONTEXTO/CONTEXTJ) code points present
static void init()
          Initialize all the data structures in this class with data read and parsed from Unicode data files.
static boolean isCombiningMark(int aCodePoint)
           
static boolean isDisallowedOrUnassignedCodePoint(int aCodePoint)
           
static int isNormalizationNeeded(int[] aCodePoints)
          This method performs a quick check to see if the given code points can be normalized or not.
static boolean isNormalized(int[] aCodePoints)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UnicodeData

public UnicodeData()
Method Detail

init

public static void init()
                 throws XcodeException
Initialize all the data structures in this class with data read and parsed from Unicode data files. This method can be invoked explicitly on startup to ensure that there is no delay at runtime when any of the IDNSDK classes are invoked to perform IDNA processing.

Throws:
XcodeException

isCombiningMark

public static boolean isCombiningMark(int aCodePoint)
Parameters:
aCodePoint -
Returns:
boolean specifying if the given code point is a combining mark.

getScript

public static java.lang.String getScript(int aCodePoint)
Parameters:
aCodePoint -
Returns:
the script for the given code point, or value of "Unknown" if not found

getBidiClass

public static java.lang.String getBidiClass(int aCodePoint)
Parameters:
aCodePoint -
Returns:
the BIDI class for the given code point, or "L" if not found

getJoiningType

public static char getJoiningType(int aCodePoint)
Parameters:
aCodePoint -
Returns:
the joining type for the given code point or "U" if not found

assertNoDisallowedOrUnassignedCodePoints

public static void assertNoDisallowedOrUnassignedCodePoints(int[] aCodePoints)
                                                     throws XcodeException
Assert that none of the code points are either UNASSIGNED or DISALLOWED.

Parameters:
aCodePoints - an int[] array of Unicode code points.
Throws:
XcodeException - if any of the Unicode code points are either UNASSIGNED or DISALLOWED

isDisallowedOrUnassignedCodePoint

public static boolean isDisallowedOrUnassignedCodePoint(int aCodePoint)

isNormalized

public static boolean isNormalized(int[] aCodePoints)
Parameters:
aCodePoints - the Unicode code points
Returns:
boolean indicating if the given set of code points are in Normalization Form C (NFC [Unicode-UAX15])
Throws:
XcodeException

assertNormalized

public static void assertNormalized(int[] aCodePoints)
                             throws XcodeException
Assert that the given set of code points are in NFC normalized form.

Parameters:
aCodePoints - the Unicode code points
Throws:
XcodeException - if the given set of code points are not in Normalization Form C (NFC [Unicode-UAX15])

isNormalizationNeeded

public static int isNormalizationNeeded(int[] aCodePoints)
This method performs a quick check to see if the given code points can be normalized or not.

If the value of a code point has a derived core property of "NFC_QC_N" then the code points cannot be normalized.

If the value of a code point has a derived core property of "NFC_QC_M" then the code points have to be normalized.

Otherwise, the code points are already in NFC normalized form.

Parameters:
aCodePoints - the Unicode code points
Returns:
int indicating if the given set of code points are in Normalization Form C (NFC [Unicode-UAX15]) as follows:

0 - means given set contains code points with NFC_QC with value “N”

1 - means given set contains code points with NFC_QC with value “Y"

2 - means given set contains code points with NFC_QC with value “M"

Throws:
XcodeException

getCanonicalClass

public static int getCanonicalClass(int aCodePoint)
The canonical combining class value of the code point or 0 if not found

Parameters:
aCodePoint -
Returns:
the canonical combining class value of the code point or 0 if not found

hasContextualCodePoints

public static boolean hasContextualCodePoints(int[] aCodePoints)
Checks to see if there are any contextual (CONTEXTO/CONTEXTJ) code points present

Parameters:
aCodePoints -
Returns:
boolean indicating if there are any contextual (CONTEXTO/CONTEXTJ) code points present

getCodePointDerivedProperty

public static UnicodeCodePointCategory getCodePointDerivedProperty(int aCodePoint)

getCanonicalCombiningClass

public static final int getCanonicalCombiningClass(int aCodePoint)
Returns the canonical combining class value for the given code point

Parameters:
aCodePoint -
Returns:
the canonical combining class value

getCompatibilityTable

public static final TIntSet getCompatibilityTable()
Returns the compatibilityTable

Returns:
the compatibilityTable

getCanonicalClassTable

public static final TIntIntMap getCanonicalClassTable()
Returns the canonicalClassTable

Returns:
the canonicalClassTable

getComposeTable

public static final TLongIntMap getComposeTable()
Returns the composeTable

Returns:
the composeTable

getDecomposeTable

public static final TIntObjectMap<int[]> getDecomposeTable()
Returns the decomposeTable

Returns:
the decomposeTable

getGeneralCategory

public static final java.lang.String getGeneralCategory(int aCodePoint)
Parameters:
aCodePoint -
Returns:
the general category of the code point as specified in UnicodeData.txt

getContextualCodePointsTable

public static TIntSet getContextualCodePointsTable()
Returns the contextualCodePoints

Returns:
the contextualCodePoints

getScriptsTable

public static TIntObjectMap<java.lang.String> getScriptsTable()
Returns the scriptsTable

Returns:
the scriptsTable

getDerivedJoiningTypeTable

public static TIntCharMap getDerivedJoiningTypeTable()
Returns the derivedJoiningTypeTable

Returns:
the derivedJoiningTypeTable

getBidiClassTable

public static TIntObjectMap<java.lang.String> getBidiClassTable()
Returns the Bidi Class Table

Returns:
the Bidi Class Table

getGeneralCategoryTable

public static TIntObjectMap<java.lang.String> getGeneralCategoryTable()
Returns the General Category table

Returns:
the General Category table

getCombiningMarkTable

public static TIntSet getCombiningMarkTable()
Returns the combiningMark table

Returns:
the combiningMark table


Copyright © 2000-2010 VeriSign Inc. All Rights Reserved