JavaScript character utility CharFunk 1.1.0 released

CharFunk is a little library I wrote a few years ago to make it easier to do things with Unicode text. I revisited it recently to clean up and improve the code, and add tests and a few features. The API is pretty simple:

  • CharFunk.getDirectionality(ch) – Used to find the directionality of the character
  • CharFunk.getMatches(string,callback) – Returns an array of contiguous matching strings for which the callback returns true, similar to String.match()
  • CharFunk.isAllLettersOrDigits(string) – Returns true if the string argument is composed of all letters and digits
  • CharFunk.isDigit(ch) – Returns true if provided a length 1 string that is a digit
  • CharFunk.isLetter(ch) – Returns true if provided a length 1 string that is a letter
  • CharFunk.isLetterNumber(ch) – Returns true if provided a length 1 string that is in the Unicode “Nl” category
  • CharFunk.isLetterOrDigit(ch) – Returns true if provided a length 1 string that is a letter or a digit
  • CharFunk.isLowerCase(ch) – Returns true if provided a length 1 string that is lowercase
  • CharFunk.isMirrored(ch) – Returns true if provided a length 1 string that is a mirrored character
  • CharFunk.isUpperCase(ch) – Returns true if provided a length 1 string that is uppercase
  • CharFunk.isValidFirstForName(ch) – Returns true if provided a length 1 string that is a valid leading character for a JavaScript identifier
  • CharFunk.isValidMidForName(ch) – Returns true if provided a length 1 string that is a valid non-leading character for a ECMAScript identifier
  • CharFunk.isValidName(string,checkReserved) – Returns true if the string is a valid ECMAScript identifier
  • CharFunk.isWhitespace(ch) – Returns true if provided a length 1 string that is a whitespace character
  • CharFunk.indexOf(ch) – Returns the first index where the character causes a true return from the callback, or -1 if no match
  • CharFunk.lastIndexOf(ch) – Returns the last index where the character causes a true return from the callback, or -1 if no match
  • CharFunk.matchesAll(string,callback) – Returns true if all characters in the provided string result in a true return from the callback
  • CharFunk.replaceMatches(string,callback,ch) – Returns a new string with all matched characters replaced, similar to String.replace()
  • CharFunk.splitOnMatches(string,callback) – Splits the string on all matches, similar to String.split()

This allows you to do some things you would have a hard time doing in JavaScript otherwise. JavaScript RegExps are notoriously useless for dealing with non-ASCII data. For example, imagine you wanted to do something simple like replace all non-word characters with an underscore. This is easy:

"The United States of America".replace(/[^\w]/g,"_");
    //returns "The_United_States_of_America"

Unless of course, you are dealing with non-ASCII letters:

"Российская Федерация".replace(/[^\w]/g,"_"); 
    //returns "___________________" 
"جمهورية مصر العربية".replace(/[^\w]/g,"_"); 
   //returns "____________________"

That’s not what we want.

Fortunately, CharFunk can handle this using replaceMatches:

function notLetterOrDigit(ch) {
    return !CharFunk.isLetterOrDigit(ch);
}

CharFunk.replaceMatches("جمهورية مصر العربية",notLetterOrDigit,"_"); 
    // returns "جمهورية_مصر_العربية"

CharFunk.replaceMatches("Российская Федерация",notLetterOrDigit,"_"); 
   //returns "Российская_Федерация"

This is just one small example of what CharFunk can do. I hope that web developers working on international projects — which is pretty much any web app these days — will find this useful!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>