The Sailing Programmer

animini – A javascript micro-library for tween animations

Posted on January 29, 2012 by Roy Sharon

animini was built with a single purpose in mind: to be the most convenient to use tween animation library, while keeping it lightweight.

I have managed to keep it below 5K, with no external dependencies. As for being convenient, I believe it is easier to use than any other tween animation library I came across. But check it out for yourself.

Posted in Javascript, Uncategorized | 5 Comments

Batch Rename One-Liner

Posted on May 8, 2011 by Roy Sharon

Following is a simple way to perform batch renaming, using the Command line:

ls *.jpg | awk '{print("mv "$1" "$1)}' | sed 's/jpg/jpeg/' | /bin/sh

If you wish to see the renaming commands before they are performed (just to make sure it does what you want it to do), then simply omit the last pipe:

ls *.jpg | awk '{print("mv "$1" "$1)}' | sed 's/jpg/jpeg/'

This will show you what it is going to do, without actually doing it.

By the way, the same technique can be used for other purposes. For example the following example copies the files foo.jpg, bar.jpg and glu.jpg to subdirectory temp:

echo foo,bar,glu | perl -pi -e 's/,/\n/g;s/.+/cp $&.jpg temp\//g' | /bin/sh

Posted in Unix Tidbits | Leave a comment

Fast and Secure Remote File Transfer Using The Command Line

Posted on March 1, 2011 by Roy Sharon

One of the fastest ways to transfer files between remote machines is using SSH:

tar -cz *.jpg | ssh <username@host> "tar -xzC <path-to-expand>"

This will create a single compressed archive of all jpg files in the current directory, transfer them securely to the specified host (logging in with the specified username), and then expand the archive and save the files to the specified path. Cool, isn’t it?

From my experiments this is faster than using scp, sftp and rsync. Note, though, that if you make further changes in your local directory after the initial copying to the remote machine, and you wish to sync those changes to the remote directory, then it is faster to use rsync. But the initial copying is faster using SSH.

Posted in Unix Tidbits | 2 Comments

Unicode Numbers In Javascript

Posted on December 1, 2010 by Roy Sharon

Handling Non-ASCII Numerals In Javascript — The Way It Should Have Been Handled

Source code available at uninums on github.

A couple of weeks ago, I ranted about the lack of proper Unicode support in Javascript. Granted, Javascript supports Unicode strings, but if you want to parse such strings to numbers (e.g., the user enters a phone number using Chinese numerals), you will have to handle this yourself. So here is a small utility script that implements five methods for handling non-ASCII numerals in Javascript:

Function	Description
normalDigits(s)	Normalizes string s by replacing all non-ASCII digits with ASCII digits. normalDigits(‘٠۴६’) == ‘046’ normalDigits(‘123’) == ‘123’
normalSpaces(s)	Normalizes string s by replacing all whitespace characters with either a space (‘\x20’) or a newline (‘\n’) as appropriate: normalSpaces(‘Hello\t\rWorld’) == ‘Hello\x20\nWorld’ normalSpaces(‘\xA0\u2003’) == ‘\x20\x20’ normalSpaces(‘\u2028) == ‘\n’ As a special case, normalSpaces() also replaces CRLF to a single newline character. So normalSpaces(‘\r\n’) == ‘\n’.
parseUniInt(s,r)	Returns the integer value at the start of string s, ignoring leading spaces and using radix r. This is equivalent to the behavior of Javascript’s internal parseInt() function, but also handles non-ASCII digits: parseUniInt(‘٠۴६’, 10) == parseInt(‘046’, 10) == 46 parseUniInt(‘٠۴६’) == parseInt(‘046’) == 38 // assumes radix=8 due to leading zero parseUniInt(‘٠۴६hello’) == parseInt(‘046hello’) == 38 parseUniInt(‘hello’) == parseInt(‘hello’) == NaN
parseUniFloat(s)	Returns the float value at the start of string s, ignoring leading spaces. This is equivalent to the behavior of Javascript’s internal parseFloat() function, but also handles non-ASCII digits: parseUniFloat(‘٠۴.६’) == parseFloat(‘04.6’) == 4.6 parseUniFloat(‘٠۴.६hello’) == parseFloat(‘04.6hello’) == 4.6 parseUniFloat(‘hello’) == parseFloat(‘hello’) == NaN
sortNumeric(a)	Sorts array a according to the numeric float values of its items: sortNumeric([‘3 dogs’,’10 cats’,’2 mice’]) == [‘2 mice’,’3 dogs’,’10 cats’] sortNumeric([‘٣ dogs’,’١٠ cats’,’٢ mice’]) == [‘٢ mice’,’٣ dogs’,’١٠ cats’] Note that using Javascript’s internal sort() function will order ’10 cats’ before ‘2 mice’ because it is string based rather than numeric.

All of these functions are available in the uninums.js file. You are welcome to use/modify/redistribute it as you see fit.

Let’s Start With The Space Normalization Function

The Javascript Standard published by ECMA states that all of the following Unicode characters should be treated as whitespace:

Code Unit Value	Name
\u0009	Tab
\u000B	Vertical Tab (‘\v’)
\u000C	Form Feed (‘\f’)
\u0020	Space (‘ ‘)
\u00A0	No-break space
\uFEFF	Byte Order Mark
Other category “Zs”	Any other Unicode “space separator”

In version 5.2 of the Unicode Standard, the “Zs” category adds the following characters:

Code Unit Value	Name
\u1680	Ogham Space Mark
\u180E	Mongolian Vowel Separator
\u2000	En Quad
\u2001	Em Quad
\u2002	En Space
\u2003	Em Space
\u2004	Three-per-em space
\u2005	Four-per-em space
\u2006	Six-per-em space
\u2007	Figure space
\u2008	Punctuation space
\u2009	Thin space
\u200A	Hair space
\u202F	Narrow no-break space
\u205F	Medium mathematical space
\u3000	Ideographic space

So the normalSpace() function should basically replace all occurrences of one of these characters with a simple space (‘\x20’):

function normalSpaces(s) {
   var Zs_and_friends = new RegExp('[\u0009\u000B\u000C\u00A0\u1680\u180E' +
      '\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A' +
      '\u202F\u205F\u3000\uFEFF]', 'g');

   return s ? s.toString().replace(Zs_and_friends, ' ') : s;
}

We would also like to replace line terminators with newline characters. The Javascript Standard says that all of the following should be treated as line terminators:

Code Unit Value	Name
\u000A	Line Feed (‘\n’)
\u000D	Carriage Return (‘\r’)
\u2028	Line separator
\u2029	Paragraph separator

It also says that a CRLF sequence should be treated as a single line terminator.

We want to normalize line terminators as well:

function normalSpaces(s) {
   var Zs_and_friends = new RegExp('[\u0009\u000B\u000C\u00A0\u1680\u180E' +
      '\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A' +
      '\u202F\u205F\u3000\uFEFF]', 'g');

   var line_terminators = new RegExp('\u000D\u000A|[\u000D\u2028\u2029]', 'g');

   return s ? s.toString().replace(Zs_and_friends,' ').replace(line_terminators,'\n') : s;
}

Implementing The Digit Normalization Function

The normalDigits() function is implemented in a similar manner to the normalSpaces() function, with one difference — it uses 10 regular expressions, one for each digit:

function normalDigits(s) {
   if (!s) return s;
   s = s.toString();
   for (var i = 0; i <= 9; ++i) s = s.replace(Nd[i], i);
   return s;
}

The Nd variable is an array which contains 10 elements, each of which is a regular expression matching all Unicode characters that represent the same decimal digit. (I will not list them here as they are long, but you can find their definition in uninums.js.) All in all, they amount to 411 characters.

Implementing The Parse Functions

Having implemented the normalization functions, most of the hard work has already been done. We can now easily implement the parseUniInt() and parseUniFloat() functions:

function parseUniInt(s, radix) {
   return parseInt(s && typeof(s) != 'number' ? normalDigits(normalSpaces(s.toString())) : s, radix);
}

function parseUniFloat(s) {
   return parseFloat(s && typeof(s) != 'number' ? normalDigits(normalSpaces(s.toString())) : s);
}

Note that if s is not already a number, we should first convert it to a string using the toString() function, then normalize it with our normalization functions, and finally pass the resulting string to the standard parseInt() or parseFloat().

It is clear why we normalize the digits, but why normalize the spaces as well? Well, according to the Javascript Standard, parseInt() and parseFloat() should strip leading spaces before parsing commences, so we’re normalizing them just in case the Javascript engine does not understand non-ASCII spaces.

Implementing The Sort Function

Using the parseUniInt() function, it is very easy to implement the sortNumeric() function:

function sortNumeric(array) {
   return array.sort(function(a,b) {
      var va = parseUniFloat(a), vb = parseUniFloat(b);
      return isNaN(va) ? -1 : isNaN(vb) ? 1 : va < vb ? -1 : va == vb ? 0 : 1;
   });
}

Javascript’s sort() function can receive an argument which should be a comparator function. This comparator function receives two arguments, a and b, and should return 1 if a is bigger than b, (-1) if a is smaller than b, and 0 if a equals b. So we simply implement such a function using our parseUniFloat() function to get the float value of the string arguments.

Conclusion

The utility functions included in uninums.js are useful for developing internationalized web applications. However, they are not as fast as they would have been had they been implemented inside the Javascript engine. As I have written before, the Javascript Standard is gravely lacking in its required support for Unicode. I do hope that future versions of the Javascript Standard fix this. In the meanwhile, we have to resort to other means, such as uninums.js. I hope you find this useful for your applications.

Have you ever developed an international web application and dealt with these challenges?
I would love to hear about your experiences.

Posted in Javascript | Tagged Javascript, tip, Unicode | 8 Comments

Javascript & Unicode: The Unconsummated Marriage

Posted on November 15, 2010 by Roy Sharon

Why We Need Better Unicode Support for Javascript, And What Can Be Done About It

One of the smartest things ECMA did was requiring Javascript engines to use Unicode strings. This has important implications beyond the simple ability to represent any character in any language. It actually enables important new functionality, which I will explain shortly.

But first, let us meet the participants in this marriage.

The Groom: Unicode

The Unicode Standard comes with ample dowry: some 246,877 characters, covering 90 world scripts (as of version 5.2). This is an impressive database, and the Unicode Consortium has even supplied it with several indexing options. One of these indices is the General Category of each character, which tells us what type of character it is: Uppercase Letter (Lu), Lowercase Letter (Ll), Decimal Number (Nd), Punctuation, or any of the additional 26 categories.

Unicode also has a hidden treasure — the Numeric Value Property:

Codepoint	Character	Name	Numeric Value Property
U+0031	1	DIGIT ONE	1
U+0032	2	DIGIT TWO	2
U+0033	3	DIGIT THREE	3
…
U+0661	?	ARABIC-INDIC DIGIT ONE	1
U+0662	?	ARABIC-INDIC DIGIT TWO	2
U+0663	?	ARABIC-INDIC DIGIT THREE	3
…
U+2155	?	VULGAR FRACTION ONE FIFTH	0.2
U+2156	?	VULGAR FRACTION TWO FIFTHS	0.4
U+2157	?	VULGAR FRACTION THREE FIFTHS	0.6
U+2158	?	VULGAR FRACTION FOUR FIFTHS	0.8

Although not all characters have a numeric value property, all numerals, number letters (e.g., Roman numerals, such as VII or IX), ideographic numbers and others are associated with a numeric value.

The implication is that any program wishing to support numerals in non-Latin scripts can easily do so by simply using the numeric value property from the Unicode table. This can be applied to:

numerically sorting arrays
identifying digits that are to be dialed by a mobile application
validating and interpreting user input
a variety of additional practical uses

Kudos to the guys at the Unicode Consortium! This is really excellent work.

The Bride: Javascript

Having endured my fair share of suffering as a result of handling international languages in C++ and other golden oldie programming languages and OSs (does wchar_t ring a bell, anyone?), I truly appreciate the fact that Javascript uses Unicode for its internal string representation.

However, this is as far as this bride is willing to go. If you actually try to use Javascript’s Unicode capabilities in real applications, you will find yourself banging your head against the wall with every new step you take.

For example, have a look at the following input validation function for an “Age” field:

function validateAge(value) {
   return /^\d+$/.test(value);
}

The regular expression used by this function ensures that all characters of the supplied value are digits. This is achieved by ensuring that the entire string matches \d+, which in turn matches one or more digits. (The ^ sign and the $ sign are anchored to the start and end of the tested string respectively.) This is fine and dandy when dealing with ASCII digits. But what happens when the user inserts Indic or Arabic digits? Will it still work?

The answer is, if you test this validation function with Arabic/Indic digits on any major browser, the validation will fail. The validation function will reject the input, although it is of course perfectly valid. The ECMAScript Standard explains why this happens (section 15.10.2.12):
“The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the characters 0 through 9 inclusive.”

In plain English, \d only matches ASCII digits, which means that \d is equivalent to [0-9]. We could have just written:

function validateAge(value) {
   return /^[0-9]+$/.test(value);
}

To include the Arabic/Indic numerals, we also need to add the U+0660 – U+0669 range, as follows:

function validateAge(value) {
   return /^[0-9\u0660-\u0669]+$/.test(value);
}

Javascript’s (Lack of) Support of Unicode Regex

Although this might be a good solution for Arabic, what if we want to support other languages and scripts that have their own numerals, such as Bengali, Thai, Lao, Tibetan and Myanmar? It would make sense to use the Unicode General Category [“Number, Decimal Digit”] (“Nd” for short) mentioned above. This category includes 411 characters that are all different world script numerals. And being Unicode compliant, one would expect the Javascript regular expression \d to actually suit all of the characters in the Nd category.
Unfortunately, this is not the case. It turns out that Javascript’s regular expressions are only halfway Level 1 compliant with the Unicode Regular Expression Standard. Level 1 simply means that the regular expression engine can deal with Unicode characters and match these characters based on their hexadecimal values (implemented in Javascript via the \u escape sequence).

So let me spell it out: Level 1 Support does not actually provide a great deal of support for handling international scripts. In the [Standard’s language] it goes like this:
“Level 1 is the minimally useful level of support for Unicode. All regex implementations dealing with Unicode should be at least Level 1.”
But Javascript is not even a Level 1 conformant, as the Standard explicitly also requires the handling of character classes based on the character’s General Category. Had it met this requirement, Javascript would probably allow something like [:Nd:] or \p{Nd} to match decimal numerals. Then we could write our validation function as follows:

function validateAge(value) {
   return /^[:Nd:]+$/.test(value);
}

Unicode-Style parseInt() and parseFloat()

What about parseInt()? This Javascript function basically takes a string argument and converts it into an integer. For example, if we want to calculate the year of birth based on the Age field, we could implement something like this:

function getBirthYear(age) {
   return new Date().getYear() - parseInt(age);
}

We can call this while supplying the Age field content:

getBirthYear(ageField.value)

Assuming ageField is an INPUT field, we would achieve its value as a string, and getBirthYear() would convert it into an integer, using the parseInt() global function. However, this does not work when the user enters the age using non-Latin numerals.

Theoretically, by using the numeric value property supplied by the Unicode Standard, it should be possible to create a parseUniInt() function – a sibling of the standard parseInt() that also handles non-Latin numerals. The same goes for parseFloat(). It would be extremely convenient to achieve the numeric value of a vulgar fraction that happens to be represented by a Unicode character:

Codepoint	Character	Name	Numeric Value Property
U+2155	?	VULGAR FRACTION ONE FIFTH	0.2
U+2156	?	VULGAR FRACTION TWO FIFTHS	0.4
U+2157	?	VULGAR FRACTION THREE FIFTHS	0.6
U+2158	?	VULGAR FRACTION FOUR FIFTHS	0.8

Unfortunately, Javascript does not actually support this.

Sorting The Unicode Way

Another useful feature of the Numeric Value Unicode property is the ability to sort arrays in numerical order, instead of textual order. To illustrate the problem, let us consider the following example:

var melting = ["2300ºF Maganese", "1946ºF Gold", "786ºF Zinc", "450ºF Tin"];
metling.sort();
alert(melting.join(',')); // displays: 1946ºF Gold,2300ºF Maganese,450ºF Tin,786ºF Zinc

Note that Javascript performs textual sorting by default. This means that “450ºF Tin” is placed after “2300ºF Maganese”, and “786ºF Zinc” is placed last. In order to sort by numeric value, we need to supply our own comparison function, which should return 1, 0, or -1, according to the relative order between its arguments a and b:

metling.sort(function(a, b) {
   var i = parseInt(a), j = parseInt(b);
   return i > j ? 1 : i == j ? 0 : -1;
});
alert(melting.join(',')); // displays: 450ºF Tin,786ºF Zinc,1946ºF Gold,2300ºF Maganese

It is easy to do this with numbers written with ASCII digits, but what about Thai numerals? Again, if the Javascript engine were to supply a parseUniInt() function, this would be a piece of cake:

var melting = ["????º? ??????????", "????º? ???", "???º? ???????", "???º? ?????"];
metling.sort(function(a, b) {
   var i = parseUniInt(a), j = parseUniInt(b);
   return i > j ? 1 : i == j ? 0 : -1;
});
alert(melting.join(',')); // displays: ???º? ?????,???º? ???????,????º? ???,????º? ??????????

An Unanswered Call From The Standard Committee

Note that Javascript’s regular expression syntax is [governed by the ECMA Standard]. It therefore seems that ECMA is the party responsible for not going all the way in Javascript’s marriage to Unicode. I sincerely hope this will be corrected in future versions of the ECMAScript Standard.

That being said, it seems that the Javascript engine implementers are also at fault. Section 2 of the Standard explicitly states the following:

“A conforming implementation of ECMAScript is permitted to support program and regular expression syntax not described in this specification.”

This is an open call from the Standard Committee to the Javascript engine implementers. I wonder why this call was never answered. It might be because the people who implement these engines are not the people who later use them to build international web applications.

As both a Javascript developer and a user, I would like to say that we need full Level 1 support for the Javascript regex engine. We need Unicode-aware parseInt() and parseFloat() in Javascript. These will enable application developers to make their apps useful for international users — users who are rapidly becoming the majority of the web audience. We need to make their experience as local and convenient as that of English-speakers.

What about you? Have you ever developed an international web application and dealt with these challenges? Please share your experiences.

Posted in Javascript | Tagged Javascript, rant | Leave a comment

Scrolling The Selection Into View

Posted on September 19, 2010 by Roy Sharon

Scrolling Text Nodes And Ranges Into View In A Cross-Browser Manner

Scrolling an HTML element into view can easily be achieved using the scrollIntoView() function. But what about text nodes, ranges and selections?

var o = document.getElementById('foo');
o.scrollIntoView();

The scrollIntoView() function is required by the W3C CSSOM View specification, and is widely supported on all modern browsers. However, the W3C specification says nothing about the scrolling of non-elements, such as text nodes.

var div = document.createElement('DIV');
div.innerHtml = 'foo';
var o = div.firstChild;		// o is now a TextNode
alert(o.nodeValue);		// displays "foo"
o.scrollIntoView();		// fails, because TextNodes do not support it

If we want to scroll the content of the div into view, we can call div.scrollIntoView(), which should work just fine. However, if we make this call and the div’s content is longer than the window’s height, the window scroll will align the top of the div with the top of the window, but will cause the bottom of the div to be hidden.

It seems that there is no easy way to scroll a TextNode into view. There is no way of finding out the actual position of a TextNode (no offsetTop/offsetLeft properties, and no getBoundingClientRect() method), so we have no way of knowing if it is visible to the user or not.

Solution

I have played with this a bit and come up with a relatively simple method for achieving reasonable results. It doesn’t always work, but in most cases in does do the job:

function scrollIntoView(t) {
   if (typeof(t) != 'object') return;

   // if t is not an element node, we need to skip back until we find the
   // previous element with which we can call scrollIntoView()
   o = t;
   while (o && o.nodeType != 1) o = o.previousSibling;
   t = o || t.parentNode;
   if (t) t.scrollIntoView();
}

The nodeType property tells us whether the supplied node is an element node (nodeType=1) or a different type of node (typically a text node, with nodeType=3). In the latter case, we try to move backwards through the document hierarchy in order to find the closest element node. When we find this element, we can use it to call scrollIntoView().

Scrolling The Selection Into View

The Selection object is composed of a collection of Range objects. A Range object may span over several sibling nodes. To be able to scroll the selection into view, we need to get hold of one of its ranges (typically the first one) and collapse it to its starting point, so as to achieve a single node. We then use the same technique to scroll this node into view:

function scrollIntoView(t) {
   if (typeof(t) != 'object') return;

   if (t.getRangeAt) {
      // we have a Selection object
      if (t.rangeCount == 0) return;
      t = t.getRangeAt(0);
   }

   if (t.cloneRange) {
      // we have a Range object
      var r = t.cloneRange();	// do not modify the source range
      r.collapse(true);		// collapse to start
      var t = r.startContainer;
      // if start is an element, then startOffset is the child number
      // in which the range starts
      if (t.nodeType == 1) t = t.childNodes[r.startOffset];
   }

   // if t is not an element node, then we need to skip back until we find the
   // previous element with which we can call scrollIntoView()
   o = t;
   while (o && o.nodeType != 1) o = o.previousSibling;
   t = o || t.parentNode;
   if (t) t.scrollIntoView();
}

You can call it with a Selection object, a Range object, an Element or a TextNode, and it will usually do the job and scroll the given item into view:

scrollIntoView(window.getSelection());					// Selection
scrollIntoView(window.getSelection().getRangeAt(0));			// Range
scrollIntoView(document.getElementById('foo'));				// Element
scrollIntoView(document.getElementsByTagName('SPAN')[0].firstChild);	// TextNode

As I mentioned earlier, this usually works – but not always. I have found that it doesn’t work correctly when the TextNode is very long and not split up by any elements (even not BRs). However, this is a relatively rare case. If you use even some formatting in your content, you will probably do just fine.

You are welcome to download the scrollIntoView.js file with the complete implementation. Note that it includes one feature not mentioned here — the handling of BR elements. If the selection is a BR element, then in some browsers, the scrollIntoView() implementation listed above will scroll past the BR . I have added special handling for such cases in the final function in this file.

Posted in Javascript | Tagged Javascript, tip | 5 Comments

Unix Tidbit: Most Requested Web Resources

Posted on August 15, 2010 by Roy Sharon

This is a quick and dirty way to find the most requested static resources of your site (e.g., images or javascript files), using Unix shell commands:

grep -oE "GET /[^ \t]+\.(gif|png|js|css)" /var/log/httpd/access_log \
    | sort | uniq -c | sort -nr

Use the correct path to your system’s HTTP server access log.

You can of course also add or replace extensions, to your heart’s content.

Posted in Unix Tidbits | Tagged Unix | Leave a comment

Unix Tidbit: Color Coded Directory Listing

Posted on July 30, 2010 by Roy Sharon

Each time I move onto a new Unix machine (I recently bought a new iMac!), I change the look of the terminal directory listings, from this:

to this:

It’s quite simple — I edit one of the bash config files (any of the files will do, but I usually choose /etc/bashrc), and add the following line:

alias ls="ls -aGF"

That’s it!

Posted in Unix Tidbits | Tagged Unix | Leave a comment

animini – A javascript micro-library for tween animations

Batch Rename One-Liner

Fast and Secure Remote File Transfer Using The Command Line

Unicode Numbers In Javascript

Let’s Start With The Space Normalization Function

Implementing The Digit Normalization Function

Implementing The Parse Functions

Implementing The Sort Function

Conclusion

Javascript & Unicode: The Unconsummated Marriage

The Groom: Unicode

The Bride: Javascript

Javascript’s (Lack of) Support of Unicode Regex

Unicode-Style parseInt() and parseFloat()

Sorting The Unicode Way

An Unanswered Call From The Standard Committee

Scrolling The Selection Into View

Solution

Scrolling The Selection Into View

Unix Tidbit: Most Requested Web Resources

Unix Tidbit: Color Coded Directory Listing

Roy Sharon

Recent Posts

Archives

Categories

Meta