The Default Namespace
Byron complains about what he calls a limitation of XPath. It’s not actually a limitation of XPath at all but rather a very common mistake people make when working with XML namespaces. Lets take a tour into the dark depths of XML namespaces to discover what’s really going on. Originally, XML didn’t have namespaces at all, every element was identified purely by it’s name. So the element <html> was known as html and the world was simple. Then people discovered that they wanted to combine XML documents and that quite often they’d wind up with two elements called html that had completely different meanings and uses – ie: they were actually two different elements. To solve this problem, the clever folk over at the W3C changed the way XML elements and attributes were named. Now instead of a simple string, elements would be named using a QName (short for Qualified Name). A QName is a compound data object consisting of an URI and the regular name we’re used to. Now the important thing to note in this distinction is that it is impossible to refer to an element only by it’s local-name, you have to use a QName to refer to it and that QName must have a namespace attribute (all QName’s do). It is however possible to assign nil to the namespace attribute of a QName, which is referred to as an element in the nil namespace. Now, one of the important things about adding namespaces to XML was backwards compatibility so there had to be some way to assign a namespace to all those elements which were previously referred to only by their local-name. This is where the default namespace comes in (it also happens to be quite convenient). By default in an XML document, any element that doesn’t have a prefix to declare which namespace it is in, is assigned the “default namespace”. The default namespace however isn’t an actual namespace, it’s just a default value for the namespace attribute. By default, the default namespace is the nil namespace. Now, in XML you can specify what the default namespace is by adding an xmlns attribute, eg: xmlns="http://www.w3.org/1999/xhtml". The new value for the default namespace is then in effect for that element and any element under it (unless it’s changed again). It is important to note here that an XML document doesn’t have a default namespace, but rather each element has a value which it inherits (attributes never ever use the default namespace). There can therefore be as many different values for the default namespace as there are elements. So if we were to add Byron’s idea of matching any element in the default namespace, it would never match anything because every element (and attribute) would have an explicit namespace. Worse still, the default namespace would change depending on which element we were in and what the specific representation of the XML was (maybe the default namespace was left as nil and every element used a prefix or maybe no elements were prefixed and the default namespace was changed all through the document). Depending on the representation of XML instead of the actual data it represents is very bad practice and will cause problems. This leads right into the other common mistake in Byron’s post: //*[name() == 'foo'] will do nothing particularly useful. What it will do is match any element with a local-name of foo, in any namespace as long as the element was represented without a prefix. It will not match <my:foo xmlns:my="http://www.intencha.com/my" /> because name() for my:foo will return my:foo. Elements are the same if they have the same QName regardless of whether or not a prefix used or if different prefixes were used. The correct way to select any element with a local-name of foo in any namespace regardless of what prefix was used is: //*[local-name() = 'foo'] If I were to take a shot at selecting any node which had a namespace-uri the same as the default namespace in effect in the original representation of the XML document’s root node it would be: //*[string(/*/namespace::*[name() = ""]) = namespace-uri()] Which is to say:
The Curse Of Testing Text
One of the major challenges in my job is testing our product. Now most people think that testing is reasonably easy but requires discipline, this is not true if the product you write happens to be a styled text editor and it's nearly impossible to do really well if you're working with something as flexibly defined as HTML.
The problems start at the unit testing level. Try taking the standard JTextPane class and writing unit tests for it. How do you test that it can render a HTML list correctly? You could write a test to make sure that the list numbering at least comes out in order and in the right format but that still won't guarantee that the list numbers actually paint correctly. For instance, we recently had a bug where the list numbers painted correctly if the list fit on screen, but if it required scrolling, whatever item was at the top of the screen was numbered 1 even if it was actually the third item in the list. Our list numbering unit tests had no way of picking up on that because it was the actual rendering code (which we didn't write) that was wrong.
Object.equals()
Andrae Muys provides some excellent advice on implementing Object.equals() however I do have to correct one thing. Andrae presents the code:
class A {
int n1;
public boolean equals(Object o) {
if (!(o instanceof A)) {
return false;
} else {
A rhs = (A)o;
return this.n1 == rhs.n1;
}
}
}
and suggests that it is incorrect. It is not. This is absolutely, 100% the correct way to implement equals for that class. The alternative he presents on the other hand is incorrect:
Riverfire
I was fortunate enough to be invited out to a friends place to see the fireworks last night. They happen to own the penthouse apartment on the river front with an awesome view of nearly all the fireworks. The fireworks are launched from numerous sites along the river so being able to see all of them is really quite unusual. Better yet though, one of the launching barges is positioned directly in front on the balcony as if it were a private show just for us. That particular barge this year had some issues launching it’s fireworks. There were about three sections where all the other barges launched fireworks in unison but “our barge” sat there doing nothing. Apparently it was unplanned because at the end our barge suddenly started firing off every firework it had missed all at once. A display that was intended to take about 10 minutes was sent up in about a minute flat. Extremely impressive! Also impressive was the dump and burn which was close enough to feel the heat hit you and felt like you could just reach out and catch the plane. I’ve never been overly impressed by fly-bys when standing at ground level at South Bank but from the penthouse it really is quite an experience. Oh, and Iain has photos.
When Marketing Goes Wrong
I’m currently wearing one of the shirts that James Gosling hurled into the crowd by various means which depicts Duke aiming a rocket launcher at a weird looking demon with four arms labeled complexity. Earlier today I was accosted by a very young man who asked what that was on my shirt. When I pointed to Duke and said “this guy’s called Duke” the response was: “he’s crap. I like this guy ‘cause he has four arms”. I’m not sure that’s the reaction the designer was after….
We Will Rock You
And they did. Went to see We Will Rock You – the musical by Queen and Ben Elton last night and it was sensational. The songs fit into the story line brilliantly and the story itself was interesting and not just an excuse for singing the songs. The constant use of song references as bad puns just added to the experience for me and it was particularly impressive to see the customization for the Australian audience. The lead bohemian was titled after the great rock singer from the past “John Farnam” and when captured he was told – “This really is the last time”. That’s the kind of joke that flowed through the night and all of them went down extremely well. The music itself was very loud and very energetic. Unless you are an absolute die-hard, noone can match Freddy Mercury type of person you’ll appreciate the vocal talent that performed the extremely difficult songs Queen put together. Definitely worth seeing.
Just When You Thought It Was Safe…
Just when you thought it was safe to turn the TV on again, Young Talent Time makes a come back. The worst part is that the Minogue sisters have promised to appear on the show – any hope that some actual talent may be found is lost…
Amazon Goodness
I have slightly obscure tastes in music – particularly, I like musicals, not the highlights CDs the full recording of the original cast. It’s certainly not the most obscure taste in music but it does lead to an awful lot of trouble tracking down what I want and worse still I know what I want ahead of time unlike most people with really obscure tastes who just stumble across things they like. Coming back to the point though, you can’t just walk into HMV or pretty much any music store that I’ve found and pick up a copy of the original 1986 cast recording of The Phantom of The Opera or Miss Saigon or Les Miserables. However with this new fangled technology intarweb thingy I can head on over to Amazon and order it from there. There’s a bunch of other online stores around that may or may not have what I want for prices roughly equal to Amazon but what I love about Amazon is watching it try to predict my buying habits. I know most people freak out about privacy violations when computer systems start gathering data about them but with Amazon it’s like a fun game. It managed to pick that I was looking to purchase Miss Saigon and The Phantom Of The Opera the last time I went there and offered a package deal on them. It’s recommendations are also really quite good – including detecting that I tend to buy the versions that Lea Salonga is in (she tends to be part of the original cast of a lot of big musicals). Sadly, it doesn’t seem to have worked out that I only buy CDs from them as it keeps offering books and sheet music. While I do tend to buy a fair bit of sheet music of musicals I won’t purchase it without first flicking through it to make sure I have a chance of being able to play it. The big downside of buying from Amazon though is I have to wait two and a half weeks for things to arrive (that or pay an extra arm for postage).
That Pesky Caps Lock
Tor Norbye politely requests that the caps lock key be removed and the control key put there instead. There’s one very good reason why that shouldn’t be done: Everyone (except old school UNIX geeks) is used to the control key being where it is. Moving the control key would seriously annoy people. If you’re one of the people who are used to control being next to ‘a’ then imagine the whole world being as annoyed as you every time they use a computer and find that control is in the “wrong” place. More importantly though, putting control beside ‘a’ isn’t a good place anyway. The little finger is the most difficult finger to control on the human hand and is used least commonly. In touch typing, currently the left little finger is positioned over ‘a’ and moves up for ‘q’ and ‘z’. If you’re British or Australian, ‘q’ and ‘z’ are incredibly uncommon letters (American’s customi_z_ed their language by putting a bunch of Zs in). Now think of the most common keyboard shortcuts used on computers these days (think Windows users, not emacs users):
Pointless Schemas
There seems to be a growing trend for projects to use XML configuration files – fine. There seems to be a growing trend for those projects to provide a schema for those files – good. There seems to be a growing trend for those projects never to validate their configuration files against the schema – bad. As I’ve previously mentioned, my job involves creating an XML forms editor and it turns out that this forms editor is really quite good at editing configuration files (see our very own configuration tool). We thought it might be nice to create a simple editor for Maven POM files. Sadly, it seems that the schema for a POM file doesn’t come anywhere near close to describing what should actually be in a POM. Maven has support for inheriting a POM and using the information in it, thus allowing that information to be omitted from the POM itself. Now admittedly, it’s not possible to specifically describe this in XML schema (the POM being inherited from can omit any information it likes as well assuming that it will be filled in by the extending POM), but the current schema insists that everything be specified in every POM file which makes validation completely useless. The POM files from the plugins don’t validate against the schema for a number of reasons as well. JDNC is even worse – Xerces finds errors in the schema itself (and I’m fairly sure Xerces is correct). This, combined with fairly poor documentation, makes it extremely difficult to implement tool support. It’s a shame, I’d probably use Maven if it wasn’t such a pain in the neck to create the POM file correctly (that and dependency management which is wonderfully simple for things in the public repository and annoyingly difficult for things that aren’t). Maybe I’ll come back and create a more useful schema for Maven at some point, but updating the schema doesn’t really help me identify the areas of our product that need improving.
String Interning and Threads
Anton Tagunov added an excellent comment to yesterday’s entry:
The one thing that has always stopped me from doing this was: interning must use some global lock. So, while interning would cause little to no harm for desktop application it is likely to introduce extra synchronization bottleneck for server applications. Thus potentially degrading perfromance on some multi-cpu beast. Would you agree with this, Adrian? Firstly, let me make something very clear: string interning will cause bugs in your code and they will be hard to track down – they will however be easy to fix once found. At some point, someone, somewhere will forget to intern a string and let it pass into your code that assumes strings are interned. Also, string interning is an extremely situational optimization. In most cases, it will worsen performance because the overhead of interning the strings will not be made up for by the reduced complexity of comparisons. Even of the cases where it does help, most of the time the difference will not be noticeable. As always, don’t bother optimizing until you know that you need to and that it will help – this is an optimization that causes the code to become less maintainable. Having said that, lets go back to the original question. Firstly, does string interning require synchronization? Probably, but not in terms of Java. The
String.intern()method is a native method and works via JNI. It would be difficult to imagine a way of achieving the behavior without at least some synchronization though. The synchronized block however would be very small, and very rarely encountered. There are two situations to consider, either the string is already in the interned list or the string is not. If it is, then no synchronization needs to occur because the list is only being read. So multiple strings can be interned at once so long as all of them are already in the interned list. Synchronization will be needed however whenever a string is interned for the first time (ie: it doesn’t match any String constant that has been loaded or any previously interned string). So on a multiple CPU system, it would be very bad to intern a lot of strings that are only ever used once or twice as they would require a lot of synchronization for no benefit. Of course on a single CPU system, doing this would be a bad thing anyway because it would incur the extra cost of comparing strings to check if they match an interned string without gaining any real benefit. My theory would then be (and only real world application profiling will confirm this in any particular situation) that the string interning technique is slightly less likely to pay off on multiple CPU systems, however because the situations in which string interning is useful require that the vast majority ofString.intern()calls match something already in the cache (most likely one of the string constants they’re to be compared against) the question of how many CPUs will be in use isn’t going to have any significant impact. I can’t stress enough though that if you don’t have specific profiling data that shows String comparisons as the biggest bottle neck in your application, you shouldn’t apply this optimization. Great question. UPDATE: Here’s an interesting discussion of interning relating specifically to this question. The automatic google search on the side (if you actually click through to this blog entry) is very handy at times.
String Interning
Elan Meng investigates the behaviours of string constants, interning and == compared to .equals. Very informative. The question remaining is, why would anyone ever use == instead of .equals on a String considering how likely it is to cause confusion and the potential for disaster if one of the strings for some reason isn’t interned. The answer is performance. In the average case, the performance difference between == and .equals is pretty much non-existent. The first thing .equals does (like any good equals method) is check if the objects are == and returns true immediately if they are. The second thing .equals does is check that the strings have the same length (in Java the length of the string is a constant field and so requires no computation). If an answer still hasn’t been found, the characters of each String are iterated over and as soon as they differ, false is returned. Now, consider the possible cases: