Ampersands (Does it ever end?)
Byron continues on the ampersand issue:
I’m not going to accept your argument that it’s not harmful to produce invalid HTML. What would your code produce for: http://example.com/entities.cgi?entity=& The requirements are that it should produce exactly that since that will work in all known browsers and would break in all known browsers if the ampersand wasn’t escaped. Since I didn’t personally write the code I can’t be certain that it does output that, but that’s what it should do. It should output whatever it is that makes things the most compatible so our users are the most happy.
Living In Academia
I spent about two years working as a research assistant at Griffith University and quite enjoyed my time there. I spent time working with both pure mathematics lecturers as well as software engineering oriented lecturers, so I’ve got a fairly good grasp and appreciation for the academic point of view and the processes and logic they tend to use. One of the things you notice if you spend time in an academic environment as well as a commercial environment is that the abstract nature of academic thought and reasoning fits very poorly into a commercial context. This is why so many very good ideas that come out of universities so often struggle to be commercialized (and why there are dedicated departments to help inject commercial sense into an academic idea and bring it to market). It’s not that academics are a bunch of nut-jobs with no idea about reality, it’s simply that the values and expectations in an academic environment are very different to those in a commercial environment. For instance, if a mathematics professor is developing software, his prime concern is that it is “correct” and usually wants it to be provably so. A software engineering lecturer’s primary concern will be that the software meets the requirements precisely. In most cases however, a commercial developer’s primary concern is that the software does what the user wants. With contract style work the commercial interests match up much better with a software engineering lecturer’s viewpoint, but with commercial off the shelf software development the requirements are largely unimportant – making users happy and thus buy the product is important. The biggest difference though is not between “commercial developers” and software engineering academics, but rather between mathematicians (or possibly referred to as computer scientists which tend to be mathematicians who specialized in computers) and “commercial developers”. A mathematician always wants everything to be correct (aka perfect) and provably so. A commercial developer just wants it to work, remain working and be maintainable. Sometimes of course, commercial developers are far too slack and could use an injection of the mathematician’s viewpoint, however I’ve generally found the reverse to be true more often. Academics tend to be overly pedantic to the point where it would in fact damage a commercial project. So am I saying that academics are useless or that commercial developers are somehow better? Heck no. Both viewpoints are extremely useful and allow for different types of innovation. Both are required. Just if you ever find you’re in an argument (as opposed to an informative discussion) with someone from “the other camp” give up and walk away – that argument will never be resolved.
Funny
I try not to link to everything that comes through the Oddly Enough feed, but this was just too funny to resist. Have Sex Until The Cows Come Home Source: Reuters.
Time Tracking Tools
We’ve acquired a new engineering manager at work so at long last we’re starting to put in place some of the things we’ve always said “we should do that” about for a long time but never actually gotten around to doing. One of those things is establishing how accurate our estimates are by actually tracking the time taken to complete the task. Other metrics may be useful later, but for now we just want to track time taken since time is our most limited resource. The trouble is, I don’t know of any really good time tracking tools. Here’s the rough requirements:
String Interning (Redux)
A long time ago I made some comments about String interning and Anton Tagunov made some interesting comments. It turns out he was very much right and that I was smoking something…. There are definitely still times when string interning will improve performance, even in multithreaded situations (XML parsing turns out to be one) but my comments on threading and synchronization should probably be ignored unless you’ve got the mythical hardware I had in mind when talking about it. Essentially, to achieve what I was talking about you need to be able to add a map entry (not create the entry, just add it) as an atomic operation. You would have to be able to replace the existing map with a new one as an atomic operation as well (don’t add the new entry until the expanded map is in place), however with multi cpu systems, such assignments are likely to wind up in a single processors cache and not be available to other processors. You’d need the ability to tell the processor to put this straight into RAM (reads could still come from cache but the main memory version would have to be checked in a synchronized block before creating a new entry). In Java it is definitely not possible to tell the CPU to put just one variable directly into main memory and I haven’t found any reference to this algorithm even theoretically working on any common computer system. Shame, it seems like such a good way to do it…. Good sounding design trumps working design right?
On Ampersands And Standards
Byron commented on ampersand redux:
Yes, an ampersand is valid as part of an attribute value (as represented in an HTML document) where that ampersand is part of an entity reference. An ampersand that is not part of an entity reference is not valid in an attribute value, in an HTML document. Serialization has nothing to do with it, since an HTML document is not the serialization of a DOM tree, although it can be viewed as such. I did not mean to say anything about serializing attribute values, I meant to say that an attribute value in an HTML document cannot legally have an ampersand that is not part of an entity reference. If your document does have such an ampersand, it will not validate. It might work in current browsers, but down the road it might not. Don’t do it. If a browser gets it wrong, file a bug against the browser or avoid ampersands entirely, don’t force every other author of HTML parsers to work around your markup’s faults. I still disagree with the first part – ampersands are perfectly valid in HTML comments but when serialized they must be escaped as entities. It is critical to consider entities as equivalent to the character they represent, otherwise é wouldn’t be the same as é which is clearly ludicrous. Regardless, the point is entirely academic so I’ll leave it at that. The last part however is crazy. If a browser has a bug and you need to support that browser, you should do whatever it takes to make your application work with that browser – standards be damned. It is in no way acceptable for a software developer to skip requirements just because it would mean conflicting with a standard. If adhering to the standard was also a requirement then the higher priority requirement should wind up being implemented and the other one revised to not be in conflict. If you can get the browser vendor to fix the issue and you consider it acceptable to make all your clients upgrade to the fixed version then by all means follow the standard – otherwise through it out. Standards are designed to enhance interoperability, if they reduce interoperability in areas that are important to your project they are completely worthless and should be ignored. The comment about forcing every other HTML parser to work around the markup problems is a red herring as well – HTML parsers already have to deal with that kind of thing and that’s not going to change. XML parsers on the other hand do not have to handle invalid mark up and most don’t which is precisely why I pointed out that you should always escape ampersands correctly in XHTML despite the fact that most if not all browsers will get it right either way. Software development is about achieving the project’s requirements. It’s not about politics, it’s not about standards and it’s not about making yourself feel good. If you can meet your requirements and do any of that, then great, but the requirements are the only thing that have to be achieved and they override anything else. That said, any of those things could be made a requirement of the project, but it’s quite rare that they would actually be requirements let alone high priority ones.
Ampersand Redux
It seems I wasn’t clear enough with my ampersand related comments. I’m not talking about standards here, the standards are very clear – & should always be escaped as &, no ifs no buts. However, we live in the real world and many things don’t follow standards correctly. So while David is correct that the validator will complain if you don’t escape ampersands in HTML documents, some browsers will get it wrong if you do escape them in some cases (it’s exotic and the actual test cases are at work not here unfortunately). In XHTML however, you really seriously have to escape them because a) browsers get it right when kicked into XHTML mode, and b) XML parsers barf if you don’t. Byron also chimes in with a comment:
Odd Bits Of HTML Behaviour
If you wanted to create a hyperlink to a file called “Me & You”, which of the following should you use?
<a href="Me & You"> or
<a href="Me & You"> In other words, should you escape the ampersand or not? It depends. If you create a plain HTML page, you must not escape the ampersand or it won't work (browser dependent obviously), however if you leave it unescaped it will work in every browser. If however, you create an XHTML document you should escape the ampersand, otherwise XML parsers will break when parsing the document and browsers will get the link right as long as they are kicked into XHTML mode by the appropriate declaration at the top of the file. If you want to test this try linking to an URL with
& in it (ie: the file name literally includes the HTML entity for ampersand). Better yet, don’t put stupid characters in your URLs.
Greg, Im Well Aware of When Its Appropriate To Use An Apostrophe
This blog is a very informal place for me – I write what I want, when I want, how I want. Thats why I have a blog. Recently, Greg Black took issue with my (admittedly fairly regular) misuse of the apostrophe. I am actually quite familiar with the rules of when to use an apostrophe and when not to, however since this is informal writing, I tend not to proof read me comments and also tend to think much faster than my fingers type. If this were formal writing not only would I proof read my comments to ensure that apostrophe’s were used in the correctly locations but also rewrite those long sentences with many sub-sentences in brackets (I have a bad habit of doing that – it matches my though patterns), and of course the over use of dash’s to tack on additional points. I would probably even go so far as using paragraph’s to delimit separate points instead of using them whenever I feel like some white space is required. Heck, I might even run a speeling checker over it. Its also worth responding to Gregs comment:
Excitement
(In case you haven’t noticed, it seems to be a very work oriented evening this evening.) I’m often a little envious of people who get to work in cool places developing brand new technology and speaking at conferences (and more importantly having more than a handful of people actually care about that area of development). I’ve been in the same job for about 3 years now and while we are and always have been on the very cutting edge of content related technologies (see, not even a cool name for it…), it’s a little bit old hat to me. I’ve beaten my head against all kinds of standards, HTTP, HTML, CSS, XML, Namespaces, XPath, XSD, XSLT, Word – if there’s content written in it, I’ve probably had to deal with it at some point and if it’s at all web related I’ve probably had a lot to do with it. Now I don’t mean to say that I know everything and I certainly don’t want to imply that I’m any more knowledgeable than anyone else – quite the opposite, I still have a lot to learn and there are a lot of people that I’m constantly learning from. What I am trying to say however, is that as a product becomes more mature, the coolness factor of it’s development tends to wear off. 5 years ago, the ability to replace a standard text area on a HTML page with a WYSIWYG HTML editor was nothing short of astounding. These days most browsers have (very) primitive WYSIWYG editing modes built in. In the past week or two however, I’ve gotten my teeth sunk into some awesome new features that once again have me really excited about the technology space I’m in. With the features I’ve put in during the last week or two and a couple of the features that will go in this week and next, our boring little editor is rocketing forward in usability. When Ephox started making it’s editors the general practice was to look at Word, FrontPage and DreamWeaver to see what they did and how they did it then try to find a way to make that possible within the confines of a browser. Now I’m looking at how those programs handle things and finding them lacking and quite buggy. I used to think their behavior was how it was supposed to work and that working out what the user meant was just difficult – I’m really excited to have discovered that’s not the case: most programs are just really buggy and make it look hard. Now I’m not about to suggest that our product is completely bug free, it’s not – it has a lot of room for improvement (and I’m excited that we’re really focussing on making those improvements happen), but when you look back over the past few years and see the journey that Ephox has taken to get here and follow the progress of the entire content management industry, it’s really quite exciting that we’ve gotten here. So keep an eye out for our 4.0 release (or whatever the heck marketing decide to call it) when it comes out (waiting on management and marketing to decide what the best time for a release would be and what features we want in it). It won’t be announced on Slashdot and seeing as we don’t really sell to end users and most people don’t read the kind of news sites we do get mentioned on, you’ll probably never notice the release, but I’m telling you – it’ll be awesome. I’m excited.
Java Is Now Officially Fast
I don’t use our product outside of debug mode often enough apparently. Having played around with the wiki system I mentioned previously (you do read your planet’s from bottom to top right?), I suddenly noticed that the progress bar our editor applet shows while it’s starting up wasn’t displaying. Turns out the applet was loaded and ready instantly so the screen wasn’t getting a chance to repaint. Awesome! This by the way is only on a 1.4Ghz AMD machine with 512Mb RAM so it’s about the average for corporate desktops these days, maybe a little above and it is running the Java 1.5 beta which provided some massive improvements in start up time. Still, our ActiveX based editor doesn’t load this fast. Even better, I’m using an old version of the applet since I didn’t have a recent copy on my laptop when I came home tonight and couldn’t be bothered downloading a new version. It should be a faster still with the performance improvements we’ve put in.
I Love Regex, I Hate Regex
I’ve been playing around with writing a mini-wiki that uses the full compliment of HTML as it’s syntax (instead of forcing me to learn yet another markup language) and use EditLive! for Java as the editor – eating ones own dog food and all that. Frankly, that’s the way a wiki should work, no messing around with mark up at all, just simple, easy to use WYSIWYG markup. Anyway, I wrote the back end in PHP since we don’t have any PHP examples in our SDK and I couldn’t be bothered working out why perl refused to install the MySQL drivers. Loading and saving from the database is simple enough, and I settled in to make the CamelCase works hyperlinks. The obvious answer: regex. The obvious problem: working out which regex expression to use (I don’t use regex often since I usually live in the land of custom automatons instead). I’ve wound up with: