Heading North
I’m heading up to Ingham (just north of Townsville in far north Queensland) tomorrow for a week holiday and my sister’s wedding. I’ve had to pack the absolute minimum clothing as I’m taking up a huge collection of musical instruments and assorted junk. 2 saxophones, a ton of music and I’m still not sure how I’ll fit my saxophone stand in. I’m borrowing my younger sister’s clarinet while I’m up there to play as the bride walks down the isle so I’ll have the week to learn to play that. It’s pretty similar to the sax and I’ve played clarinet previously without sounding too bad so it should come off okay. I’m more concerned about the rendition of “The Rose” on keyboard I’m meant to do as backing for my little sister while the register’s being signed. Somehow I liked the original idea of having a few musicians involved instead of just me but it’ll give me something to do anyway. Aside from that I need to find out a few things about the bridal party – most of whom I’ve never met – as I’m the MC for the evening and probably should have something half intelligent to say when introducing people. On the plus side, I’m far enough away that I haven’t had to worry about all this until now – everyone else has been madly organizing things for months.
Excuses and Reasons
Brad makes an excellent comment regarding the root logins via ssh issue:
I think the biggest disagreement we’re having here is where should this be solved. Adrian , as a developer, thinks it should be coded around. Myself, as a sysadmin, think the user should take some responsibility for their actions and check their setup on the critical pieces of software – in this case, the Internet accessible ones. It all makes sense now. I think the movie Babe probably best covers this when the sheep dog goes to talk to the sheep and talks very slowly and clearly because every sheep dog knows that sheep are stupid. When the sheep reply they do so very slowly and clearly because every sheep knows that wolves are ignorant. In this case every developer knows that all users are stupid and so they try to make everything work as safely and easily as possible. Then of course every sysadmin knows that programmers are incompetent and so they double check everything and always assume that the developer has stuffed up. It’s then quite natural (and indeed beneficial) for Brad and I to have differing reactions to this issue. As a sysadmin Brad knows that he should read the README.Debian files for the security related packages he installs and that he should double check configuration files (plus he often knows how to do this off the top of his head). As a developer however, I know that when the software gets into the hands of end users (as operating systems for PCs generally do), the users aren’t going to do any of this (mostly because they won’t know how) so the default configuration should be secure and stable to avoid the user having problems.
Excuses
Brad comments on my condemnation of root login being enabled in the default SSH config for Debian systems (noting again that SSH is disabled by default).
Debian’s SSH package explicitly asks if you want to run the ssh daemon, and by choosing to do so, you take a certain level of responsibility into your hands. Granted – Sandra acknowledged this and I acknowledged this.
I don’t agree that its the software’s fault more than the users – as a maintainer you make some assumptions, some of which will not match the users requirements, and its up to the end user to ensure that it meets their needs. The assumptions should always err on the side of security. It is trivial for a user to turn something on if they discover it is missing – it is effectively impossible for them to (knowingly) turn something off if they don’t realize it’s turned on. Microsoft is very often criticized for leaving unneeded services on by default and not being configured securely by default – Linux should receive the same criticism when it falls into the same trap.
Who’s Fault Is It Anyway?
With the real planet humbug down, I’m only occasionally checking the temporary planet humbug since it’s not coming through my RSS feeds at the moment. While I wasn’t looking, there seems to have been a little bit of a stink kicked up about Linux’s security. The story as far as I can tell seems to be that Sandra Mansell had her Debian router compromised because the root password was a dictionary word, ssh was available to the world and root logins were allowed. She took responsibility for the compromise pointing out that it was “lazy” (I would have said careless or possibly even crazy) to use a dictionary word as the root password and then complained that Debian’s default settings were to allow root logins over SSH. Brad Mashall and Greg Black then provided some useful advice on setting up SSH and security in general. However, from both posts (particularly Brad’s) I got the impression that this was entirely the users fault. I’d very strongly disagree with that. Was the user at fault? Absolutely, and Sandra acknowledged that. Was the software more at fault? Absolutely. It is very common knowledge that allowing root logins via ssh is an unsafe practice and should be avoided. It’s completely inexcusable for the Debian developers to have root logins enabled by default and even worse that it (apparently) doesn’t even display a warning about this. It is simply not good enough for an OS to ship with insecure settings and expect ordinary users to perform a full security audit and know all about arcane config files to make their system secure. The OS installer should always make the system as secure as possible given the features the users have indicated they need. Specifically, the installer should have disabled SSH by default (I think it did) then when told that SSH was required, it should have left root logins disabled. Better yet, it could have asked which accounts were required, from which networks and hosts and whether or not it could generate a key pair to be used for authentication and disable password authentication. All of that however does make the installation more complex so there’s a balance to be reached, but since there’s almost never a reason to allow remote root logins I’d be quite prepared to criticize the developer responsible for that configuration being the default. I’d then set about making sure that the users understands where they went wrong and how to avoid similar problems in the future (most likely by pointing them at Brad and Greg’s comments, which I definitely suggest you read). Why is it that in commerce the customer’s always right but in software the user’s always wrong?
Ampersands (Does it ever end?)
Byron continues on the ampersand issue:
I’m not going to accept your argument that it’s not harmful to produce invalid HTML. What would your code produce for: http://example.com/entities.cgi?entity=& The requirements are that it should produce exactly that since that will work in all known browsers and would break in all known browsers if the ampersand wasn’t escaped. Since I didn’t personally write the code I can’t be certain that it does output that, but that’s what it should do. It should output whatever it is that makes things the most compatible so our users are the most happy.
Living In Academia
I spent about two years working as a research assistant at Griffith University and quite enjoyed my time there. I spent time working with both pure mathematics lecturers as well as software engineering oriented lecturers, so I’ve got a fairly good grasp and appreciation for the academic point of view and the processes and logic they tend to use. One of the things you notice if you spend time in an academic environment as well as a commercial environment is that the abstract nature of academic thought and reasoning fits very poorly into a commercial context. This is why so many very good ideas that come out of universities so often struggle to be commercialized (and why there are dedicated departments to help inject commercial sense into an academic idea and bring it to market). It’s not that academics are a bunch of nut-jobs with no idea about reality, it’s simply that the values and expectations in an academic environment are very different to those in a commercial environment. For instance, if a mathematics professor is developing software, his prime concern is that it is “correct” and usually wants it to be provably so. A software engineering lecturer’s primary concern will be that the software meets the requirements precisely. In most cases however, a commercial developer’s primary concern is that the software does what the user wants. With contract style work the commercial interests match up much better with a software engineering lecturer’s viewpoint, but with commercial off the shelf software development the requirements are largely unimportant – making users happy and thus buy the product is important. The biggest difference though is not between “commercial developers” and software engineering academics, but rather between mathematicians (or possibly referred to as computer scientists which tend to be mathematicians who specialized in computers) and “commercial developers”. A mathematician always wants everything to be correct (aka perfect) and provably so. A commercial developer just wants it to work, remain working and be maintainable. Sometimes of course, commercial developers are far too slack and could use an injection of the mathematician’s viewpoint, however I’ve generally found the reverse to be true more often. Academics tend to be overly pedantic to the point where it would in fact damage a commercial project. So am I saying that academics are useless or that commercial developers are somehow better? Heck no. Both viewpoints are extremely useful and allow for different types of innovation. Both are required. Just if you ever find you’re in an argument (as opposed to an informative discussion) with someone from “the other camp” give up and walk away – that argument will never be resolved.
Funny
I try not to link to everything that comes through the Oddly Enough feed, but this was just too funny to resist. Have Sex Until The Cows Come Home Source: Reuters.
Time Tracking Tools
We’ve acquired a new engineering manager at work so at long last we’re starting to put in place some of the things we’ve always said “we should do that” about for a long time but never actually gotten around to doing. One of those things is establishing how accurate our estimates are by actually tracking the time taken to complete the task. Other metrics may be useful later, but for now we just want to track time taken since time is our most limited resource. The trouble is, I don’t know of any really good time tracking tools. Here’s the rough requirements:
String Interning (Redux)
A long time ago I made some comments about String interning and Anton Tagunov made some interesting comments. It turns out he was very much right and that I was smoking something…. There are definitely still times when string interning will improve performance, even in multithreaded situations (XML parsing turns out to be one) but my comments on threading and synchronization should probably be ignored unless you’ve got the mythical hardware I had in mind when talking about it. Essentially, to achieve what I was talking about you need to be able to add a map entry (not create the entry, just add it) as an atomic operation. You would have to be able to replace the existing map with a new one as an atomic operation as well (don’t add the new entry until the expanded map is in place), however with multi cpu systems, such assignments are likely to wind up in a single processors cache and not be available to other processors. You’d need the ability to tell the processor to put this straight into RAM (reads could still come from cache but the main memory version would have to be checked in a synchronized block before creating a new entry). In Java it is definitely not possible to tell the CPU to put just one variable directly into main memory and I haven’t found any reference to this algorithm even theoretically working on any common computer system. Shame, it seems like such a good way to do it…. Good sounding design trumps working design right?
On Ampersands And Standards
Byron commented on ampersand redux:
Yes, an ampersand is valid as part of an attribute value (as represented in an HTML document) where that ampersand is part of an entity reference. An ampersand that is not part of an entity reference is not valid in an attribute value, in an HTML document. Serialization has nothing to do with it, since an HTML document is not the serialization of a DOM tree, although it can be viewed as such. I did not mean to say anything about serializing attribute values, I meant to say that an attribute value in an HTML document cannot legally have an ampersand that is not part of an entity reference. If your document does have such an ampersand, it will not validate. It might work in current browsers, but down the road it might not. Don’t do it. If a browser gets it wrong, file a bug against the browser or avoid ampersands entirely, don’t force every other author of HTML parsers to work around your markup’s faults. I still disagree with the first part – ampersands are perfectly valid in HTML comments but when serialized they must be escaped as entities. It is critical to consider entities as equivalent to the character they represent, otherwise é wouldn’t be the same as é which is clearly ludicrous. Regardless, the point is entirely academic so I’ll leave it at that. The last part however is crazy. If a browser has a bug and you need to support that browser, you should do whatever it takes to make your application work with that browser – standards be damned. It is in no way acceptable for a software developer to skip requirements just because it would mean conflicting with a standard. If adhering to the standard was also a requirement then the higher priority requirement should wind up being implemented and the other one revised to not be in conflict. If you can get the browser vendor to fix the issue and you consider it acceptable to make all your clients upgrade to the fixed version then by all means follow the standard – otherwise through it out. Standards are designed to enhance interoperability, if they reduce interoperability in areas that are important to your project they are completely worthless and should be ignored. The comment about forcing every other HTML parser to work around the markup problems is a red herring as well – HTML parsers already have to deal with that kind of thing and that’s not going to change. XML parsers on the other hand do not have to handle invalid mark up and most don’t which is precisely why I pointed out that you should always escape ampersands correctly in XHTML despite the fact that most if not all browsers will get it right either way. Software development is about achieving the project’s requirements. It’s not about politics, it’s not about standards and it’s not about making yourself feel good. If you can meet your requirements and do any of that, then great, but the requirements are the only thing that have to be achieved and they override anything else. That said, any of those things could be made a requirement of the project, but it’s quite rare that they would actually be requirements let alone high priority ones.
Ampersand Redux
It seems I wasn’t clear enough with my ampersand related comments. I’m not talking about standards here, the standards are very clear – & should always be escaped as &, no ifs no buts. However, we live in the real world and many things don’t follow standards correctly. So while David is correct that the validator will complain if you don’t escape ampersands in HTML documents, some browsers will get it wrong if you do escape them in some cases (it’s exotic and the actual test cases are at work not here unfortunately). In XHTML however, you really seriously have to escape them because a) browsers get it right when kicked into XHTML mode, and b) XML parsers barf if you don’t. Byron also chimes in with a comment:
Odd Bits Of HTML Behaviour
If you wanted to create a hyperlink to a file called “Me & You”, which of the following should you use?
<a href="Me & You"> or
<a href="Me & You"> In other words, should you escape the ampersand or not? It depends. If you create a plain HTML page, you must not escape the ampersand or it won't work (browser dependent obviously), however if you leave it unescaped it will work in every browser. If however, you create an XHTML document you should escape the ampersand, otherwise XML parsers will break when parsing the document and browsers will get the link right as long as they are kicked into XHTML mode by the appropriate declaration at the top of the file. If you want to test this try linking to an URL with
& in it (ie: the file name literally includes the HTML entity for ampersand). Better yet, don’t put stupid characters in your URLs.