Blog
LetMeGoogleThatForYou Bookmarklet
I'm sure someone must have already done this and I'm just incapable of finding it (despite Googling it myself) but I figured that the only thing missing from the brilliant LetMeGoogleThatForYou was a bookmarklet, so I made one: LetMeGoogleThatForYou. Highlight text, click bookmarklet and voila - patronisation on demand.
Blogging in Business
When I started at Active Parity, one of the first things I wanted to do was help the company to get a blog started. They're a great way to show off your knowledge and communicate with clients. My first post over on the new blog is up, and it's on exactly that subject: Blogging in Business - specifically, why businesses should blog and what the potential downsides are. The first of many posts over there, I'm sure!
Personal Development: To Do
On Hacker News, ambition posted a to-do list inspired by / taken from this excellent bit of advice from Chris Wanstrath. Which got me thinking about what I want to work on and with in my spare time.
I've been meaning to organise my side-projects better. Like everyone else, I have lots of ideas and little time to make anything of them. I have a folder packed with projects at 95% completion, sitting there unloved because I got distracted, or found something better to use.
The problem with that is that taking projects to 95% is ultimately demotivating. It breeds guilt, and that's not helpful. And a project at 95% doesn't pay you back for the time you put in to it. You eventually need to release something if you don't want to end up looking back and seeing missed opportunities and wasted time.
In addition to a collection of projects on the go and ideas, there are technical skills I want to develop. I'm learning Python, and Linux server administration. I'm interested in looking into Objective-C and Cocoa. jQuery is great but I need more time with it. My "Dave! Play with PostgreSQL!" post-it is faded it's been on my wall for so long. And I need to stay sharp with the languages and technologies I use day-to-day.
Some fat needs to be trimmed.
I need to leave time for new things, too. Stuff I've not heard of yet. I'm always going to be distracted by shiny new technology. I think that's a good thing. But I want time to experiment and to tinker. If I earmark all my time for projects, I'm not going to suddenly lose interest in web technologies and tools. No, I'd start cutting into time I've promised to other things. Voila - the guilt's back and the schedule's shot. Back to square one.
So I've spent some time thinking about what I really want to get out of the time I spend on personal projects, and come up with a to-do list. I expect this to change over time, and while I don't expect for a second that you, the reader, will have the same goals or that this list itself will be useful to you, I hope if you're in a similar position it helps you to get a handle on things and get back to spending your time doing what you enjoy.
- Keep on blogging!
- Keep on making cheat sheets!
- Move AddedBytes (set up server).
- Thin out project folder and pick 2 to work on until finished.
- Write a web service.
- Write SVN Statistics app in Python (learn Python).
- Rewrite site management VB app in Python (learn Python).
- Learn Objective-C and Cocoa by writing a Useful Small Mac App (decide on what app!).
- Learn a new PHP framework.
- Get involved in an open source project.
- Update and release more code from AddedBytes.com under open source license.
That should keep me going for a while. Next, I need to flesh out some of those ideas and work out how much time I can put into them.
Readability Code Open Sourced
In July 2004 I wrote some code to calculate the readability of text using the most common algorithms available (Flesch-Kincaid and Gunning-Fog). The code hasn't aged well, and had many flaws, especially when it came to the subject of syllable counting.
Syllable counting is a tricky prospect. Consider the following sentence, for example: "I moped about, hopeful that my moped would be back on the highway soon". Sound innocuous? There's a pair of homographs in there (two words, spelled the same, that sound different) - and these have different syllable counts depending on which of the two words you mean. Words can be almost identical, with the same order and number of consonants and vowels (and it's that order you generally use to calculate syllable numbers) - "sired" has one syllable, while "sided" has two. Throw in prefixes, suffixes, plurals and compound words and you've got yourself a challenge.
Syllable counting is a minefield, with a small set of rules and a massive set of exceptions to handle.
That said, I've spent some time working through a set of test data and have come up with a small set of rules to take on the task. It helped tremendously having the work of Greg Fast (creator of Perl module Lingua::EN::Syllables) handy for reference, and setting up a decent set of unit tests allowed me to experiement with different rules until I found a set that works. So far. I expect to find more and more exceptions as time goes on, and hopefully the rules can be expanded to account for them.
It wasn't just the syllable counting that was bad. The code was inefficient, disorganised, and incapable of handling anything unpredictable (every extra space counted as an extra word, for example). There were lines in there that didn't make any sense. And I hadn't documented anything, so couldn't tell you why I'd added them in the first place. Oh to be that young and inexperienced again ...
So, as with the other releases in the last few weeks, I went back and rewrote the code properly. The new and improved version has been released as a Google Code project by the name of PHP Text Statistics. It's released (as with the other projects I've set free recently) under a New BSD License.
TextStatistics.php
It consists (so far) of a single class that will tell you various things about the text you feed it:
- String length
- Letter count
- Syllable count
- Sentence count
- Average words per sentence
- Average syllables per word
It will also calculate the readability of the text you enter according to the 6 known algorithms (links go to Wikipedia):
- Flesch Kincaid Reading Ease
- Flesch Kincaid Grade Level
- Gunning Fog Score
- Coleman-Liau Index
- SMOG Index
- Automated Reability Index
TextStatistics.php4
There is also a PHP4 compatible version of the code. At the time of writing, it returns the correct scores for test data, though given PHP4's decline and the rise of PHP5, this version may not remain as current as the previous file.
tests/
Next thing to be aware of is the unit tests included in the project. There's no easy way to check your calculations are correct, unless you have a set of verified numbers to compare them against. So, I put together (so far) three files with a variety of different tests for the code. These tests should be run with PHPUnit and at the time of writing they all pass (which means there's not enough of them yet).
tests/TextStatisticsTest.php
The basic unit test class lists a large selection of words and compares their calculated syllable count with their actual syllable count (worked out the old fashioned way). It includes a variety of tests to ensure sentence counting and word counting both work as intended. It also includes a small selection of sentences, for which readability scores have been calculated by hand, and checks that the class returns the correct scores for these items.
tests/TextStatisticsKiplingIf.php
Rudyard Kipling's If is, aside from a brilliant piece of inspiring poetry, one long sentence comprised of lots and lots of short words (take a look - impressive how few of the words are multi-syllabic). This file contains a selection of tests to run against If. It checks all of the words of the poem have their syllable count correctly calculated, and that all of the readability scores are correctly calculated, by matching the calculated scores against hand-calculated numbers.
tests/TextStatisticsMelvilleMobyDick.php
Herman Melville's Moby Dick is up next (well, the first paragraph is - I'm not prepared to count, by hand, the number of syllables in the entire book). Like If, it is (I believe) in the public domain, so can be used for this sort of purpose without complications. It's also a brilliant read. This file contains a selection of tests to run against the first paragraph of Moby Dick. It checks all of the words of the passage have their syllable count correctly calculated, and that all of the readability scores are correctly calculated, by matching the calculated scores against hand-calculated numbers.
Get Involved!
This project can benefit from the involvement of people in many ways. Initially, the most helpful thing anyone can do is find words whose syllable count is not correctly calculated by the script and add a new test for that word. There are going to be a lot out there (especially compound words, like "shoreline", and odd words that are not pronounced according to normal rules, like "simile").
The class could be expanded to give more information about text - like letter frequencies, word and phrase freqencies (useful for SEO) and unique word count, among other things. I've made a start on making the code multi-byte character set safe, but there's lots more to do there too.
The really brave could add more test text, too. Paragraphs of (public domain) text provide an excellent way to check the tool is working as it should. I'd suggest using either the Kipling or Melville file as a template to work from, and prepare for a boring few hours. You get a great feeling of satisfaction at the end, though, when the whole thing is done!
There's a discussion group for ... well, discussion. Suggestions, comments and feedback all welcome. If you would like to get involved in this project, start there (or email me), or grab a copy of the code from the SVN repository on Google Code.
Cheat Sheet Requests Updated
The Cheat Sheet Requests system has so far been more successful than I had hoped it could be, so I'm chuffed, and very grateful to everyone who has contributed so far.
Three problems have arisen, however. The first is the sheer volume of requests. There are simply too many to give people a reasonable selection to choose from, so the voting becomes skewed in favour of the items already at the top of the list.
The second problem is abuse, which has of course been rife. Lots of people requesting all sorts of dark and depraved cheat sheets. Amusing, but as requests go live instantly, not ideal.
Finally, there are only so many hours in the day, and I'm never going to be able to fulfil every cheat sheet request (though I am working on something to help people make their own).
Something that was unexpected but entirely positive was (is) the volume of requests and votes for non-techy cheat sheets. A "Girls" cheat sheet is in high demand, though I suspect it would take far more than a A4 sheet to list anything useful about women. And that's assuming any of the content could be agreed on by anyone. "French", "Guitar", "Chess" and "Leadership" are all great ideas, and I'm definitely up for doing more non-geek ones. That said, I suspect "Squid" and "Cockroaches" are not entirely serious requests.
So, I've changed the system. Now, you can still request anything you like, but only short-listed ideas will be displayed for voting. These are the ones I've picked out of the requests list that are very likely, depending on votes, to become cheat sheets. The top of the pile, so to speak. At the moment, the list is 15 requests, drawn from the most popular and the most interesting to date.
Those that aren't on the short list won't be forgotten or ignored. I'm going to start organising the requests into categories, and create a new section for requests and votes for those who want to see, and add to, the full list of requests.
For those of you not part of the Google Group, the new Python and Subversion cheat sheets are both in the preview stage there, so if you want to have a say about what goes on them, or just see what they look like before they're properly released, take a look at the Added Bytes Cheat Sheets Google Group.
Modem Emulator Open Sourced
In July 2004 I released a modem emulator (a.k.a. a throughput throttling proxy). It was created to help give designers a sense of how their sites function for people with slower connections.
I've had to take it offline a number of times due to the volume of traffic and the various ways it was being used (turns out it was a highly effective way to bypass workplace web filters).
Not only that, the code was badly out of date (code soup, not an object in sight, no real validation ... the shame) and badly needed an update.
It's been sitting there, half-working and half-not, and begging for an update for almost exactly 4 years. Ultimately, the choice was to update it or kill it permanently.
So, I spent some quality time rewriting the whole thing, pretty much from the ground up, and now with pleasure announce that it has been turned into an open source project (yes, another one) and the code is now available from Google Code under a New BSD License.
With any luck, this will allow more people to make this tool part of their workflow.
Email Address Validation Updated
I've updated the Email Address Validation function posted in June 2004. I've converted it to a PHP5 compatible class, and released it under a New BSD License on Google Code.
What Happened to ILoveJackDaniels.com?
In April of 2008, the people responsible for the Jack Daniel's trademark contacted me and asked me to stop using the name "ILoveJackDaniels", and URL "ilovejackdaniels.com", for my site, and to change the logo.
Own goal?
I think so. Not many companies have widely read websites named in tribute to their product. Even fewer have websites named in tribute to their product that get more traffic than their own product's website. Deciding to threaten them with legal action seems counter-productive. It offends the marketer in me.
And?

Ultimately, the big guys got their way, and ilovejackdaniels.com has had to go. I don't believe I've misused their trademark but I'm not in a financial position to argue.
I agreed to change the logo immediately (as you may have noticed if a regular visitor) as it was designed in tribute to the Jack Daniel's one and I can appreciate their concern that it was a bit too close to the original.
So the site has been rebranded and moved to a new domain, and the vast majority of the content (some of the out of date stuff has been removed along the way) is still here.
The Jack Daniel's folk have agreed to allow me to redirect the old domain to the new for a period of time (until at least June 2009), after which they intend to remove the redirection from the domain and break everyone's bookmarks and links. Always helpful.
On that note, I have been contacting major bookmarking sites asking them to update saved bookmarks once the site has been moved. I will post more about this later, once all the replies are in, and will be writing about the problems involved with moving the site as well, as I came across some interesting issues I'd not encountered before when moving smaller, less established sites.
Ultimately, I decided to try to treat the Jack Daniel's issue as an opportunity, rather than the disaster I initially thought it was. The whole thing gave me a great reason to clean up and rebrand the site, and I've had fun working on it for the last few weeks - so while this has been an unpleasant experience, it's not been entirely negative.
Added Bytes?
Added Bytes is new! It's essentially a replacement for ILoveJackDaniels.com, but (hopefully) without any trademark issues. Over time, I hope it will become as useful to the wider design, development and marketing communities as I always hoped ILoveJackDaniels.com was.
Still love Jack Daniel's?
Umm. I know it should taste the same - they've not changed the recipe after all. Yet for some reason, I find myself drawn to alternatives. After some hard work sampling several alternatives, I can so far highly recommend Bulleit and Woodford Reserve. Any suggestions for other drinks to try always gratefully accepted.
What about the cheat sheets?
No change to the cheat sheets, except that they now say "Available free from AddedBytes.com" instead of "Available free from ILoveJackDaniels.com". They're still free, and still released under a creative commons license.
Actually, thinking about it, there are a couple of changes. The entire series is being re-designed and re-released. All the originals are still available (and always will be) if you prefer them, but I think the new ones are smarter and cleaner. And more accurate.
To get the ball rolling, I have re-released the CSS, PHP, mod_rewrite (I reckon this is probably the most improved) and Regular Expressions cheat sheets. The others are all coming along nicely and I'll be re-releasing most of the rest over the next few weeks (some will be "retired" as they are out of date - they'll still be available, but won't be updated the the new format).
You can also now request new cheat sheets and vote for requests you'd like to see turned into real cheat sheets - see the right hand side of the cheat sheets section or any of the cheat sheets pages.
Was this always planned?
The idea of changing the name and domain of the site had crossed my mind before. Although I don't sell my services through this site, it still reflects on me professionally, and the old name, while highly memorable, didn't exactly conjure up the sort of image I was after.
Ultimately, while the thought had on several occasions crossed my mind to change, I had decided against it. I liked the old name - it was personal. It wasn't trying to be clever or stuff keywords where they don't belong. It was interesting and memorable, which are both rare qualities in domain names. I will especially miss giving my old email address over the phone.
In Conclusion then ...
The old domain is pointed here, and I will endeavour to keep that redirect up as long as I possibly can. The cheat sheets are all being re-released with a shiny new look, and you can request new ones. The site also has a shiny new look (which I'll write more about shortly). My new email address is dave@addedbytes.com. And I need a new nickname if I ever plan to use IRC again.
What Makes a Great Developer?
ReviewMe: Internet Marketing Ninjas
ReviewMe: Wordze
Comment Peer Review With OpenID?
OpenID allows us to verify that the person visiting and commenting on our sites relates to, or owns, a specific URL. This is wildly useful, and I'm looking forward to seeing it more widely adopted as soon as possible (OpenID on this site is in testing still but will be up and running soon!).
I was thinking about OpenID the other day, and one other problem that we are currently experiencing. People know that commenting on other sites will increase their exposure. Lots of people know that. So popular posts on popular sites receive a huge number of comments. Partly because they are good posts, and partly because people know a comment in the right place can be a major draw for traffic.
This creates a visibility problem. It's difficult to spot the good commenters (or good comments) in among the mess at the end of most articles. It's even harder to spot comments from people that you know personally, or whose comments you enjoy reading.
A Solution?
- A site uses OpenID for commenter identities.
- A JavaScript loads a small frame from another site when you mouse-over the commenter's name.
- This frame includes a rating for that commenter, a link to a profile for that commenter, and rating buttons.
- The profile includes whatever the commenter wants to add - standard profile stuff.
- People can click the rating buttons in the frame, "Positive" or "Negative", to indicate how they feel about a specific comment.
- The combination of these clicks produces the overall commenter rating.
- People can also leave a note with their rating ("Comment is extra-smart") which is added to the commenter profile along with, ideally, a link to the original comment.
- People who leave ratings need to be validated with OpenID before they can rate another person's comments.
- People can opt out of the system.
Pros
- Easy to set this up so that the JavaScript call to the centralised system included a link back to the original comment, allowing OpenID users to track their previous comments and potentially quickly check for replies.
- A quick call to the system could grab the commenter (or comment) rating and change the display accordingly, allowing a skim-reader to quickly pick out the best comments from a thread.
- People would get feedback on their comments!
- Would be possible for individuals to set preferences within the system ("Always highlight comments from this person", "Always ignore comments from this person")
- People who leave worthless comments (quick one liners using keywords instead of names, just to boost their own search engine link-love) are easily spotted and ignored.
- Provides a path for non-A-list bloggers to become more widely read and A-list themselves.
Cons
- System is open to spamming - people can set up multiple OpenIDs to vote themselves up. Easily fixed though - IP and cookie tracking, plus a higher weighting given to commenters with certain characteristics (member more than a year, consistently highly rated comments, rate lots of other people, don't just give high ratings when they do rate, etc).
- Revenge rating (where someone leaves a negative rating and the person slighted then does the same back despite actual comment quality) could be a problem.
- "Cliques" could easily form.
- It may dissuade genuine people from leaving negative comments on popular blogs for fear of fanboy-revenge.
Thoughts?
I'm not entirely sure how much of a difference this could make. It would require a wider adoption of OpenID (definitely a good thing), and adoption on the larger blogs and blog networks. However, were such a system to exist and be used, I think its benefits would be enormous. I'd be very interested in hearing your thoughts.
AddedBytes.com is the online playground of 