Interesting Links, Nascent Thoughts

Here are some threads that I’ve been reading or interested in from the last year or so. I typically find links tweet-worthy, not blog-worthy unless I have significant original thoughts to contribute. If you don’t think one link is valuable, the next one will likely be better. Many of these won’t be safe for work (although what does that really mean?) I’m not saying that I agree with them. Just think that they are interesting or resounded with me when I read them. This is more than you can probably consume in a day, so feel free to Control-D. I considered breaking it up, but whatever. Consider it your Google Reader for a week. Here goes.

A ton of resources on lean software engineering (I have not specifically investigated these sublinks, but seems like a recent, solid, quality page. I’m pretty sure every other article on this post I’ve read fully.)

Of course, my favorite article since college: How to Create Wealth

One of my favorite blogs with an interesting topic and high signal post here. Perhaps a bit verbose.

From the same blog, this article parallels some thoughts that I have had recently on children and creating products. Perhaps more about this another time or in person.

And one more for good measure: how to train yourself to spot opportunities

Of course, here’s a passionate article on a similar subject. I seriously think this article is epic.

Simplification and technology

Browser adoption politics article

Detailed analysis of sales of an indie BlackBerry app, over time.

For the statistics / math buffs, apparently you can algorithmically get a totally fair result from a biased coin.

Chris Baggott chronicling his efforts at making a low-effort product blog. Seems replicatable, although I’m not sure of the value produced.

I’ve been here, unfortunately.

How to make a difference in your field: Hamming’s You and Your Research

Some great reading by Seth Godin (probably second to Paul Graham as my favorite blogger). I am usually brimming with ideas after reading stuff like this: http://sethgodin.typepad.com/seths_blog/2009/08/the-bandwidth-sync-correlation-thats-worth-thinking-about.html http://sethgodin.typepad.com/seths_blog/2009/09/organizing-customers.html http://sethgodin.typepad.com/seths_blog/2009/06/textbook-rant.html http://sethgodin.typepad.com/seths_blog/2009/02/grave-new-world.html http://sethgodin.typepad.com/seths_blog/2009/01/beauty-as-a-sig.html http://sethgodin.typepad.com/seths_blog/2009/01/in-the-mood.html http://sethgodin.typepad.com/seths_blog/2008/12/the-making-chas.html http://sethgodin.typepad.com/seths_blog/2008/11/do-you-know-eno.html http://sethgodin.typepad.com/seths_blog/2008/11/blah-blah-blah.html

I’m interested in this blog which combines biology and computer science to a degree. He is interested in open-source biology as well, which is a pretty interesting variation on a theme with many different things to consider. Here’s an interesting post.

More biology + computers

One of the more interesting papers I have read. Basically describes the “winner’s curse” (entity that wins auction likely overpays for item) as it applies to winning software contracting bids (winning entity most likely underbids, in this case, causing ruin in the near future.)

I’d say the same things that make student software projects succeed or fail can probably be applied to software contracting.

Martin Fowler on more readable regular expressions

Exciting applications of Erlang

If you’re on the hunt for something to use your newest shiny tool on or have some interesting algorithmic ideas, you could do worse than messing with these data sets.

And I can finally sleep at night because I don’t need to know everything about cars.

The paradigm shift that most people in software have seen coming for awhile. http://www.shirky.com/weblog/2009/03/newspapers-and-thinking-the-unthinkable/

If you need a kick in the ass or some passion: http://www.seoblackhat.com/2007/01/29/do-it-fucking-now/ http://weblog.raganwald.com/2004/07/if-you-want-to-write-software.html Quite NSFW, but pretty interesting http://www.stevepavlina.com/articles/do-it-now.htm http://www.37signals.com/svn/posts/1437-put-a-dent-in-the-universe http://weblog.raganwald.com/2004/07/how-to-write-software-with-art.html http://www.youtube.com/watch?v=Cbk980jV7Ao http://www.slash7.com/articles/2009/1/25/quick-note-about-shipping http://www.37signals.com/svn/posts/1626-the-most-powerful-word-is-no

My twitter statuses with links that are still useful: http://twitter.com/panozzaj/status/4134727472 http://twitter.com/panozzaj/status/4080359840 http://twitter.com/panozzaj/status/3458258161 http://twitter.com/panozzaj/status/3430020800 http://twitter.com/panozzaj/status/2344419495

git svn

I recently had a chance to use the Git interface to Subversion (git-svn), and it’s the best way that I’ve found to work with Subversion (SVN). Using git-svn is better than using plain SVN. The primary reasons I feel this way are that it provides a convenient staging environment and superior local branching capabilities.

The staging environment provided by Git means that I have a local version control system that no one else can see. This allows me to take continually check things in without worrying about mucking up the general repository. I like this because I can make changes quickly and check in whenever I make progress and things mostly run correctly. This is great because I hate it when things are improving and then I make some boneheaded change to a bunch of files and can’t figure out how to get back to a working state.

When I’m ready to check into the shared repository (in SVN), I can do this as a separate operation. What’s nice is that I can squash, modify, and reorder local commits so that my development history is clean of the rabbit trails that inevitably pop up. Theoretically, the staging environment allows me to not need to run full test suites before locally committing to ensure that I don’t break the build. This was not really a gain that I utilized though.

The second advantage was quick local branching capability. This allowed me to have several streams of development going that were mostly independent of each other. This was invaluable for quick bug fixes, and helped when work was blocked due to a client member being unreachable. Another great advantage was the ability to have my master branch be clean and always be ready for a demo. Git branches differ significantly from SVN branches, in that they are much faster to work with, require less space, and, perhaps most importantly, Git offers better merging capabilities.

Everyone else on your team can still use SVN normally, and you will see their changes if you use the correct workflow. There are numerous sites that describe this workflow and variations of it, so I won’t detail this. Just Google ‘git svn workflow’.

Issues

Attention conservation notice: heavy Git nomenclature to follow, probably not useful for the casual reader.

I ran into a couple of problems that I’d like to discuss, and my final solution. One issue was that I developed for quite awhile in Git and then needed to put this into the client’s SVN repository. There are numerous risks that I should have mitigated by doing this as I was going along with the project. I did not commit against my better judgment because I thought that my code structure might change significantly. However, it never really did. In the future I won’t hesitate to do this now, as the pain of messing around with version control is not really all that high.

I initially was interested in keeping my development history around, but this turned out to be pretty hard based on some specific things that I did. I think that the main reason for this is that I wanted my commit messages to have the correct name and email. I’m pretty sure I changed this setting in ./.git/config, but something must have gone awry. Hence, I ran a one-liner at the command line similar to the following:

git filter-branch --env-filter "export GIT_AUTHOR_NAME='<my name>';
                                export GIT_AUTHOR_EMAIL='<my email>';
                                export GIT_COMMITTER_NAME='<my name>';
                                export GIT_COMMITTER_EMAIL='<my email>'" HEAD

However, when then subsequently running git-svn commands, I received this error: Can’t call method “full_url” on an undefined value at /usr/lib/git-core/git-svn line 425.

My understanding of what this error means is that git-svn can’t find a join point between local development and your remote branch. Running the git-svn init or clone commands sets this up correctly, but what I was doing messed it up. I think I lost quite a bit of time by not understanding that rewriting the history at the end (which I typically did) would cause git-svn to get out of sync. I also messed around with grafts for awhile, which probably didn’t help matters any. I think in retrospect I probably should have only rewritten commit history for commits that I did locally, and not mess around with grafts at all. This would have probably allowed me to just git svn rebase to keep my history and have a flurry of checkins with the right information. But at this point, I was alright with losing my development history since I had already spent a bit of time working on it.

Once I was ready to import my changes, I found the following process to be helpful. Basically I add SVN directories, sync with these directories, copy over the files in master from my local directory, then commit the changes and dcommit to SVN. First, let’s add the standard SVN layout to the SVN repo. If you’re worried about messing something up, doing this process locally first is helpful, and then doing it to a sandbox directory might be advisable. See this page for an example of the former.

> svn co <repo/folder>
> cd <folder>
> mkdir trunk tags branches
> svn add trunk tags branches
   A         trunk
   A         tags
   A         branches
> svn commit -m "Base directory structure."
   Adding         trunk
   Adding         tags
   Adding         branches
   Committed revision 1.

Then in a clean directory, you can execute:

# the --prefix=svn/ gives you remote branches prefixed with svn/
# which helps to namespace them
> git svn clone <repo/folder> -s --prefix=svn/
   Initialized empty Git repository in <folder>/.git/
   Using higher level of URL: <repo/folder> => <repo>
   W: Ignoring error from SVN, path probably does not exist: (160013): Filesystem has no item: File not found: revision 100, path '<folder>'
   W: Do not be alarmed at the above message git-svn is just searching aggressively for old history.
   This may take a while on large repositories
   r1 = <treeish> (svn/trunk)
   Checked out HEAD:
     <repo/folder> r1
> cd <folder>
> git branch -a
   * master
   svn/trunk

I’d modify your .git/config file to have the correct username / email if necessary at this point.

Then you can copy all of the files from your original working directory (the one where you did all of the development) except for the .git directory. If you copy this over, you’ll have to start all over again. At this point:

> git status
   # all of your files
> git add .
> git commit -m "Changes from my local repository."
   Created commit <treeish>: Changes from my local repository.
   ...

And finally:

# in case someone updated the SVN directory that you're working with (unlikely)
> git svn fetch
> git svn dcommit --dry-run
   Should see something about pushing to <repo/folder/trunk>
> git svn dcommit
   A file1
   # etc....

This should add your files in the trunk folder in the SVN repo. Hope this helps!

Writing and Revision Control

I have been doing a lot of writing lately and was interested in automatic versioning so I could see the results of writing over time and how things change. I think that it would be really interesting to see a visualization of a book being written from scratch. Normally you only see the end product; tracking changes over time would allow others to see the sausage being made. This could be useful for teachers to help their students improve their process, for writers to analyze their craft, or for aspiring writers to see how books really get written.

Here’s a demo of what I envisioned using a recent blog post that I wrote using the following method.

The system uses git for version history. I also used a Vim hook that checks in the current file on buffer writes:

cabbr autocommit call Autocommit()
fun! Autocommit()
  au BufWritePost * silent !git add <afile>
  au BufWritePost * silent !git commit <afile> -m 'Generated commit'
endfu

This is about the finest grain of editing that I can imagine being useful and that was practical to do. Anything lower-level and you’re probably looking at the document as the cursor is moving around. Commits are nearly instantaneous, and you can amend commits to explain complicated changes. Git branching seems to work well with this system. Hence, you can have multiple streams of writing. If you’re working with other people, you could be writing a new chapter when you get some feedback on the last chapter which you would like to add. Simply create a branch from the time that you sent the document out, and you should be able to see exactly what the reviewer saw. In addition, authors of collaborative works can use the push/pull functionality to manage copies, which is probably better than emailing documents around. See this page on collaborative writing for more ideas.

As far as the current visualization goes, I used a Ruby suite that I found called DocDiff. I think that this based on or uses wdiff, a difference engine focused on words. Based on my understanding, wdiff writes each word to a line and uses the standard GNU diff algorithm to detect changes. Anyway, DocDiff seemed to fit for a rough visualization of the changes between each commit, so I used this and hacked together the navigation with some further scripting.

Improvements could include at least:

  • a more accurate diff function
  • showing diffs in the opposite direction when you go backwards in time
  • representing branching
  • advancing changes automatically instead of manually
  • showing sections moving smoothly in real-time
  • Doogie Howser typewriter noises :)
  • highlighting backgrounds a different color when the final words used are in their correct places
  • showing commit information in a corner for context
  • showing the product over time based on the source (think PDFs with images)

For blog writing, I’ve been pretty happy sticking with HTML, although Markdown would probably be better. For longer works, I have recently found Pandoc, which is a Haskell-based Markdown-and-more implementation with fewer bugs than the standard Markdown interpreter. You get support for other file formats, conversion between file formats, and the ability to write documents to PDF using LaTeX! LaTeX is nice for editing large works, but it can be cumbersome to read at times. Pandoc allows you to use Markdown for most things, and then switch to LaTeX mode for things like equations. Markdown seems to play well with versioning and seeing changes over time as well.

Non-programmers

I pretty much agree with all of the points Derek Hammer brings up in Personal Source Control. On a Linux machine, I would contend that git is pretty painless to set up, although you still need to realize that you’re starting something worth tracking. It’s not automatic yet. Plus, there’s the learning curve of actually using the system. It’s not fully in the background. And to the overall points, I definitely think that putting your local files and documents under version control is a great idea. If it’s so easy that users don’t need to think about it, then it’s an advance.

I read a book a couple of months ago called About Face 3: The Principles of Interaction Design, and the authors talked about going to a more document-oriented user interface model. The book gives some interesting examples to why this is a good idea.

Pretend you have never used a computer before. Would it be intuitive to rename the word processing document you are currently editing by clicking “Save As…” and saving it as a different file and then deleting the original? I think this makes little sense because thinking about files is more of an implementation detail than how people actually think about their projects. I don’t really care if the computer is storing my document in a hard drive, the cloud, or a shoebox, I just want to be able to rename a file I currently have open.

Versioning of documents should undergo the same analysis. Manually choosing to track changes and then being inundated with the visualization is a chore. And when users forget to track changes…? Complex merging is a chore regardless of how diligent users are.

While I was looking for a system that would check in text files automatically and push to a central repo when I was ready, I saw an article about flashbake, a git- and cron-based system of recording writing changes. While I’m not all that interested in what song was playing when I committed, having contextual information might be helpful in some cases.

Google Wave has a pretty nice way of replaying waves (conversations) that resembles the revision tracking system in a wiki. I could see using this for a personal knowledge management system when it comes out. It’s definitely nice because it should be accessible from many different platforms as long as you have an internet connection. One downside for me is the fact that you are not in full control of the data (backups, privacy, security.)

As far as collaborative text editors, I’ve looked into Gobby for some work projects. It had the best quality that I saw, you can see everyone in the session typing in real-time with different colors and no locks, and it’s cross-platform compatible FOSS.

Other thoughts

Here is an article about using relative dates. This seems like a helpful concept. Instead of saying that I read About Face a couple of months ago, I could put in an approximate date and then let software do the translation. This might be nice so that people know that I’m probably not interested in talking about the book when they stumble upon this post in a couple of years. Many things on the internet are time-specific, so anything to state this clearly seems to be moving in the right direction. I can’t think of how many times I’ve read something and thought, “this doesn’t seem quite right,” and then looked for a date and realized it was horribly out of date. Using more relative dates is a semantic web thing.

I’d like to see even more histories of documents on the web, with linking to specific versions of documents. This would enable programs to cache the documents that you link to, and if the contents change or have newer versions, you can be alerted to this fact so that you can update your references. Instead of having fully broken links when a page is down, your web server would just fall back to the caches of the documents. This would all but ensure that useful pages are backed up in a distributed fashion. People seeking to restore a lost site or link could run some sort of script that is aware of sites that are backing up external sites, and link to those caches, adding further redundancy. This reminds me of Linus Torvalds saying (hopefully tongue-in-cheek): “Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it.” There would obviously be logistical hurdles to overcome if someone manually changes their cache, if versions get out of sync, etc.

While we’re at it, let’s add transparency to government operations. Consider every change to every bill in the legislative branch and every signing statement having an associated author, time, and commit message explaining the rationale. Then citizens could see who really makes positive changes to bills and who to hold responsible for the pork. Are you sure you want to release a hundred changes one minute before voting is to begin on the House floor? True “blame” and “praise” (from an SVN perspective.)

Annual Navel Gazing

Well, it’s been a year since I started this blog, and I have been pleasantly surprised by the personal changes that I have seen, as well as other people’s responses to my writing.

I appreciated feedback that people gave me over the course of the year, whether through comments or discussions. This helped me realize that I can provide value through writing and that people are actually interested in reading what I am thinking about.

I now think that being creative is something that one can improve with practice. I don’t think that I seriously considered this angle before trying to produce something creative over a long period of time. Being able to produce consistently is a challenge, but once I started doing it, my mind changed to accommodate this request. When I gain new insights, I start thinking about how I can formalize these thoughts into something that is digestible for other people.

I’ve noticed this in songwriting or poetry as well recently. I don’t think most artists just sit down one day and say, “I’m going to write ten songs today.” The way I imagine it goes is that they write songs continuously and then take the ones that resonate most strongly afterward.

By and far the most popular two posts have been the Pomodoro Technique post and the Vim Word Processing post. Based on Google Analytics, I’m thinking the former is popular because people are searching for things on the Pomodoro Technique, and the latter is popular because it is linked to from the popular Vim Autocorrect plugin page.

As noted in some of the notes sections, I have work to do with limiting WIP and getting things out while they still have energy. Again, as written elsewhere, I realize that having a word target is good, but writing every single day is probably more trouble than it is worth. Writing with energy is important, as is writing out what I am thinking about and just getting it on paper.

Looking forward to another year of thinking, writing, and interacting! Thanks for tuning in.

Regular Expression Anchor Mnemonic

In most languages, regular expressions have symbols to indicate when the first part or last part must match the first or last part of the string or line. These are called anchors. Anchors are usually the caret () for matching the beginning of a string, and the dollar sign ($) for the end of the string. Hence:

'abc' =~ /c$/     => true
'abc' =~ /a$/     => false
'abc' =~ /^a/     => true 
'abc' =~ /^c/     => false

I can remember what the anchors are. When I have trouble remembering which is which, I use the following mnemonic:

Regular expressions are perfect, like the Garden of Eden.

Snakes end the Garden of Eden.

In this case, the dollar sign looks a lot more like a snake than the caret does, so it’s the ending anchor. This might be corny, or perhaps not completely accurate to the original story or the true nature of regular expressions (hairy beasts that they are), but it’s served me pretty well.