Archive for the ‘Software Development’ Category

A Sneak Peak at Marktplaats

Monday, April 28th, 2008

On April 12th I presented at PFCongrez, a yearly gathering of PHPFreakz. During the day three other presentations were given. The first one was by Peter-Paul Koch (ppk for short) who presented about unobtrusive javascript. The presentation was given with a lot of energy, enjoyable!

After that, I presented a sneak peak at Marktplaats during which I gave some insight into what it takes to run one of the biggest sites of the Netherlands. It goes into some of the high level production setup, highlights some of the challenges of operating hundreds of database deployments and goes into some of the aspects Marktplaats runs into while using PHP.

The slides are embedded below, or up for download here. Or join the after party discussion.

The other presentations during the day that can be found online:

Next to that, I am trying to gauge any interest in our tool to manage database schema’s: DBC. It keeps database schema’s in synch with your application, and allows developers to branch off the database schema much as a version control system allows you to do with your code. If you are interested in this tool, please contact me at “jilles &at& marktplaats . nl”. We are looking to see if there is enough interest to open source the tool.

Software development is hard

Saturday, October 6th, 2007

Kyle Wilson wrote recently a really nice piece on why software development is so hard, which for me didn’t include new insights (I’m already convinced) but did a really nice job on quantifying the problem space. Something which I had not seen before so clearly articulated. If you’re in this line of business, it’s a must read.

The thing that makes this article so interesting is that for some reason Kyle has access to information about five large software development projects: Chandler (the OSS Exchange replacement), Myst Online, Fracture (a new game), the software that controls a F-22 fighter jet and the FBI’s Virtual Case File.

After describing some of the pitfalls the Chandler team fell in, he goes on trying to outline why Lines of Code (LOC) is a useless metric for determining the complexity of a software program. More importantly, he throws in some statistics of the aforementioned projects that really hits this home.

Short list of conclusions:

• LOC is useless as a means to describe either the complexity of the program or the amount of effort that went into producing it
• Project teams need an economic framework (in the broadest sense of the word) in order to be successful. Otherwise there is no forcing function for decisions (like design choices, feature sets and release dates).
• In theory the complexity of a well-structured program should be O(n), where n is the number of lines of code (each line only tightly coupled with the line preceding and after it). A poorly structured program would be O(n2), with dependencies on one particular line throughout the codebase.

Favorite quote, from the 1968 NATO Software Engingeering Conference: “We undoubtedly produce software by backward techniques. […] We build systems like the Wright brothers build airplanes — build the whole thing, push it off the cliff, let it crash, and start over again”.

And this one: “Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves” — Alan Kay (the father of Smalltalk).

Vendors vs Application providers

Saturday, October 6th, 2007

Vogels (the CTO of Amazon) published a paper in which they describe their high available, eventually consistent data storage called Dynamo that will scale incrementally. It was an excellent read, and if you’re in the business of providing a high traffic, high available application (web or otherwise) I suggest you take a look!

That post did re-iterate with me a point I came across before: why is it that a company like Amazon is building these types of infrastructure components? There are other examples like it, providing excellent world class technology within eBay. Or more publicly why did LiveJournal.com develop memcached or Mogile? Why did Google write GFS? And the list goes on. This, by the way, is not just pertained to storage solutions. Within eBay I see some really cool technology that could be spinned off into separate products in different area’s, but I am not in a position to disclose those.

I do see that having such a technology could be a competitive advantage (for a while) but at this scale I’m not sure that that really holds. For example both Amazon and Google currently have a highly scalable data store (Dynamo vs GFS). (They are a bit different with Dynamo storing data smaller than 1MB)

Those technologies are really cool, and scratch an itch that is absolutely there for these companies but bottom line eBay, Google, LiveJournal should be adding features and improving the user experience above writing infrastructure components. Now, in order to either a) write those features or b) bring down operational cost (or availability up) you might need these technologies but that does not translate 1:1 into actually writing them. Ideally, an Application provider such as Amazon should be able to come up with a cool feature, buy the technology needed to back that feature up and develop the feature using the technology bought.

Now, why is it then that no 3rd party vendor stepped into this space and provided similar technology? Why is it that noone from these companies started off on their own and started a company providing a technology like Dynamo? Why doesn’t a big database or storage vendor step into this space? Clearly there are some big companies out there that need this technology (Amazon, eBay, Google, Yahoo, and there are certainly more). So, really, why has nobody stepped into this space? Or, in reverse, which companies provide these types of products?

Memcached usage across large web properties

Tuesday, May 29th, 2007

Lately a discussion on the memcached-mailing list has started where for example the guys behind facebook.com and bloglines.com are participating and sharing some of their experiences. I’m don’t think this is rocket science, but I’d like to quote some of the things that are being said and provide some links to the relevant discussions.

About the general “would you want to bet your uptime on memcached as an infrastructure component?”-question:

We consider memcached a critical part of our infrastructure. The benefit of memcached in a typical setup is to reduce the amount of database hardware you need to support an application; if you have enough database horsepower to run unimpaired with most of your memcached servers out of service, then there¹s probably no point using memcached at all, since it without a doubt adds extra complexity to your application code. [link]

If you shard all you data, etc. etc., is memcached still worth it?

Question:
And you would split (federate) your database into 100 chunks (the remaining 100 would be hot spares of the first 100 and could even be used to serve reads), wouldn’t that take care of all your database load needs and pretty much eliminate the need for memcache? Wouldn’t 50 such boxes be enough in reality?
Answer:
Don’t forget about latency. At Hi5 we cache entire user profiles that are composed of data from up to a dozen databases. Each page might need access to many profiles. Getting these from cache is about the only way you can achieve sub 500ms response times, even with the best DBs. [link]

Also, there is a lot of talk about a FUSE (File system in user space) filesystem based on top of memcached. Not only would that make caching available for those applications you do not control (blackbox) but it would have some really great advantages for your generic PHP app:

Over the last two weeks i spent a lot of time discussing a memcachefs (fuse-based) with two fellow geeks - applications that came to mind were (a) the smarty cache (b) php sessions; for both cases, losing files (as a whole, not random parts inside) is ok and readdir is irrelevant, which allows cutting a lot of corners. [link]

PHP vs Ruby on Rails

Tuesday, May 29th, 2007

Terry Chay over at “The Woodwork” has a length but nicely written blog post about a PHP vs Ruby on Rails discussion. If you’re interested in that kind of stuff, read the article: it has some juicy humor sprinkled into it as well; it’s a bit flame bait too…

Favourite quote (quoting another quote):

“First they ignore you, then they laugh at you, then they fight you, then you win.”
—Mahatma Ghandi

OSCON 2005:

“Unless you’re Ruby.”
—Danny O’Brien, “On Evil”

And:

I can’t speak for Alex, but what I’m saying is look at the top 100 websites on the internet: about 40% of them are written in PHP and 0% of them are written in Rails. (Yes, I can (and am) using this statistic to grind you Ruby fuckers into the dust.)

Good posts on the ‘net

Saturday, March 24th, 2007

Normally, I just follow my own Bloglines account, checking those blogs that I’m subscribed to. I always try to force myself skip on as many posts I can. But every once in a while I go on a rampage in search of new good blog (-posts). Here is a round up.

  • Top ten things ten years of professional software development has taught me: lists the 10 things the author thinks software development has thaught him. My favorite: “he business likes to say that all the features are as crucial. They are not. Push back and make them commit.”
  • Here is a post on Particletree (bookmark that site!) that talks through the 4 ways of delivering JSON objects from the server to the browser client. Also this other post that talks through paging through JSON/Ajax data plus preloading the data.
  • A nice aggregation of articles talking about business models on the web
  • Update: another really nice post on Amazon’s technology and what the advantage of using their systems give. (I’m not completely buying into it, but thats a post for another time — Microsoft shoving all their apps onto Amazon and Amazon not going to budge underneath the pressure??)

Some good articles about product development and product management

Saturday, April 29th, 2006

A long while ago, Joel Spolsky added a Reddit.com site just to expiriment. Didn’t pay too much attention to it back then. Last evening I did visit the site, and there is some good links there. So, not feeling that creative myself, I’ll repost some of the more interesting links.

Headrush is a blog supposedly about “Creating Passionate Users”. Cannot attest to that myself since I didn’t read mover than 15% of the content on the site. One of the best articles is “Death by risk aversion“. The article talks about allways targetting the outer extremes of scale instead of being mediocre. DO yourself a pleasure, follow that link and just look at the 3 graphs used in the article and you will understand what the article says.

Most of my collegues know that I have a dog, and we (me and my girlfriend) are pretty serious in training her. During some of the training sessions we use a technique called the “clicker technique”. It works like this: first you give your dog a lot of little cookies, one by one. With each cookie you don’t say a thing but make a little “click” with the clicker. After a while the dog will associate the click sound with something positive (and will actually start drooling just by hearing the sound). Kathy connects this with e-mail addiction (blackberry anyone?).

Thirdly, she has a nice article (still fresh!) about information anxiety and trying to keep up with everyone else. One of the things I do personally is go really fast through all my bloglines subscriptions and force myself to only read two full articles, done. Too bad if the blogosphere decided to write more than two interesting articles — if they were so good someone else will repost them tomorrow and I can go for a rebound.

A second blog, called Rands In Respose, has a really nice article up about why web startups are most of the time failing and what to do about it. Really, the article is too long and too good to just summarize here. So, go read it! No really, the article is good, go read it!

A second article on that blog is about the “Free Electron” in your development team:

A Free Electron can do anything when it comes to code. They can write a complete application from scratch, learn a language in a weekend, and, most importantly, they can dive into a tremendous pile of spaghetti code, make sense of it, and actually getting it working. You can build an entire businesses around a Free Electron. They’re that good.

There is some gold in that article as well. Example: don’t send your Free Electron off fixing those three memory leaks:

When he returned, the bugs were fixed and the entire database layer had been rewritten. A piece of code that’d taken two engineers roughly six months to design had been totally redone in seven days. Sound like a great idea until you realize we were working on a small update and did not have the resources or time to test a brand spankin’ new database layer. Oops.

Thats it for now…Thanks for all the fish.

What branching strategy do you use?

Wednesday, October 6th, 2004

Yesterday I found myself writing a branching strategy. We’ve been using a software configuration tool (CVS) since I’ve been working there. But the branching strategy was somewhat ad-hoc. Whatever we felt like, we did. Now our new project nears its first release onto production servers, I thought it was time to re-think our branching strategy.

When thinking about how to branch, it seems to me it is choosing between overhead and stability. The more branches you use, the more work is independent from each other, providing stability to that branch. However, the more branches you create, the more work goes into merging those branches, documenting what’s on which branch, etc.

Having such a branch strategy helps at least in one way: everyone should learn this strategy by hart. Once that is done, nobody will be surprised as in: “Oh? I didn’t know we created a new branch for that functionality”.

But coming up with good versioning tactics that “don’t get in the way” too much is pretty hard. Of course, there has been some scientific research done in that field. That resulted into some pretty documents that cover everything from A to Z. (I particularly like that document that treats all of the branching tactics like another GoF pattern.) Of course, there is a lot of difference between the strategies involved for an internal product, a shrink-wrapped product or a website.

Obviously, Marktplaats.nl is in the website department here. But even then, there are a lot of differences. It has already been in some newspapers that we are opening websites in foreign countries (foreign to The Netherlands that is). Aha, that opens up entirely another can of worms. Now we deploy our website for multiple countries. Each of these countries could run a different version of our product. That might been driven by the fact that in some countries (country A) a particular feature is hard needed, but in others maybe not (country B). But then, where do you do the bug fixing? Both country A and B need that same bug fix, but what has been deployed for those countries might reside on different branches! That brings in a lot more overhead if you ask me.

Even armed with a lot of scientific articles about this topic I’m still in doubt about what strategy to choose. So I’d like to invite everyone who’s in the same situation as me to contact me. What branching strategy do you use? What do you like about it? What policies do you have in place? And most of all, I’d like to get in touch with Joel Spolsky about this topic. He’s advocating the usage of a SCM-tool (just like me!), but every tool has is incorrect usages. Joel, how are you using CVS to maintain your product?

Specifications? How?

Saturday, July 10th, 2004

[This post is a re-post of a question I asked at Joelonsoftware.com’s discussion forums]

Before I start of writing this post, I’d like to refer to certain other sources to prove that Specifications are important for any software development. It works the same as house construction: no company is going to build a house without a blueprint (heck, they won’t even get permission to build it otherwise!).
First off, lets link to the Joel test, specifically the part about writing specs. Any questions about why you would write specifications read this article. More information can be found here on Wikipedia.

The question I have is: how would you go about to document the specifications? This PDF talks about possible options on pages 109 and 115, but how would you do this?

Everything always boils down to what you really want to do with it. Well, actually, not very much. Specification is firstly a means to communicate the exact soon-to-be implementation to those who should sign-off the project. Secondly, it should be the basis for a technical design document and later the actual implementation. Furthermore, it should provide the definitive answer for testing: should this box be red or blue? As a last needed feature is to be able to quickly see the differences between specific versions of the specification (“What was changed since version x.y.z?”).

So what would be a good way to document these specifications? Because I want ways to quickly show differences between versions I thought about CVS and docbook (together with a tool, Norman Walsh’s diffmk, to generate a proper HTML document containing differences between two versions of the docbook’s XML document). But this proves a little bit tiresome, especially when large volumes of pictures and diagrams are involved. A possitive side of the usage of CVS is that multiple people can work on the specifications. Thats why, in my opinion, dismisses solutions like Word and/or Excel.

It is good to know that other people are struggling with the same problems.

Basically the core of this post is to ask: “What ways of specification would you use in similar circumstances?” Got any answers to this? Do you want to discuss about this topic with me? Write me an email, post a comment on Joel on Software or post a comment at this post.

SpeedDEV

Tuesday, May 11th, 2004

Well, today I’ve had a long discussion with someone from SpeedDEV. I’ve been in touch with them for a little while now. They make something that lets itself describe best (in my own words) as

“An Issue and Requirements tracking application with advanced functionality for process and project management with a very high level of automation and flexibility.”

The employees are very nice to me and very helpful. Two things surprise me about this company and its product:

  • I have never seen a greater difference between a companies own website and the product it has developed. The product SpeeDEV is something that struck with me amazement (such a good and thoroughly developed product!). Yet, if you look at their company website it looks like it was developed by a bunch of high school students.
  • The product itself is by far the most thoroughly developed product in the Issue/Requirements tracking market I’ve seen till today. (I’ve seen a lot of these products already. That’s not to say that I’ve seen everything, but still says something about SpeeDEV.) For example, the entire application is web based. They advert with a process automation engine and an editor to go with that. You’d expect some half-baked solution in DHTML that will barely do what you want it to do. But not SpeeDEV, oh no! It comes with an entire editor that would behave just like a desktop application (ala Visio or the like) to define the process you want. Perfect!

In time to come I’ll report more often on SpeeDEV. One reason for that is that on the web there is almost nothing to be found about this product except on the company website (Google turns up only 679 results for example). And that’s pretty exceptional for a product that is in its 4.x version series.