  urlLink I've just been perusing the GFS filesystem paper. If you're at all interested in computers, google, gmail, and large clusters. Take a quick look. The system allows scalable, high capacity, high bandwidth, storage. As is evident from Google's Why should this matter to me? Well, one word, GMail. I am, of course, assuming that Google's upcoming GMail is running off of a GFS cluster. However, this seems very likely as GFS is quite so cool. What is relevant here is the way data is stored in 64Mb 'chunks'. And the way in which many "clients" (this could be a process writing files or indeed a user accessing data) can easily append to the same file. This allows google's e-mail service to accept multiple messages simultaneously, as well as having your mailbox indexed, de-spamified, copied for the FBI, etc. However, GFS was not designed with the aim of efficient random access writing. There is a primitive garbage collection system imposed by the Master server. For Google's needs, data is appended to files rather than inserted or changed mid-way through. This is fine of course, for caching.
However if GMail is used, and a user delete's a message, although the message may no longer display in the main inbox, it is very unlikely that the message will be removed from GFS. This is because the message will be stored in a 64Mb chunk along with many other messages. GFS can not guarantee a message's deletion is an actual deletion, and indeed it would be too intesive for messages to be deleted immediately.
Hence the issues for privacy groups, the EU and concerned users. It is of course possible for Google's GMail software to poll mailboxes periodically, rebuilding the chunks containing deleted messages. However even then messages would certainly persist for some negligible time period, and the increase in cluster load versus the wasted use of disk space would ensure that under optimal conditions, such rebuilding would occur relatively infrequently. I can see GMail privacy as being an issue for those who have something to hide, or indeed those targeted by an evil agency somewhere. Like all providers, I suspect we're just gonna have to trust google to be nice about disclosing info about us. Remember folks, "Do no evil". Any comments on this -- or indeed anyone want to offer me a GMail invitation :) ? 
