Archive for July, 2007

Who Else Wants Faster Standards!!!

Monday, July 23rd, 2007

Well, on the 19th of July the W3C announced that CSS 2.1 Is a Candidate Recommendation, no offense intended, but it’s about time!

This change has been on the cards for years, and I for one am quite annoyed that it takes so long from the initial design to an actual standards approval. I realise that the number of channels and layers that the standard gets past through is immense, but seriously, the first working draft for CSS 2.1 was produced in 2002, it is now 2007, and this wasn’t for a large change like going to CSS 3.

Unfortunately, the process is the best we have at the moment. I would be interested in people’s views on the subject if you have any. Should we try and set up a faster standards approval process? Or does the current system work by ensuring quality of standards?

Microformats: Metadata or Metacrap?

Sunday, July 22nd, 2007

I for one quite like the idea of being able to expand XHTML on the fly to markup data like events and contacts. However I must admit to not liking the unstructured nature of the markup that is generated. Divs and spans are all very well but they don’t do much for the semantic nature of the page, a better idea might be to use XML and insert that into the main site via namespaces.

So, what to do about them? I for one implement a few microformats on this site, the contact cards on the right sidebar are in the hCard format. All events published in the blog will be in the hCalendar format. Why do I do that if I don’t like the unstructured nature? Well at the moment there isn’t a widely adopted markup language for such content and so microformats are the next best thing, one advantage of them being that they require no extra effort to display and can be integrated straight away, without having to learn another markup language.

So please, go ahead, the more data on the net that is easily accessible and readily available to machines the better!

More information about microformats is available at the Microformats site

The Bluffers Guide to…Metadata!

Saturday, July 21st, 2007

In my past few posts I have referred constantly to Metadata, but what is it? And how can you use it?

Metadata in its most trivial form is data about other data. If you know that one line you know everything there is to know about metadata, all we need to do now is elaborate…

Metadata can classified in terms of content, mutability and logical function, I can’t describe this in any better terms than Wikipedia already has so:

  • Content. Metadata can either describe the resource itself (for example, name and size of a file) or the content of the resource (for example, “This video shows a boy playing football”).
  • Mutability. With respect to the whole resource, metadata can be either immutable (for example, the “Title” of a video does not change as the video itself is being played) or mutable (the “Scene description” does change).
  • Logical function. There are three layers of logical function: at the bottom the subsymbolic layer that contains the raw data itself, then the symbolic layer with metadata describing the raw data, and on the top the logical layer containing metadata that allows logical reasoning using the symbolic layer.

Metadata Lifecycles and Storage

Metadata usually lasts as long as the resource it is describing lasts, this is not always ideal however. What happens when two resources merge or the resource changes in some other way, what happens to the metadata? Sometimes it is useful to keep old metadata e.g. to keep an archive of the resource as it changes. When two resources merge e.g. text documents, the metadata from each is usually discarded and fresh metadata is generated, this is something which should change if we really want an accurate version history of our documents, images and music.

Metadata is usually stored within a resource, usually in the header part of a file format, there are occasions when metadata exists outside a resource, e.g. An XML file describing a photo album might be located somewhere and the photos that make up the album may be stored somewhere completely different. Where to put metadata and how to store it is becoming a major issue, do we increase the size of the file? Or do we store it externally and risk it being separated? Should it be plain text and human readable, or should be store it as binary to decrease the storage space needed to hold it?

Metacrap?

Those who oppose metadata, or so called metacrap, are against metadata for a number of reasons, mainly they say it clogs up file formats for no particular reason, another argument is that metadata is subjective and as so there is no real objective need to have it.

My response? Their wrong. Metadata is continually being used everyday, ever tag your blog entry? What about your friends pictures on facebook? What about the author of a word document? The truth is that metadata is an essential part of digitial life, it allows us to add meaning and context to the 1’s and 0’s. Only when metadata is used incorrectly does it start becoming useless.

Web Law: Less Equals More

Friday, July 20th, 2007

Jeff Atwood over at Coding Horror posted a great post entitled The Principle of Least Power. The concept was first proposed by Tim Berners-Lee, the father of the world wide web, to summarise, Powerful languages inhibit information reuse. For example, weather information encoded in XML can be manipulated and processed very easily, where as if it were encoded in Java, the information is locked and therefore cannot be searched or sorted by any external application.

For more information see The W3C Document concerning The Rule of Least Power

Directories, Tagging and Desktop Search

Friday, July 20th, 2007

Last time I touched upon Desktop Search in terms of how metadata should be used in the process, this time I would like to look further into the structure of the system itself and how that could be improved to facilitate search.

Desktop search is a subject that is pushing its way back into the limelight and all thanks to an online convention, tagging.

Microsoft have PHLAT and there are numerous other applications out there that allow you to tag files and then search through them. The idea is that a central database is created and then filled with tags and information about each file so that more data is available to the search application.

One important note about tagging that only really hit me while reading the new face of tagging was that tags can be seperated into two broad categories, descriptive and contextual, the former provides information about the data that has been tagged e.g. in a photo a tag might be the photographer or who is in the photo. Contextual tags however provide information about the item that has been tagged e.g. in a photo a contextual tag might be “to-crop” or “to-edit”.

The major difference between the two types of tag is their mutability, descriptive tags are immutable where as contextual tags last as long as the context e.g. once the photo has been edited the tag can be removed and a new tag “complete” can be used.

But tagging has its problems, the tags are invisible which means that manual search isn’t improved, this can only be done through the appropriate use of directories.

In my home directory, for example, I have a folder called “Music” and in this folder I keep all my music files, each artist has their own directory, and in each artist directory there are album directories and in these are the actual music files. Now, because they are sorted this way it makes moving the library around very easy and it means that music players can easily find my entire collection. This is an example of a situation where tagging just isn’t appropriate.

Photo Files

Lets look at another example, photo albums, whenever I have a collection of photos a new directory is created with the name and, if applicable, the date of the album, then the photos are placed in there, the photos themselves are named in a specific way, <name>_<location>_<date>.file type, this helps me when I am manually browsing the files.

Decentralised Data

The next step would be to use an application to tag the files so that I could easily find similar photos, what would be ideal is if the photo files aloud me to add information such as who is in the photograph, where it was taken and who took it. I am aware that many file types do allow for a degree of metadata but I have yet to find a suitable format, but that is for a future blog post.

The Secret Power of Metadata

Thursday, July 19th, 2007

Following on from my last post, I would like you to think about your computer space, if you needed a file now, would you be able to find it?

If it’s anything like my old space then the answer is, yes, but it would take a while. Now, I am going to go a over a theory that has already been implemented a few times but never to full satisfaction, tagging.

In traditional (if it can be called that) web tagging, tags have been stored in a database, now this isn’t a problem because the image/link/document is always stored centrally somewhere on the web. On a normal desktop computer files are shuffled about, backed up, copied, emailed etc. This means that any central database is almost impossible to manage without a program constantly monitoring what happens to your files, and even then as soon as the file is moved to a different computer the metadata is lost forever.

This problem got me thinking, is there a way to store metadata so that it remains constant across folders, computers, operating systems etc. The problem is that any attempt to add metadata to file formats that have no accommodation for it will inevitably corrupt the file. So this means that the metadata must be stored externally, at least on some files, which means that for every data file there needs to be a sister file containing the metadata, and these must somehow be managed in a way that they get transferred with the original data file.

Now I can hear you all crying, “What’s the point of that you may as well put everything in a database and write a protocol to ensure that the values get transferred when the files do!”, Well in a way you’re right but the problem lies in what happens when a file is transferred to a system without a central meta database, the data has no where to go but still has to be preserved. This is where the sister files come in they don’t need a context in which to be transferred to since every major system under the sun supports plain text files. So as long as the sister file is kept with the main file the metadata can be preserved without the system needing a centralised system.

Unfortunately, I don’t like this solution, it is a bit messy and doesn’t solve the underlying problem, I hope that in the future metadata will be built into file formats, this is already happening with .mp3 and of course all the common web markup languages have this ability.

A Metadata Future?

So this is all very well but what use is it? Well I propose that this system could be used in the future as a basis to using tags to sort and search data on computers instead of the folders and file names we all use now. This will lead to far more accurate search since the metadata helps describe the file you are looking for so you no longer have to rely on the file name or type. This is something I am looking into developing, as my ideas become more structured.

Who Else Wants Killer Search?

Wednesday, July 18th, 2007

Can you imagine a world where computers could communicate flawlessly?

That is the dream of many, including myself. I dream of a future when applications will be able to understand information in context, this future is quite far off but we are making some progress towards it.

XHTML, is one language which developers and content publishers are slowly but surely starting to grasp onto. A far stricter and more content orientated version of HTML, XHTML allows users to markup documents so that whoever/whatever is reading them understands the context of the information, e.g. <strong> to donate a piece of text with a strong emphasis.

XHTML when used correctly can be a powerful tool towards building a knowledge base, however, it cannot be used exclusively, other problems are still proposed, consider this example:

I want to search for a blue car on the internet, several companies have web sites dedicated to displaying their range of cars. If the pages are correctly marked up, it is likely that a search engine can find related pages, but that system isn’t foolproof. What we need is a way of marking up car details.

Now this is where XML comes in, XML is a way of creating your own markup for specifying anything from car details to tv listings. The main issue here is that the motor industry would have to come up with their own specification, and we all know how long it takes for something to get done in corporate partnerships.

OK, so lets say that by some miracle there is an industry standard markup language and that all dealers subscribe to it and publish listings on their web site. Now a search application could index these listings and then I would be able to find my blue car right?

Well, not really, we now have another issue, semantics. Blue is a colour, so is turquoise so is cyan. When I say blue, I sometimes mean cyan, I sometimes mean navy. There are so many different names to describe the subtle and not so subtle variations in colour and an application would have to take this into account. But how do you explain to an application that blue means cyan means turquoise? Do we just stop here and come to the conclusion that humans have to become slightly better at describing?

There are a few solutions, one is building a relationship database, where colours and other similar properties can be put into related groups and can thus be referenced. However, this again requires a central body to create a maintain the database. Another solution would be to leave it up to the applications themselves, and let market forces decide which ones are better for the job, a far more decentralised approach. A third would be to have a publicly maintained database, Wikipedia anyone?

And there you have it, if all the above elements were in place searching would become a hell of a lot easier, there is only one problem…..

How do you trust the people producing the page to be honest about their content/products ?