The more data we have, the less we can trust it. This is certainly true in the case of IMDb, the so-called movie database that is increasingly used by students and educators alike. Superficially, it has a history that shouts ‘trust me’: founded by a British film fan Col Needham before the World Wide Web even existed, it lived at one time on Cardiff University’s servers. Though later acquired and monetised by Amazon, Needham remains in charge. But look a bit further, and IMDb is increasingly a generator of data chaos, especially when it comes to anything beyond the latest movies. Historical information and information about TV is, well, more than a bit random.

At the heart of the problem is ‘disambiguation’: the activity of cleaning up data relating particularly to people’s names and film titles. On Wikipedia, this is a key activity, with lists of names which appear to be the same. Here I learned that I could be the ‘John Ellis’ who played a key role in discovering the Higgs’ Boson, or the ‘John Ellis’ who wrote The Social History of the Machine Gun. Sadly, I am neither of them. A Wikipedia search for the name lists almost fifty possibilities including John Ellis (executioner). Wikipedia allows any registered user (registration of course is free) to correct entries, adding information and, just as importantly, deleting anything that is wrong. As part of the process, the person entering the correction has to provide a short rationale. Then the ‘View History’ tab allows any user to trace who changed what, when they did it, and why. So Wikipedia offers two crucial data cleaning functions: users can correct entries, and the history of each entry can be checked by any subsequent interested user.

There’s none of this on IMDb. ‘Disambiguation’ does not even appear on the help page. The only address to the issue that I can find is a smug message to the effect that ‘we prefer to have loads of entries on the same thing, rather than try to tidy up around the place’. That takes a weight off their workload, of course, but does nothing to make the data trustworthy.

Take the case of ‘John Ellis’ who once ran an independent production company called ‘Large Door Ltd’ and then ‘Large Door Productions’. I’ve recently been trying to sort out the digital legacy of this largely analogue activity, including a basic website linking to programmes uploaded to YouTube or Vimeo. But I have reached an impasse with IMDb.

IMDb thinks that I am four people: John Ellis XIV, John Ellis XXV, John Ellis XXVIII and John Ellis XXXV. These John Ellises between them produced many (but not all) of the TV programmes that I did, and one that I did not, Godard’s Scenario du film Passion. IMDb thinks that the production company concerned, variously known as Large Door Ltd and Large Door Productions, produced many of the same programmes, but also two feature films Dean Quixote (2000) and Milk and Cookies or the Ballad of Norman Saxon (1996) which it definitely did not. A large number of the programmes made by the company do not have an entry at all, mainly ones produced in the 1980s, which is the forgotten decade of British TV (in metadata terms at least).

So, what to do? It appears that I have to pay a subscription to IMDb Pro to do any data entry at all, so I do that. I spend several hours dealing with an unfriendly interface, similar to the many I have to deal with in my university work. I leave messages saying who I am and that the entries are wrong in this particular respect. They say that all the entries will be moderated. So I wait. That was back in early April. Here I am in mid-June having exhausted my free trial and paid up $48 and I have had not one single acknowledgment of the various entries I have edited.

So why should anyone trust this heap of inaccurate data? My advice is: just don’t. The site is a bloated waste of internet space and actively misleading. It is fuelled by promotional activity, which is probably why, when trying to write this on a Virgin train using their internet connection, IMDb came up as a banned site. Apparently it uses too much bandwidth, which is unfair to other train travellers. Maybe that’s a rare example of Virgin Trains doing people a favour!

 


John Ellis  is Professor of Media Arts at Royal Holloway University of London. He leads the ADAPT project on the history of technologies in TV, funded by a €1.6 million grant from the European Research Council. He is the author of Documentary: Witness and Self-revelation (Routledge 2011), TV FAQ (IB Tauris 2007), Seeing Things (IB Tauris 2000) and Visible Fictions (1984). Between 1982 and 1999 he was an independent producer of TV documentaries through Large Door Productions, working for Channel 4 and BBC. He is chair of the British Universities Film & Video Council and also oversees the Royal Holloway team working on EUscreen. His publications can be found HERE.