"In September we will begin the full-scale project of digitising over 80 years' worth of broadcast records. That's approximately 400,000 pages of Radio Times, 3 million programmes and 300 million words to recognise through OCR.The process for which is described. Not too long ago this would have required an infinite number of temps many decades to carry out. Also:
"Although the BBC only has about 20-25% of the programmes in its physical archive"Of course there is. For decades television and radio was broadcast live and there wasn't a cheap recording format. I just haven't thought of the implications of that before. In the long run, 20-25% is a lot, even if 10% is probably due to junking (goodbye Marco Polo). What will be really clever is if once this is completed, the database then links internally to relevant pages within the BBC website.
Meanwhile, the BBC's archive is moving to Perivale. They'd best watch out for the giant cat people, posh street kids etc.