Archives for June 2008

Post categories: Radio

James Cridland|11:45 UK time, Friday, 27 June 2008

People have been asking about the bitrates and codecs that we're using on national radio within the new iPlayer beta.

The quick answer is "they're different per station, they're different whether live or on-demand, and they'll change at least another two times this year". If that satisfies you, you have no requirement to read on. If you want more information, however, I'm happy to help. Note that I'm only talking about national radio, and only for listeners in the UK.

First, you'll notice that for "live" we're currently using Windows Media Player rather than Real Player (for most of you - we still give Real to some operating systems). We're doing this because we know online radio is particularly useful in the office, and chances are that Windows Media is automatically installed on most computers, and most corporates won't let you install other software. It should, therefore, 'just work'. I should though say that if you need RealPlayer for your internet radio or your fridge, those streams continue; we've no plans to remove them.

The future for "live" is firstly to significantly improve the bitrate (which we'll do in July). In parallel with that we're working on a way of delivering higher-quality still, using a Flash-based player and an AAC-family stream. We're working with our distribution partners to enable this; the upshot is that it should sound even better but use less bandwidth.

For "on-demand", you'll have spotted that we're using Flash, within the lovely embedded media player that you're familiar with for TV in the iPlayer. Under the hood is a protected MP3 stream for now: again, we're shifting over to AAC-family later in the year. The real difference here is the quality - we've significantly improved the bitrates we can offer.

For on-demand content, we're launching iPlayer with four MP3 profiles based on the content of the programme: and we're using four different bitrates for these profiles.

Pop music (eg Radio 1 or Asian Network) is 128k stereo.

Classical music (Radio 3) is 192k stereo.

Stereo speech (Radio 4) is 128k stereo.

Mono speech (Radio 5live) is 80k mono.

These are the launch bitrates; we'll tweak things, and moving to the AAC family will reduce the bitrates we use (to make your listening more reliable, whilst maintaining audio quality). Again, the Real listen-again streams that your internet radio uses will still work.

Finally, perhaps I might be able to let you into a bit of a dirty secret. For the last six years, the online streams from BBC national radio have been taken from satellite: the same feeds you get on Freesat or Sky. So we've been taking a lossy MP2 audio feed, and then encoding it further into even lower bitrates. As we move into higher quality audio online, clearly this has to stop. So, from July, it will - we'll be encoding everything within Broadcasting House, plugged in to the studio feeds. So better bitrate is only part of the story - it's also better sound.

If you've got feedback about radio within iPlayer beta, we're watching your blogs; or if you're blogless, please do comment here.

The Simple Joys of Web-Scale Identifiers

Comments

Share this page

Post categories: Technology

Michael Smethurst|13:26 UK time, Wednesday, 25 June 2008

<aside>Second post of the day is quite a record for me but this one isn't about microformats so you can probably look away now...<aside>

Bob Dylan with his MusicBrainz identifier

This post is partly a response to Tom's post about URLs and partly the result of conversations with Matthew Wood, Chris Sizemore and John O'Donovan on our recent jaunt to Linked Data Planet. Now I think that most of our department would agree with Tom. After all we've been having these conversations for a few years now and when it comes to URL design we're standing on the shoulders of giants.

When you're building anything it's always good to admit that cleverer people than you or I (or even Tom) came before. In the case of the web those people gave us HTTP and HTTP is stateless. It's the whole beauty of the web: everyone, everywhere gets the same thing from the same place. The moment you pick a fight with this design you're probably gonna get beat.

Which is not to say that people haven't picked this fight. Many websites (including the BBC) use cookies to preserve state across requests. So stateful web apps do get built but when you make that choice you need to be aware that all your user activity will remain uncaptured by the web - no browsability, no google goodness, no benefit to your organisation (beyond the obvious) and no caching.

So, like I say I agree with the four Linked Data rules but I'd like to try to add a fifth: if possible don't reinvent other people's web identifiers. By web identifiers I mean those fragments of URLs that uniquely identify a resource within a domain. So in the case of the MusicBrainz entry for The Fall (https://musicbrainz.org/artist/d5da1841-9bc8-4813-9f89-11098090148e.html) that'll be d5da1841-9bc8-4813-9f89-11098090148e.

The last time we updated the /music site we made this mistake (kind of unavoidable at the time). Even though we linked our data to MusicBrainz we minted new identifiers for artists. So The Fall became https://www.bbc.co.uk/music/artist/jb9x/ where jb9x was the identifier. But jb9x doesn't exist anywhere outside of /music. We'll (hopefully) never make that mistake again.

When we first partnered with MusicBrainz the big attraction was 2 fold:

stable web-scale identifiers
liberal data licensing - no separate deals to reuse data in APIs etc

So when the next version of /music goes live you'll see: https://www.bbc.co.uk/music/artists/d5da1841-9bc8-4813-9f89-11098090148e and the world will hopefully be a slightly better place.

Now I can already hear my old mentor saying:

Michael noooo! URIs are just identifiers for resources. They shouldn't reflect the taxonomy of the site. The resource should define it's relationships to other resources not the URI. Call them anything you like but just keep them stable.

With which I also mostly agree but - if bbc.co.uk/programmes tagged content with the same vocabulary as bbc.co.uk/news we'd be able to cross promote news stories from programmes and programmes from news stories by sharing APIs not databases. Tie this into personalisation and the power goes logarithmic. Read six articles on reconstruction in Iraq? Then you might like this Panaroma programme.

But if the vocabulary used to tag programmes and news was web-scale then The Times, The New York Times, Fox News etc (or someone in between) could start to aggregate stories around a shared sense of topic. This is what Chris' recent post on using wikipedia / dbpedia as a controlled vocabulary begins to hint at. It's like Yahoo! Term Extraction or Open Calais except the terms returned are web native or web-scale identifiers if you will.

So what's the practical benefit: well because the new /music URLs will be based on MusicBrainz identifiers and because /music will be interlinked with /programmes and because the Last.fm API speaks in MusicBrainz identifiers Patrick can spend a weekend at Mashed making something that takes your Last.fm user name, extracts your favourite artists, ties them to /music and recommends BBC programmes. Which is a pretty good hack.

Taking another example for those who wish to stalk Tom Scott. His blog is at derivadow.com which is also his OpenID, you'll find his delicious account at del.icio.us/derivadow, his tweets at twitter.com/derivadow and if you want to hire him he's at www.linkedin.com/in/derivadow on LinkedIn. So derivadow is a web-scale identifier for Tom. It's not as strong or as powerful as a set of RDF linked URIs but if you wanna aggregate Tom-ness it's a pretty good starting point. Sadly I can't find him anywhere on Last.fm but that's possibly a godsend.

The obvious question is if web-scale identifiers are so good why did the BBC mint it's own for programmes? After all the the b00c4wxm used in /programmes and iPlayer is a BBC invention. And the answer is there were no suitable identifiers out there. I'd like to think that if Program(me)Brainz existed with stable identifiers we'd have put in the work to use those instead. But it didn't so we couldn't... But now we have stable identifiers out there on the web free to use for anyone. It would be good for example to see these identifiers adopted by Speechification. Time will tell.

One argument against all this is that web-scale identifiers are often kinda ugly. After all if Last.fm gets away with www.last.fm/music/The+Fall why do we need d5da1841-9bc8-4813-9f89-11098090148e. The answer is ambiguity. MusicBrainz has 16 Auroras. Which one(s) does the BBC play? Probably none actually but you get the point. If we want to be exact in what we point to we need to handle ambiguity. In general we follow 3 commandments:

URLs should be human readable
URLs should be hackable
URLs should persistently point to one concept

And the greatest of these is persistence. If you can't maintain stable URLs per concept don't even bother with 1 and 2. There are others that argue that URLs are part of the interface. If resolving ambiguity is not important to your business then I'd agree but if you need to differentiate stuff with the same label you need unique identifiers - better yet web-scale identifiers.

Now I guess the Linked Data people would say do this properly in RDF with owl:sameAs etc and we will do. But for hackers without PhDs the possibility of instant interoperability and quick mesh-ups is irresistible. Obviously you'll still need to establish equivalency between this and this but luckily that's where the Linking Open Data people have done some of our work for us. And they're damn nice people to boot.

So I guess what I'm saying echoes Tom. Cleverer people than us have come up with ways to attach web-scale identifiers to content so why waste time reinventing. Whilst the BBC or *insert your organisation here* should own their data (whilst hopefully making it free - as in beer; as in speech) we don't have to own our identifiers. If we choose to use the power of web-scale identifiers we free our content to fly and leave it to other people to add value / make money in the middle. It's not exactly profound but it does feel like a small breakthrough to an aging BBC employee.

Microformats and RDFa and RDF

Comments

Share this page

Post categories: Technology

Michael Smethurst|10:13 UK time, Wednesday, 25 June 2008

Improving the Acronym Karma

My original post on removing microformats from /programmes seems to have kicked off quite a debate. Unfortunately some of this seems to have resulted in RDFa people criticising microformats and vice versa. Which wasn't really the intention.

The post covered 3 things:

the decision by the BBC to ban the use of microformats which use non-human-readable data in the title attribute of the abbreviation element (most obviously the datetime abbreviation design pattern)
the impact of this on /programmes
the possibility of using RDFa on /programmes

so it's probably best to break these things apart.

Banning some uses of the abbreviation design pattern on bbc.co.uk

This is hopefully only a temporary ban until the microformats community come up with an alternative to the abbreviation design pattern that doesn't break BBC accessibility standards. It doesn't mean that hCalendar is banned or even the abbreviation design pattern is banned per se. Just that we can't use it where the title attribute contains non-human-readable data. Note that hCalendar can be used without the abbreviation design pattern but none of the alternatives fit with our needs.

The impact on /programmes

I concentrated on /programmes because:

it's the project I work on
it's probably the bit of bbc.co.uk that makes most extensive use of microformats

Obviously there are other bits of bbc.co.uk that use microformats that would break the new accessibility standards but we were aware of people screen scraping the /programmes microformats in lieu of a full API so thought we'd best flag up what was happening.

RDFa

First it's probably important to note that interest in RDFa is pretty much an Audio and Music thing. I've spoken to other people in various bits of the BBC who've expressed an interest but so far the majority of discussions have been confined to Henry Wood House. So this next bit is with A&Mi hat firmly on.

A number of A&Mi projects are being developed in accordance with the principles of Linked Data. For these sites we intend to provide full-fat RDF at separate URLs. In the case of /programmes this has resulted in the development of the Programmes Ontology - an RDF vocabulary to describe programmes. We're following the same principles with the redevelopment of /music (where we'll be using the existing Music Ontology). Where we're providing full RDF it makes sense (at least to us) to reuse these ontologies and also produce RDFa.

Other projects might be data driven but might not want to go down the full RDF route. In this case they might opt for RDFa or they might choose accessible microformats.

For more lightweight, possibly hand-coded projects (still the majority of bbc.co.uk) accessible microformats would probably be most suitable.

So in short it's easy to imagine a BBC website with a mixed economy of microformats, RDFa and RDF. It certainly shouldn't be an either/or. So mostly I agree with Edd Dumbill except that I'm not sure that the accessibility of the abbreviation design pattern is a bug so much as an expected result of deliberate design decisions. Anyway it's a problem that seems to have been around for a while now - hopefully it'll get sorted soon and we can all get back to using microformats (where appropriate) with a bit more peace of mind.

Removing Microformats from bbc.co.uk/programmes

Comments

Share this page

Post categories: Technology

Michael Smethurst|10:48 UK time, Monday, 23 June 2008

Since /programmes first went live we've been working to ensure that programme data was accessible to people and machines alike. The API design was baked in at the application design stage. Similarly we've worked on adding microformats to HTML pages as a lightweight API. All broadcasts use the hCalendar microformat to add start times, end times, broadcast channels etc.

Unfortunately there have been a number of concerns over hCalendar's use of the abbreviation design pattern. This uses the HTML abbreviation element to add machine data to pages. Our concerns were:

the effect on blind users using screen readers with abbreviation expansion turned on where abbreviations designed for machines would be read out
the effect on partially sighted users using screen readers where tool tips of abbreviations designed for machines would be read out
the effect of incomprehensible tooltips on users with cognitive disabilities
the potential fencing off of abbreviations to domains that need them (travel - airport codes, finance - ticker symbols etc)

Until these issues are resolved the BBC semantic markup standards have been updated to prevent the use of non-human-readable text in abbreviations. ~~As I type the revised standard has not been published - I'll update this post with a link when that happens.~~ Updated standard is here. For this reason we've taken the decision to remove the hCalendar microformat from /programmes until:

either the BBC accessibility group does further testing and declares the abbreviation design pattern to be safe to use
or the microformats community settles on an accessible alternative to the abbreviation design pattern. The conversation about this has already been started by Frances Berriman.

hCalendar will be gone from /programmes by the next deploy (probably this Thursday).

In the meantime we'll be looking at the possible use of RDFa (a slightly bigger S semantic web technology similar to microformats but without some of the more unexpected side-effects).

Apologies to anyone who's been using hCalendar to help with screen-scraping of /programmes. We know we've been promising a full API for a while now and the /programmes development team will be campaigning to bring this up the product backlog. In the meantime schedules are already available as json and xml. Leave a comment if there are specific views / formats you'd like to see next.

Probably best to note that this only affects microformats using the abbreviation design pattern. Any rel based and hCard microformats will remain (at least until/if we fully embrace RDF-a). And probably also best to note that this is not a decision that has come down from on high by the BBC equivalent of suits. The /programmes team has been concerned about this issue for a few months now and it's good to get some clarity here.

Stay tuned to radiolabs and we'll keep you updated if / as things change.

Radio Labs at Mashed 08

Comments

Share this page

Post categories: conferences

Tristan Ferne|16:43 UK time, Friday, 20 June 2008

Mashed 08 is just starting at Alexandra Palace and Radio Labs is sending a crack team of developers who will be building stuff with BBC radio and music metadata. And we've also pulled together some new data for you to play with for this weekend...

XMPP Now Playing feed
Live now playing data for Radios 1, 2, 1Xtra and 6Music featuring track information, MusicBrainz artist IDs and programme identifiers for /programmes

Audio archive of BBC radio
Access to audio for the past month for the 10 national BBC radio stations with the ability to request any segment of this, accurate to the second.

Live BBC radio streams
MPEG over HTTP.

Artist playcount data
Data for how many times each radio station and DJ have played an artist. Maybe you can build a recommendations engine?

RDF for bbc.co.uk/programmes
For the semantic web fans amongst you - RDF data for brands, series, and episodes based on /programmes

Hopefully Nick Humfreys will be talking briefly about these feeds at the end of Jonathan Tweed's /programmes talk around 11am, so grab him if you want to know more.

All the data will be found here shortly:

https://mashed-audioandmusic.dyndns.org

Also featuring our data from last year's Hackday (well, what's still available) including Top of the Pops historical data, John Peel data, now playing RSS and LiveText data.

Enjoy.

Links for 19-06-2008

Comments

Share this page

Post categories: Links

Tristan Ferne|12:36 UK time, Thursday, 19 June 2008

No radio links this week, just three interesting articles on open-source hardware, serendipitous recommendations and how to unleash your creativity.

Open-source hardware | Open sesame | Economist.com
The Economist gives an introduction to open-source hardware featuring devices from openmoko, Chumby, Bug Labs et al.

...My heart's in Accra » The architecture of serendipity
A good post about how about typical recommendation systems on the web aren't about recommendation, but about prediction. They tend to do a good job of making consistent, safe recommendations but should be more open and serendipitous.

How to Unleash Your Creativity: Scientific American
Capturing, challening, broadening, surrounding and "walk out the door for 20 minutes or so and see what happens to your thinking".

More at https://del.icio.us/tristanf/work

Wikipedia + Lucene's MoreLikeThis = useful bits about the bits?

Comments

Share this page

Post categories: Design, R&D, Technology

Chris Sizemore|14:49 UK time, Friday, 13 June 2008

'bits about the bits' -- those bits that describe the narrative...

My colleague Michael recently posted about Nicholas Negroponte's prescient 1995 musings into the info glut challenges traditional TV and radio broadcasters are now feeling as a result of going digital.

Negroponte: "...we need those bits that describe the narrative with key words... these will be inserted by humans aided by machines... the[se] bits about the bits change broadcasting totally... they give [audiences] a handle by which to grab what interests [them], and [they] provide the [broadcaster] with a means to ship [its programmes] into any nook or cranny that wants them..."

I've been working for some years now on methods of providing audiences with access to BBC Radio and TV programmes based on genre, topic, and subject. In other words, I, and many of my colleagues, have been concentrating on the "bits about the bits" part of the chain.

Recently, I managed to hack a promising little "bits about the bits" prototype together, something that attempts to address in particular Negroponte's notion of "...bits that describe the narrative with key words..." My approach begins by treating Wikipedia and its articles as a Web-scale collaborative taxonomy or controlled vocabulary. Yes, for these purposes, suspend disbelief and assume Wikipedia is useful fodder for semi-automated categorisation -- whether or not it's a trustworthy or authoritative journalistic resource is an interesting debate, but isn't relevant for the job we want to do here.

My proof-of-concept is based on vacuuming every Wikipedia article into the Lucene open source search engine to build a text categorisation tool prototype. It's possible you may find this approach useful in your own "bits about the bits" endeavours.

Read the rest of this entry

Links for 06-06-2008

Comments

Share this page

Post categories: Links

Tristan Ferne|16:24 UK time, Friday, 6 June 2008

Radio and the digital native [RAB and RadioCentre]
A report from the Radio Advertising Bureau suggesting that young "Digital natives" still like the radio

Lydmur wall of radios installation by Maia Urstad
Some radio-based art.

TODAY, Mobile Application by CADA / Latest Images
A visualisation for your mobile phone that constantly evolves, showing recent call and text activity in a spiral of brightly coloured circles.

Rebel Alliance -- NBC's Heroes -- SciFi TV Shows | Fast Company
A good introduction to transmedia storytelling and the geek producers who are leading it.

Media Futures Conference 2008
"The Media Futures Conference is a one day exploration of the dynamics and trends shaping the future of media" - at Alexandra Palace, with BBC Mashed on the following day.

And finally, some RDF and ontologies for you...

A Chord Ontology
For all your harmonic descriptive needs.

An Event Ontology
For events and happenings.

Thinking Digital

Comments

Share this page

Post categories: conferences

Tristan Ferne|09:26 UK time, Thursday, 5 June 2008

Guy Strelitz, a Technical Project Manager in our team, went to the ThinkingDigital conference in Newcastle. This is his report...

It's already a couple of weeks since I was at the ThinkingDigital conference in Newcastle, run by CodeWorks, for the Regional Development Agency for the NorthEast. The three-and-a-half-day event was crammed with content - even a précis of the whole thing takes too many pages of A4 and no-one wants to read that. So I thought I'd present some highlights from Day 1.

The Future of Media
The conference ran on the basis of several guests in a session each speaking very broadly to the same theme. The Future of Media brought us Matt Locke, formerly of the BBC, now Commissioning Editor at Channel 4 Education, Eric Lindstrom and Steve Jelley, partners in VideoJuicer, an agency specialising in "online video entertainment and community websites for brand owners and video content producers" and Jeremy Silver, General Manager of Avid Education.

Matt spoke about following audience trends for educational programming at Channel 4. Channel 4 Education produces informal educational content for 14-19-year-olds, traditionally daytime TV during the school term. Hardly an ideal slot for the demographic, so they've diverted their £6-million budget from broadcast to online video. He gave an analysis of 6 different types of online social space, each fostering different types of interaction (Secret, Group, Publishing, Performing, Participation and Passive crowd), and the need to create the right type of space if you're inviting contribution from neurotic teens.

Eric and Steve spoke on the difference between online video and previous channels. 2 key lessons: 1. Defend your brand. If you want your site to appear at the top of Google's pagerank, design a kick-arse hub for original content, not an aggregator. People only go to aggregators when they don't know what they want. 2. Tell new types of story with video in the new medium. Just as television allows long-form drama with dramatically greater intricacy than cinema (think Lost vs Memento), so web-based video enables new story-telling paradigms. Eric is keen on saying that now you can tell Dickens as it was written, in bite-sized serial form.

And Jeremy Silver spoke on how digital is failing to kill music, just changing the balance of power in the industry - a process it's undergone before in previous technological shifts. He foresees "an amazing flowering at hand" in the industry...but wisely declines to predict what form it will take!

United We Stand
The wide variety of speakers started to become apparent with Darren Thwaites, Editor of the Teesside's Evening Gazette newspaper, Ian Kennedy, Cisco's Head of Technical Operations, EMEA, and Tara Hunt, online community maven.

Darren Thwaites, spoke compellingly about hyper-local journalism. An old-media print journal, the Gazette have trained volunteer 'citizen journalists' on a per-postcode basis to produce 20 extremely local online editions, composed entirely of UGC without pre-moderation. It's been a success to the point that it's spawned new print editions and fed features back to the parent paper.

Ian Kennedy spoke in fairly broad terms about research on collaboration technologies - essentially telepresence tools - developed in part with a view to fostering ground-up innovation. It included a video demo of using Second Life as a meeting space, but the highlight was undoubtedly footage of an apparent full-body telepresence hologram on a Cisco conference stage.

Finally Tara Hunt spoke passionately about the flowering of the BarCamp concept since its inception in 2005. The community's now expanding into co-working spaces in several cities around the world - geek-friendly venues where you can just show up, plug in your laptop and connect to the network. Key quote-oid *: "I don't make money from it, bit I make money because of it at events such as this."

* words to this effect anyway, and in another session later in the day

The Singularity
Ray Kurzweil signally failed to talk about the singularity. Instead he blew us all out of the water with a talk on miniaturisation of IT and its implications for human longevity. The first thing about Kurzweil's talk - he appeared long-distance using a Teleportec lectern. Not as impressive as the Cisco technology seen earlier, it was nonetheless an appropriate meeting of medium and message. Secondly, backed up by copious historical data, he made a compelling case that the spatial density of processing power has increased over decades at an exponential rate, technological limitations be damned. It shows no sign of slowing down - he predicts for instance that we'll be using 3D chips several years before the current 2D paradigm is exhausted. He certainly had our rapt attention with the implication that within decades we will have computers small enough to run in our bloodstream, increasing our intelligence and reversing aging. Apparently there are several existing research groups working on this agenda...

And sessions on the history of mulitmap from startup to Microsoft purchase, the caustic humour of Fake Steve Jobs and his purchase by the company he was rebelling against in the first place.

All the video is available from the BBC Backstage blog.

« May 2008 | | July 2008 »

Jump to more content from this blog

About this blog

This is our new blog for BBC Radio Labs - a place where we show some of our prototypes for new sites and services. They are all at an early stage of development and some of them might not work quite right, some might look a bit sketchy and they may never be taken any further. They're what we call "betas". We'll write about every new beta we release on this blog so please play with them and come back here to let us know what you think. We'll also be writing about other things we're working on, how we do our work and anything else we think you might be interested in.

Archives for June 2008

Under the iPlayer hood for radio

The Simple Joys of Web-Scale Identifiers

Microformats and RDFa and RDF

Improving the Acronym Karma

Banning some uses of the abbreviation design pattern on bbc.co.uk

The impact on /programmes

RDFa

Removing Microformats from bbc.co.uk/programmes

Radio Labs at Mashed 08

Links for 19-06-2008

Wikipedia + Lucene's MoreLikeThis = useful bits about the bits?

'bits about the bits' -- those bits that describe the narrative...

Links for 06-06-2008

Thinking Digital

About this blog

Subscribe to Radio Labs

Other BBC blogs

More from this blog...

Topical posts on this blog

Being Discussed Now

Archives

Categories

Latest contributors

BBC navigation

BBC links

Archives for June 2008

Improving the Acronym Karma

Banning some uses of the abbreviation design pattern on bbc.co.uk

The impact on /programmes

RDFa

'bits about the bits' -- those bits that describe the narrative...

About this blog

Subscribe to Radio Labs

Other BBC blogs

More from this blog...

Topical posts on this blog

Being Discussed Now

Archives

Categories

Latest contributors

BBC iD

BBC navigation

BBC links