Fry’s Planet Word

On Top Documentary Films I came across Stephen Fry‘s excellent five-part series about language: Fry’s Planet Word (BBC Two series homepage). You may watch it all on YouTube:

  • Part 1Babel. In this first episode, Stephen seeks to uncover the origins of human language and how and why we are the only species on the planet to have this gift. From attempts to teach chimps to speak to the so-called singing mice who have been given the human language gene, Stephen uncovers to what extent our brain is uniquely hard-wired for language.
  • Part 2Identity. What is it that defines us? Stephen argues that above all, it is the way we speak. Be it a national language, a regional dialect or even class variation – we interpret and define ourselves through our language. From markets in Kenya to call centers in Newcastle, Stephen charts the shifting patterns of lingua franca and the inexorable spread of Globish (Global English).
  • Part 3Uses and Abuses. This programme looks at the ways language is used and abused. While not everyone approves of bad language, Stephen learns that swearing plays an important part in human communication the world over. He undergoes an MRI scan and discovers the parts of the brain associated with swearing – and meets a sufferer of Tourette’s and a stroke patient who swear they can’t help using the f-word.
  • Part 4Spreading the Word. In this programme, Stephen Fry explores the written word. Writing is a great invention – making it possible to communicate across space and time. Without writing we would have no history and very little technology. Stephen discovers the earliest writing – cuneiform – at the British Museum, and learns how our alphabet came from the Phoenicians.
  • Part 5The Power and the Glory. In this programme, Stephen Fry celebrates storytelling. It has been with us as long as language itself and as a species, we love to tell our stories. This desire to both entertain and explain has resulted in the flowering of language to describe every aspect of the human condition. Stephen asks just what makes a good story and why some writers just do it better. He reveals what stories make him shiver with joy or, conversely, shudder with horror.

 

OpenDocument and OOXML –all is not well

Sometimes I get the impression that much of the current talk about sustainability of data resources is just a broad way of refocusing on a complex of problems which were somewhat overlooked, probably because many of us failed to grasp the full extent of the commitment to portable data. Along with more flimsy promises such as interoperable web services, portability in terms of platform-independent data, was/is actually an attainable goal – provided, of course, “we can use the cleanly documented, well-understood, easy-to-parse, text-based formats that XML provides.” And to continue along the same lines: “XML lets documents and data be moved from one system to another with reasonable hope that the receiving system will be able to make sense out of it.” (from Elliotte Rusty Harold and W. Scott Means, XML in a XML in a nutshell)

“Reasonable hope” –yes indeed. It’s very much implied that there’s more to portability than data being re-usable across different software and hardware platforms. If they are to be re-usable across different communities and different purposes as well, there are some further questions that cannot be left unanswered. This is all well argued for by Steven Bird and Gary Simons in their seminal Seven Dimensions of Portability for Language Documentation and Description (2003).

I’m bringing it up, because with word processors like OpenOffice.org and MS Word now using XML as a storage format, people could get the impression that such reasonably well-documented formats ship with a sustainability guarantee. XML formats is a step in the right direction, but like HTML they are “only” presentational, although arguably much harder to understand than HTML, and consequently difficult to manipulate and repurpose. Consider an excerpt of the present document in ODT:

<office:body>
 ...
 <text:p text:style-name="P1">
 <text:bookmark-start text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1">OpenDocument</text:span>
 </text:span>
 <text:bookmark-end text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1"> and OOXML </text:span>
 </text:span>
 </text:p>
 <text:p text:style-name="P2">
 <text:bookmark-start text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2">all</text:span>
 </text:span>
 <text:bookmark-end text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2"> is not well</text:span>
 </text:span>
 </text:p>
 <text:p text:style-name="P3">
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T3">Sometimes I get the impression that much of the
 current talk about sustainability of data resources is just a broad way of
 refocusing ...
 </text:span>
 </text:span>
 </text:p> ...

Basically, it consists of paragraph (text:p) and span (text:span) child elements. Mind you, these are consistently used, but in terms of format the markup doesn’t really provide any information except how an application should render it. Notice how a heading is just another paragraph with different typography.

In TEI we are able to distinguish between headings, captured by the head (heading) element, and p (paragraph) elements, which should only be used to reflect a real prose paragraph. Further, headings and paragraphs are contained by a div (division) element.

<text>
...
<body>
<div>
 <head>OpenDocument and OOXML</head>
 <head>all is not well</head>
 <p>Sometimes I get the impression that much of the current talk about
 sustainability of data resources is just a broad way of refocusing ...</p>
 ...
</div>
...
</text>

In terms of content TEI markup adds another dimension. By applying the TEI terminology, people can use the Guidelines to check if we use the terminology correctly and consistently. Also, by enriching the markup with more elements we could get a broader coverage of the different aspects of the content (quotations, emphasized passages, etc.) thereby making the content relevant to more people.

So, for long-term preservation purposes, OpenDocument and OOXML don’t quite cut it. Besides the lock-in with notoriously short-lived word processor applications, they aren’t rich enough to capture relevant aspects of your content.

Hyperlocal, anyone?

This is a minor appendix to my former post. It’s intended as a bulletin board over Danish hyperlocal blogs (please note that I decide what to count as hyperlocal and that I’ve already disqualified AOK).

I’ll update it when I find something to post. So far, I’ve found these:

Hyperlocal news

In connection with familiar words like ‘blog’, ‘news’, and ‘content’, the term hyperlocal has been a buzzword, at least since the launch of the hyperlocal content network Outside.in in 2006. We’ll get back to Outside.in and why I think it’s so important, but I’ll have to set some terminology straight first:

Hyperlocal means ‘over-local’; it refers to information not only about a specific location (that would just be plain old ‘local’ information) but implies a closer affiliation with the place, typically in terms of residence or some degree of familiarity. The rationale behind it is this: When people blog about the place they live, it attracts people who see themselves as connected to the same place. Very often, good old community feeling lies at the heart of it all.

Buzz rarely originates directly from community feeling; it’s more of a down-to-earth business kinda thing, and in order to turn volatile notions as community feeling into something tangible, the idea has to translate into a business model of sorts. Around 2005, with the rise of blogging in general and neighborhood blogs like Gothamist in particular, the aggregate amount of high-quality local content had become so extensive that it was in fact starting to look like an alternative to the news coverage of mainstream local media.

In this situation, what you need to make it a real alternative, is an aggregator that lets you gather the content you want and source it to users who will be able search and browse it. While millions of readers certainly is more than your average blogger could hope for, it’s what newspapers like New York Post crucially needs, and for that they’re more than willing to pay.

In briefly sketching the hyperlocal business model, I’ll throw in a few more buzzwords (hint: do watch out for the italics!):

Premise 1: Let there be given a lot of hyperlocal content on the web

Premise 2: Let there be given a news network that will let you

  • find and collect stuff, you want to use (that’s called aggregation),
  • select what you see fit to publish (that’s curation, but if you’re bluffing, please avoid confusing curation with ‘editorial work’) and
  • publish it to your own site

Consequence: Receive lots of traffic and ad-revenue.

While refraining from adding a Quod erat demonstrandum to the argument, there’s evidence that the model is working: New York Post (here’s a page for the Flatiron District) and CNN have teamed up with Outside.in, AOL acquired Patch, and MSNBC bought EveryBlock.

Outside.in is important, because it represents a genuine intersection of blogosphere and traditional media; it’s not just another newspaper letting a few reporters do some trendy blogging. What comes to mind is that this is in fact the most extensive local news coverage I have seen: Not only is there more content, the news are also much more granular.

If you’re interested in the really big picture, you’ll be sure to get it in Outside.in co-founder Steven Berlin Johnson’s excellent talk at SXSW 2009. As a little aside, I’ll be posting a little companion piece with a (hopefully growing) list of Danish hyperlocal blogs.

[Online 09] Wendy Hall and Nigel Shadbolt, The Semantic Web Revolution – Unleashing the World’s most valuable information

Besides finding out who’s sponsoring the hugely important conference drinks, the opening keynote is also about remembering why we decided to attend in the first place. This year at London’s Online Information the main attraction, for me at least, was the track on Semantic Web. In his introduction conference chairman Adrian Dale phrased the question like this: How can we make the most out of the shift from the document-centric to the information-centric world?

The opening keynote was delivered by Dame Wendy Hall, professor in Computer Science, and Nigel Shadbolt, professor in AI, both at the University of Southampton.

First, to get an idea of where we’re going with semantic web, Wendy looked back on the evolution of the Web from read only, via read/write, and on to today’s social web. But what were the theoretical foundations of the current Web?

Well, pioneers of the Web, people like Vannevar Bush, Ted Nelson, and Doug Engelbart, envisioned more intuitive computer systems, systems that would (sort of) mimic the way we think. That is to say, we dont think in hierarchies, but in a more associative (read: “link-like”) manner. Engelbart thought such systems could augment, not replace, human intellect.

Now, on the threshold of what could be a new era in the history of the Web, we might well take a look at the lessons learned since the Web grew big in the 1990’s. Those are:

  • Big is beautiful; there has to be a certain critical mass of material/web pages, before things get going
  • The network is everything, and it doesn’t matter if parts of it is scruffy and has broken links
  • Democracy rules. If the web wasn’t open and free, it (probably) wouldn’t have taken off the way it did

But what’s missing from the web as we know it? Wendy suggests that we’ve lost the idea of conceptual linking (where targets are referenced not by their location, but by the semantics of the document). Instead, where links are missing we use search engines to fill out a gap. Nevertheless, we’re hungry to share data, and in doing so we may also, by means of RDF, structure and add meaning to it. When this is done, machines can begin making inferences.

With RDF we’re seeing a web of linked data starting to emerge. This new Web, which Nigel Shadbolt calls the Pragmatic Semantic Web, is yet another layer of abstraction on top of the Web, which was itself an abstraction on top of the physical network that existed prior to it.

The technical principles of this kind of semantic web are:

  1. the URI that enables you to refer unambiguously to resources
  2. the fact that resources can be dereferenced
  3. that it’s got RDF at the backend (this makes it flexible)
  4. linked data, which can be subjected to search (Sigma is a search engine for RDF annotated material on the internet)

To get information out of RDF triples, there’s been developed a special data access language called SPARQL (SPARQL Protocol And RDF Query Language). With SPARQL, which became a W3C recommendation in January 2008, it’s possible to answer complicated questions, such as “Give me all people born in London before 1827”. But are there any data to query?

As a matter of fact there are. Besides the BBC, the UK government is publishing (data.gov.uk) large volumes of public data which are now being described with RDF and thus being prepared to be repurposed/mashed-up by whoever’s interested. This enables users to type in a postal code and get all the public data (crime statistics, local transportation, etc) available for that area.This way, public data have social and economic value, but on a larger scale linked data matters, because it supports interoperability.

Related: Richard Wallis interviews Wendy Hall on the Semantic Web Revolution.

Watching empires decline

Here’s a little visualization of the decline of four great colonial powers: France, Great Britain, Portugal and Spain. Although I miss the Netherlands, and a delineation of the rise of colonialism, it’s great work and interesting to watch.

Visualizing empires decline from Pedro M Cruz on Vimeo.

Mutations of evolution

150 years after Charles Darwin published On the Origin of Species, the theory of evolution has proven such a fruitful concept that the terms ‘evolution’ and ‘Darwinian’ have become commonplace. Today it’s simply a model for explaining change, and judging from its many different mutations, it’s tempting to suggest some kind of evolution of the theory itself.

Darwin’s theory of evolution assumed that certain heritable traits, namely those that make the survival and successful reproduction of an organism easier, become more common in a population over the generations. It was this mechanism, Darwin referred to as natural selection, and which he described elaborately.

Back in school I used to learn about Darwinian evolution as something which has sort of warped itself to different developing needs, without conscious intervention, and that evolution certainly wasn’t the same as progress – which was what the followers of Herbert Spencer thought. But it seems to me that it’s the Spencerian, or ‘Socialdarwinian’, form of evolution (“survival of the fittest”) most people today think of when they hear the word. A couple of examples from the last few months; you have the evolution of:
  • Architectural Ideas – According to Danish architect Bjarke Ingels, some architectural ideas prove more sustainable than others. Why? Because people select the best ideas. (Here, by the way, you’ll also find the interesting contrast: evolution vs. revolution, which is also a theme here and here)
  • Blogging – Om Malik says blogs need to evolve and be more social. Why? Otherwise they will not survive the competition with social networking services like Facebook and Twitter.
This was evolution as progress. But then there’s Niall Ferguson. His take on the evolution of financial theory is more refined, and this clip is definitely worth watching. Worth a read, on the other hand, is his book The Ascent of Money. Here he offers six features shared by the financial world and evolutionary systems. I quote:
  1. ‘Genes’, in the sense that certain business practices perform the same role as genes in biology, allowing information to be stored in the ‘organizational memory’ and passed on from individual to individual or from firm to firm when a new firm is created.
  2. The potential for spontaneous mutation, usually referred to in the economic world as innovation and primarily, though by no means always, technical.
  3. Competition between individuals within a species for resources, with the outcomes in terms of longevity and proliferation determining which business practices persist.
  4. A mechanism for natural selection through the market allocation of capital and human resources and possibility of death in cases of under-performance, i.e. ‘differential survival’.
  5. Scope for speciation, sustaining biodiversity through the creation of wholly new species of financial institutions.
  6. Scope for extinction, with species dying out altogether.
Language can also be seen as an evolutionary system. No matter how much people intervene and try to restrict it, it still evolves in mysterious ways. I wonder what evolutionary features one could find there? En bien! Vive le sport!

EPUB now available on Google Books

I’m happy to learn that Google Books have made their public domain books available for download in the EPUB format. This is a nice supplement to the existing image-based PDF version, because you’re no longer tied to large size displays -which, obviously, is where PDF works best.

epub

In a previous post I outlined the advantages of EPUB, but they’re well worth restating: EPUB is a free open standard designed to make text adapt (“reflow”) even to the smallest displays, and it’s supported by a growing ecosystem of digital reading devices.

All you need to get started on classics like Treasure Island is a reader. For instance, O’Reilly’s Bookworm is free online, and available in a growing number of languages. If you’re an iPhone user, you can install Stanza. Perhaps I should add that these two readers have been reviewed in Wired.

However, Google Books is not the only place, you can download EPUBs; ManyBooks, Feedbooks and Project Gutenberg are also available.

This is not transparency

A key factor in establishing authority on the internet is, as David Weinberger convincingly argued, transparency:

What we used to believe because we thought the author was objective we now believe because we can see through the author’s writings to the sources and values that brought her to that position. Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I agree with much most of it, and perhaps the point can be further illustrated by a quick example. If you take a look at the Wikipedia article on the epistemological sense of, well, Transparency, the contrast between then and now will be clear:

WikipediaTransparency
As you can see, there’s an explanation and a reference to an article by professor Paul Boghossian. The reference is the interesting part, because in academia this is perfectly sufficient for convincing readers that the material can be trusted. At least, it leaves you with an idea of what to do when you get to the library.

But the internet isn’t like the research library at all. Here, everybody could have made the claim that a certain Paul Boghossian said so and so about transparency, but, since links to resources supporting it (e.g. Wikipedias article on Paul Boghossian, for one) are extremly few, the article isn’t transparent and doesn’t meet Wikipedia’s requirements for verifiability, let alone follow conventions of the internet media.

Transparency is not the new objectivity, but comprehensiveness just might be

In a terrific post, Transparency is the new objectivity, David Weinberger argues that the hyperlink nature of the internet is reshaping our notions of authority. With everybody suddenly a potential author, the old claim to objectivity seems more and more trite and outworn:

Objectivity used to be presented as a stopping point for belief: If the source is objective and well-informed, you have sufficient reason to believe. The objectivity of the reporter is a stopping point for reader’s inquiry. That was part of high-end newspapers’ claimed value: You can’t believe what you read in a slanted tabloid, but our news is objective, so your inquiry can come to rest here. Credentialing systems had the same basic rhythm: You can stop your quest once you come to a credentialed authority who says, “I got this. You can believe it.” End of story.

Instead we demand transparency; to be able to “see through the author’s writings to the sources and values that brought her to that position.”

Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I think that this kind of “hyper-transparency” -where citing a book isn’t enough, but where a link has to point to the actual resource- may be an essential feature of the internet medium; but whereas it certainly is a necessary condition for establishing reliability, it’s hardly sufficient. After all, what leads to reliability is not the number of hyperlinks to the author’s sources, but trust in the fact that the relevant aspects of the matter have been adequately dealt with.

So, instead of objectivity, I’d suggest ‘comprehensiveness’ as a condition for reliability. And it’s a sufficient one too, because on the internet comprehensiveness seems more than ever to subsume transparency.


Follow me

Pages

Categories

Add to Technorati Favorites
May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Flickr Photos