tag » well formed data

The structure of internet revolutions



Posted on September 23, 2008
in Undressing the Internet, , , ,

Revolving around the ephemeral “web 2.0″ and the future of the internet, Web 2.0 Expo New York wrapped up over the weekend, and already the talks are up at the blip.tv Web2Expo page. (The San Francisco talks from April are also up for viewing.) The expo runs the gamut from technology to business, but most interesting were the talks on web 2.0 structure by Jay Adelson of Digg.com and author Clay Shirky. Shirky’s talk, It’s not Information Overload. It’s Filter Failure., especially titillated my nerd bones, so here it is in all its bald-headed, fast-talking glory:

Basically, Shirky proposes that consequences we normally attribute to an explosion of available information thanks to the internet is actually attributable to a failure of the filters in place to deal with an already abundant amount of information. Since the invention of the printing press, he says, we have lived amidst “information overload”, such that we can no longer look at the phenomenon as a problem, but as a fact. An appropriate response, then, is to build better filters.

Notice that the solution is to build, not to fix. An overload of information has been our oxygen for centuries, but the type and amount of data we deal with now is vastly different than whatever has come before. Shirky ends by separating the types of filters needed into two categories: programming and social. The latter category is pretty nebulous, but “programming” is much more concrete, and already in use today. Digg, Netflix, Google, and every other website run on extrapolating from its users’ actions is utilizing so-called collaborative filters, and these are a pivotal part of the internet’s future.

Collaborative filtering works by ranking content according to prior users’ actions (e.g., Google looks at links, the paragraphs surrounding search terms, other mysterious data), then analyzing your own actions to serve up relevant content. With sites like Digg and Reddit, this involves users upvoting or downvoting submitted sites, and then you seeing the best sites in descending order (with maybe some specializing depending on if you’re at a subpage). With sites like Netflix and Amazon, your own consumption is compared to other consumers in order to serve up recommendations.

As we approach the singularity and the web becomes more ubiquitous, collaborative filtering will become increasingly sophisticated (and accurate). As Jay Adelson mentions, Google’s search rankings have become more powerful simply because of the diversity of its users has increased. Collaborative filtering thrives on multifarious data, and this will come quickly with a larger number of netizens, and more slowly through the growing number of connections between web services. Adelson points to the economic advantages of shared data (read: better advertising targeting), but this is of course of huge value to users and developers as well.

Low-level web curation, from Metafilter to Undress Me Robot, will always have its place online, but the future is definitely in these automated processes which leverage what each user is already doing to provide a highly personalized and more effective experience. However, obviously, collaborative filtering is simply an update of the sort of filter that has been around since the 1500s, moving from the editorial eye of a single person to the gaze of millions. The next big jump in information overload might break down even these strong filters, taking a paradigm shift to get back on top. And who knows where that will take us?


Bad Web Design 101: Hiding Content From Your Users



Posted on June 5, 2008
in Undressing the Internet, , , ,

What Newspapers Still Don’t Understand About the Web is a short article by Scott Karp of Publishing 2.0 on, you guessed it, how newspapers fail online. As a test case, Karp uses The Washington Post and a recent storm in Washington D.C.:

This is the WASHINGTON Post, right? So where’s the news about Washington? We just got pounded by a nasty storm — but it’s not homepage worthy.

Despite his initial difficulty in finding information on his first go, he did later find it in the Metro section, but not before heading to Google and getting it there. A few readers lambasted Karp in the comments for being a “stupid user”, but Karp makes a good point in a rejoinder post.

Here’s the problem — my failure to find the information I wanted is not MY problem, because I went to Google and found it. I succeeded. The failure is the site’s problem, because I abandoned it and went instead to a site that would help me succeed without having to be smarter.

Basically, the solution to Karp’s difficulty finding information on the Post website is logging in (or digging deeper through the website). He logs in, customizes his homepage, and bam! local news is the first thing he sees. But who cares about how smart users navigate? If any users are leaving your website to find information that is ON YOUR WEBSITE, something is wrong.

Karp addresses his criticisms to the newspaper industry, but it is hard to not to generalize them to any type of website. The web is a new beast, and users are expecting more-and-more to find information nonlinearly (or, you know…hyperlinearly). Having navigation that forces users to traverse the website as if it were comprised of sequential pages is wrong. This isn’t print.

So what is the solution? It differs on a case-by-case basis, of course, but the bottom-line is that newspapers (specifically) need to bring more focus to their web-only content, while still allowing easy access to the traditional news that is their bread and butter. Basically, as Karp says, good content is no longer enough. Websites also need to make that content accessible to users at all levels.

I’ll sign off by saying Karp has a lot of good ideas in the above links, but they are drowned beneath nitpicking and cat-calling. Kidiot.


HistoryShots creates high-quality information graphics on a number of historical topics. From the Race to the Moon to the Geneology of Pop & Rock Music, you could also say that HistoryShots creates the artwork for my next apartment. All of it. (Well, maybe some Edward Tufte prints just to spice things up.)


Encyclopedia of Life now alive!



Posted on March 2, 2008
in Undressing the Internet, , ,

…and it’s not very good.

Almost a year ago, biologist E. O. Wilson brought into existence the Encyclopedia of Life, an exhaustive source of information on all the world’s various species. The site “launched” last year, but just to show the eventual layout and UI. Now the site is up for real (disregarding all the times the server crashes from too many visitors), but not for real real. There are only 25 “complete” pages, and still tons of interface problems.

But still! Though the EOL is now live, it is in beta (so to speak), and basically is live so as to figure out what all those interface problems are. And since they asked for it, criticisms were happily rendered. The verdict? Not so good. Basically, the site suffers from four main problems at the moment.

(1) There is a glaring lack of information, and the information that is there thus far is mismanaged at best:

I think the first release of EOL should have, at a minimum, provided at least as much information that I can get from iSpecies and Wikipedia. Why didn’t EOL? If the argument is that they want authenticated content, then this doesn’t wash. Their authenticated content is minimal, and waiting for authentication will, in my view, cripple EOL.

(2) Where is the hypertext! If I can manage a “related articles” field, then surely so can the EOL.

(3) Poor search capabilities on a site like this practically render the project useless. How do you wade through a billion pages without a good search engine?

It gets worse if I search on “Tyrannosaurus rex”. EOL doesn’t do dinosaurs, and so doesn’t contain anything on T. rex, but the search results tell me that The following 116 search results contain ‘Tyrannosaurus rex’. Nope, none of them do.

The search engine is poorly done, it fails to rank results sensibly, incorrectly reports what it does find, and has no support for spelling mistakes.

(4) Lastly, but not leastly, where’s the openness? Where are the tools necessary for actually using the data on the site?

There is a ton of structure on the site, but no support for semantics (where is the RDF?), or microformats. There is no RSS feed for a specific species or for the latest species to be added. there is no place to have a discussion. There is no API. As we leave what we knew as “web 2.0″ behind, it should be clear to anyone designing a web resource that in the absence of programmatic interactions, a site will languish. In the absence of community, the site will die. I hope EOL addresses these issues ASAP. In the absence of structured information, I’d love to be able to pull the data from EOL into Freebase, mirroring the structure and building relationships. GIVE ME AN API!!!

None of these problems are intractable, but they are disheartening considering how much time and money has already been put into the project. I hope the EOL takes to heart the constructive criticism its first offerings have brought, and implements the necessary changes. A year and $10 million should at least be enough to get the project on the right track.


Nerd Alert: Stanford Encyclopedia of Philosophy



Posted on December 20, 2007
in Undressing the Internet, , ,

Every now and I come across something especially cool (read: nerdy) on the internet. Not necessarily something major, influential, or newsworthy, but cool nonetheless. For these precious moments of geekdom, I give you nerd alert!

I begin with this nerd alert’s namesake: the Stanford Encyclopedia of Philosophy. The SEP is a free, online collection of peer reviewed articles on philosophy. Their publishing model is as stringent as any encyclopedia’s or journal’s (plus it’s Stanford), so there’s no question of quality. This is all well and good, but it’s not what I’m geeking out over. I have known about the SEP for awhile, but just noticed that they have static, quarterly archives so you can cite an article without worrying about it changing. How cool is that!

Of course, the idea of an open online repository of philosophy is pretty cool on its own. Wikipedia is certainly more extensive, but the SEP (and the more comprehensive Internet Encyclopedia of Philosophy) is absolutely more informative and scholarly. Plus, the information from the SEP is a lot more trustworthy (for those looking to be certain of their sources).

Other similar collections exist for other topics (Internet Medieval Handbook anyone?), but how many? Someone needs to find them all and create a huge, searchable directory. Hmm….


Amount of Party and Sipping Bacardi



Posted on November 27, 2007
in Undressing the Internet, ,

Normally something like this would get lost in a link dump, but it is deserving of its own post: Charts and graphs of rap song lyrics. Think of it as a hip hop indexed.

The best:

Damn it feels good to be a gangsta
Bitches ain’t shit but hoes and tricks
I got 99 problems and a bitch ain’t one

I am woefully unhip, but I lol’ed for most of them even despite not knowing the song.


Funding the Encyclopedia of Life



Posted on May 10, 2007
in Undressing the Internet, , ,

The Encyclopedia of Life has launched.

Now, for those who want some back story:

TED (Technology, Entertainment, and Design) is a yearly conference held in California, dedicated to bringing together some of the world’s greatest thinkers and doers. Over four days, 50 speakers discuss science, business, the arts, and all the big global issues facing our world. The conference has been selling out a year in advance lately, and is invite-only otherwise, but a collection of past talks is available at the TED site.

Along with the TED conference, the organization introduced the TED Prize two years ago. The prize, $100,000 and a wish to change the world, is given to three winners each year:

Three winners are chosen each year. They could be anyone with worldchanging potential: inventors or entrepreneurs, designers or artists, visionaries or mavericks, story-tellers or persuaders. But they must be people who the judges believe have the ability to inspire others to do something great for the world.

This year, one of the winners was biologist and naturalist E. O. Wilson. His wish? An Encyclopedia of Life.

From Wilson’s acceptance speech at TED2007:

Sadly, our knowledge of biodiversity is so incomplete that we are at risk of losing a great deal of it before it is even discovered. For example, about 200,000 species of all kinds of organisms are currently known from the United States, and the number could easily exceed 500,000 even without including microorganisms. Only about 15 percent of the known species have been studied well enough to evaluate their status. Of the 15 percent evaluated, 20 percent are classified as imperiled to some degree.

We are in short flying blind into our environmental future. We urgently need to change this, We need to have the biosphere properly explored so that we can understand and competently manage it. This should be a Big Science project, equivalent toe the Human Genome project. It should be thought of as a biological moonshot with a timetable. So this brings me to my wish for TEDsters and to anyone else around the world who hears this talk. I wish that we will work together to help create the key tool that we need to inspire preservation of Earth’s biodiversity: the Encyclopedia of Life. The Encyclopedia of Life. What is it? It is an encyclopedia that lives on the Internet and is contributed to by thousands of scientists around the world. It has an indefinitely expandable page for each species. It makes the key information about life on earth accessible to all on demand.

This is not the first attempt at such an endeavor (see: Wikispecies, Catalogue of Life), but it absolutely the one with the best potential to succeed. It has the most presence, especially within the scientific and intellectual community, and has started to get funding. Furthermore, it has plans to be amazingly comprehensive (or encyclopedic, if you will), with the ability to show as much or as little information necessary for the intended audience (from novice to expert). Looking for simple information? Visit the polar bear page. Want something more technical? Visit the ursus maritimus page. (See the demonstration pages for what I mean.) The encyclopedia’s ability to be not only be an expansive resource for scientists, but also accessible to the public, will be the source of its success and power.

Supplemental to Wilson’s TED speech is a page on the EOL site that gives a more in depth discussion of the encylopedia’s necessity and benefit:

At the end of the day and at a deeper level, the all-species encyclopedia will transform the very nature of biology. The reason is that biology is primarily a descriptive science. Although it depends upon a solid base of physics and chemistry for its functional explanations, and upon the theory of natural selection for its evolutionary explanations, it is defined uniquely by the particularity of its elements. Each species is a small universe in itself, from its genetic code to its anatomy, behavior, life cycle, and environmental role, a self-perpetuating system created during an almost unimaginably complicated evolutionary history. Each species merits careers of scientific study and celebration by historians and poets. Nothing of the kind can be said (at the risk of stating the obvious) for each proton or inorganic molecule.

Taxonomy, the scientific study and practice of classification, is foundational to the all-species encyclopedia. However, it is still one of the most underfunded and weakly developed biological disciplines. Worldwide as few as 6000 biologists work within it. Most people are surprised to learn that most of biodiversity is still entirely unknown. They assume that taxonomy all but wound down generations ago, so that today each new species discovered is a newsworthy event. The truth is that we do not know how many species of organisms exist on Earth even to the nearest order of magnitude. Those formally diagnosed and given Latinized scientific names are thought to number somewhere between 1.5 and 1.8 million, with no exact accounting having yet been made from the taxonomic literature. Estimates of the full number, known plus unknown, vacillate wildly according to method. As summarized in the Global Biodiversity Assessment (1995), they range from an improbable 3.6 million at the low end to an equally improbable 100 million or more at the high end. The commonest order-of-magnitude guess is ten million.

As fascinating as all of this is, I could go on quoting forever, but if you’ve managed to stay with me this long, just visit the site.


Nobel Prize now a bit stretchier



Posted on March 25, 2007
in Undressing the Internet, , ,

Besides looking good, the elastic list of Nobel prize winners provides a presentation that is interesting as much for the actual content as for its ability to effortlessly display information about that content. The elastic list visualizes the data in two main ways (better explained at the Well-formed data blog entry). First, the size of the cells is dynamic, and the cells change in size relative to the chosen filter. Second, brightness of the cells comes to indicate relative proportion during a filtered view, or “characteristicness” of the data when unfiltered.

I know I have made the whole thing sound unbearably boring, but even disregarding the technology the list is supposed to demo, it is worth a look. It is always fascinating to see the contributions to humanity that warranted this highest of honors.


Freebase and The Knowledge Web



Posted on March 14, 2007
in Undressing the Internet, ,

An editor’s note in Edge 205:

In May, 2004, Edge published Danny Hillis’s essay in which he proposed Aristotle: The Knowledge Web. “With the knowledge web,” he wrote, “humanity’s accumulated store of information will become more accessible, more manageable, and more useful. Anyone who wants to learn will be able to find the best and the most meaningful explanations of what they want to know. Anyone with something to teach will have a way to reach those who what to learn.”

To create the knowledge web, Hillis and his company Metaweb have started Freebase, “an open, shared database of the world’s knowledge”. While search engines such as Google have created a database of sorts out of all the internet’s information (i.e., websites), this database is only readable by people, and no computer-readable databases of this size exist. Freebase hopes to be just that, a database of the world’s information that can be read by computers and ouput however and wherever the user wants. But, the site is still in its alpha stages, and won’t be available to the public at large for awhile.

Until then, a middle road has emerged in the form of recently (a month or two ago) released Yahoo! Pipes, “an interactive feed aggregator and manipulator”. Basically, Pipes allows you to mashup information from various websites. For example, one of the currently popular pipes combines Yahoo! and Google web searches with del.icio.us (a social bookmarking site) to restrict the search to sites tagged in del.icio.us. It is a very simple idea, but it underlines the general process of taking information from different websites (or from different places on the same website) to create an output you could not have otherwise. There’s a good article on Pipes over at O’Reilly that goes more into why Pipes is such a big deal, and how easy it is to use.

Freebase and Pipes will certainly differ in some key ways, but in general I see Freebase as being a consolidated Pipes. On Pipes, the information still only exists spread over dozens of websites, but Freebase will have all the information right there. In a sense, it’s a question of limitations. Pipes is limited to the API’s and RSS feeds of the websites it includes, but Freebase will be limited only by its users. Like the Wikipedia for databases.

I’ve probably mauled and maimed and done a great disservice to the whole idea, so go read Esther Dyson’s “Emergent Structure vs. Intelligent Design. It’s a piece on Freebase by someone who actually has behind-the-scenes (alpha tester, huh huh?) knowledge of how Freebase works.


undressing the internet
Photoshop CS 4WES0ME
Why so serious?
You’ve Got Regret!
Proud to be a Parody
Lando Carter

music
Nana Grizol – Love It Love It
Gablé – 7 Guitars with a Cloud of Milk
Why? – Alopecia
Xiu Xiu – Women as Lovers
Rings – Black Habit

graphic novels
Astonishing X-Men #23
The Umbrella Academy #1
Rex Mundi #7
Doktor Sleepless #1 & #2
The Last Fantastic Four Story

concerts
Man Man, The Extraordinaires (3/22/08)
The Walkmen, White Rabbits, The Triggers (1/16/08)
Electric Six, We Are The Fury, The Resistors (11/07/07)
Jens Lekman (10/29/07)

interviews
Syme
Jamie Tanner
Texas is the Reason
Jason Anderson
Body Without Organs


movies
Tropic Thunder
Indiana Jones and the Kingdom of the Crystal Skull
The Ruins
There Will be Blood
No Country for Old Men


features
USA NUMBA 1
Best Musical Albums of 2007, Belated
Spotlight on Hong Kong Six