dkb logo

To Organize The World's Information

February 28, 2021Last updated: February 28, 2021
#search#organizing information

We take it for granted now that we can go to Wikipedia and read about how the U.S. Political System works with no hassle at all, but not too long ago, information was a lot harder to obtain. The gutenberg press wasn’t invented until the 1400s, and before that, it was a lot harder to find copies of important books and content that you wanted to read.

A really long time ago, we didn’t have much writing, and the information was passed down orally. Then we had writing, but it was scattered and disorganized. You had no way to know what writing existed, what thoughts people had already, and generally what was out there.

Around 300BC, one of the most ambitious projects of all time was undertaken in the Greek-Egyptian city of Alexandria. The Ptolemies wanted to create the intellectual capital of the world, and so they established the famous Library of Alexandria. This was not the first library, but what made it interesting was their extremely ambitious goal. They wanted to collect all of human knowledge, and bring it all together in this one place.

Though they didn’t actually collect all of human knowledge, they did a good enough job at it to establish Alexandria as the intellectual capital of the world. Alexandria became the hub of many inventions and discoveries that preceded later scientific and industrial revolutions. Some quick examples are Euclid inventing geometry, Eratosthenes calculating the circumference of the earth, and Hero inventing the first steam engine.

Importance of organizing information

This example illustrates some of the value of organizing information. It seems obvious that having information organized would be valuable, but in Alexandria we can see what that practically looks like. Because people were able to get access to the knowledge and information related to things they were working on, they could make progress much faster, and benefit from all the thinking that others had already done.

When information is organized, and accessible, people can “stand on the shoulders of giants” as Newton put it. We can read up on all the thinking and research that others have done in our field.

But it’s not just that. It also enables and feeds into curiosity. With abundant information at your fingertips, you could dive into a field you know absolutely nothing about, but seems interesting. If the information were not organized, you might not even know that this field existed.

In short, the organization and accessibility of information enables innovation, new discoveries, curiosity, exploration, and much more.

Libraries were the main institution focused on this problem, until the internet came along.

Metaphors for organizing information on the internet

The Filing Cabinets

First there were the filing cabinets. You want to find something with the name “apple”, so let’s look in the drawer that says “A”, then open that folder that says “AO - AR”, then find the folder that has the world apple on it.

This describes the original search engines, like Archie, which allowed users to search FTP file names.

The Library

Then there were the directories and portals like Jerry and David’s guide to the web which later became Yahoo. These directories were lists of hierarchical categories. You could select a category like “Science”, then a subcategory like “Physics” and get all the websites that talked about Physics.

This was much like a library. You could peruse the shelves, dive into and out of categories, and explore whatever seemed interesting to you.

The Free Market

GoTo was launched in 1998 with an interesting new concept. Websites would have to pay to be listed in search results. The higher the bid, the higher the website would show up in the rankings. Presumably, if someone has the money to afford a high spot in the search results, they must be making money, so their website must be good in some way. Low quality spam websites wouldn’t be able to outspend a legitimate business.

The Research Paper

Project Backrub was based on a simple premise. Research papers have citations, and papers with the most citations are the best ones. Links to other pages on the internet are kind of like citations. So why don’t we just copy paste this idea of ranking research papers onto the internet? And thus, a trillion dollar company was formed. It turned out that using backlinks to determine the quality of a web page was a really good idea, and it led to high quality search results.

The Encyclopedia

An encyclopedia is an attempt to summarize all knowledge. It is focused on giving factual explanations and descriptions of various things. It doesn’t go too in depth, and it doesn’t allow for opinion. The popular web equivalent is Wikipedia which allows anyone to contribute their knowledge on any subject.

Oral Tradition

One of the oldest metaphors for organizing information in human history is oral tradition. People would pass information down to one another directly. The internet version of this is Q/A websites like Quora, Stack Exchange and some parts of Reddit. Instead of searching, you just ask a question, and other people answer it. It’s direct person to person knowledge transfer.

The Town Square

Imagine that you’ve just walked into the town square. There’s a lot of people saying a lot of things. You have some people that you like, so you go listen to them talk. If someone you don’t know has a large enough crowd, you’ll walk over to them to see what’s going on. This is Twitter, and more or less all social media.

Is this the end?

Is this the best we can do? Are we done organizing the world’s information?

No.

And we’re not even close.

One question may point us in the right direction: what can’t you google right now?

Sure, you can google anything you want, but what kind of query can’t you expect good results for?

Collected Wisdom

Query: “What should I do with my life”. Google Query “What should I do with my life”

The articles do answer the question in some way, although I’m a bit skeptical that a listicle from lifehack.org will have anything worth reading. Ignoring the fact that these are merely the most SEO optimized articles and not the highest quality, the real issue with most of them is that they’re only sharing one person’s answer to the question.

“What should I do with my life” is a very common question that everyone has probably thought about. There exists a ton of answers to this question by believable people throughout various books, blogs and interviews. Indeed, the way you’d solve this is probably by reading some of those books, blogs and interviews by people you admire and trust.

But how do you know what content to dive into without any prior knowledge? And is there a faster way you could get an overview of all the different perspectives on this question?

The information is all out there, it just isn’t indexed in the right way. If it was, you would be able to get the answer to this question from a bunch of different trustworthy people, and for a variety of life contexts.

You can see an example of what this might look like at https://guzey.com/personal/what-should-you-do-with-your-life/.

Guzey What Should You Do With Your Life

Here is a page filled with pointers to relevant knowledge on the topic of “what should I do with my life” from trusted people.

Imagine if this was the search results page anytime you searched for advice on anything.

Idea Search

Query: “Give me all the content that talks about the same idea as The Refragmentation by Paul Graham”

Google Query “Give me all the content that talks about the same idea as The Refragmentation by Paul Graham”

Unsurprisingly, google has no idea what I’m talking about, and just gives me more Paul Graham essays.

But the answer to this question exists. This essay by David Perell, this book by Martin Gurri, this pmarca tweet, this Perell podcast and probably more things that I couldn’t find, all are talking about the same idea.

Wouldn’t it be useful if you could be reading any one of these pieces of content, and know about the others?

It would be like a search engine for ideas and concepts.

Perhaps something like public roam graphs will be the solution to this, but there are infinite ways to approach it.

The Library Revisited

Query: “What are all the blogs that talk about the american revolution”

Google Query “What are all the blogs that talk about the american revolution”

Once again, google really doesn’t know what I mean here. But let’s give google the benefit of the doubt on the specific wording and just take it upon ourselves to answer this question. I want to know what all the blogs are that talk about the American Revolution. How do I find them?

Maybe I can try searching for “american revolution blog” instead.

Google Query "American revolution blog"

148,000,000 results sounds like a lot, I can’t wait to see all these blogs.

By the time we get to page 19, google thinks that Marginal Revolution has something to do with the American Revolution. But let’s just keep going, there’s still some legitimate blogs about the American Revolution mixed between the garbage.

What? There’s 148,000,000 results but there’s only 270 unique ones? I’m really only allowed to go to the 27th page? Well, let’s retry that with the omitted results and see what happens.

We’ve fallen straight off the deep end to things that have nothing to do with the american revolution at all. And at the same time, we’ve only made it to page 41, and this time there’s no way to override it.

We wanted to find all the blogs that talked about the American Revolution. Google told us they had 148,000,000 results, but only showed us 410. Half of those had nothing to do with the American Revolution, some were from the same website, and not all of them were even blogs. What is going on here?

Google doesn’t actually give us access to all the webpages in its database, it only gives us a tiny segment of it. And we don’t really control what’s in the tiny segment. It’s whatever google’s algorithm thinks we want.

Maybe there are really only 200 blogs that talk about the American Revolution, or maybe there are actually 1000. Maybe the best ones are in the top google results, or maybe they’re so hidden away and obscure that you’d never find them on google. We don’t know what the answer is, because we don’t really know what’s on the internet anymore. Platforms expose a tiny subset of the internet through their algorithmic feeds, but nothing gives us true access to all the information on the internet.

yahoo classic

When Google rose and all the web directories fell, we lost the metaphor of the library. We lost the ability to explore blogs on various subjects with no real goal.

dmoz

Why would anyone want such a thing? For the American Revolution example, you might be interested in seeing all the perspectives on that topic. But more generally, you might want it out of pure curiosity. The same curiosity that might grip you if you’ve ever perused a library.

The Call

The world has benefited greatly in the past from having information organized and made accessible. When information is organized, people can easily find similar ideas to build on top of, explore their curiosities, and discover new things. Knowledge is power, and organized information makes the attainment of knowledge far more efficient.

The Research Paper and the Town Square are good metaphors that work for some cases but not for others. What other metaphors could we have? What other ways are there to organize the world’s information? I suspect there are many, many more.