The Art of Search
Note: This blog was originally published as the first edition of our newsletter DataDuet, co-authored by Akash Tandon.
Humans are one of its kind curious species. Seeking information is primal to our nature. Interestingly, we have been trying to make the act of search as easy and efficient as possible since 245 BC, and it continues to date.
This blog explores how our search tools have evolved with the transformation in information being generated & its new face as data, enabling decision-making in various domains.
We hope this piece helps you find a way into your curiosity and ponder upon your search patterns, not just on the internet but in the physical world too!
Abstractions
We never seek things for themselves, but for the search” — Blaise Pascal
One peculiar feature of our search and what Sir Pascal is also trying to convey above is that it is a non-linear process. The journey of finding one thing does not lead to just one result but several interesting outputs, insights, and ideas, or seldom resulting in awesome discoveries.
Imagine the compound effect of a single quest, a speck of curiosity, or one search on google; how much more information does it create?! Granted, it can also be a distraction at times. Chalk that up as a cost of the quest.
Since we have started to document the findings of our curiosity and preserve them, searching through the existing information has become challenging. This problem came up first around 245 BC. When the great library of Alexandria started to grow, it became increasingly difficult for the patrons to locate the relevant material. This led to the invention of the first library catalog, ‘The Pinakes’, by Callimachus, a famous poet of that time.
From the times of handwritten scrolls to the current age of Big Data, AI, and ML, slowly turning into an Age of Reckoning, the volume and forms of information have kept increasing. Thanks to cheap storage, all the information we discovered was useful; we started collecting it more frequently and identified it as data.
Another challenge and a peculiar characteristic of the act of search is that, funnily, more often or not, we do not know what exactly we are searching for. Sounds poetic, right!? We have some guiding pointers or keywords which help us reach our destination. A basic book index, bibliography, references, etc are a few examples highlighting this behavior.
Let’s look at how technology has contributed to solving the same.
Actuality
Nowadays, when we think of search, search engines come to mind. For better and sometimes worse, we have come a long way from being restricted by information from our local newspaper and library.
As the Internet evolved, so did our search capabilities. The 90s and early 2000s saw many search engines competing for mindshare. This can be seen in the timeline here. Fast forward to today, and search is synonymous with the big G.*
*For those interested, here is the original research paper which introduces Google’s PageRank algorithm — The Anatomy of a Large-Scale Hypertextual Web Search Engine:
However, recently Google’s search results are slowly getting taken over by hidden Ads and keyword hijacking. Read more in detail here.
Quest for information goes beyond our Googles, Bings, and DuckDuckGos. Different domains, teams, and businesses require different approaches for searching through their data.
We search by drawing out connections, and having our digital data available in a connected fashion makes our search faster and more effective. Google’s knowledge graph is a testament to this approach.
A couple of applications that we’ve recently come across and used the connected data approach to search are:
- Roam research: a (note-taking) tool for creating a personal knowledge base inspired by the Zettelkasten method.
- Connected papers: a visual tool to help researchers and applied scientists find academic papers relevant to their field of work.
On a related note, here is a primer about the fascinating world of knowledge graphs.
Accompaniments
If you have used a public library, chances are they have records of your visits and the books you checked out. If the librarian was inclined to, they could go through the catalogs and analyze your interests and personality. Similarly, by the nature of their function, search tools can capture our identity and inclinations.
Privacy in the internet and social media age is as pertinent a topic as any. You can often find debates raging about it. Below are some interesting links around the topic.
- Google Privacy Policy has mirrored the evolution of the internet over the years.
- While not restricted to search, here is an interesting account of Mozilla’s mission to fix Internet privacy.
- Decentralization is often brought up as an answer to privacy concerns. When researching, we came across YaCy, a software that allows you to set up a decentralized search engine.
Trivia
- October 1st, 2015 is regarded as the official day the last library catalog card was printed.
- Can you find an anomaly or detect fraud in a dataset that follows no pattern and is as random as the heights of 60 tallest structures in the world? Yes, you can, through Newcomb-Benford law. Want to know more about it, watch episode 4, ‘Digits’ of an amazing series on Netflix named Connected.