dkb logo

Interview with Neeva

June 1, 2022Last updated: June 1, 2022
#search#interview

We thought this was wrong. Search is a daily utility that all of us use. There needs to be alternate mechanisms for how we go about it. And so things like the ads-free and private model that Neeva is built on were consequences of this basic observation.

This is an interview with Sridhar Ramaswamy, Vivek Raghunathan, and Darin Fisher of Neeva.

DKB: What differentiates Neeva from Google?

Sridhar:

Three things:

  • Control over the ranking of sources in your search results.
  • Transparency about these sources.
  • Deep integrations to bring rich previews and answers directly to you.

First, some background. I ran search ads at Google for over a decade, and I could see the inexorable pressure to take up more and more space with ads. It’s one of those things where there's just so much money to be made that people come up with cleverer and cleverer theories to justify why it’s okay.

While I didn't run Google's search, I ran sister teams like Travel and Shopping. I could see that the angle Google was taking on the organic side was to show more of the content itself. This includes things like the knowledge graph and featured snippets.

And at some point I realized that the logical conclusion to all of this was that any commercial query was going to have ads because that's what makes the most money, and any non-commercial query would have content directly from Google.

That’s really the founding story of Neeva. We thought this was wrong. Search is a daily utility that all of us use. There needs to be alternate mechanisms for how we go about it. And so things like the ads-free and private model that Neeva is built on were consequences of this basic observation.

With Neeva, we started with a simple principle – put the user first. We wanted to put you back in charge of the search experience. And the philosophies that we bring into play at one level are around changing the ranking algorithm.

This has to be done carefully because one of the most prized qualities for a search engine is that it is neutral. It is not editorialized by a small set of people. The aspiration is to be the objective truth. In practice, you realize that there are always loopholes and people that are trying to game the system. And so you have to evolve. It's a living system, but the aspiration is to always be the objective truth.

In a bunch of areas, for example, in shopping, we lean in, because shopping has become so much about SEO. On Google it's about showing ads and SEO sites. So we change how our ranking works to prefer reviews more.

We're going to prefer informational content more for certain classes of queries because we think those are more useful than simply saying "okay, buy this here", which is what the search engine page has become.

Similarly, for things like programming queries, where we are sure that there is an official answer, we change the ranking algorithm to say, "Answers from the official site are generally more important. If you have a specific question about a Go language feature, Golang docs are just better."

The second thing we do is when we are not sure that large scale changes in ranking are called for, we give you agency. So on a number of queries in the legal vertical, and in the health vertical, we give you filters that can distinguish between nonprofit sites, ad supported sites, government sites, and more.

Google creates what I call the flat earth view where every site is just like every other site. We go out of our way to tell you these additional things that you should know.

Sometimes we have partnerships, for example with NewsGuard, where we show these additional attributes. You then have the ability to set preferences for sources you want more or less of.

We also work much more closely with partners. So we have partnerships with Reddit, Medium, and others, where we want to create a holistic experience on the search page, where you get your answer, but in a way that creates space for them to thrive.

This is why we announced our revenue share program early on. And so for certain classes of things, we act in concert with others. Search quality and ranking are unaffected by these partnerships.

Vivek:

Alan Kay once said that people who are serious about software need to build their own hardware. We think that there is an equivalent in the search market: people who are serious about a search engine need to have their own search stack. Whether it's crawling or indexing the web, or serving search results completely, or having your own ranking systems or having your own eval systems or your own structured parsers.

If you don't have that end-to-end, you are not going to be able to solve the problems you identified in your first article. The problems of SEO, and the problems of authority, the problems of all the great content on Reddit that you can’t get to. For example, we've actually spent the better part of a year working directly with the Reddit ranking team to make their ranking better.

Those things are only doable if you have your own search engine. In many ways, this is the flywheel that lets Neeva get better on a daily basis.

DKB: One of the biggest issues people have with Google is that many queries are super SEO optimized. The top results feel more like spammy marketing than genuine content, so people have to append Reddit or Quora to their searches to find something decent.

How do you think about ranking content to avoid this kind of problem?

Vivek:

I’ll answer that along a couple of dimensions.

First, let’s start with the core human eval and ranking systems.

Every search engine uses humans to come up with guidelines on what constitutes a good page for each query. With these guidelines at hand, search engines get human raters to generate a large amount of labeled ground truth data. And then you tune your ranking systems to optimize for the ground truth.

Typically, eval guidelines have two components. One is how well this page matches the intent of the query at hand. This first component is called “query page match” or “topicality”. The second component is how good the page is by itself, independent of any query. This is called (query-independent) “page quality”. We spend a lot of time on our page quality guidelines to make sure they reflect great user experiences.

What we do roughly is the following: There's obvious stuff we want to get rid of. Foreign language pages, malware, clones of other pages, this is a very common problem, there are all these clones of Stack Overflow.

If you have a paywall and you're very aggressive about pushing people into the paywall, we will consider those bad pages. If you have pages with lots of ad load relative to the information, we'll consider those bad pages. If the ads are in the center column and blocking the content, we'll consider that even worse. We have hundreds of pages of guidelines like this.

When a user query comes in, our machine learning scores results using a combination of factors. The query-page topicality is a big component of the model. The other huge factor in the model is the query-independent page quality. Now, generally speaking, there are hundreds of on-topic search results for any given query, so our scoring system ends up picking the highest quality pages for any given query.

The beauty of the page quality guidelines tends to be while they're page specific, all pages on a site have the same flavor to them. For example geeksforgeeks.org or experts-exchange.com will all be the same format; all having blocking interstitial ads before you can actually get to the content. And so these signals often remove all pages that are from the same class of SEO sites. So that's the core ranking which helps reduce low quality sites.

There's a second class of things we do. We’ve labeled every site on the web with a set of faceted labels (“official”, “forums”, “blogs”, …). We show these facets on every query, and you can deep dive into the facet to only get results from that class of sites.

We use the faceted labels to cluster results and call them out in the UI. For example we might give you a set of results from official documentation sites. There would be a few of them, and if you clicked more, you would get into the tab view where you see all the results from these sites.

We'll also do this for high quality forums, like Stack Overflow or Reddit. If we see four of the top eight or nine results are from places like Reddit, we'll pull it up into a forum box. This is at the top of the page and is the first thing you see. And of course we only do this if it’s topically relevant to the query.

We are very aggressive about optimizing for authority. This helps with the other end of the spectrum, where it doesn’t necessarily get rid of bad pages, but it promotes good pages.

Lastly, we extract structured content from lots of these pages and show you rich inline previews. We’re exploring and experimenting with even deeper integrations of this.

As an example, wouldn't it be cool if alongside any page on the web, when you're hovering over your search results, you could see which Reddit posts or Stack Overflow threads pointed to the page? Often that's a good sign of authority.

DKB: I know that Neeva is also working on a browser, so I’d be curious to hear how that fits into everything else you’re doing.

Darin:

Right now we're focused on mobile. We're going to make it easier for people to get access to Neeva on mobile. If you have an iPhone, you would see that Neeva is not a default choice that Apple gives you for search engines. It'll only give you five choices. And so it's necessary that we build an app.

A search app and a browser are really synonyms. You want to be able to search for many different things and multitask, that's what tabs are for. And the main UI for a browser is the search box that starts your journey, and so making the Neeva Browser was really important so that we could make it easier for people to use Neeva on mobile.

But it goes beyond that, in that we have the opportunity to innovate on the browser in a lot of interesting ways, because we're not limited only to how many search results page views we get with ads on them. Our business model unlocks our ability to bypass the search results page.

Let me explain what I mean by that. When you start typing into the Neeva Browser, we are able to suggest not just query completions, but we can just go ahead and suggest search results based on those query completions directly inline, as you're typing. There's not a lot of space in the suggest experience for ads, but we just go ahead and put the URL there.

And the awesome thing is that about 14% of the time, people actually go directly to those results in the Neeva Browser. They just bypass the search results page entirely. So we save them time, which is pretty cool. And you can think about the queries where this might be obviously the right thing, but at Google it’s difficult to not send people to the search results page.

Imagine somebody starts typing Facebook. What's the best result to put in the suggest box? Probably facebook.com. You might see that term in Chrome, but what it's going to do if you tap it, if you haven't been there before, it's going to take you to the search results page and show you a search query for facebook.com. And guess who needs to buy ads in order to make sure that their result is on top. Try doing a query for Amazon and see what happens.

Sridhar:

The query “headphones” is a more dramatic example of what you're saying. On Neeva we will actually show you the review sites for headphones in the suggest box so you can go directly there. A company like Google has zero incentive to ever do that because they really want you to go to that stack full of ads.

We are taking a brand new outlet, which is the mobile app, and saying, “We’re going to cut out the search engine." The search engine, as most people know it, disappears. And so more than half our clicks are direct from this feature, which is called FastTap.

You get to where you want to go in 500 milliseconds, the time it takes for you to parse the result with no search engine in between.

Darin:

And so this is a very simple feature, but one that makes you wonder why you didn’t have it before. And there's a really easy answer to that, because of course Google wants to take you to the search results page to show you ads. It's really hard for Google to move away from that.

And so we've been exploring a lot of things that are in this sort of vein. In the context of the browser, things like helping you get back to what you were doing before, so you don't have to search for it again, by having better organized tabs, simple things like that.

It's difficult for Google to do this, because anything they change about Chrome that makes you not visit the search results page as much, is something they don't know how to ship.

And so there's a whole swimlane of features for the browser and search experience that have to do with how we help people get to where they want to go faster. How do we help people get back to the things they were doing? How do we help people stay organized in those tasks? How do we help people find the next thing based on what they're doing and continue the journey forward?

That's the space that we're innovating in. Things like what Vivek talked about, how we can understand that there's a discussion happening about the topic you're researching on Reddit or Stack Overflow.

You can imagine how through the browser as you’re visiting webpages, we could bring that up and surface that information to you. So if you're on an API reference page, we can tell you there's a discussion about it happening on Stack Overflow.

And this is pointing towards the future a bit, but we already have a feature in the browser called NeevaScope. You can activate that on any page, and Neeva will find related content.

So these are ways in which we are innovating, and exploring how the search experience could be different. It isn't just “enter a query, then go to the search results page with ads”. It could be a lot different and better than that.

Vivek:

And I think the key insight is that browsers that want to do this in a performant way need to have their own search engine sitting next to them. There's only one other browser on the planet that has its own search engine sitting next to it. And it has 100 billion reasons not to do any of this stuff.

DKB: What would you say to people who are using Google? Why should they switch from Google to Neeva?

Sridhar:

People need to be reminded that search is their gateway to so much in this world.

And it's not just the products you are buying. It's your headache. It's your itch. It's all the deeply personal stuff that we all have questions about day in and day out. We have accepted that the ad-supported model is the only way to be.

People are genuinely shocked, and pleasantly surprised, by how much nicer the internet is once you take away tracking. And once you have a simple search engine that's about you.

The one other thing to add is that we also have a base tier that is completely free. And so compared to Duck Duck Go with ads, or Bing with ads, or Google that is full of ads, there's zero risk in trying us. Our quality is very good, it’s private, and the basic tier is completely free.