Distributed Search Network (DSN) | Algolia

Dsn

Algolia’s Distributed Search Network (DSN) adds one or more satellite servers to a cluster. This extends the reach of an Algolia cluster into other regions, closer to end users.

Take the example of an Algolia user on the East Coast of the United States, whose cluster is close to their servers in New York. Yet, not all their end users are located on the East Coast. They might have a significant user base in California, for instance. With only a single cluster on the East Coast, Californian users may have slightly slower search performances than users in New York.

For that reason, putting a DSN server on the West Coast might be a good idea to bring data closer to the West Coast users. Adding DSN servers in strategic regions reduces network latency, improves performance, and enhances user experience.

Even though Algolia already addresses network latency by placing clusters in many regions around the world, you can go one step further by adding DSN servers into regions closer to your users.

In addition to bringing the engine closer to your users, DSN servers also extend the processing power of your clusters. They can share the load of extensive cluster activity: a user can offload requests to a DSN whenever their cluster or clusters reach peak usage. Therefore, you may sometimes choose to add DSN servers in the same region as your primary cluster.

DSN servers

A DSN server is a powerful, fully functioning, self-sufficient bare-metal machine. It’s a replication of your primary cluster. Each DSN runs independently and contains the full data and settings of its primary cluster.

The significant difference with an Algolia three-server cluster is that a DSN is a single machine, therefore they don’t provide cluster-level redundancy.

However, the DSN network is equally reliable when accessed with the official API clients. As opposed to the REST API, the official API clients implement a retry strategy that switches to the primary cluster whenever a DSN server goes down.

Getting data to the DSN

DSN servers aren’t a backup of their primary clusters: they build their own indices.

A DSN gets its data by processing indexing jobs on its own. A primary cluster sends all indexing jobs to its DSNs, which the DSNs process independently. This is how a DSN gets its data: not via backup, but by repeating the same indexing process as its primary cluster.

The primary cluster doesn’t send an indexing job to a DSN until it has finished processing it. More specifically, the DSN gets the job only once the machines on the cluster have achieved consensus. Therefore, DSNs aren’t immediately in sync with their clusters: there is a slight delay (between seconds and minutes), depending on the size of the indexing job.

To get an estimate of that delay, you need to factor in the network latency between a cluster and its DSN and add the time it takes for the DSN to process the indexing job.

How do you activate a DSN?

DSN servers are accessible on the current Standard and Premium plans with an annual commitment, as well as some paid legacy plans (before July 1, 2020). Since adding DSN servers to your application requires provisioning additional infrastructure, adding DSN servers costs extra. Please reach out to the support team for more information.

Once a DSN server is attached, Algolia takes care of the distribution and synchronization of your indices around the world. Algolia automatically routes queries to the closest data center among those you’ve selected, ensuring the best possible experience.

Note that on specific plans, you can choose to have more than one DSN in the same region. Users with a worldwide client base may need DSN servers distributed over many regions and DSNs within the same region for handling extensive usage.

You can monitor your DSNs via the Algolia dashboard.

Frontend implementation for reduced latency

A DSN can only improve network latency with frontend search implementations (web and mobile).

Why is this? If you’re using a DSN to bring data closer to your end users, Algolia needs their IP address to determine the closest server. With a backend search implementation, your end users first contact your server, wherever it’s in the world. Then, it’s your server that performs calls to Algolia, with its own IP address.

When you’re using a DSN server to reduce latency, it’s best to have your server and your primary Algolia cluster near each other to speed up backend indexing operations.

If, however, you’re using a DSN for more processing power, you can use a client or server-side search implementation.

Retries and fallback (failover) logic

All official API clients implement a retry strategy that uses up to four different URLs for every search request: one for the DSN and three for the cluster.

The first request always goes to the closest server. It could be any one of the three servers in your cluster, or it could be your DSN if you have one. If this first try works, the search goes through. If it fails, the clients activate their retry logic:

  1. Try to connect to the primary cluster using one of its three URLs.
  2. If that fails, use a second URL of the same cluster.
  3. If that fails, use a third URL of the cluster.
  4. If that fails, send a timeout.

With this fallback logic, Algolia ensures a high degree of availability over a widely distributed infrastructure.

Only the official API clients provide this kind of failover reliability. Because of this, it’s strongly recommended to use the API clients instead of the REST API directly.

Accessing DSN servers

The Algolia infrastructure (where your data lives) is addressable by five different URLs:

  • Smart records (NS1)
    • appid-dsn.algolia.net
      • This record tries to find the closest server to perform a query.
      • It contains all the cluster’s servers and, if a DSN is configured, it also considers the additional servers.
      • All servers in the configured pool are considered equal, and their location is taken into account. It means that if the configuration is one cluster + one DSN, they’re treated as four identical servers that can process searches. As long as one of them is available, the record returns the address.
    • appid.algolia.net
      • This record is used for indexing.
  • Fallback records (Cloudflare), designed to address the availability zones of the clusters
    • appid-1.algolianet.com
    • appid-2.algolianet.com
    • appid-3.algolianet.com