Over the last year, Cloudflare has begun formally verifying the correctness of our internal DNS addressing behavior — the logic that determines which IP address a DNS query receives when it hits our authoritative nameserver. This means that for every possible DNS query for a proxied domain we could receive, we try to mathematically prove properties about our DNS addressing behavior, even when different systems (owned by different teams) at Cloudflare have contradictory views on which IP addresses should be returned.
To achieve this, we formally verify the programs — written in a custom Lisp-like programming language — that our nameserver executes when it receives a DNS query. These programs determine which IP addresses to return. Whenever an engineer changes one of these programs, we run all the programs through our custom model checker (written in Racket + Rosette) to check for certain bugs (e.g., one program overshadowing another) before the programs are deployed.
Our formal verifier runs in production today, and is part of a larger addressing system called Topaz. In fact, it’s likely you’ve made a DNS query today that triggered a formally verified Topaz program.
This post is a technical description of how Topaz’s formal verification works. Besides being a valuable tool for Cloudflare engineers, Topaz is a real-world example of formal verification applied to networked systems. We hope it inspires other network operators to incorporate formal methods, where appropriate, to help make the Internet more reliable for all.
Topaz’s full technical details have been peer-reviewed and published in ACM SIGCOMM 2024, with both a paper and short video available online.
Addressing: how IP addresses are chosen
When a DNS query for a customer’s proxied domain hits Cloudflare’s nameserver, the nameserver returns an IP address — but how does it decide which address to return?
Let’s make this more concrete. When a customer, say example.com
, signs up for Cloudflare and proxies their traffic through Cloudflare, it makes Cloudflare’s nameserver authoritative for their domain, which means our nameserver has the authority to respond to DNS queries for example.com
. Later, when a client makes a DNS query for example.com
, the client’s recursive DNS resolver (for example, 1.1.1.1) queries our nameserver for the authoritative response. Our nameserver returns some Cloudflare IP address (of our choosing) to the resolver, which forwards that address to the client. The client then uses the IP address to connect to Cloudflare’s network, which is a global anycast network — every data center advertises all of our addresses.
Clients query Cloudflare’s nameserver (via their resolver) for customer domains. The nameserver returns Cloudflare IP addresses, advertised by our entire global network, which the client uses to connect to the customer domain. Cloudflare may then connect to the origin server to fulfill the user’s HTTPS request.
When the customer has configured a static IP address for their domain, our nameserver’s choice of IP address is simple: it simply returns that static address in response to queries made for that domain.
But for all other customer domains, our nameserver could respond with virtually any IP address that we own and operate. We may return the same address in response to queries for different domains, or different addresses in response to different queries for the same domain. We do this for resilience, but also because decoupling names and IP addresses improves flexibility.
With all that in mind, let’s return to our initial question: given a query for a proxied domain without a static IP, which IP address should be returned? The answer: Cloudflare chooses IP addresses to meet various business objectives. For instance, we may choose IPs to:
-
Change the IP address of a domain that is under attack.
-
Direct fractions of traffic to specific IP addresses to test new features or services.
-
Remap or “renumber” domain names to new IP address space.
Topaz executes DNS objectives
To change authoritative nameserver behavior — how we choose IPs — a Cloudflare engineer encodes their desired DNS business objective as a declarative Topaz program. Our nameserver stores the list of all such programs such that when it receives a DNS query for a proxied domain, it executes the list of programs in sequence until one returns an IP address. It then returns that IP to the resolver.
Topaz receives DNS queries (metadata included) for proxied domains from Cloudflare’s nameserver. It executes a list of policies in sequence until a match is found. It returns the resulting IP address to the nameserver, which forwards it to the resolver.
What do these programs look like?
Each Topaz program has three primary components:
-
Match function: A program’s match function specifies under which circumstances the program should execute. It takes as input DNS query metadata (e.g., datacenter information, account information) and outputs a boolean. If, given a DNS query, the match function returns true, the program’s response function is executed.
-
Response function: A program’s response function specifies which IP addresses should be chosen. It also takes as input all the DNS query metadata, but outputs a 3-tuple (IPv4 addresses, IPv6 addresses, and TTL). When a program’s match function returns true, its corresponding response function is executed. The resulting IP addresses and TTL are returned to the resolver that made the query.
-
Configuration: A program’s configuration is a set of variables that parameterize that program’s match and response function. The match and response functions reference variables in the corresponding configuration, thereby separating the macro-level behavior of a program (match/response functions) from its nitty-gritty details (specific IP addresses, names, etc.). This separation makes it easier to understand how a Topaz program behaves at a glance, without getting bogged down by specific function parameters.
Let’s walk through an example Topaz program. The goal of this program is to give all queried domains whose metadata field “tag1” is equal to “orange” a particular IP address. The program looks like this:
- name: orange
config: |
(config
([desired_tag1 "orange"]
[ipv4 (ipv4_address “192.0.2.3”)]
[ipv6 (ipv6_address “2001:DB8:1:3”)]
[t (ttl 300]))
match: |
(= query_domain_tag1 desired_tag1)
response: |
(response (list ipv4) (list ipv6) t)
Before we walk through the program, note that the program’s configuration, match, and response function are YAML strings, but more specifically they are topaz-lang expressions. Topaz-lang is the domain-specific language (DSL) we created specifically for expressing Topaz programs. It is based on Scheme, but is much simpler. It is dynamically typed, it is not Turing complete, and every expression evaluates to exactly one value (though functions can throw errors). Operators cannot define functions within topaz-lang, they can only add new DSL functions by writing functions in the host language (Go). The DSL provides basic types (numbers, lists, maps) but also Topaz-specific types, like IPv4/IPv6 addresses and TTLs.
Let’s now examine this program in detail.
-
The
config
is a set of four bindings from name to value. The first binds the string”orange”
to the namedesired_tag1
. The second binds the IPv4 address192.0.2.3
to the nameipv4
. The third binds the IPv6 address2001:DB8:1:3
to the nameipv6
. And the fourth binds the TTL (for which we added a topaz-lang type)300
(seconds) to the namet
. -
The
match
function is an expression that must evaluate to a boolean. It can reference configuration values (e.g.,desired_tag1
), and can also reference DNS query fields. All DNS query fields use the prefixquery_
and are brought into scope at evaluation time. This program’s match function checks whetherdesired_tag1
is equal to the tag attached to the queried domain,query_domain_tag1
. -
The
response
function is an expression that evaluates to the specialresponse
type, which is really just a 3-tuple consisting of: a list of IPv4 addresses, a list of IPv6 addresses, and a TTL. This program’s response function simply returns the configured IPv4 address, IPv6 address, and TTL (seconds).
Critically, all Topaz programs are encoded as YAML and live in the same version-controlled file. Imagine this program file contained only the orange
program above, but now, a new team wants to add a new program, which checks whether the queried domain’s “tag1” field is equal to “orange” AND that the domain’s “tag2” field is equal to true:
- name: orange_and_true
config: |
(config
([desired_tag1 "orange"]
[ipv4 (ipv4_address “192.0.2.2”)]
[ipv6 (ipv6_address “2001:DB8:1:2”)]
[t (ttl 300)]))
match: |
(and (= query_domain_tag1 desired_tag1)
query_domain_tag2)
response: |
(response (list ipv4) (list ipv6) t)
This new team must place their new orange_and_true
program either below or above the orange
program in the file containing the list of Topaz programs. For instance, they could place orange_and_true
after orange
, like so:
- name: orange
config: …
match: …
response: …
- name: orange_and_true
config: …
match: …
response: …
Now let’s add a third, more interesting Topaz program. Say a Cloudflare team wants to test a modified version of our CDN’s HTTP server on a small percentage of domains, and only in a subset of Cloudflare’s data centers. Furthermore, they want to distribute these queries across a specific IP prefix such that queries for the same domain get the same IP. They write the following:
- name: purple
config: |
(config
([purple_datacenters (fetch_datacenters “purple”)]
[percentage 10]
[ipv4_prefix (ipv4_prefix “203.0.113.0/24”)]
[ipv6_prefix (ipv6_prefix “2001:DB8:3::/48”)]))
match: |
(let ([rand (rand_gen (hash query_domain))])
(and (member? purple_datacenters query_datacenter)
(< (random_number (range 0 99) rand) percentage)))
response: |
(let ([hashed_domain (hash query_domain)]
[ipv4_address (select_from ipv4_prefix hashed_domain)]
[ipv6_address (select_from ipv6_prefix hashed_domain)])
(response (list ipv4_address) (list ipv6_address) (ttl 1)))
This Topaz program is significantly more complicated, so let’s walk through it.
Starting with configuration:
-
The first configuration value,
purple_datacenters
, is bound to the expression(fetch_datacenters “purple”)
, which is a function that retrieves all Cloudflare data centers tagged “purple” via an internal HTTP API. The result of this function call is a list of data centers. -
The second configuration value,
percentage
, is a number representing the fraction of traffic we would like our program to act upon. -
The third and fourth names are bound to IP prefixes, v4 and v6 respectively (note the
built-in ipv4_prefix
andipv6_prefix
types).
The match function is also more complicated. First, note the let
form — this lets operators define local variables. We define one local variable, a random number generator called rand
seeded with the hash of the queried domain name. The match expression itself is a conjunction that checks two things.
-
First, it checks whether the query landed in a data center tagged “purple”.
-
Second, it checks whether a random number between 0 and 99 (produced by a generator seeded by the domain name) is less than the configured percentage. By seeding the random number generator with the domain, the program ensures that 10% of domains trigger a match. If we had seeded the RNG with, say, the query ID, then queries for the same domain would behave differently.
Together, the conjuncts guarantee that the match expression evaluates to true for 10% of domains queried in “purple” data centers.
Now let’s look at the response function. We define three local variables. The first is a hash of the domain. The second is an IPv4 address selected from the configured IPv4 prefix. select_from
always chooses the same IP address given the same prefix and hash — this ensures that queries for a given domain always receive the same IP address (which makes it easier to correlate queries for a single domain), but that queries for different domains can receive different IP addresses within the configured prefix. The third local variable is an IPv6 address selected similarly. The response function returns these IP addresses and a TTL of value 1 (second).
Topaz programs are executed on the hot path
Topaz’s control plane validates the list of programs and distributes them to our global nameserver instances. As we’ve seen, the list of programs reside in a single, version-controlled YAML file. When an operator changes this file (i.e., adds a program, removes a program, or modifies an existing program), Topaz’s control plane does the following things in order:
-
First, it validates the programs, making sure there are no syntax errors.
-
Second, it “finalizes” each program’s configuration by evaluating every configuration binding and storing the result. (For instance, to finalize the
purple
program, it evaluatesfetch_datacenters
, storing the resulting list. This way our authoritative nameservers never need to retrieve external data.) -
Third, it verifies the finalized programs, which we will explain below.
-
Finally, it distributes the finalized programs across our network.
Topaz’s control plane distributes the programs to all servers globally by writing the list of programs to QuickSilver, our edge key-value store. The Topaz service on each server detects changes in Quicksilver and updates its program list.
When our nameserver service receives a DNS query, it augments the query with additional metadata (e.g., tags) and then forwards the query to the Topaz service (both services run on every Cloudflare server) via Inter-Process Communication (IPC). Topaz, upon receiving a DNS query from the nameserver, walks through its program list, executing each program’s match function (using the topaz-lang interpreter) with the DNS query in scope (with values prefixed with query_
). It walks the list until a match function returns true
. It then executes that program’s response function, and returns the resulting IP addresses and TTL to our nameserver. The nameserver packages these addresses and TTL in valid DNS format, and then returns them to the resolver.
Topaz programs are formally verified
Before programs are distributed to our global network, they are formally verified. Each program is passed through our formal verification tool which throws an error if a program has a bug, or if two programs (e.g., the orange_and_true
and orange
programs) conflict with one another.
The Topaz formal verifier (model-checker) checks three properties.
First, it checks that each program is satisfiable — that there exists some DNS query that causes each program’s match function to return true
. This property is useful for detecting internally-inconsistent programs that will simply never match. For instance, if a program’s match expression was (and true false)
, there exists no query that will cause this to evaluate to true, so the verifier throws an error.
Second, it checks that each program is reachable — that there exists some DNS query that causes each program’s match function to return true
given all preceding programs. This property is useful for detecting “dead” programs that are completely overshadowed by higher-priority programs. For instance, recall the ordering of the orange
and orange_and_true
programs:
- name: orange
config: …
match: (= query_domain_tag1 "orange")
response: …
- name: orange_and_true
config: …
match: (and (= query_domain_tag1 "orange") query_domain_tag2)
response: …
The verifier would throw an error because the orange_and_true
program is unreachable. For all DNS queries for which query_domain_tag1
is ”orange”, regardless of metadata2
, the orange
program will always match, which means the orange_and_true
program will never match. To resolve this error, we’d need to swap these two programs like we did above.
Finally, and most importantly, the verifier checks for program conflicts: queries that cause any two programs to both match. If such a query exists, it throws an error (and prints the relevant query), and the operators are forced to resolve the conflict by changing their programs. However, it only checks whether specific programs conflict — those that are explicitly marked exclusive. Operators mark their program as exclusive if they want to be sure that no other exclusive program could match on the same queries.
To see what conflict detection looks like, consider the corrected ordering of the orange_and_true
and orange
programs, but note that the two programs have now been marked exclusive:
- name: orange_and_true
exclusive: true
config: ...
match: (and (= query_domain_tag1 "orange") query_domain_tag2)
response: ...
- name: orange
exclusive: true
config: ...
match: (= query_domain_tag1 "orange")
response: ...
After marking these two programs exclusive, the verifier will throw an error. Not only will it say that these two programs can contradict one another, but it will provide a sample query as proof:
Checking: no exclusive programs match the same queries: check FAILED!
Intersecting programs found:
programs "orange_and_true" and "orange" both match any query...
to any domain...
with tag1: "orange"
with tag2: true
The teams behind the orange
and orange_and_true
programs respectively must resolve this conflict before these programs are deployed, and can use the above query to help them do so. To resolve the conflict, the teams have a few options. The simplest option is to remove the exclusive setting from one program, and acknowledge that it is simply not possible for these programs to be exclusive
. In that case, the order of the two programs matters (one must have higher priority). This is fine! Topaz allows developers to write certain programs that absolutely cannot overlap with other programs (using exclusive
), but sometimes that is just not possible. And when it’s not, at least program priority is explicit.
Note: in practice, we place all exclusive programs at the top of the program file. This makes it easier to reason about interactions between exclusive and non-exclusive programs.
In short, verification is powerful not only because it catches bugs (e.g., satisfiability and reachability), but it also highlights the consequences of program changes. It helps operators understand the impact of their changes by providing immediate feedback. If two programs conflict, operators are forced to resolve it before deployment, rather than after an incident.
Bonus: verification-powered diffs. One of the newest features we’ve added to the verifier is one we call semantic diffs. It’s in early stages, but the key insight is that operators often just want to understand the impact of changes, even if these changes are deemed safe. To help operators, the verifier compares the old and new versions of the program file. Specifically, it looks for any query that matched program X in the old version, but matches a different program Y in the new version (or vice versa). For instance, if we changed orange_and_true
thus:
- name: orange_and_true
config: …
match: (and (= query_domain_tag1 "orange") (not query_domain_tag2))
response: …
Our verifier would emit:
Generating a report to help you understand your changes...
NOTE: the queries below (if any) are just examples. Other such queries may exist.
* program "orange_and_true" now MATCHES any query...
to any domain...
with tag1: "orange"
with tag2: false
While not exhaustive, this information helps operators understand whether their changes are doing what they intend or not, before deployment. We look forward to expanding our verifier’s diff capabilities going forward.
How Topaz’s verifier works, and its tradeoffs
How does the verifier work? At a high-level, the verifier checks that, for all possible DNS queries, the three properties outlined above are satisfied. A Satisfiability Modulo Theories (SMT) solver — which we explain below — makes this seemingly impossible operation feasible. (It doesn't literally loop over all DNS queries, but it is equivalent to doing so — it provides exhaustive proof.)
We implemented our formal verifier in Rosette, a solver-enhanced domain-specific language written in the Racket programming language. Rosette makes writing a verifier more of an engineering exercise, rather than a formal logic test: if you can express the interpreter for your language in Racket/Rosette, you get verification “for free”, in some sense. We wrote a topaz-lang interpreter in Racket, then crafted our three properties using the Rosette DSL.
How does Rosette work? Rosette translates our desired properties into formulae in first-order logic. At a high level, these formulae are like equations from algebra class in school, with “unknowns” or variables. For instance, when checking whether the orange program is reachable (with the orange_and_true
program ordered before it), Rosette produces the formula ((NOT orange_and_true.match) AND orange.match)
. The “unknowns” here are the DNS query parameters that these match functions operate over, e.g., query_domain_tag1
. To solve this formula, Rosette interfaces with an SMT solver (like Z3), which is specifically designed to solve these types of formulae by efficiently finding values to assign to the DNS query parameters that make the formulae true. Once the SMT solver finds satisfying values, Rosette translates them into a Racket data structure: in our case, a sample DNS query. In this example, once it finds a satisfying DNS query, it would report that the orange
program is indeed reachable.
However, verification is not free. The primary cost is maintenance. The model checker’s interpreter (Racket) must be kept in lockstep with the main interpreter (Go). If they fall out-of-sync, the verifier loses the ability to accurately detect bugs. Furthermore, functions added to topaz-lang must be compatible with formal verification.
Also, not all functions are easily verifiable, which means we must restrict the kinds of functions that program authors can write. Rosette can only verify functions that operate over integers and bit-vectors. This means we only permit functions whose operations can be converted into operations over integers and bit-vectors. While this seems restrictive, it actually gets us pretty far. The main challenge is strings: Topaz does not support programs that, for example, manipulate or work with substrings of the queried domain name. However, it does support simple operations on closed-set strings. For instance, it supports checking if two domain names are equal, because we can convert all strings to a small set of values representable using integers (which are easily verifiable).
Fortunately, thanks to our design of Topaz programs, the verifier need not be compatible with all Topaz program code. The verifier only ever examines Topaz match functions, so only the functions specified in match functions need to be verification-compatible. We encountered other challenges when working to make our model accurate, like modeling randomness — if you are interested in the details, we encourage you to read the paper.
Another potential cost is verification speed. We find that the verifier can ensure our existing seven programs satisfy all three properties within about six seconds, which is acceptable because verification happens only at build time. We verify programs centrally, before programs are deployed, and only when programs change.
We also ran microbenchmarks to determine how fast the verifier can check more programs — we found that, for instance, it would take the verifier about 300 seconds to verify 50 programs. While 300 seconds is still acceptable, we are looking into verifier optimizations that will reduce the time further.
Bringing formal verification from research to production
Topaz’s verifier began as a research project, and has since been deployed to production. It formally verifies all changes made to the authoritative DNS behavior specified in Topaz.
For more in-depth information on Topaz, see both our research paper published at SIGCOMM 2024 and the recording of the talk.
We thank our former intern, Tim Alberdingk-Thijm, for his invaluable work on Topaz’s verifier.