Reliable Webservers with Go - Part 1

Introduction#

Welcome!

Building software is surprisingly similar to many other types of engineering. If you compare software development to architecture, some software projects are similar to being a child and building a blanket fort. It solves your needs, but does not last long. There is a big difference between a blanket fort and a skyscraper: the amount of thought, time, planning, research and resources are all drastically different. Similarly, building a small command-line program is very different from building a large software product. The great difference between software and building physical things is that you can build skyscraper-esque projects by yourself, or with a small group, without large amounts of resources.

In this book, we hope to get you thinking about the types of problems larger production systems have to deal with. We also want to teach you Go !

In the following chapters, you will find three things:

  • Interesting projects for you to build. Each chapter walks you through building or expanding a project, helping you understand our thought process around the code, and explaining why decisions were made.
  • Some thinking around building large-scale systems, and how modern software infrastructure is built.
  • Some teaching on Go of the sort that you won’t get from free online tutorials or just by reading the docs.

Projects#

Across our chapters, we will walk through how to build the following things:

  • key-value server
  • canary deployment proxy
  • real-time chat service

We will spend the majority of our time on the key-value server, walking through how to build, test, monitor and grow it.

For the other projects, we will focus on their most interesting aspects, showing how to take what we learned with the key-value server and apply it to various types of web projects.

Useful research tools#

Go is an incredibly well-documented language. godoc.org generates documentation for any publicly accessible Go package. You can click on any function in the docs, and it will take you to the function definition in the code. It has documentation for both random libraries and all of the Go standard library.

If you’re having trouble finding a package or library for Go, the suggested search that people use is “golang”, because “go” is a very common word, and the name of a famous board game!

If you need examples on how to do something specific, check out gobyexample.com. It’s built by Mark McGranaghan, and has lots of wonderful examples.

The history of Go#

Go often feels very similar to C like languages. Its authors Robert Griesemer, Rob Pike, and Ken Thompson wanted to fix the issues Google was suffering with languages like C++, Java and Python. It was designed to be easy to debug and fast to write, compile, and run. It was started in 2007, and some of its major contributors’ backgrounds are evident in the language.

  • Brad Fitzpatrick - Creator of LiveJournal and lots of popular open-source software including memcached, PubSubHubbub, OpenID, and Perkeep.
  • Ian Lance Taylor - Contributor to GCC since 2010. Maintained GNU Binutils from 1996 to 1999.
  • Keith Randall - Researcher from MIT and Compaq’s System Research Center focusing on high-performance computing.
  • Ken Thompson - Worked at Bell Labs, basically invented Unix and Regular Expressions.
  • Marcel van Lohuizen - Worked on Google’s Search and Borg distributed systems.
  • Rob Pike - Invented UTF-8, wrote the first graphical interface for Unix and is one of the creators of Plan9.
  • Robert Griesemer - Worked on Java’s virtual machine, Google’s V8 and Sawzall languages.
  • Russ Cox - Worked at Bell Labs, contributed to Plan9.

Installing Go#

The best place for you to learn how to install Go is the official documentation. There, they cover how to install it across a variety of operating systems and environments.

For this book, you’ll need Go version 1.16 or later. So head over there, get go installed, and then we’ll go to the next section.

Go basics#

The best way to get introduced to Go is to go through tour.golang.com. We will assume you have done that before starting this book. If you have not gone through it yet, or it has been a while, some syntactic reminders are below.

a := 10 means to assign and allocate a new variable. It is equivalent to writing:

var a int64 = 10

defer is a magic word that says run this function when this scope closes . It’s often used to close file connections when the file is no longer being used.

go test() is a Go function. This allows quick creation of a new thread and the function to be run concurrently.

Functions, types and variables that start with capital letters are public and exported from a package. They are visible in documentation that is generated. Their documentation is autogenerated from comments in the line above the function or type that starts with their name. Comments are prefixed with // .

// Export is a function that exports things. This is a documentation comment on
// a function. It is good style to put documentation on all public things in Go.
func Export() error {
  return nil
}

go fmt formats Go files. Go has a published style guide that is followed by most folks in the community. You can read about it at Effective Go and Go Code Review Comments. Jaana Dogan has a popular article on the topic Style guideline for Go packages. You can use Go’s golint tool to tell you when you are not following the most common style suggestions. go vet also tries to catch common mistakes.

Types#

Go is strongly typed , and every variable has a type . You can create new types using the statement type . This is often used in conjunction with struct to create new complex data structures.

type Data struct {
  Date time.Time
  UserID int64
  Content string
}

The above type is named Data and has three public fields: Date (of type Time ), UserID (of type int64 ) and Content (of type string ).

An int is not the same as an int64 . To cast something from one type to another, use the type as a function. For example:

var a int
var b int64
a = 10
b = int64(a)

To check the type of something, you can use the .() operator. a := "hello".(string) will set a to "hello" because it is a string. If it wasn’t, the code would panic. If you want to do a type assertion without panicking:

var i interface{} = "hello"
if s, ok := i.(string); ok {
  log.Printf("%q is a string!", s)
}

Modules#

Go Modules are how packages are managed in Go. If you want to import code from other repositories, it is done by specifying a URL that maps to a repository. For example, all the code in this book uses the root prefix as github.com/fullstackio/reliable-go . Each project is a submodule of that, for example github.com/fullstackio/reliable-go/distributed-key-value-store . These don’t have to be working URLs, but are rather namespaces for packages.

To create a new package, run go mod init $PROJECT_PATH , where $PROJECT_PATH is something like github.com/username/projectname . This will create a go.mod that looks like:

module github.com/username/projectname go 1.16

module github.com/username/projectname

go 1.16

If you run go build or go get in this directory, or subdirectories (Go limits us to one package per directory), your dependencies will be added to the go.mod , and a snapshot of the hashes of the packages will be stored at go.sum .

If you set the $GOPROXY variable, Go can compare those hashes against a global repository. proxy.golang.org is a popular repository provided by the Go team at Google.

It is recommended to run go mod tidy regularly to keep go.sum and go.mod from growing too large (by default, most Go tools only append to the files).

A key-value server#

In programming, one of the most traditional data structures is the hash table. It exists in almost every programming language, a one-dimensional mapping of a key (usually a string) to a value. In Go, these are called maps , in Python they are dictionaries , in JavaScript they are objects .

In distributed systems, distributed hash tables are used - shared, one-dimensional data structures that many programs can use. One program on one machine can set a key-value pair, and another program on another network can access it, and modify it.

There have been many famous key-value servers, including:

  • Chubby by Google
  • memcached by Brian Fitzpatrick at LiveJournal
  • Redis by Salvatore Sanfilippo
  • etcd by CoreOS
  • Zookeeper by Yahoo
  • Consul by Hashicorp

Each server has benefits and deficiencies as each was built with a different goal in mind. Some, such as memcached and Redis were built for fast data access. Others, such as Chubby, Zookeeper and Consul, originally built as lock servers, were built to focus on consistency, ensuring all changes were consistent.

In this chapter we will be building a very simple key-value server, to introduce the basics of building a Go web server and set up the basic layout for the modifications we will make throughout this section.

Our first server will only work with one instance, and not share data. In Chapter 4, we will expand it to contain syncing among distributed servers.

A “hello world” web server#

Create a directory called key-value-store , and in that directory run go mod init key-value-store . Then make a new file called server.go and add the following to it:

package main

import (
	"fmt"
	"log"
	"net/http"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, "Hello world\n")
	})

	log.Fatal(http.ListenAndServe(":8080", nil))
}

Above we have got a lot of code, but this is the most basic Go web server.

In Go, packages are scoped by folder, so files in the same folder should all have the same package name. In this case, we have gone with the package name main . main is the name of any package that outputs an executable, and has a main() function defined in it.

Below the package definition, we have a list of imports. We have separated the packages in the standard library and any external dependencies. Usually if it looks like a URL, it is an external dependency. godoc.org has documentation for the latest versions of both the standard library packages. To get a package’s documentation, just go to godoc.org/packagename . For example, godoc.org/github.com/go-chi/chi has the documentation for the Chi package which we will be discussing soon.

In our first example, we are just using standard library packages: fmt , log and net/http .

If we go run . in this directory, we will get a server running on port 8080. If we make a GET request to / , we will get the string Hello world .

$ curl -svL localhost:8080
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 01 Mar 2021 16:04:16 GMT
< Content-Length: 12
< Content-Type: text/plain; charset=utf-8
<
Hello world
* Connection #0 to host localhost left intact
* Closing connection 0

The final line of our main function begins with log.Fatal . The log package has a bunch of functions to write timestamped messages to standard error. Fatal makes the program exit with a status code of 1. This is useful because any exit message from http.ListenAndServe will be logged, and because this http server runs forever by default, it has no case where it should exit with success.

A “hello world” web server with Chi#

The standard library is great for building simple HTTP servers. But if you want to add complex routing logic, we recommend the Chi library. It adds a bunch of great functions, and a router, that all use the standard library.

A router or mulitplexer is included by default in Go, it’s called http.ServeMux and if you don’t declare a router, http.DefaultServeMux is used by default. While Go’s http.ServeMux is great, it doesn’t allow two common types of request pattern matching: HTTP Method and variables. Chi adds these and a bunch of other things. Let’s update our server.go file to contain the following code instead of what we originally wrote before:

package main import ( “log” “net/http” “os” “github.com/go-chi/chi” ) func main() { // Get port from env variables or set to 8080. port := “8080” if fromEnv := os.Getenv(“PORT”); fromEnv != “” { port = fromEnv } log.Printf(“Starting up on http://localhost:%s”, port) r := chi.NewRouter() r.Get("/", func(w http.ResponseWriter, r *http.Request) { w.Write([]byte(“hello world.”)) }) log.Fatal(http.ListenAndServe(":"+port, r)) }

package main

import (
	"log"
	"net/http"
	"os"

	"github.com/go-chi/chi"
)

func main() {
	// Get port from env variables or set to 8080.
	port := "8080"
	if fromEnv := os.Getenv("PORT"); fromEnv != "" {
		port = fromEnv
	}
	log.Printf("Starting up on http://localhost:%s", port)

	r := chi.NewRouter()
	r.Get("/", func(w http.ResponseWriter, r *http.Request) {
		w.Write([]byte("hello world."))
	})

	log.Fatal(http.ListenAndServe(":"+port, r))
}

Some changes to this code block from the first. We have added a simple if statement to get the port to run on from an environment variable called PORT . Next we have r := chi.NewRouter() . The reason we are using Chi is to get a Router. r is an instance of Chi’s router, which will receive every HTTP request to the server. The line log.Fatal(http.ListenAndServe(":"+port, r)) sets up our server to listen on the port we defined, and send everything to the router we created. It logs any errors and exits if the server returns anything.

We use the Chi Router for two reasons. The first is that it is compatible with the Go standard library package net/http . There are a bunch of HTTP frameworks out there, but all of the easiest-to-use ones integrate with net/http , because it provides so much for free. Another router that supports net/http is Gorilla’s Mux.

If you write code that works with net/http , like Chi does, you can switch between frameworks easily depending on what features you need. You also have a consistent API for people writing middleware or handlers for you. For example, all of Chi’s router functions take a function with definition func(w http.ResponseWriter, r *http.Request) , which is the exact same as net/http 's HandlerFunc definition:

type HandlerFunc func(ResponseWriter, *Request)

The second reason we’re using it is because it is popular and has an active ecosystem. People have contributed lots of middleware and documentation to make the framework easy to understand and use. For instance, if we wanted to add request logging to the above example, we could write something like:

r.Use(func(next http.Handler) http.Handler {
	fn := func(w http.ResponseWriter, r *http.Request) {
		log.Printf("got request: %+v", r)
		next.ServeHTTP(w, r)
	}
	return http.HandlerFunc(fn)
})

This middleware is fine, but its output is hard to read, and also does not allow for much customization. As such, there is a middleware package that comes with Chi, github.com/go-chi/chi/middleware . To use it, you can just plop in r.Use(middleware.Logger) . But if you do not like that, you could also check out github.com/rs/zerolog or github.com/sirupsen/logrus , which both have drop-in middleware that work with net/http -compatible frameworks. We’ll talk a bit more about logging and monitoring in the Monitoring chapter.

Extending our server to use JSON#

Going back to our simple server, the line a few examples above that starts with r.Get( is the definition of our first route. Here we are passing in a lambda as an argument because this handler function is so simple. If the function was over a few lines, we could define it somewhere else, and reference it here.

w in that function is essentially a fancy buffer of what we will return to our users. Right now we are just writing a simple string. But we want to change it to return JSON. To do that, we replace w.Write([]byte("hello world.")) with JSON(w, map[string]string{"hello": "world"}) and then define a new JSON function at the bottom of our server.go file:

// JSON encodes data to json and writes it to the http response.
func JSON(w http.ResponseWriter, data interface{}) {
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	b, err := json.Marshal(data)
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		JSON(w, map[string]string{"error": err.Error()})
		return
	}

	w.Write(b)
}

Our output is now closer to what we might expect from a JSON API. Note that we will have to import an extra package for this function to work, specifically encoding/json from the Go standard library.

We now have a basic hello world JSON API server! In the folder where we’ve saved this file, we should be able to run go run . and get our web server running on http://localhost:8080. If that doesn’t work the first time, you probably received an error message mentioning that “no required module provides package github.com/go-chi/chi”. That handy message also tells you how to resolve the situation - by running go get github.com/go-chi/chi in the same folder. Then try go run . again.

Creating an in-memory hash table API#

We are now starting to make our key-value server. Using our hello world server that returns JSON as a base, we can extend it with all the rest of the pieces a key-value server needs.

Basic handlers Get, Set & Delete#

We can start adding handlers for our basic business logic. We will need three essential functions:

  • Get a value by key
  • Set a key with a value
  • Delete a key

Put this below the existing r.Get("/" ...) section within the main func.

r.Get("/key/{key}", func(w http.ResponseWriter, r *http.Request) {
	key := chi.URLParam(r, "key")

	data, err := Get(r.Context(), key)
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		JSON(w, map[string]string{"error": err.Error()})
		return
	}

	w.Write([]byte(data))
})

Let’s begin with the Get handler. It has three things worth talking about.

First is the URL pattern: "/key/{key}" , which is the first argument to the r.Get function. This is like saying “for every request URL that starts with /key/, send it to this handler, with the variable key containing everything after the second slash”. The handler in this case is the second argument to r.Get - an anonymous function which matches type of a http.HandlerFunc .

This is a feature of Chi, as Get is a function of Chi’s router. Sadly, Go is a very explicit language, so if you’re used to Ruby or JavaScript which just might make a magic variable available, this might come as a shock, but in Go, you must ask for the variable. So the first line in the function, we get the data from the request URL and save it to a local variable with key := chi.URLParam(r, "key") .

A lot of developers might immediately take this variable and do data validation on this incoming key. This is important, but we will save that for later, because we are just going to call a yet-to-be defined function named Get() , which will do the actual work of getting the data. We do this so we can separate concerns. The web server code is distinct from the key/value storage code.

This Get() function call brings us to the second thing we want to talk about, r.Context() . Context was introduced into Go around version 1.7 in 2014. Context is a concept that carries request-scoped data with it. We could add a timeout or extra tracing data to our program, and with context, it is plumed all the way through. Each function that receives a context is in charge of dealing with it, and also not harming the package-scoped variables shoved in there. Mostly Go deals with this for us, and you’ll find most modern Go libraries deal with context correctly. To start with, we won’t do anything with the context, but leave it there so we can easily pass through data later.

The third thing to talk about is the error handling. We skipped over this in the JSON function, but we will address it now. The Get function, like many things in Go, returns both data and an error. This tends to confuse many programmers used to languages with throw or raise exception functionality. Go doesn’t have that. Instead it passes around an instance of the error type. If it is nil, data is good, and if it is not nil, there was an error and you need to handle it. In this case, we are setting the status code to 500 (which is what the constant http.StatusInternalServerError equals), and then returning a JSON blob to the user with the string inside the error.

That’s it, now we have a basic Get handler!

If we look at the Delete handler, which responds to an HTTP DELETE verb instead of the GET verb, we can see it looks almost identical. This should be added after our r.Get handler from above.

r.Delete("/key/{key}", func(w http.ResponseWriter, r *http.Request) {
	key := chi.URLParam(r, "key")

	err := Delete(r.Context(), key)
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		JSON(w, map[string]string{"error": err.Error()})
		return
	}

	JSON(w, map[string]string{"status": "success"})
})

Finally, we will need to be able to set values, which we will do with a Set handler, dealing with POST requests, which looks similar, but slightly different, as we will need to parse the POST request body. Put this after the r.Delete handler we just added.

r.Post("/key/{key}", func(w http.ResponseWriter, r *http.Request) {
	key := chi.URLParam(r, "key")
	body, err := io.ReadAll(r.Body)
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		JSON(w, map[string]string{"error": err.Error()})
		return
	}

	err = Set(r.Context(), key, string(body))
	if err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		JSON(w, map[string]string{"error": err.Error()})
		return
	}

	JSON(w, map[string]string{"status": "success"})
})

This uses a new package, called io which simplifies the act of reading from a buffer via io.ReadAll . There are downsides to this, such as not being able to check buffer sizes before reading into memory, but for our case right now, we can let it slide. In the future we might want to put a maximum value size, in which case we might want to modify this line.

A quick note about io.ReadAll . If you are using a version of Go that is older than 1.16, you’ll quickly find out that this function doesn’t exist. That is because it exists in a package called io/ioutil . ioutil has a bunch of convenient functions for reading and writing to files, but the package itself isn’t the obvious place for any of these functions. So in go 1.16 they were split between the io and os packages.

The rest of the function is pretty close to Get and Delete. The only additional thing we are doing is casting the request body from []byte into a string .

This code won’t compile yet, because we haven’t written our three library functions.

The actual data storage work#

Our first pass at data storage will be to store the data in memory. The problem with this is if we run more than one of these servers, they will not be able to share state. But for a first pass it’s fine.

At the top of the server.go file, just above the main function, let’s create a global variable to store our map.

var data = map[string]string{}

With this, our three data functions will be quite simple, as we are just adding and removing data from a map. Put these three functions at the end of the server.go file, and add the context package to the list of imports at the top of the file.

func Set(ctx context.Context, key, value string) error {
  data[key] = value
  return nil
}

func Get(ctx context.Context, key string) (string, error) {
  return data[key], nil
}

func Delete(ctx context.Context, key string) error {
	delete(data, key)

	return nil
}

And voila! We have our basic system.

There are all sorts of problems with this. Let’s make a list!

  • If the server reboots, we lose all of our data, because there is no persistence.
  • If we spin up two instances of the same server, they will not share data.
  • But if we keep a single server, we will have a SPOF (single point of failure) meaning any service that depends on this might fail if this server fails.

Given these three concerns, let us walk through some ideas on how we might fix them.

Adding Persistence#

We could solve persistence in a few ways:

  • We could write the data to the hard drive. The data would be machine-local, but available if the server reboots or the process dies.
  • We could introduce a new dependency, like another database. This could work, but for now, we want to keep our system as simple as possible, with as few dependencies as possible.

So - let’s try writing things out to the hard drive.

// Get gets the value at the specified key.
func Get(ctx context.Context, key string) (string, error) {
	data, err := loadData(ctx)
	if err != nil {
		return "", err
	}

	return data[key], nil
}

First we have modified our three functions. They still do the same job, but instead of just dealing with the data in memory, they now call loadData and saveData whenever they need to modify the data.

loadData and saveData will both need a consistent place on disk to store the data, so let us define a function that returns that path. To work with file paths, we will bring in the package path/filepath . In the future we can make StoragePath an environment variable or a flag so that it is not hard-coded, but for now, we can just write to the temp directory. Replace the var data above main with:

var StoragePath = "/tmp"

And at the end of the file add new function to get the data file location from the StoragePath .

func dataPath() string {
	return filepath.Join(StoragePath, "data.json")
}

Next, we need a way to load data.

func loadData(ctx context.Context) (map[string]string, error) {
	empty := map[string]string{}
	emptyData, err := encode(map[string]string{})
	if err != nil {
		return empty, err
	}

	// First check if the folder exists and create it if it is missing.
	if _, err := os.Stat(StoragePath); os.IsNotExist(err) {
		err = os.MkdirAll(StoragePath, 0755)
		if err != nil {
			return empty, err
		}
	}

	// Then check if the file exists and create it if it is missing.
	if _, err := os.Stat(dataPath()); os.IsNotExist(err) {
		err := os.WriteFile(dataPath(), emptyData, 0644)
		if err != nil {
			return empty, err
		}
	}

	content, err := os.ReadFile(dataPath())
	if err != nil {
		return empty, err
	}

	return decode(content)
}

Above we have written a simple loadData function. It does some basic error-checking, and then calls the function decode to translate whatever was written to disk into our data map . (In the error-checking for the saveData function, you will notice that we call encode on an empty map before writing to the file.)

func saveData(ctx context.Context, data map[string]string) error {
	// First check if the folder exists and create it if it is missing.
	if _, err := os.Stat(StoragePath); os.IsNotExist(err) {
		err = os.MkdirAll(StoragePath, 0755)
		if err != nil {
			return err
		}
	}

	encodedData, err := encode(data)
	if err != nil {
		return err
	}

	return os.WriteFile(dataPath(), encodedData, 0644)
}

The reason we encode stuff is two-fold. Firstly to provide a format that can easily be written to and read from disk. By default most binary objects cannot, so below we use the encoding/json library to turn the map into JSON.

Secondly, to avoid storing the exact key and value that we got from the user in the file, we encode them both with the encoding/base64 library. The reason for this is mainly paranoia! It doesn’t give us too many benefits just now, but in the future provides some protection from any sort of attack such as a bug in Go’s JSON library. If we took out the base64 encoding, everything would still work fine. The encode function could be just one line: return json.Marshal(data) if we wanted.

func encode(data map[string]string) ([]byte, error) {
	encodedData := map[string]string{}
	for k, v := range data {
		ek := base64.URLEncoding.EncodeToString([]byte(k))
		ev := base64.URLEncoding.EncodeToString([]byte(v))
		encodedData[ek] = ev
	}

	return json.Marshal(encodedData)
}

Our decode function is just the inverse of the encoding function. We could also make it simpler by just doing the Unmarshal part below, if we hadn’t base64-encoded the data as well.

func decode(data []byte) (map[string]string, error) {
	var jsonData map[string]string

	if err := json.Unmarshal(data, &jsonData); err != nil {
		return nil, err
	}

	returnData := map[string]string{}
	for k, v := range jsonData {
		dk, err := base64.URLEncoding.DecodeString(k)
		if err != nil {
			return nil, err
		}

		dv, err := base64.URLEncoding.DecodeString(v)
		if err != nil {
			return nil, err
		}

		returnData[string(dk)] = string(dv)
	}

	return returnData, nil
}

Conclusion#

Alright! We have built a very basic persistence layer. We mentioned there were three issues with our in-memory solution:

  • no persistence
  • no data sharing between multiple servers
  • it is a single server

We have solved the first issue. We have built a simple key-value server that stores all of its data in a single file on disk. There are lots of edge cases we haven’t dealt with, which are interesting exercises for you to think about:

  • How would you change this to store the keys and values across multiple files to avoid having one single gigantic file?
  • How would you make the file storage location configurable?
  • Try using a tool like hey or ab to figure out how many read requests the service can handle in a minute before it breaks.

In the next chapter, we will talk about solving the next two issues: having multiple servers and having them work together.

Testing a key-value server#

Now that we have a basic key-value server, we should test it.

“Why?” you might ask.

Well, testing will help us make sure that our key-value server doesn’t have any bugs (or at least no bugs that will negatively impact our usage of it).

If it does have any bugs, we can catch some of them here, and if we’re lucky and it doesn’t, we can make sure it stays that way when we make changes in the future.

If we didn’t write tests, our only way of finding bugs would be to run into them by using the key-value server. Catching bugs at that stage could have negative impacts on the system we were using the server for.

For potentially sensitive applications, it is best to catch issues before they arise, rather than finding them by running into them.

What kinds of testing exist?#

There are many kinds of testing available to us - the three main ones are unit testing, integration testing , and end-to-end testing . The level of each that you employ can depend on the language you are using or the project you are writing, but generally you should expect more unit tests than integration tests, and more integration tests than end-to-end tests. It can be very useful to have all three.

Unit tests

  • operate on a single unit of code
  • usually test a single function
  • should ensure that your function works for a series of different and reasonable inputs

Integration tests

  • are the next step up
  • test the connection between components
  • are needed when you have separate sections of code that talk to and rely on each other

End-to-end tests

  • test the entirety of a system, starting with user input and going down through whatever storage mechanism might exist, and all the way back to the user again
  • can be quite slow, so it is not recommended that you write a lot of them
  • you need at least a couple to have confidence that your system, when it is all put together, works as you expect

Related to all of these kinds of testing, it can also be useful to do property-based testing and/or fuzz testing . Though we will not be focusing on these, they could potentially be useful for a number of projects and so are good to know about.

Property-based testing is where you define a number of axioms about methods or the program as a whole, and then the property-based testing library generates a number of test cases that fit within those constraints.

For example, if you had an addition method which could accept two integers of any size and return their sum, you could write an axiom such as Inputs a and b can be any integer, and the output will also be an integer. When the test runs, it could find that adding large integers causes an overflow, without you needing to write that specific test case yourself.

Fuzz testing is similar to property-based testing, but it tests at a different level. Typically, fuzz testing involves giving a series of potentially valid and invalid inputs to the whole program, and then watching to see if the program crashes. If it does, those inputs are recorded and the program author can determine why the program crashes, and prevent it from doing so again. Fuzz testing tends to be more related to security than other kinds of tests, but is still very valuable for finding edge cases that a program doesn’t properly handle.

A unit test for the JSON function#

Let’s write some unit tests as a way of getting familiar with Go’s built-in testing library. Looking in server.go , we can find the first function that isn’t main and write a test for it. So let us write a test for the JSON function, starting with making a failing test and running it. We start by creating a server_test.go file in the same directory as server.go . Go will automatically know this is a test file because of the ending _test.go , and naming it starting with server lets us know that we’ll be writing tests for server.go in it.

package main

import "testing"

func TestJSON(t *testing.T) {
	t.Fatal("not implemented")
}

In the same directory as the test file we run our new test with go test and we will see our expected failure from the t.Fatal :

$ go test
--- FAIL: TestJSON (0.00s)
    server_test.go:6: not implemented
FAIL
exit status 1
FAIL    github.com/fullstackio/reliable-go/kv-store     0.009s

Great! Now in order to replace that with something which will actually test the function, let’s think about what the function is doing. It encodes arbitrary data and then writes it to a http.ResponseWriter . So if we want to test it, we need to

  • supply the data
  • know what its encoded form looks like
  • check that it was written to the http.ResponseWriter

Because we are currently only using JSON to encode data of the type map[string]string , that seems like a good type to test. So - let’s test the message “hello world” in a map form. Add these lines at the top of the TestJSON function:

in := map[string]string{"hello": "world"}
out := `{"hello":"world"}`

I’m using the variable names in for the input we will pass to JSON , and out for the output we expect to see from it.

This is great and should be a good test, but we have an issue: JSON takes an input but it doesn’t return an output. So we write to the response writer, but have no way of checking that data. Luckily, Go’s standard library has something that will help us: the httptest library - and it is lovely!

httptest.ResponseRecorder is a struct that implements the http.ResponseWriter interface but also allows you to check what changes were made to it afterwards. If we use one of these, we can pass it with our message into JSON and then check the results after.

Use it and call JSON with it like this:

recorder := httptest.NewRecorder()

JSON(recorder, in)

JSON will now have written to our ResponseRecorder , giving us the ability to check those values. For now let us only test that the body written to the recorder is the same as our input message. We can access it with its Result() method which gives us the http.Response we normally get when making a request.

From that we can read the entire body, convert it to a string, and compare it with our input.

response := recorder.Result()
defer response.Body.Close()

got, err := io.ReadAll(response.Body)
if err != nil {
	t.Fatalf("Error reading response body: %s", err)
}

if string(got) != out {
	t.Errorf("Got %s, expected %s", string(got), out)
}

Note that we use t.Fatalf here if we fail to read the body to indicate that our test has failed to run and we shouldn’t continue. We always expect to be able to read it here, so if we can not, something is seriously wrong.

We use t.Errorf for our actual test case to show that it has failed, but the other tests should continue.

Putting it all together looks like this:

package main

import (
	"io"
	"net/http/httptest"
	"testing"
)

func TestJSON(t *testing.T) {
	in := map[string]string{"hello": "world"}
	out := `{"hello":"world"}`
	recorder := httptest.NewRecorder()

	JSON(recorder, in)

	response := recorder.Result()
	defer response.Body.Close()

	got, err := io.ReadAll(response.Body)
	if err != nil {
		t.Fatalf("Error reading response body: %s", err)
	}

	if string(got) != out {
		t.Errorf("Got %s, expected %s", string(got), out)
	}
}

Enhancing the JSON unit test with tables#

Our test is currently fairly simple. It only checks the body of the response, not any headers, and it only tests one message and one data type. Luckily, Go has a common pattern we can use to scale this up: table-driven tests !

Instead of having only one message to test, we can make a slice of them, with both our input and expected output. We can then use a for loop to run all of our tests one after the other. Adding a new case will then be as simple as adding another element to the slice - we’ll do that shortly by adding the {"hello":"tables"} message.

Before we do that, we need to convert our first test case into a table of test cases, including both our input and our expected output. Our input is a map of string to string, and our output is a string. So we need to create a slice of these inputs and outputs. Fortunately, we can use an anonymous struct, which is a struct that is defined without a name. It will let us create our desired slice and immediately create values to fill it. Also, because it is defined only in this one test case, future test cases can use their own anonymous structs if they have different inputs or outputs. The syntax looks slightly weird, because we are used to having our struct definition separate from instantiation. Combined with the fact that this is also a struct means there are a whole lot of curly braces. But it gives us exactly what we’re looking for, which is great.

testCases := []struct {
	in  map[string]string
	out string
}{
	{map[string]string{"hello": "world"}, `{"hello":"world"}`},
}

In this section of code we are not defining a type for our struct but defining the struct inline. It’s not necessarily the best practice, but so long as we are only using this structure here it should be fine. If we find ourselves wanting to use this in more than one test case we can convert it to a regular struct and reuse it.

With this struct, we can replace our in and out variables with test.in and test.out , which is our range variable over our testCases slice. We also create a loop around the main section of our test to loop through it with the one case we have provided. The loop encompasses everything from creating the recorder to the last if statement checking our output.

for _, test := range testCases {
	recorder := httptest.NewRecorder()

	JSON(recorder, test.in)

	response := recorder.Result()
	defer response.Body.Close()

	got, err := io.ReadAll(response.Body)
	if err != nil {
		t.Fatalf("Error reading response body: %s", err)
	}

	if string(got) != test.out {
		t.Errorf("Got %s, expected %s", string(got), test.out)
	}
}

Now that we’ve converted our first test into a table-driven test, let us make sure it still works! go test will do the job.

We have one test passing, and now it will be very easy to add a second test case. Simply add the {"hello":"tables"} test right below our first one, so our testCases looks like this:

testCases := []struct {
	in  map[string]string
	out string
}{
	{map[string]string{"hello": "world"}, `{"hello":"world"}`},
	{map[string]string{"hello": "tables"}, `{"hello":"tables"}`},
}

Another go test will make sure that works too.

Finishing up testing the JSON function#

Now that we can easily support multiple test cases, let’s finish testing all parts of JSON . The function is doing two things that we are not yet testing.

  • it sets a header indicating that the response has a Content-Type of application/json with a utf-8 charset
  • if the key-value store cannot convert the data to json, it returns whatever error is created from that to the user, in a json format

In order to test that the header is properly being set, we can add a header section in our testCases anonymous struct. The type of header should be the same as normal response headers, http.Header . We can also add a check of the response header for Content-Type and make sure it matches our expected application/json; charset=utf-8 . http.Header.Add() allows us to create our expected header for comparison.

Our test cases now look like this:

header := http.Header{}
headerKey := "Content-Type"
headerValue := "application/json; charset=utf-8"
header.Add(headerKey, headerValue)

testCases := []struct {
	in     map[string]string
	header http.Header
	out    string
}{
	{map[string]string{"hello": "world"}, header, `{"hello":"world"}`},
	{map[string]string{"hello": "tables"}, header, `{"hello":"tables"}`},
}

and we have a new test at the end in our for loop that looks like

if contentType := response.Header.Get(headerKey); contentType != headerValue {
	t.Errorf("Got %s, expected %s", contentType, headerValue)
}

The final thing to do is to add a failing test case. Right now both of our tests are for a successful call to JSON , but not all calls will be successful. If the data value being passed in can’t be marshalled into json, an error will be returned to the user.

A Go channel is an example of a type which won’t pass, so let’s add a boolean channel to our list of test cases. It will fail with a message "json: unsupported type: chan bool" , which JSON will turn into a json error message.

Put this test case with the other two in our testCases struct, making it look like:

{make(chan bool), header, `{"error":"json: unsupported type: chan bool"}`},

However, this time when we run go test , we get an error!

$ go test
--- FAIL: TestJSON (0.00s)
    server_test.go:44: Got , expected application/json; charset=utf-8

If we take a look at the JSON function, we can see that the content type header is only set when the method is successful. However, the method always returns json regardless of success, so we should always be setting the Content-Type header. Moving the line that sets our header, specifically line 96, up to the top of the method, line 89, fixes our test case. And now we’ve finished testing the JSON function!

Testing the Get method#

Only having written a test for one function probably isn’t enough, so - using our handy table-driven tests - let’s make a test for the Get method.

We have an interesting new complication here - this method reads from a file. Ideally, we should put this in a consistent location that also won’t complicate our usage of go test . Luckily, Go has a defined place where we can put data files for testing - testdata . According to the output of the command go help packages , “Directory and file names that begin with “.” or “_” are ignored by the Go tool, as are directories named “testdata”.” So we can put our test data here and be confident that go test will never interact with it.

We will have to make sure we set the StoragePath variable before running our tests. We could leave that data around long term, but for now we will create and delete the directory for the test.

Let’s start once again by creating a failing test for our Get method, and put it at the end of the server_test.go file.

func TestGet(t *testing.T) {
	t.Fatal("not implemented")
}

And run go test to confirm it does what we expect.

$ go test
--- FAIL: TestGet (0.00s)
    server_test.go:50: not implemented
FAIL

Okay, now before we fill in TestGet , let’s write a helper method to create our temporary storage path, and another to clean it up. We’ll call these makeStorage and cleanupStorage .

makeStorage will need to use the os.Mkdir function and cleanupStorage will need os.RemoveAll .

func makeStorage(t *testing.T) {
	err := os.Mkdir("testdata", 0755)
	if err != nil && !os.IsExist(err) {
		t.Fatalf("Couldn't create directory testdata: %s", err)
	}

	StoragePath = "testdata"
}

func cleanupStorage(t *testing.T) {
	if err := os.RemoveAll(StoragePath); err != nil {
		t.Errorf("Failed to delete storage path: %s", StoragePath)
	}

	StoragePath = "/tmp/kv"
}

We start in makeStorage by creating our testdata directory. If it already exists, we’ll just continue, because we might already have the directory around from another test, or we’ve decided to keep it permanently. Then we set the StoragePath variable to the directory.

cleanupStorage is much simpler - we delete the StoragePath directory and all of its contents with os.RemoveAll , then reset the StoragePath to its default. (This isn’t strictly necessary, but it is best to put things back as they were, just in case.)

Now that we have these helper methods, lets add them to our Get test. Adding these two lines at the top of TestGet will ensure we have the setup we need:

func TestGet(t *testing.T) {
	makeStorage(t)
	defer cleanupStorage(t)
	t.Fatal("not implemented")
}

Now we can start implementing the test! Just like last time, let us make one passing test and worry about testing multiple things later.

The steps of this test are going to look like:

  1. Setup our data storage location (and defer cleaning it up)
  2. Setup our key value store’s data
  3. Call Get to retrieve data
  4. Test that the data matches what we setup

Because we’ve already setup our storage the next issue we need to solve is setting up the key-value store’s data file so that we can Get a key from it. If we take a look at the loadData function, we can see it calls encode , which encodes the key and value before saving everything as JSON. We will want to do the same thing when setting up for our test. Since we are starting by only testing one key and value, we can call them “key” and “value”, encode them and write our file.

The setup below our storage methods looks like this:

key := "key"
value := "value"
encodedKey := base64.URLEncoding.EncodeToString([]byte(key))
encodedValue := base64.URLEncoding.EncodeToString([]byte(value))
fileContents, _ := json.Marshal(map[string]string{encodedKey: encodedValue})
os.WriteFile(StoragePath+"/data.json", fileContents, 0644)

Real quick, you may have noticed that we are ignoring the errors that could be created from our json.Marshal and os.WriteFile calls. In both of these cases we are confident that they will not fail, as we are specifying valid JSON keys and values, and we have already setup our testdata folder with the proper permissions to write to it. Properly checking the errors here would unnecessarily add more code and slightly muddle the actual testing we intend to do.

Then we call Get to retrieve them and ensure we get what we expect without an error:

got, err := Get(context.Background(), key)
if err != nil {
	t.Errorf("Received unexpected error: %s", err)
}
if got != value {
	t.Errorf("Got %s, expected %s", got, value)
}

All together it looks like this. Run a go test to make sure it works.

func TestGet(t *testing.T) {
	makeStorage(t)
	defer cleanupStorage(t)

	key := "key"
	value := "value"
	encodedKey := base64.URLEncoding.EncodeToString([]byte(key))
	encodedValue := base64.URLEncoding.EncodeToString([]byte(value))
	fileContents, _ := json.Marshal(map[string]string{encodedKey: encodedValue})
	os.WriteFile(StoragePath+"/data.json", fileContents, 0644)

	got, err := Get(context.Background(), key)
	if err != nil {
		t.Errorf("Received unexpected error: %s", err)
	}
	if got != value {
		t.Errorf("Got %s, expected %s", got, value)
	}
}

Tables again#

We can add tables to this test fairly easily. First we decide what extra data we want to have. Let’s have key1 through key4 matching value1 through value4 , but missing key3 and value3 so we can include a failed Get .

So we adjust our file setup to look like this:

kvStore := map[string]string{
	"key1": "value1",
	"key2": "value2",
	"key4": "value4",
}
encodedStore := map[string]string{}
for key, value := range kvStore {
	encodedKey := base64.URLEncoding.EncodeToString([]byte(key))
	encodedValue := base64.URLEncoding.EncodeToString([]byte(value))
	encodedStore[encodedKey] = encodedValue
}
fileContents, _ := json.Marshal(encodedStore)
os.WriteFile(StoragePath+"/data.json", fileContents, 0644)

Then we add test cases to fetch the first three keys, key1 key2 and key3 . We expect to find keys 1 and 2, and so the values we find will be values 1 and 2, and we don’t expect errors, so the error value of the first two test cases will be nil . However key3 doesn’t exist, so we will end up returning an empty string for the value. Based on how Get is currently written, for key3 the error will also be nil . But we still want to include errors in this test in case we decide that Get should return an error in certain conditions in the future.

We loop over the cases just as before, ending up with:

testCases := []struct {
	in  string
	out string
	err error
}{
	{"key1", "value1", nil},
	{"key2", "value2", nil},
	{"key3", "", nil},
}

for _, test := range testCases {
	got, err := Get(context.Background(), test.in)
	if err != test.err {
		t.Errorf("Error did not match expected. Got %s, expected: %s", err, test.err)
	}
	if got != test.out {
		t.Errorf("Got %s, expected %s", got, test.out)
	}
}

We can run go test again to check this… and now we’ve successfully tested a stateful function!

Parallel tests#

At the moment, we will get very little from doing the next step, but it is a great tool to have for much larger codebases. We’re going to make these tests run in parallel!

This is actually incredibly simple to implement, but could definitely cause some problems if misused, so we will talk about that.

First, to see the differences between testing with Parallel and without, let us first run go test -v . ( -v is for verbose, to show the tests we are running.)

$ go test -v
=== RUN   TestJSON
--- PASS: TestJSON (0.00s)
=== RUN   TestGet
--- PASS: TestGet (0.00s)
PASS

See that it shows it ran TestJSON and then it ran TestGet . When we add the parallelism, it will show something different.

To make these two tests run in parallel, simply add t.Parallel() to the top of each test. That’s it! Now go test -v tells us a different story.

$ go test -v
=== RUN   TestJSON
=== PAUSE TestJSON
=== RUN   TestGet
=== PAUSE TestGet
=== CONT  TestJSON
=== CONT  TestGet
--- PASS: TestJSON (0.00s)
--- PASS: TestGet (0.00s)
PASS

This time, we see that it starts then pauses TestJSON , starts then pauses TestGet , and then continues both of them.

At this size of codebase, this is fairly useless, but for larger codebases with lots of independent tests, it could be useful for speeding up the time it takes to run all the tests.

We should note here that any tests that are similar to our TestGet could not be run in parallel with this method. The test modifies a global variable in our server, so any other test that relied on that variable would be put in an inconsistent state depending on which of them ran at which time. In those cases, it is always best not to run the test in parallel, as every non-parallel test in the same file will run in serial.

Benchmarks#

Benchmarking is another useful feature of the golang testing standard library. It allows us to conduct performance tests for the parts of our code where being fast matters most. That will help us ensure that we don’t cause performance regressions when we are making changes.

Benchmarks are very similar to tests. If you replace Test with Benchmark and testing.T with testing.B , you’re 90% of the way there.

The other 10% is that you’ll need to write a for loop around whatever you’re trying to benchmark. testing.B provides a parameter, N , which determines the number of loops to do over the benchmarked function. A very short benchmarking function might look like this:

func BenchmarkEmptyFunc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		func() {}()
	}
}

Let’s set up a Get benchmark since the Get function is a very critical path in a key-value store. We want it to be fast and remain fast in the future.

In order to make our benchmark we’re going to copy the entirety of our TestGet method and make some small changes. BenchmarkGet will look exactly like TestGet , except we remove the t.Parallel() call, rename the function to BenchmarkGet , and change t *testing.T to b *testing.B .

The reason we are copying TestGet rather than coming up with a common set of abstractions and using them in both functions, is because we actually expect these functions to differ greatly over time. TestGet will likely receive a large number of varried test cases, some added as bugs are discovered and some added to help ensure the function will work as intended with new keys and values. BenchmarkGet , however, is likely to remain very stable, and will only need to be adjusted for significant changes made to the Get function. So copying everything allows us to more easily write BenchmarkGet while preventing a tight coupling of these two functions that are currently similar but will likely be very different in the future.

The function signature for BenchmarkGet looks like:

func BenchmarkGet(b *testing.B)

Now we have another problem!

We’re trying to pass out *testing.B to our makeStorage and cleanupStorage methods, which only take a *testing.T parameter. Luckily for us, the testing library has an interface, TB which will accept either, so we can adjust those methods to accept that instead, making their signatures:

func makeStorage(tb testing.TB)

func cleanupStorage(tb testing.TB)

You’ll also have to adjust the Fatalf and Errorf calls to use the new tb argument instead of t .

So - back to the benchmark.

The way the benchmark works is by looping over the benchmarked call until it can determine how fast the function runs. At this point everything we have in the method is set up, so it is a fairly poor measure of how fast Get will run. Luckily, we can cut out the setup time by b.ResetTimer() , then loop over our Get call in the same way we did before: a for loop from 0 to testing.B.N .

We can finish our benchmark by adding that to the bottom, making our benchmark test look like this:

func BenchmarkGet(b *testing.B) {
	makeStorage(b)
	defer cleanupStorage(b)
	// Setup omitted

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		Get(context.Background(), "key1")
	}
}

Now that we have our benchmark test, we can run it with go test -bench . (note the trailing . ). It takes a couple of seconds to run, but then it will let us know, based on the platform we are on, what our nanoseconds to operation ratio is:

$ go test -bench .
goos: linux
goarch: arm
pkg: the package name
BenchmarkGet-4             16662             69113 ns/op
PASS

Integration testing#

The final type of testing we will demonstrate in this chapter is integration testing. This will use the higher-level methods of our library to ensure that related methods can successfully work together (though the level won’t be so high as to have us start our server or make http requests).

One kind of integration test we could do is to check the relation between the Set and Get methods. If we start with nothing, we would expect that we can’t Get a key until it is been Set , and that once Set , it can be retrieved. We can even throw in Delete , to then make sure that removes the expected key.

All in all, we will want to run these steps in order and make sure they do what we expect:

  1. Get “key”, expect no result
  2. Set “key”
  3. Get “key” again, expect a result
  4. Delete “key”
  5. Get “key” a final time, expect no result

Putting those all together gives us this relatively straightforward integration test:

func TestGetSetDelete(t *testing.T) {
	makeStorage(t)
	defer cleanupStorage(t)
  ctx := context.Background()

	key := "key"
	value := "value"

	if out, err := Get(ctx, key); err != nil || out != "" {
		t.Fatalf("First Get returned unexpected result, out: %q, error: %s", out, err)
	}

	if err := Set(ctx, key, value); err != nil {
		t.Fatalf("Set returned unexpected error: %s", err)
	}

	if out, err := Get(ctx, key); err != nil || out != value {
		t.Fatalf("Second Get returned unexpected result, out: %q, error: %s", out, err)
	}

	if err := Delete(ctx, key); err != nil {
		t.Fatalf("Delete returned unexpected error: %s", err)
	}

	if out, err := Get(ctx, key); err != nil || out != "" {
		t.Fatalf("Third Get returned unexpected result, out: %q, error: %s", out, err)
	}
}

Though we do not populate them with anything, we will still need to use the makeStorage and cleanupStorage methods to ensure we have a clean directory to work with.

We can run this test just like the rest of our tests and see that it passes.

As an interesting test of the potential pitfalls of parallelism, if you add t.Parallel() to the top of this new test and run go test -v a couple of times, you may see a number of different errors as these tests add or delete the testing directory at different times.

Further testing types#

There are a few types of testing we didn’t have time to go over in detail in this chapter, specifically end-to-end, property-based and fuzz testing . These types of tests are great for making sure the overall stability and sometimes security of the application remain healthy.

For this application, an end-to-end test would have looked very similar to our integration test. The main difference would be that we would start up the server first, and instead of using the Get , Set and Delete methods, we would call their endpoints via http.

It could be useful to have such a test to ensure that whatever server startup needs to happen does so properly, as well as any particular aspects of shutdown, in addition to testing the parts of the application surrounding the Get , Set and Delete methods.

As for property-based testing, Go actually has a set of utility functions to help with it, under the package “testing/quick”. It could be good to look at that package and think about how the functions it provides could be useful in the testing strategy for our key-value server.

Distributed key-value server#

We have built a server and tested it! Now we have a single binary that can do the thing we want, we need to think about replicating it. Why? Well, let us chat a bit about failure.

If we are talking about hardware failure , it happens pretty frequently. DRAM Errors in the Wild: A Large-Scale Field Study claims that 8% of DIMMs surface errors each year. Failure Trends in a Large Disk Drive Population claims that 2% of all drives fail in their first three months. That means if you had a hundred computers, two of them would suffer data loss from disk failure, and eight would have issues running because of bad RAM.

If we are talking about human failure , a large proportion of software failure comes from human error. Configuration bugs, process bugs, code bugs, and other human mistakes cause all sorts of issues. We talk a bit about how to avoid some of this in Chapter 6.

On top of that, larger pieces of infrastructure fail. Every day, cables are accidentally cut, natural disasters take out power generators and flood buildings, and dictators decide their citizens can no longer access certain IP address ranges.

In other words, stuff is going to fail. One way we can limit the impact of failure is by having more than one instance of your service. If we could run a few versions of the key-value server, when one fails, the others can keep running without any major impact. This is a common design decision in web services. You run three or more copies of a piece of software in three or more regions.

The problem, which you might have guessed, is getting all the copies of our code to agree. For example, if Steve writes a value to a replica in Atlanta, and Nat tries to read that value from a replica in Vancouver, how do we make sure they all agree? How does Nat get a consistent value to the one Steve wrote?

A common term for this problem is “consensus”. It’s a hard problem in computer science, and is dealt with by many services in many different ways.

Consensus#

Consensus has been a problem for a long time. Leslie Lamport’s 1989 paper “The Part-Time Parliament” described an island named Paxos, and used it to describe a theoretical solution to distributed consensus. In 2001, Lamport got tired of people telling him his original paper and algorithm were too complicated, and rewrote it as “Paxos Made Simple”. Hilariously, according to Lamport, that paper has a bug, and GitHub implementations often implement the algorithm incorrectly. That being said, Paxos is one of the most commonly-used algorithms in distributed systems. For example, Paxos is used by Zookeeper and Chubby.

Another solution to Consensus comes to us from the infamous Bitcoin. In his influential paper, Satoshi Nakamoto described and built the bitcoin blockchain. The blockchain is a log, or history, of every piece of data attached to it, along with a “proof of work” that lets every computer that reads or modifies the blockchain validate every piece of data in the blockchain. Imagine an array, where everyone agrees on each item in the array, and anyone who comes to look at the array can validate what everyone else is saying about the array. There are many similar but different implementations of this out there, including Ethereum and others.

Consensus is also often seen in BitTorrent, specifically in BitTorrent’s DHT (distributed hash table). BitTorrent uses this distributed hash table to share lists of peers for files on the network. It was first implemented in 2005. It implements an algorithm called Kademlia. The DHT is similar to our goal for our key-value sever, but its consistency is eventual. This means that changes made take awhile to be made everywhere. So if a change is made to one node, and you read from another, the change may not be there, but it will get there, eventually.

Another common algorithm for consensus is called Raft. Raft is very similar to Paxos, but was written a little more recently, in 2014. Raft is used by both Etcd and Consul to resolve their consensus issues.

Using Raft#

We are choosing Raft to solve our consensus problems for a few reasons. The first is it uses leader election to solve the consensus issue. While we could solve consensus with the other solutions (a distributed log, a distributed hash table, etc), leader election is often the easiest to understand and also implement.

The second reason is that Etcd and Consul are both written in Go, and have open-sourced their Raft implementation libraries. Some folks have written about how to use Etcd’s implementation, so we will use Hashicorp’s (the parent company of Consul) implementation to be a new and fresh! But other than that, our implementation choice is arbitrary.

Why not implement it ourselves? Part of the answer is “why bother?” While Raft is one of the simpler consensus algorithms, the white paper describing the algorithm is still 18 pages long. It has lots of edge cases that we could miss. The other is to leave it as an exercise to the reader. A fun programming exercise you can undergo once you feel comfortable in Go is to implement a consensus algorithm. Personally, we find implementing algorithms boring, and would much rather see what interesting problem

Changes to the server#

To change our server to be distributed, we need to do a few things:

  • integrate the raft library into our code
  • store our data in a way that can be replicated
  • add a way for the replicas to tell each other about themselvesWe will go over this last part in more detail later, but this is a nice reminder that libraries are not magic. Replicas do not just automatically know about each other. This surprised us when we learned it - we had really hoped ZooKeeper (the first distributed lock server we ever used) would just know about other ZooKeeper entities automatically, and instead you had to know the IP addresses of every node. It was a real pain.

The first step in dealing with our rewrite is move functions to a separate package. This makes documentation and testing easier, and means our code can be depended on by multiple libraries.

Set up the store folder#

To do this, we will create a new folder called store . All Go files in this folder will have package store as one of their first lines.

Note that it doesn’t declare the entire package path - that is determined by the parent go.mod file, which we’re not creating here as it causes all sorts of issues (if we updated store , we would need to update the go.mod of the root package, which we can’t do until we’ve deployed and waited for all of the caches to clear). It works fine, but it tends to cause confusion around what code you are actually calling.

Go modules are usually best tied to their version control. That is the go.mod should be at the root of your version control. So if your Git repository contains multiple packages, that’s fine, but try to keep just one Go module (ie one go.mod file) to keep things sane. The downside is then all of your packages share a dependency tree, because dependencies are defined in go.mod . This won’t affect your built binaries though, just how many dependencies you download when you build your package.

Inside of this folder, we can create Go files with any name. We will call our first file store.go but it could be literally anything and still work.

Set up the Config

type Config struct {
	raft *raft.Raft
	fsm  *fsm
}

First things first, we will create a Config struct for whomever uses our server to initialize. We will add all of the data access functions to this struct.

func (c *Config) Set(ctx context.Context, key, value string) error {
func (c *Config) Delete(ctx context.Context, key string) error {
func (c *Config) Get(ctx context.Context, key string) (string, error) {

We mention this in other chapters as well, but using a struct pointer allows for us to do a few things easily:

  • share configuration without globals
  • do testing using mocks
  • make sure the server is initialized before calling

We are making the fields of the struct private because we do not want anyone outside of this package to be able to modify the values, but we still want to be able to access and mutate them in the store package. Note here that:

  • lowercase fields are the equivalent of private
  • capitalized functions and types are public

We will implement our methods shortly, but we need two more very important functions, one that creates the config and one that returns a middleware for the server to use. We will walk through the middleware in the “Connecting the servers” section, but first let us build our config factory (a factory is a common name for a function that generates an instance of a type).

// NewRaftSetup configures a raft server.
func NewRaftSetup(storagePath, host, raftPort, raftLeader string) (*Config, error) {
	cfg := &Config{}

	if err := os.MkdirAll(storagePath, os.ModePerm); err != nil {
		return nil, fmt.Errorf("setting up storage dir: %w", err)
	}

	cfg.fsm = &fsm{}
	cfg.fsm.dataFile = fmt.Sprintf("%s/data.json", storagePath)

	ss, err := raftbolt.NewBoltStore(storagePath + "/stable")
	if err != nil {
		return nil, fmt.Errorf("building stable store: %w", err)
	}

	ls, err := raftbolt.NewBoltStore(storagePath+"/log")
	if err != nil {
		return nil, fmt.Errorf("building log store: %w", err)
	}

	snaps, err := raft.NewFileSnapshotStoreWithLogger(storagePath+"/snaps", 5, log)
	if err != nil {
		return nil, fmt.Errorf("building snapshotstore: %w", err)
	}

In this first section, we set up where we’re going to store things on the disk. We will talk about the fsm struct in a bit. raftbolt is an on-disk storage system built for raft. hashicorp/raft-boltdb is written in pure Go, and backed by the popular Go datastore BoltDB. So from this section, we will end up with three subdirectories in our storagePath : log , stable , and snaps . We will also have one file called data.json . If storagePath doesn’t exist, we create it.

fullTarget := fmt.Sprintf("%s:%s", host, raftPort)
	addr, err := net.ResolveTCPAddr("tcp", fullTarget)
	if err != nil {
		return nil, fmt.Errorf("getting address: %w", err)
	}
	trans, err := raft.NewTCPTransportWithLogger(fullTarget, addr, 10, 10*time.Second, log)
	if err != nil {
		return nil, fmt.Errorf("building transport: %w", err)
	}

The above creates a TCP transport. This basically says, “Hey raft, open a TCP port. This is where and how people will talk to you.”.

raftSettings := raft.DefaultConfig()
	raftSettings.LocalID = raft.ServerID(uuid.New().URN())

	if err := raft.ValidateConfig(raftSettings); err != nil {
		return nil, fmt.Errorf("could not validate config: %w", err)
	}

	node, err := raft.NewRaft(raftSettings, cfg.fsm, ls, ss, snaps, trans)
	if err != nil {
		return nil, fmt.Errorf("could not create raft node: %w", err)
	}
	cfg.raft = node

Next, we build our raft configuration using the things we created above. We also assign our server a unique ID. We do this every time because we are living in stateless containers.

if cfg.raft.Leader() != "" {
		raftLeader = string(cfg.raft.Leader())
	}

  // Make ourselves the leader!
	if raftLeader == "" {
		raftConfig := raft.Configuration{
			Servers: []raft.Server{
				{
					ID:      raftSettings.LocalID,
					Address: raft.ServerAddress(fullTarget),
				},
			},
		}

		cfg.raft.BootstrapCluster(raftConfig)
	}

  // Watch the leader election forever.
	leaderCh := cfg.raft.LeaderCh()
	go func() {
		for {
			select {
			case isLeader := <-leaderCh:
				if isLeader {
					log.Info("cluster leadership acquired")
					// snapshot at random
					chance := rand.Int() % 10
					if chance == 0 {
						cfg.raft.Snapshot()
					}
				}
			}
		}
	}()

  // We're not the leader, tell them about us.
	if raftLeader != "" {
		// Lets just chill for a bit until leader might be ready.
		time.Sleep(10 * time.Second)

		postJSON := fmt.Sprintf(`{"ID": %q, "Address": %q}`, raftSettings.LocalID, fullTarget)
		resp, err := http.Post(
			raftLeader+"/raft/add",
			"application/json; charset=utf-8",
			strings.NewReader(postJSON))

		if err != nil {
			return nil, fmt.Errorf("failed adding self to leader %q: %w", raftLeader, err)
		}

		log.Debug("added self to leader", "leader", raftLeader, "response", resp)
	}

	return cfg, nil
}

Handling leader election#

Now we do the “real work”. In this block, we tell Raft how to deal with a leader election. There are two things that happen here.

First we look at our config and see if there is a leader configured. The answer is probably not, but we should check anyway. If there is no leader configured, we should create a new cluster, with just ourselves in it. If there is a leader configured, we should send them an HTTP request (we will talk about that in a bit) telling them we exist.

Then no matter what, we should create a loop that runs forever in a Go routine, that checks to see if we are leader, and if we are, randomly decide to snapshot our current state to disk. We use Go’s rand package to generate random numbers to decide when we should do this snapshot with the line chance := rand.Int() % 10 .

If you have not seen it before, to create a function that runs forever in parallel, we use three tools: the for loop, a go routine, and an anonymous function (also known as a lambda function).

go func() {
	for {
	  // do a thing forever
	}
}()
  • go is a magic keyword that spawns a separate thread (technically not a thread, but that’s the easiest way to think about it) and runs a function. If you had a function called hello and you wanted to run it asynchronously, you would type go hello() and it would run in the background.
  • for {} is just a loop that runs forever, similar to writing while (true) {} in other languages.
  • func() {}() creates a new function with no arguments, and no return value, and then calls it immediately.

The second thing happening here is a channel.

leaderCh := cfg.raft.LeaderCh()

Here, we’re assigning a channel to a variable. Channels are a type of IPC or inter-process communication. You use them to pass data between things that are running concurrently. I like to think of channels as like a small river that ends in a pond. You put a rubber duck in the river and it flows towards the pond. Any other process can grab the rubber duck out of the river.

There are two ways to create a channel:

  • unbuffered: make(chan string)
  • buffered: make(chan bool, 1)

To send data into the channel, you use a left arrow: channelVariable <- "message" .

To receive data from a channel, you also use a left arrow: fmt.Println(<- channelVariable) .

If you are using an unbuffered channel, Go will block on send or receive until there is a receive and send on the channel. However Hashicorp’s Raft implementation uses a buffered boolean channel. They can send one message (the number can be whatever you want) even if we are not set up to receive it.

select {
  case isLeader := <-leaderCh:
    # Do something
}

select is commonly used to receive messages on a channel. It lets you have multiple case statements if you have multiple channels to receive from. In the above, the select will still block until the channel receives. If you want to make it unblocking, you can add a default case, and the receive becomes unblocking!

An example of a completely unblocked channel is:

package main

import (
	"log"
	"time"
)

func main() {
	c1 := make(chan string, 1)
	c2 := make(chan string, 1)

	go func() {
		c1 <- "one"
		c2 <- "two"
	}()

	for {
		select {
		case msg1 := <-c1:
			log.Printf("received %q", msg1)
		case msg2 := <-c2:
			log.Printf("received %q", msg2)
		default:
			log.Printf(".")
			time.Sleep(time.Second)
		}
	}
}

This will run forever, so remember to ctrl+c to get out of it when you’re done. Every second it will print out a period, and then when it receives messages, it will output them as well.

Connecting the servers#

So we have our Config struct, and our NewRaftSetup generator. We need two more things before we can implement our actual get, set and delete functions. The first is an Add handler, and the second is a redirector middleware.

Our Add handler is pretty simple:

// AddHandler is an http handler that responds to Add requests to join a raft
// cluster.
func (cfg *Config) AddHandler() func(w http.ResponseWriter, r *http.Request) {
  return func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json; charset=utf-8")
    jw := json.NewEncoder(w)
    body, err := ioutil.ReadAll(r.Body)
    if err != nil {
      w.WriteHeader(http.StatusInternalServerError)
      jw.Encode(map[string]string{"error": err.Error()})
      return
    }
    log.Debug("got request", "body", string(body))

    var s *raft.Server
    if err := json.Unmarshal(body, &s); err != nil {
      log.Error("could not parse json", "error", err)
      w.WriteHeader(http.StatusInternalServerError)
      jw.Encode(map[string]string{"error": err.Error()})
      return
    }
    cfg.raft.AddVoter(s.ID, s.Address, 0, time.Minute)
    jw.Encode(map[string]string{"status": "success"})
  }
}

This function returns an HTTP handler, which is a function: func(w http.ResponseWriter, r *http.Request) . This creates an anonymous function on return. This might seem a bit weird, but the reason we are doing this is so that we can inject the configuration from our Config struct. You’ll notice that this handler is quite simple, all it does is parse some JSON, and if it parses it successfully, it adds the request body as a voter to our Raft cluster.

This is both great, and incredibly dangerous! If we now use this handler on our server, any service which makes a POST request to the configured path (which we haven’t done yet) becomes a member of our cluster. On top of that we are not validating that the server that sent the message is identifying itself. We are okay with this significant vulnerability only because this server is not available on the public internet. We are relying on the fact that only our code is running on the network that this is deployed on. Trusting your network is a common thing to do, but not recommended in high-security situations. If you’re interested in learning more about Zero Trust networks (which is way beyond the scope of this book), check out: Google’s BeyondCorp whitepaper, Palo Alto Networks’ article on Zero Trust and CrowdStrike’s article on Zero Trust.

Next, let’s build that middleware.

// Middleware passes the incoming request to the leader of the cluster.
func (cfg *Config) Middleware(h http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    if cfg.raft.State() != raft.Leader {
      ldr := cfg.raft.Leader()
      if ldr == "" {
        log.Error("leader address is empty")
        h.ServeHTTP(w, r)
        return
      }

      prxy := httputil.NewSingleHostReverseProxy(RaftAddressToHTTP(ldr))
      prxy.ServeHTTP(w, r)
      return
    }

    h.ServeHTTP(w, r)
  })
}

This middleware takes a request, and if this server is not the leader of the raft cluster, it proxies the request over to the leader. We do this instead of a redirect so that in case we put the cluster behind a load balancer (you will learn a bunch more about load balancers and proxies in Chapter 6) we will not end up in any kind of redirect loop.

1 Like