Custom Formatting With bufio.Scanner() in Golang(Kubernetes API)

To get the runtime logs of an active project on Pipeops, I needed to fetch the pod logs from an AWS EKS cluster, using Kubernetes’ API. By default, the API only returns the most recent logs as of the time of the request. This meant that in order to have the logs updated, several calls had to be made. I pushed the logs to Google’s Firebase Cloud Firestore where the front-end read from to give the “real-time” feel. This worked, but we knew it could work better and we sought to improve on it.

On research, we found out that the Kubernetes API has an extra query parameter called follow which when set to true, would keep the logs coming in a stream, updated as the app is running. Using this approach would save us so many benefits.

This would mitigate the need for the extra Firestore call.
This was far more efficient.
This meant that we would not need to store the logs of the project on our end as we were now feeding them directly from the user’s EKS cluster.
It was cheaper. Our cloud Firestore bills would be greatly reduced.

We decided to go through with this approach, and implementation began. The response from the Kubernetes API when follow was set to true is a stream that stays open until the connection is closed. To work with this stream response, the regular net.Http package was not enough and I needed something extra.

In Go, there is a default package called bufio that helps in working with buffered I/O (input, output, thus read(input) and write(output) operations).

The bufio package exposes various methods that serve different specific purposes around this. Some examples of the methods are bufio.NewReader() which returns a Reader instance that gives access to methods for formatting the buffer input such as Read(), ReadBytes(), ReadLine(),

`Read()`

func (*bufio.Reader).Read(p []byte) (n int, err error)

Reads the contents of the buffer into a slice of bytes p that is supplied to it.

`ReadBytes()`

 func (*bufio.Reader).ReadBytes(delim byte) ([]byte, error)

reads the contents, stops and returns the line when it finds the delimiter.

`ReadLine()`

under the hood, it calls ReadBytes('\n)

These were good, but they were limiting in terms of flexibility. I decided instead to go with the bufio.NewScanner() which returns an instance of *bufio.Scanner. The advantage of this over the NewReader() approach is that the scanner instance has a method called split() that accepts any function that has the signature

func(data []byte, atEOF bool) (advance int, token []byte, err error)

and would run the buffered data against that function. This meant that we could do any type of formatting.

The split() method has a default split method bufio.ScanLines() which worked, but was breaking the formatting of the logs and sending it jumbled up.

I dug deep and saw that the default split function being used was searching for the right criteria to detect a new line, i.e \n, however, it was stripping out the character. This made the logs come in but they lost their \n formatting.

To solve this problem, I simply had to implement my own split function and pass it to the split method based off of the default ScanLines() method.

func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF && len(data) == 0 {
		return 0, nil, nil
	}
	if i := bytes.IndexByte(data, '\n'); i >= 0 {
		// We have a full newline-terminated line.
		return i + 1, dropCR(data[0:i]), nil
	}
	// If we're at EOF, we have a final, non-terminated line. Return it.
	if atEOF {
		return len(data), dropCR(data), nil
	}
	// Request more data.
	return 0, nil, nil
}

The specific culprit was the line dropCR(data[0:i]).

For the data which is a slice of bytes been returned, we can see that two actions are being performed on it

data[0:i]

i here refers to the index of the character \n. So, this is saying that ONLY the elements between the beginning of the line and the \n should be returned (excluding the \n).

dropCR()

dropCR removes a terminal \r from the data.

All I needed do was return the data as is and voila, we had our data returned formatted as expected.

scanner := bufio.NewScanner(res.Body)
defer res.Body.Close()
split := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
            if atEOF && len(data) == 0 {
                return 0, nil, nil
            }
            if i := bytes.IndexByte(data, '\n'); i >= 0 {
                // We have a full newline-terminated line.
                return i + 1, data, nil
            }
            if k := bytes.IndexByte(data, '\r'); k >= 0 {
                // We have a full newline-terminated line.
                return k + 1, data, nil
            }
            // If we're at EOF, we have a final, non-terminated line. Return it.
            if atEOF {
                return len(data), data, nil
            }
            // Request more data.
            return 0, nil, nil
        }
scanner.Split(split)
for scanner.Scan() {
    // ...
    // the current line can be accessed with scanner.Text()
}

I hope you saw how easy it is to implement your own custom function to act on the data coming from a stream using bufio.Scanner(). For most people, they may not need to do so and the default method would be okay for them, however if you ever need to do so, you now know what to do 😉.

I had fun going through the source code of the bufio package. Till next time 😁🙌🏾.

Read()#

ReadBytes()#

ReadLine()#

`Read()`

`ReadBytes()`

`ReadLine()`