Making ClickHouse ch-go client more user-friendly

Nikita Vaniasin | Apr 2, 2024 min read

ClickHouse is a fast open-source column-oriented database management system that allows generating analytical data reports using SQL queries.

We were using it for some time in our Golang application with the most popular clickhouse-go/v2 driver package which served nice and allowed writing queries in fast and clean way.

Recently I came across the benchmarks for different ClickHouse go drivers comparing the performance for simple select: https://github.com/ClickHouse/ch-bench#benchmarks. I was surprised at the difference between clickhouse-go/v2 and ch-go packages and decided to do a bit of research, especially since I wanted to optimize response times for some of our requests.

The better performance of ch-go comes from its low-level design: it works with columns (as the native ClickHouse protocol), not rows like clickhouse-go/v2 driver. Actually clickhouse-go/v2 is using ch-go under the hood. After inspecting more of our code, it struck me that we are using the column-oriented database as a traditional relational DB:

  • We are reading / unpacking data from ClickHouse using clickhouse-go/v2. Internally it uses ch-go, so it is done in column-by-column way. Then clickhouse-go/v2 parses set of columns into a slice of entries (rows).
  • Next, we marshal the results into JSON for HTTP response.
  • On the frontend, we are using code like let clicks = data.map(({ clicks }) => clicks) to unpack values again into a list of values!

So why not just unpack data from ClickHouse result-set into frontend-friendly format and send columns to frontend directly? This is definitely not a new idea.

I started to investigate how I can introduce ch-go into our project and it turned out that official documentation for ch-go is not detailed enough and there were no proper examples how to use it even for simple SELECTs. Most of the test code was checking the validity by selecting just one row, but this is probably not so important use-case. After some googling, I found these two posts which helped me to understand the package API:

Unfortunately, it seems quite stinky funky. Firstly, let’s see old code (row-based approach):

type PerformanceResult struct {
	Date        time.Time `json:"date" ch:"date"`
	Clicks      uint64    `json:"clicks" ch:"clicks"`
	Impressions uint64    `json:"impressions" ch:"impressions"`
	CTR         float64   `json:"ctr" ch:"ctr"`
	Position    float64   `json:"position" ch:"avgPosition"`
	Visibility  float64   `json:"visibility" ch:"visibility"`
}
// ...

results := make([]PerformanceResult, 0)
if err := s.ch.Select(ctx, &results, getPerformanceChartQuery(c)); err != nil {
	return nil, err
}

You probably have similar code for other databases like MySQL or Postgres. One thing to note here is that clickhouse-go/v2 binds values from query results into provided struct using ch struct tag notation.

New (column-oriented) approach using ch-go:

type colsPerformanceResult struct {
	Date        []time.Time `json:"date"`
	Clicks      []uint64    `json:"clicks"`
	Impressions []uint64    `json:"impressions"`
	CTR         []float64   `json:"ctr"`
	Position    []float64   `json:"position"`
	Visibility  []float64   `json:"visibility"`
}

// ...

var (
	dateCol        = &proto.ColDate{}
	clicksCol      = &proto.ColUInt64{}
	impressionsCol = &proto.ColUInt64{}
	ctrCol         = &proto.ColFloat64{}
	avgPositionCol = &proto.ColFloat64{}
	visibilityCol  = &proto.ColFloat64{}
)

results := proto.Results{
	{Name: "date", Data: dateCol},
	{Name: "clicks", Data: clicksCol},
	{Name: "impressions", Data: impressionsCol},
	{Name: "ctr", Data: ctrCol},
	{Name: "avgPosition", Data: avgPositionCol},
	{Name: "visibility", Data: visibilityCol},
}

ret := &colsPerformanceResult{}

if err := conn.Do(ctx, ch.Query{
	Body:       getPerformanceChartQueryBody(c),
	Parameters: []proto.Parameter{p1, p2, p3},
	Result:     results.Auto(),
	OnResult: func(ctx context.Context, block proto.Block) error {
		for _, col := range results {
			switch col.Name {
			case "date":
				Append(&ret.Date, dateCol)
			case "clicks":
				Append(&ret.Clicks, clicksCol)
			case "impressions":
				Append(&ret.Impressions, impressionsCol)
			case "ctr":
				Append(&ret.CTR, ctrCol)
			case "visibility":
				Append(&ret.Visibility, visibilityCol)
			case "avgPosition":
				Append(&ret.Position, avgPositionCol)
			}
			col.Data.Reset()
		}

		return nil
	},
}); err != nil {
	return nil, err
}

// ... where Append is
func Append[T any](slice *[]T, col proto.ColumnOf[T]) {
	for i := 0; i < col.Rows(); i++ {
		*slice = append(*slice, col.Row(i))
	}
}

Now imagine there are thirty more queries should be adjusted…

As you can see, writing column-oriented code is cumbersome. I’ve tried to figure out a shorter way to implement that. There are some types in proto package available which could help to minimize the code, but I wasn’t able to bring pieces together. That probably wasn’t a goal for ch-go developers and we can’t blame them.

But we still can make it a bit user-friendly while not losing performance gains. I’ve composed a small package to do just that: ClickHouse Helper (chh) https://github.com/nikita-vanyasin/chh

Using this package, we can shorten the code above into:

type colsPerformanceResult struct {
	Date        []time.Time `json:"date"`
	Clicks      []uint64    `json:"clicks"`
	Impressions []uint64    `json:"impressions"`
	CTR         []float64   `json:"ctr"`
	Position    []float64   `json:"position"`
	Visibility  []float64   `json:"visibility"`
}

// ...

ret := &colsPerformanceResult{}
results := &chh.ColResults{}
chh.BindDate(results, "date", &ret.Date)
chh.BindUInt64(results, "clicks", &ret.Clicks)
chh.BindUInt64(results, "impressions", &ret.Impressions)
chh.BindFloat64(results, "ctr", &ret.CTR)
chh.BindFloat64(results, "avgPosition", &ret.Position)
chh.BindFloat64(results, "visibility", &ret.Visibility)

if err := chh.Select(ctx, conn, results, getPerformanceChartQuery(c)); err != nil {
	return nil, err
}

And for selecting just one row from the result-set (e.g., calculating totals), just use .SelectRow instead:

totals := &Totals{}
results := &chh.ColResults{}
results.BindUInt64("count", &totals.Count)
results.BindUInt64("sumClicks", &totals.Clicks)
results.BindUInt64("sumImpressions", &totals.Impressions)
results.BindFloat64("avgCTR", &totals.CTR)
results.BindFloat64("avgPosition", &totals.Position)

if err := chh.SelectRow(ctx, conn, results, getQueriesTotalQuery(c)); err != nil {
	return nil, err
}

That seems to be a bit cleaner way to work with ClickHouse without employing heavy-weight clickhouse-go/v2 package. After moving our backend to column-based approach, we achieved the reduction of average response time up-to 50-60% for some queries. Frontend code also became simpler because we are feeding column data into 3d-party libraries anyway.

There is still a room for improvements and I’m planning to add more “sugar” into chh:

  • If we want to return results directly to the HTTP client, we probably don’t need to define a new type for result struct — we can just add some json Marshal interface support into chh.
  • It is probably hard to come up with a solution for INSERTs which can be universal for everyone. But I think it is possible to introduce some helper functions to simplify code for composing batches.

Please give a try to the github.com/nikita-vanyasin/chh package and let me know what you think 😉

Is there a cleaner way to get same results without adding wrappers for ch-go?