Planet NoName e.V.

2022-05-23

Mero’s Blog

Operator constraints in Go

Let’s say you want to implement a sorting function in Go. Or perhaps a data structure like a binary search tree, providing ordered access to its elements. Because you want your code to be re-usable and type safe, you want to use type parameters. So you need a way to order user-provided types.

There are multiple methods of doing that, with different trade-offs. Let’s talk about four in particular here:

  1. constraints.Ordered
  2. A method constraint
  3. Taking a comparison function
  4. Comparator types

constraints.Ordered

Go 1.18 has a mechanism to constrain a type parameter to all types which have the < operator defined on them. The types which have this operator are exactly all types whose underlying type is string or one of the predeclared integer and float types. So we can write a type set expressing that:

type Integer interface {
  ~int | ~int8 | ~int16 | ~int32 | ~int64 | ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr
}

type Float interface {
  ~float32 | ~float64
}

type Ordered interface {
  Integer | Float | ~string
}

Because that’s a fairly common thing to want to do, there is already a package which contains these kinds of type sets.

With this, you can write the signature of your sorting function or the definition of your search tree as:

func Sort[T constraints.Ordered](s []T) {
  // …
}

type SearchTree[T constraints.Ordered] struct {
  // …
}

The main advantage of this is that it works directly with predeclared types and simple types like time.Duration. It also is very clear.

The main disadvantage is that it does not allow composite types like structs. And what if a user wants a different sorting order than the one implied by <? For example if they want to reverse the order or want specialized string collation. A multimedia library might want to sort “The Expanse” under E. And some letters sort differently depending on the language setting.

constraints.Ordered is simple, but it also is inflexible.

Method constraints

We can use method constraints to allow more flexibility. This allows a user to implement whatever sorting order they want as a method on their type.

We can write that constraint like this:

type Lesser[T any] interface {
  // Less returns if the receiver is less than v.
  Less(v T) bool
}

The type parameter is necessary because we have to refer to the receiver type itself in the Less method. This is hopefully clearer when we look at how this is used:

func Sort[T Lesser[T]](s []T) {
  // …
}

func SearchTree[T Lesser[T]](s []T) {
  // …
}

This allows the user of our library to customize the sorting order by defining a new type with a Less method:

type ReverseInt int

func (i ReverseInt) Less(j ReverseInt) bool {
  return j < i // order is reversed
}

The disadvantage of this is that it requires some boiler plate on part of your user. Using a custom sorting order always requires defining a type with a method.

They can’t use your code with predeclared types like int or string but always have to wrap it into a new type.

Likewise if a type already has a natural comparison method but it is not called Less. For example time.Time is naturally sorted by time.Time.Before. For cases like that there needs to be a wrapper to rename the method.

Whenever one of these wrappings happens your user might have to convert back and forth when passing data to or from your code.

It also is a little bit more confusing than constraints.Ordered, as your user has to understand the purpose of the extra type parameter on Lesser.

Passing a comparison function

A simple way to get flexibility is to have the user pass us a function used for comparison directly:

func Sort[T any](s []T, less func(T, T) bool) {
  // …
}

type SearchTree[T any] struct {
  Less func(T, T) bool
  // …
}

func NewSearchTree(less func(T, T) bool) *SearchTree[T] {
  // …
  return &SearchTree[T]{
    Less: less,
    // …
  }
}

This essentially abandons the idea of type constraints altogether. Our code works with any type and we directly pass around the custom behavior as funcs. Type parameters are only used to ensure that the arguments to those funcs are compatible.

The advantage of this is maximum flexibility. Any type which already has a Less method like above can simply be used with this directly by using method expressions. Regardless of how the method is actually named:

func main() {
  a := []time.Time{ /* … */ }
  Sort(a, time.Time.Before)
}

There is also no boilerplate needed to customize sorting behavior:

func main() {
  a := []int{42,23,1337}
  Sort(a, func(i, j int) bool {
    return j < i // reversed order
  })
}

And you can provide helpers for common customizations:

func Reversed[T any](less func(T, T) bool) (greater func(T, T) bool) {
  return func(a, b T) bool { return less(b, a) }
}

This approach is arguably also more correct than the one above because it decouples the type from the comparison used. If I use a SearchTree as a set datatype, there is no real reason why the elements in the set would be specific to the comparison used. It should be “a set of string” not “a set of MyCustomlyOrderedString”. This reflects the fact that with the method constraint, we have to convert back-and-forth when putting things into the container or taking it out again.

The main disadvantage of this approach is that it means you can not have useful zero values. Your SearchTree type needs the Less field to be populated to work. So its zero value can not be used to represent an empty set.

You cannot even lazily initialize it (which is a common trick to make types which need initialization have a useful zero value) because you don’t know what it should be.

Comparator types

There is a way to pass a function “statically”. That is, instead of passing around a func value, we can pass it as a type argument. The way to do that is to attach it as a method to a struct{} type:

import "golang.org/x/exp/slices"

type IntComparator struct{}

func (IntComparator) Less(a, b int) bool {
  return a < b
}

func main() {
  a := []int{42,23,1337}
  less := IntComparator{}.Less // has type func(int, int) bool
  slices.SortFunc(a, less)
}

Based on this, we can devise a mechanism to allow custom comparisons:

// Comparator is a helper type used to compare two T values.
type Comparator[T any] interface {
  ~struct{}
  Less(a, b T) bool
}

func Sort[C Comparator[T], T any](a []T) {
  var c C
  less := c.Less // has type func(T, T) bool
  // …
}

type SearchTree[C Comparator[T], T any] struct {
  // …
}

The ~struct{} constraints any implementation of Comparator[T] to have underlying type struct{}. It is not strictly necessary, but it serves two purposes here:

  1. It makes clear that Comparator[T] itself is not supposed to carry any state. It only exists to have its method called.
  2. It ensures (as much as possible) that the zero value of C is safe to use. In particular, Comparator[T] would be a normal interface type. And it would have a Less method of the right type, so it would implement itself. But a zero Comparator[T] is nil and would always panic, if its method is called.

An implication of this is that it is not possible to have a Comparator[T] which uses an arbitrary func value. The Less method can not rely on having access to a func to call, for this approach to work.

But you can provide other helpers. This can also be used to combine this approach with the above ones:

type LessOperator[T constraints.Ordered] struct{}

func (LessOperator[T]) Less(a, b T) bool {
  return a < b
}

type LessMethod[T Lesser[T]] struct{}

func (LessMethod[T]) Less(a, b T) bool {
  return a.Less(b)
}

type Reversed[C Comparator[T], T any] struct{}

func (Reversed[C, T]) Less(a, b T) bool {
  var c C
  return c.Less(b, a)
}

The advantage of this approach is that it makes the zero value of SearchTree[C, T] useful. For example, a SearchTree[LessOperator[int], int] can be used directly, without extra initialization.

It also carries over the advantage of decoupling the comparison from the element type, which we got from accepting comparison functions.

One disadvantage is that the comparator can never be inferred. It always has to be specified in the instantiation explicitly1. That’s similar to how we always had to pass a less function explicitly above.

Another disadvantage is that this always requires defining a type for comparisons. Where with the comparison function we could define customizations (like reversing the order) inline with a func literal, this mechanism always requires a method.

Lastly, this is arguably too clever for its own good. Understanding the purpose and idea behind the Comparator type is likely to trip up your users when reading the documentation.

Summary

We are left with these trade-offs:

constraints.Ordered Lesser[T] func(T,T) bool Comparator[T]
Predeclared types 👍 👎 👎 👎
Composite types 👎 👍 👍 👍
Custom order 👎 👍 👍 👍
Reversal helpers 👍 👎 👍 👍
Type boilerplate 👍 👎 👍 👎
Useful zero value 👍 👍 👎 👍
Type inference 👍 👍 👍 👎
Coupled Type/Order 👎 👎 👍 👍
Clarity 👍 🤷2 👍 👎

One thing standing out in this table is that there is no way to both support predeclared types and support user defined types.

It would be great if there was a way to support multiple of these mechanisms using the same code. That is, it would be great if we could write something like

// Ordered is a constraint to allow a type to be sorted.
// If a Less method is present, it has precedent.
type Ordered[T any] interface {
  constraints.Ordered | Lesser[T]
}

Unfortunately, allowing this is harder than one might think.

Until then, you might want to provide multiple APIs to allow your users more flexibility. The standard library currently seems to be converging on providing a constraints.Ordered version and a comparison function version. The latter gets a Func suffix to the name. See the experimental slices package for an example.


  1. Though as we put the Comparator[T] type parameter first, we can infer T from the Comparator↩︎

  2. It’s a little bit worse, but probably fine. ↩︎

at 2022-05-23 17:34

2022-05-16

Mero’s Blog

Calculating type sets is harder than you think

Go 1.18 added the biggest and probably one of the most requested features of all time to the language: Generics. If you want a comprehensive introduction to the topic, there are many out there and I would personally recommend this talk I gave at the Frankfurt Gopher Meetup.

This blog post is not an introduction to generics, though. It is about this sentence from the spec:

Implementation restriction: A compiler need not report an error if an operand’s type is a type parameter with an empty type set.

As an example, consider this interface:

type C interface {
  int
  M()
}

This constraint can never be satisfied. It says that a type has to be both the predeclared type int and have a method M(). But predeclared types in Go do not have any methods. So there is no type satisfying C and its type set is empty. The compiler accepts it just fine, though. That is what this clause from the spec is about.

This decision might seem strange to you. After all, if a type set is empty, it would be very helpful to report that to the user. They obviously made a mistake - an empty type set can never be used as a constraint. A function using it could never be instantiated.

I want to explain why that sentence is there and also go into a couple of related design decisions of the generics design. I’m trying to be expansive in my explanation, which means that you should not need any special knowledge to understand it. It also means, some of the information might be boring to you - feel free to skip the corresponding sections.

That sentence is in the Go spec because it turns out to be hard to determine if a type set is empty. Hard enough, that the Go team did not want to require an implementation to solve that. Let’s see why.

P vs. NP

When we talk about whether or not a problem is hard, we often group problems into two big classes:

  1. Problems which can be solved reasonably efficiently. This class is called P.
  2. Problems which can be verified reasonably efficiently. This class is called NP.

The first obvious follow up question is “what does ‘reasonably efficient’ mean?”. The answer to that is “there is an algorithm with a running time polynomial in its input size”1.

The second obvious follow up question is “what’s the difference between ‘solving’ and ‘verifying’?”.

Solving a problem means what you think it means: Finding a solution. If I give you a number and ask you to solve the factorization problem, I’m asking you to find a (non-trivial) factor of that number.

Verifying a problem means that I give you a solution and I’m asking you if the solution is correct. For the factorization problem, I’d give you two numbers and ask you to verify that the second is a factor of the first.

These two things are often very different in difficulty. If I ask you to give me a factor of 297863737, you probably know no better way than to sit down and try to divide it by a lot of numbers and see if it comes out evenly. But if I ask you to verify that 9883 is a factor of that number, you just have to do a bit of long division and it either divides it, or it does not.

It turns out, that every problem which is efficiently solvable is also efficiently verifiable. You can just calculate the solution and compare it to the given one. So every problem in P is also in NP2. But it is a famously open question whether the opposite is true - that is, we don’t really know, if there are problems which are hard to solve but easy to verify.

This is hard to know in general. Because us not having found an efficient algorithm to solve a problem does not mean there is none. But in practice we usually assume that there are some problems like that.

One fact that helps us talk about hard problems, is that there are some problems which are as hard as possible in NP. That means we were able to prove that if you can solve one of these problems you can use that to solve any other problem in NP. These problems are called “NP-complete”.

That is, to be frank, plain magic and explaining it is far beyond my capabilities. But it helps us to tell if a given problem is hard, by doing it the other way around. If solving problem X would enable us to solve one of these NP-complete problems then solving problem X is obviously itself NP-complete and therefore probably very hard. This is called a “proof by reduction”.

One example of such problem is boolean satisfiability. And it is used very often to prove a problem is hard.

SAT

Imagine I give you a boolean function. The function has a bunch of bool arguments and returns bool, by joining its arguments with logical operators into a single expression. For example:

func F(x, y, z bool) bool {
  return ((!x && y) || z) && (x || !y)
}

If I give you values for these arguments, you can efficiently tell me if the formula evaluates to true or false. You just substitute them in and evaluate every operator. For example

f(true, true, false)
   ((!true && true) || false) && (true || !true)
   ((false && true) || false) && (true || !true)
   ((false && true) || false) && (true || false)
   ((false && true) || false) && true
    (false && true) || false
     false && true
     false

This takes at most one step per operator in the expression. So it takes a linear number of steps in the length of the input, which is very efficient.

But if I only give you the function and ask you to find arguments which make it return true - or even to find out whether such arguments exist - you probably have to try out all possible input combinations to see if any of them does. That’s easy for three arguments. But for \(n\) arguments there are \(2^n\) possible assignments, so it takes exponential time in the number of arguments.

The problem of finding arguments that makes such a function return true (or proving that no such arguments exists) is called “boolean satisfiability” and it is NP-complete.

It is extremely important in what form the expression is given, though. Some forms make it pretty easy to solve, while others make it hard.

For example, every expression can be rewritten into what is called a “Disjunctive Normal Form” (DNF). It is called that because it consists of a series of conjunction (&&) terms, joined together by disjunction (||) operators3:

func F_DNF(x, y, z bool) bool {
  return (x && z) || (!y && z)
}

(You can verify that this is the same function as above, by trying out all 8 input combinations)

Each term has a subset of the arguments, possibly negated, joined by &&. The terms are then joined together using ||.

Solving the satisfiability problem for an expression in DNF is easy:

  1. Go through the individual terms. || is true if and only if either of its operands is true. So for each term:
    • If it contains both an argument and its negation (x && !x) it can never be true. Continue to the next term.
    • Otherwise, you can infer valid arguments from the term:
      • If it contains x, then we must pass true for x
      • If it contains !x, then we must pass false for x
      • If it contains neither, then what we pass for x does not matter and either value works.
    • The term then evaluates to true with these arguments, so the entire expression does.
  2. If none of the terms can be made true, the function can never return true and there is no valid set of arguments.

On the other hand, there is also a “Conjunctive Normal Form” (CNF). Here, the expression is a series of disjunction (||) terms, joined together with conjunction (&&) operators:

func F_CNF(x, y, z bool) bool {
  return (!x || z) && (y || z) && (x || !y)
}

(Again, you can verify that this is the same function)

For this, the idea of our algorithm does not work. To find a solution, you have to take all terms into account simultaneously. You can’t just tackle them one by one. In fact, solving satisfiability on CNF (often abbreviated as “CNFSAT”) is NP-complete4.

It turns out that every boolean function can be written as a single expression using only ||, && and !. In particular, every boolean function has a DNF and a CNF.

Very often, when we want to prove a problem is hard, we do so by reducing CNFSAT to it. That’s what we will do for the problem of calculating type sets. But there is one more preamble we need.

Sets and Satisfiability

There is an important relationship between sets and boolean functions.

Say we have a type T and a Universe which contains all possible values of T. If we have a func(T) bool, we can create a set from that, by looking at all objects for which the function returns true:

var Universe Set[T]

func MakeSet(f func(T) bool) Set[T] {
  s := make(Set[T])
  for v := range Universe {
    if f(v) {
      s.Add(v)
    }
  }
  return s
}

This set contains exactly all elements for which f is true. So calculating f(v) is equivalent to checking s.Contains(v). And checking if s is empty is equivalent to checking if f can ever return true.

We can also go the other way around:

func MakeFunc(s Set[T]) func(T) bool {
  return func(v T) bool {
    return s.Contains(v)
  }
}

So in a sense func(T) bool and Set[T] are “the same thing”. We can transform a question about one into a question about the other and back.

As we observed above it is important how a boolean function is given. To take that into account we have to also convert boolean operators into set operations:

// Union(s, t) contains all elements which are in s *or* in t.
func Union(s, t Set[T]) Set[T] {
  return MakeSet(func(v T) bool {
    return s.Contains(v) || t.Contains(v)
  })
}

// Intersect(s, t) contains all elements which are in s *and* in t.
func Intersect(s, t Set[T]) Set[T] {
  return MakeSet(func(v T) bool {
    return s.Contains(v) && t.Contains(v)
  })
}

// Complement(s) contains all elements which are *not* in s.
func Complement(s Set[T]) Set[T] {
  return MakeSet(func(v T) bool {
    return !s.Contains(v)
  })
}

And back:

// Or creates a function which returns if f or g is true.
func Or(f, g func(T) bool) func(T) bool {
  return MakeFunc(Union(MakeSet(f), MakeSet(g)))
}

// And creates a function which returns if f and g are true.
func And(f, g func(T) bool) func(T) bool {
  return MakeFunc(Intersect(MakeSet(f), MakeSet(g)))
}

// Not creates a function which returns if f is false
func Not(f func(T) bool) func(T) bool {
  return MakeFunc(Complement(MakeSet(f)))
}

The takeaway from all of this is that constructing a set using Union, Intersect and Complement is really the same as writing a boolean function using ||, && and !.

And proving that a set constructed in this way is empty is the same as proving that a corresponding boolean function is never true.

And because checking that a boolean function is never true is NP-complete, so is checking if one of the sets constructed like this.

With this, let us look at the specific sets we are interested in.

Basic interfaces as type sets

Interfaces in Go are used to describe sets of types. For example, the interface

type S interface {
    X()
    Y()
    Z()
}

is “the set of all types which have a method X() and a method Y() and a method Z()”.

We can also express set intersection, using interface embedding:

type S interface { X() }
type T interface { Y() }
type U interface {
    S
    T
}

This expresses the intersection of S and T as an interface. Or we can view the property “has a method X()” as a boolean variable and think of this as the formula x && y.

Surprisingly, there is also a limited form of negation. It happens implicitly, because a type can not have two different methods with the same name. Implicitly, if a type has a method X() it does not have a method X() int for example:

type X interface { X() }
type NotX interface{ X() int }

There is a small snag: A type can have neither a method X() nor have a method X() int. That’s why our negation operator is limited. Real boolean variables are always either true or false, whereas our negation also allows them to be neither. In mathematics we say that this logic language lacks the law of the excluded middle (also called “Tertium Non Datur” - “there is no third”). For this section, that does not matter. But we have to worry about it later.

Because we have intersection and negation, we can express interfaces which could never be satisfied by any type (i.e. which describe an empty type set):

interface{ X; NotX }

The compiler rejects such interfaces. But how can it do that? Did we not say above that checking if a set is empty is NP-complete?

The reason this works is that we only have negation and conjunction (&&). So all the boolean expressions we can build with this language have the form

x && y && !z

These expressions are in DNF! We have a term, which contains a couple of variables - possibly negated - and joins them together using &&. We don’t have ||, so there is only a single term.

Solving satisfiability in DNF is easy, as we said. So with the language as we have described it so far, we can only express type sets which are easy to check for emptiness.

Adding unions

Go 1.18 extends the interface syntax. For our purposes, the important addition is the | operator:

type S interface{
    A | B
}

This represents the set of all types which are in the union of the type sets A and B - that is, it is the set of all types which are in A or in B (or both).

This means our language of expressible formulas now also includes a ||-operator - we have added set unions and set unions are equivalent to || in the language of formulas. What’s more, the form of our formula is now a conjunctive normal form - every line is a term of || and the lines are connected by &&:

type X interface { X() }
type NotX interface{ X() int }
type Y interface { Y() }
type NotY interface{ Y() int }
type Z interface { Z() }
type NotZ interface{ Z() int }

// (!x || z) && (y || z) && (x || !y)
type S interface {
    NotX | Z
    Y | Z
    X | NotY
}

This is not quite enough to prove NP-completeness though, because of the snag above. If we want to prove that it is easy, it does not matter that a type can have neither method. But if we want to prove that it is hard, we really need an exact equivalence between boolean functions and type sets. So we need to guarantee that a type has one of our two contradictory methods.

“Luckily”, the | operator gives us a way to fix that:

type TertiumNonDatur interface {
    X | NotX
    Y | NotY
    Z | NotZ
}

// (!x || z) && (y || z) && (x || !y)
type S interface {
    TertiumNonDatur

    NotX | Z
    Y | Z
    X | NotY
}

Now any type which could possibly implement S must have either an X() or an X() int method, because it must implement TertiumNonDatur as well. So this extra interface helps us to get the law of the excluded middle into our language of type sets.

With this, checking if a type set is empty is in general as hard as checking if an arbitrary boolean formula in CNF has no solution. As described above, that is NP-complete.

Even worse, we want to define which operations are allowed on a type parameter by saying that it is allowed if every type in a type set supports it. However, that check is also NP-complete.

The easy way to prove that is to observe that if a type set is empty, every operator should be allowed on a type parameter constrained by it. Because any statement about “every element of the empty set“ is true5.

But this would mean that type-checking a generic function would be NP-complete. If an operator is used, we have to at least check if the type set of its constraint is empty. Which is NP-complete.

Why do we care?

A fair question is “why do we even care? Surely these cases are super exotic. In any real program, checking this is trivial”.

That’s true, but there are still reasons to care:

  • Go has the goal of having a fast compiler. And importantly, one which is guaranteed to be fast for any program. If I give you a Go program, you can be reasonably sure that it compiles quickly, in a time frame predictable by the size of the input.

    If I can craft a program which compiles slowly - and may take longer than the lifetime of the universe - this is no longer true.

    This is especially important for environments like the Go playground, which regularly compiles untrusted code.

  • NP complete problems are notoriously hard to debug if they fail.

    If you use Linux, you might have occasionally run into a problem where you accidentally tried installing conflicting versions of some package. And if so, you might have noticed that your computer first chugged along for a while and then gave you an unhelpful error message about the conflict. And maybe you had trouble figuring out which packages declared the conflicting dependencies.

    This is typical for NP complete problems. As an exact solution is often too hard to compute, they rely on heuristics and randomization and it’s hard to work backwards from a failure.

  • We generally don’t want the correctness of a Go program to depend on the compiler used. That is, a program should not suddenly stop compiling because you used a different compiler or the compiler was updated to a new Go version.

    But NP-complete problems don’t allow us to calculate an exact solution. They always need some heuristic (even if it is just “give up after a bit”). If we don’t want the correctness of a program to be implementation defined, that heuristic must become part of the Go language specification. But these heuristics are very complex to describe. So we would have to spend a lot of room in the spec for something which does not give us a very large benefit.

Note that Go also decided to restrict the version constraints a go.mod file can express, for exactly the same reasons. Go has a clear priority, not to require too complicated algorithms in its compilers and tooling. Not because they are hard to implement, but because the behavior of complicated algorithms also tends to be hard to understand for humans.

So requiring to solve an NP-complete problem is out of the question.

The fix

Given that there must not be an NP-complete problem in the language specification and given that Go 1.18 was released, this problem must have somehow been solved.

What changed is that the language for describing interfaces was limited from what I described above. Specifically

Implementation restriction: A union (with more than one term) cannot contain the predeclared identifier comparable or interfaces that specify methods, or embed comparable or interfaces that specify methods.

This disallows the main mechanism we used to map formulas to interfaces above. We can no longer express our TertiumNonDatur type, or the individual | terms of the formula, as the respective terms specify methods. Without specifying methods, we can’t get our “implicit negation” to work either.

The hope is that this change (among a couple of others) is sufficient to ensure that we can always calculate type sets accurately. Which means I pulled a bit of a bait-and-switch: I said that calculating type sets is hard. But as they were actually released, they might not be.

The reason I wrote this blog post anyways is to explain the kind of problems that exist in this area. It is easy to say we have solved this problem once and for all.

But to be certain, someone should prove this - either by writing a proof that the problem is still hard or by writing an algorithm which solves it efficiently.

There are also still discussions about changing the generics design. As one example, the limitations we introduced to fix all of this made one of the use cases from the design doc impossible to express. We might want to tweak the design to allow this use case. We have to look out in these discussions, so we don’t re-introduce NP-completeness. It took us some time to even detect it when the union operator was proposed.

And there are other kinds of “implicit negations” in the Go language. For example, a struct can not have both a field and a method with the same name. Or being one type implies not being another type (so interface{int} implicitly negates interface{string}).

All of which is to say that even if the problem might no longer be NP-complete - I hope that I convinced you it is still more complicated than you might have thought.

If you want to discuss this further, you can find links to my social media on the bottom of this site.


I want to thank my beta-readers for helping me improve this article. Namely arnehormann, @johanbrandhorst, @mvdan_, @_myitcv, @readcodesing, @rogpeppe and @zekjur.

They took a frankly unreasonable chunk of time out of their day. And their suggestions were invaluable.


  1. It should be pointed out, though, that “polynomial” can still be extremely inefficient. \(n^{1000}\) still grows extremely fast, but is polynomial. And for many practical problems, even \(n^3\) is intolerably slow. But for complicated reasons, there is a qualitatively important difference between “polynomial” and “exponential”6 run time. So you just have to trust me that the distinction makes sense. ↩︎

  2. These names might seem strange, by the way. P is easy to explain: It stands for “polynomial”.

    NP doesn’t mean “not polynomial” though. It means “non-deterministic polynomial”. A non-deterministic computer, in this context, is a hypothetical machine which can run arbitrarily many computations simultaneously. A program which can be verified efficiently by any computer can be solved efficiently by a non-deterministic one. It just tries out all possible solutions at the same time and returns a correct one.

    Thus, being able to verify a problem on a normal computer means being able to solve it on a non-deterministic one. That is why the two definitions of NP “verifiable by a classical computer” and “solvable by a non-deterministic computer” mean the same thing. ↩︎

  3. You might complain that it is hard to remember if the “disjunctive normal form” is a disjunction of conjunctions, or a conjunction of disjunctions - and that no one can remember which of these means && and which means || anyways.

    You would be correct. ↩︎

  4. You might wonder why we can’t just solve CNFSAT by transforming the formula into DNF and solving that.

    The answer is that the transformation can make the formula exponentially larger. So even though solving the problem on DNF is linear in the size the DNF formula, that size is exponential in the size of the CNF formula. So we still use exponential time in the size of the CNF formula. ↩︎

  5. This is called the principle of explosion or “ex falso quodlibet” (“from falsehoold follows anything”).

    Many people - including many first year math students - have anxieties and confusion around this principle and feel that it makes no sense. So I have little hope that I can make it palatable to you. But it is extremely important for mathematics to “work” and it really is the most reasonable way to set things up.

    Sorry. ↩︎

  6. Yes, I know that there are complexity classes between polynomial and exponential. Allow me the simplification. ↩︎

at 2022-05-16 09:33

2022-05-14

sECuREs website

25 Gbit/s HTTP and HTTPS download speeds

Now that I recently upgraded my internet connection to 25 Gbit/s, I was curious how hard or easy it is to download files via HTTP and HTTPS over a 25 Gbit/s link. I don’t have another 25 Gbit/s connected machine other than my router, so I decided to build a little lab for tests like these 🧑‍🔬

Hardware and Software setup

I found a Mellanox ConnectX-4 Lx for the comparatively low price of 204 CHF on digitec:

To connect it to my router, I ordered a MikroTik XS+DA0003 SFP28/SFP+ Direct Attach Cable (DAC) with it. I installed the network card into my old workstation (on the right) and connected it with the 25 Gbit/s DAC to router7 (on the left):

25 Gbit/s router (left)

Component Model
Mainboard ASRock B550 Taichi
CPU AMD Ryzen 5 5600X 6-Core Processor
Network card Intel XXV710
Linux Linux 5.17.4 (router7)
curl 7.83.0 from debian bookworm
Go net/http from Go 1.18

router7 comes with TCP BBR enabled by default.

Old workstation (right)

Component Model
Mainboard ASUS PRIME Z370-A
CPU Intel i9-9900K CPU @ 3.60GHz
Network card Mellanox ConnectX-4
Linux 5.17.5 (Arch Linux)
nginx 1.21.6
caddy 2.4.3

Test preparation

Before taking any measurements, I do one full download so that the file contents are entirely in the Linux page cache, and the measurements therefore no longer contain the speed of the disk.

big.img in the tests below refers to the 35 GB test file I’m downloading, which consists of distri-disk.img repeated 5 times.

T1: HTTP download speed (unencrypted)

T1.1: Single TCP connection

The simplest test is using just a single TCP connection, for example:

curl -v -o /dev/null http://oldmidna:8080/distri/tmp/big.img
./httpget25 http://oldmidna:8080/distri/tmp/big.img
Client Server Gbit/s
curl nginx
23.4
curl caddy
23.4
Go nginx
20
Go caddy
20.2

curl can saturate a 25 Gbit/s link without any trouble.

The Go net/http package is slower and comes in at 20 Gbit/s.

T1.2: Multiple TCP connections

Running 4 of these downloads concurrently is a reliable and easy way to saturate a 25 Gbit/s link:

for i in $(seq 0 4)
do
  curl -v -o /dev/null http://oldmidna:8080/distri/tmp/big.img &
done
Client Server Gbit/s
curl nginx
23.4
curl caddy
23.4
Go nginx
23.4
Go caddy
23.4

T2: HTTPS download speed (encrypted)

At link speeds this high, enabling TLS slashes bandwidth in half or worse.

Using 4 TCP connections allows saturating a 25 Gbit/s link.

Caddy uses more CPU to serve files compared to nginx.

T2.1: Single TCP connection

This test works the same as T1.1, but with a HTTPS URL:

curl -v -o /dev/null --insecure https://oldmidna:8443/distri/tmp/big.img
./httpget25 https://oldmidna:8443/distri/tmp/big.img
Client Server Gbit/s
curl nginx
8
curl caddy
7.5
Go nginx
12
Go caddy
7.2

T2.2: Multiple TCP connections

This test works the same as T1.2, but with a HTTPS URL:

for i in $(seq 0 4)
do
  curl -v -o /dev/null --insecure https://oldmidna:8443/distri/tmp/big.img &
done

Curiously, the Go net/http client downloading from caddy cannot saturate a 25 Gbit/s link.

Client Server Gbit/s
curl nginx
23.4
curl caddy
23.4
Go nginx
23.4
Go caddy
21.6

T3: HTTPS with Kernel TLS (KTLS)

Linux 4.13 got support for Kernel TLS back in 2017.

nginx 1.21.4 introduced support for Kernel TLS, and they have a blog post on how to configure it.

In terms of download speeds, there is no difference with or without KTLS. But, enabling KTLS noticeably reduces CPU usage, from ≈10% to a steady 2%.

For even newer network cards such as the Mellanox ConnectX-6, the kernel can even offload TLS onto the network card!

T3.1: Single TCP connection

Client Server Gbit/s
curl nginx
8
Go nginx
12

T3.2: Multiple TCP connections

Client Server Gbit/s
curl nginx
23.4
Go nginx
23.4

Conclusions

When downloading from nginx with 1 TCP connection, with TLS encryption enabled (HTTPS), the Go net/http client is faster than curl!

Caddy is slightly slower than nginx, which manifests itself in slower speeds with curl and even slower speeds with Go’s net/http.

To max out 25 Gbit/s, even when using TLS encryption, just use 3 or more connections in parallel. This helps with HTTP and HTTPS, with any combination of client and server.

Appendix

Go net/http test program httpget25.go
package main

import (
	"crypto/tls"
	"flag"
	"fmt"
	"io"
	"io/ioutil"
	"log"
	"net/http"
)

func httpget25() error {
	http.DefaultTransport.(*http.Transport).TLSClientConfig = &tls.Config{InsecureSkipVerify: true}

	for _, arg := range flag.Args() {
		resp, err := http.Get(arg)
		if err != nil {
			return err
		}
		if resp.StatusCode != http.StatusOK {
			return fmt.Errorf("unexpected HTTP status code: want %v, got %v", http.StatusOK, resp.Status)
		}
		io.Copy(ioutil.Discard, resp.Body)
	}
	return nil
}

func main() {
	flag.Parse()
	if err := httpget25(); err != nil {
		log.Fatal(err)
	}
}
Caddy config file Caddyfile
{
  local_certs
  http_port 8080
  https_port 8443
}

http://oldmidna:8080 {
  file_server browse
}

https://oldmidna:8443 {
  file_server browse
}
nginx installation instructions
mkdir -p ~/lab25
cd ~/lab25

wget https://nginx.org/download/nginx-1.21.6.tar.gz
tar tf nginx-1.21.6.tar.gz

wget https://www.openssl.org/source/openssl-3.0.3.tar.gz
tar xf openssl-3.0.3.tar.gz

cd nginx-1.21.6
./configure --with-http_ssl_module --with-http_v2_module --with-openssl=$HOME/lab25/openssl-3.0.3 --with-openssl-opt=enable-ktls
make -j8
cd objs
./nginx -c nginx.conf -p $HOME/lab25
nginx config file nginx.conf
worker_processes  auto;

pid        logs/nginx.pid;

daemon off;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    access_log /home/michael/lab25/logs/access.log  combined;

    sendfile        on;
    sendfile_max_chunk 2m;

    keepalive_timeout  65;

    server {
        listen       8080;
        listen [::]:8080;
        server_name  localhost;

        root /srv/repo.distr1.org/;

        location / {
            index index.html index.htm;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root /usr/share/nginx/html;
        }

        location /distri {
            autoindex on;
        }
    }

    server {
        listen 8443 ssl;
        listen [::]:8443 ssl;
        server_name localhost;

        ssl_certificate nginx-ecc-p256.pem;
        ssl_certificate_key nginx-ecc-p256.key;

        #ssl_conf_command Options KTLS;

        ssl_buffer_size 32768;
        ssl_protocols TLSv1.3;

        root /srv/repo.distr1.org/;

        location / {
            index index.html index.htm;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root /usr/share/nginx/html;
        }

        location /distri {
            autoindex on;
        }
    }
}

at 2022-05-14 14:18

2022-04-23

sECuREs website

My upgrade to 25 Gbit/s Fiber To The Home

My favorite internet service provider, init7, is rolling out faster speeds with their infrastructure upgrade. Last week, the point of presence (POP) that my apartment’s fiber connection terminates in was upgraded, so now I am enjoying a 25 Gbit/s fiber internet connection!

My first internet connections

(Feel free to skip right to the 25 Gbit/s announcement section, but I figured this would be a good point to reflect on the last 20 years of internet connections for me!)

The first internet connection that I consciously used was a symmetric DSL connection that my dad († 2020) shared between his home office and the rest of the house, which was around the year 2000. My dad was an early adopter and was connected to the internet well before then using dial up connections, but the SDSL connection in our second house was the first connection I remember using myself. It wasn’t particularly fast in terms of download speed — I think it delivered 256 kbit/s or something along those lines.

I encountered two surprises with this internet connection. The first surprise was that the upload speed (also 256 kbit/s — it was a symmetric connection) was faster than other people’s. At the time, even DSL connections with much higher download speeds were asymmetric (ADSL) and came with only 128 kbit/s upload. I learnt this while making first contact with file sharing: people kept asking me to stay online so that their transfers would complete more quickly.

The second surprise was the concept of a metered connection, specifically one where you pay more the more data you transfer. During the aforementioned file sharing experiments, it never crossed my mind that down- or uploading files could result in extra charges.

These two facts combined resulted in a 3000 € surprise bill for my dad!

Luckily, his approach to solve this problem wasn’t to restrict my internet usage, but rather to buy a cheap, separate ADSL flatrate line for the family (from Telekom, which he hated), while he kept the good SDSL metered line for his business.

I still vividly remember the first time that ADSL connection synchronized. It was a massive upgrade in download speed (768 kbit/s!), but a downgrade in upload speed (128 kbit/s). But, because it was a flatrate, it made possible new use cases for my dad, who would jump on this opportunity to download a number of CD images to upgrade the software of his SGI machines.

The different connection speeds and characteristics have always interested me, and I used several other connections over the years, all of which felt limiting. The ADSL connection at my parent’s place started at 1 Mbit/s, was upgraded first to 3 Mbit/s, then 6 Mbit/s, and eventually reached its limit at 16 Mbit/s. When I spent one semester in Ireland, I had a 9 Mbit/s ADSL connection, and then later in Zürich I started out with a 15 Mbit/s ADSL connection.

All of these connections have always felt limiting, like peeking through the keyhole to see a rich world behind, but not being able to open the door. We’ve had to set up (and tune) traffic shaping, and coordinate when large downloads were okay.

My first fiber connection

The dream was always to leave ADSL behind and get a fiber connection. The advantages are numerous: lower latency (ADSL came with 40 ms at the time), much higher bandwidth (possibly Gigabit/s?) and typically the connection was established via ethernet (instead of PPPoE). Most importantly, once the fiber is there, you can upgrade both ends to achieve higher speeds.

In Zürich, I managed to get a fiber connection set up in my apartment after fighting bureaucracy for many months. The issue was that there was no permission slip on file at Swisscom. Either the owner of my apartment never signed it to begin with, or it got lost. This is not a state that the online fiber availability checker can represent, but once you know it, the fix is easy: just have Swisscom send out the form again, have the owner sign it, and a few weeks later, you can order!

One wrinkle was that availability was only fixed in the Swisscom checker, and it was unclear when EWZ or other providers would get an updated data dump. Hence, I ordered Swisscom fiber to get things moving as quick as possible, and figured I could switch to a different provider later.

Here’s a picture of when the electrician pulled the fiber from the building entry endpoint (BEP) in the basement into my flat, from March 2014:

Switching to fiber7

Only two months after I first got my fiber connection, init7 launched their fiber7 offering, and I switched from Swisscom to fiber7 as quickly as I could.

The switch was worth it in every single dimension:

  • Swisscom charged over 200 CHF per month for a 1 Gbit/s download, 100 Mbit/s upload fiber connection. fiber7 costs only 65 CHF per month and comes with a symmetric 1 Gbit/s connection. (Other providers had to follow, so now symmetric is standard.)
  • init7’s network performs much better than Swisscom’s: ping times dropped when I switched, and downloads are generally much faster. Note that this is with the same physical fiber line, so the difference is thanks to the numerous peerings that init7 maintains.
  • init7 gives you a static IPv6 prefix (if you want) for free, and even delegates reverse DNS to your servers of choice.
  • I enjoy init7’s unparalleled transparency. For example, check out the blog post about cost calculation if you’re ever curious if there could be a fiber7 POP in your area.

I have been very happy with my fiber7 connection ever since. What I wrote in 2014 regarding its performance remained true over the years — downloads were always fast for me, latencies were low, outages were rare (and came with good explanations).

I switched hardware multiple times over the years:

  • First, I started with the Ubiquiti EdgeRouter Lite which could handle the full Gigabit line rate (the MikroTik router I originally ordered maxed out at about 500 Mbit/s!).
  • In 2017, I switched to the Turris Omnia, an open hardware, open source software router that comes with automated updates.
  • In July 2018, after my connectivity was broken due to an incompatibility between the DHCPv6 client on the Turris Omnia and fiber7, I started developing my own router7 in Go, my favorite programming language, mostly for fun, but also as a proof of concept for some cool features I think routers should have. For example, you can retro-actively start up Wireshark and open up a live ring buffer of the last few hours of network configuration traffic.

Notably, init7 encourages people to use their preferred router (Router Freedom).

The 25 Gbit/s announcement

Over the years, other Swiss internet providers such as Swisscom and Salt introduced 10 Gbit/s offerings, so an obvious question was when init7 would follow suit.

People who were following init7 closely already knew that an infrastructure upgrade was coming. In 2020, init7 CEO Fredy Künzler disclosed that in 2021, init7 would start offering 10 Gbit/s.

What nobody expected before init7 announced it on their seventh birthday, however, was that init7 started offering not only 10 Gbit/s (Fiber7-X), but also 25 Gbit/s connections (Fiber7-X2)! 🤯

This was init7’s announcement on Twitter:

With this move, init7 has done it again: they introduced an offer that is better than anything else in the Swiss internet market, perhaps even world-wide!

One interesting aspect is init7’s so-called «MaxFix principle»: maximum speed for a fixed price. No matter if you’re using 1 Gbit/s or 25 Gbit/s, you pay the same monthly fee. init7’s approach is to make the maximum bandwidth available to you, limited only by your physical connection. This is such a breath of fresh air compared to other ISPs that think rate-limiting customers to ridiculously low speeds is somehow acceptable on an FTTH offering 🙄 (recent example).

If you’re curious about the infrastructure upgrade that enabled this change, check out init7’s blog post about their new POP infrastructure.

What for? The use-case

A common first reaction to fast network connections is the question: “For what do you need so much bandwidth?”

Interestingly enough, I heard this question as recently as last year, in the context of a Gigabit internet connection! Some people can’t imagine using more than 100 Mbit/s. And sure, from a certain perspective, I get it — that 100 Mbit/s connection will not be overloaded any time soon.

But, looking at when a line is overloaded is only one aspect to take into account when deciding how fast of a connection you want.

There is a lower limit where you notice your connection is slow. Back in 2014, a 2 Mbit/s connection was noticeably slow for regular web browsing. These days, even a 10 Mbit/s connection is noticeably slow when re-opening my browser and loading a few tabs in parallel.

So what should you get? A 100 Mbit/s line? 500 Mbit/s? 1000 Mbit/s? Personally, I like to not worry about it and just get the fastest line I can, to reduce any and all wait times as much as possible, whenever possible. It’s a freeing feeling! Here are a few specific examples:

  • If I have to wait only 17 minutes to download a PS5 game, that can make the difference between an evening waiting in frustration, or playing the title I’ve been waiting for.
  • If I can run a daily backup (over the internet) of all servers I care about without worrying that the transfers interfere with my work video calls, that gives me peace of mind.
  • If I can transfer a Debian Code Search index to my computer for debugging when needed, that might make the difference between being able to use the limited spare time I have to debug or improve Debian Code Search, or having to postpone that improvement until I find more time.

Aside from my distaste for waiting, a fast and reliable fiber connection enables self-hosting. In particular for my distri Linux project where I explore fast package installation, it’s very appealing to connect it to the internet on as fast a line as possible. I want to optimize all the parts: software architecture and implementation, hardware, and network connectivity. But, for my hobby project budget, getting even a 10 Gbit/s line at a server hoster is too expensive, let alone a 25 Gbit/s line!

Lastly, even if there isn’t really a need to have such a fast connection, I hope you can understand that after spending so many years of my life limited by slow connections, that I’ll happily take the opportunity of a faster connection whenever I can. Especially at no additional monthly cost!

Getting ready

Right after the announcement dropped, I wanted to prepare my side of the connection and therefore ordered a MikroTik CCR2004, the only router that init7 lists as compatible. I returned the MikroTik CCR2004 shortly afterwards, mostly because of its annoying fan regulation (spins up to top speed for about 1 minute every hour or so), and also because MikroTik seems to have made no progress at all since I last used their products almost 10 years ago. Table-stakes features such as DNS resolution for hostnames within the local network are still not included!

I expect that more and more embedded devices with SFP28 slots (like the MikroTik CCR2004) will become available over the next few years (hopefully with better fan control!), but at the moment, the selection seems to be rather small.

For my router, I instead went with a custom PC build. Having more space available means I can run larger, slow-spinning fans that are not as loud. Plugging in high-end Intel network cards (2 × 25 Gbit/s, and 4 × 10 Gbit/s on the other one) turns a PC into a 25 Gbit/s capable router.

With my equipment sorted out, I figured it was time to actually place the order. I wasn’t in a hurry to order, because it was clear that it would be months before my POP could be upgraded. But, it can’t hurt to register my interest (just in case it influences the POP upgrade plan). Shortly after, I got back this email from init7 where they promised to send me the SFP module via post:

And sure enough, a few days later, I received the SFP28 module in the mail:

With my router build, and the SFP28 module, I had everything I needed for my side of the connection.

The other side of the connection was originally planned to be upgraded in fall 2021, but the global supply shortage imposed various delays on the schedule.

Eventually, the fiber7 POP list showed an upgrade date of April 2022 for my POP, and that turned out to be correct.

The night of the upgrade

I had read Pim’s blog post on the upgrade of the 1790BRE POP in Brüttisellen, which contains a lot of super interesting details, so definitely check that one out, too!

Being able to plug in the SFP module into the new POP infrastructure yourself (like Pim did) sounded super cool to me, so I decided to reach out, and init7 actually agreed to let me stop by to plug in “my” fiber and SFP module!

Giddy with excitement, I left my place at just before 23:00 for a short walk to the POP building, which I had seen many times before, but never from the inside.

Patrick, the init7 engineer met me in front of the building and explained “Hey! You wrote my window manager!” — what a coincidence :-). Luckily I had packed some i3 stickers that I could hand him as a small thank you.

Inside, I met the other init7 employee working on this upgrade. Pascal, init7’s CTO, was coordinating everything remotely.

Standing in front of init7’s rack, I spotted the old Cisco switch (at the bottom), and the new Cisco C9500-48Y4C switches that were already prepared (at the top). The SFP modules are for customers who decided to upgrade to 10 or 25 Gbit/s, whereas for the others, the old SFP modules would be re-used:

We then spent the next hour pulling out fiber cables and SFP modules out of the old Cisco switch, and plugging them back into the new Cisco switch.

Just like the init7 engineer working with me (who is usually a software guy, too, he explained), I enjoy doing physical labor from time to time for variety. Especially with nice hardware like this, and when it’s for a good cause (faster internet)! It’s almost meditative, in a way, and I enjoyed the nice conversation we had while we were both moving the connections.

After completing about half of the upgrade (the top half of the old Cisco switch), I walked back to my place — still blissfully smiling all the way — to turn up my end of the connection while the others were still on site and could fix any mistakes.

After switching my uplink0 network interface to the faster network card, it also took a full reboot of my router for some reason, but then it recognized the SFP28 module without trouble and successfully established a 25 Gbit/s link! 🎉 🥳

I did a quick speed test to confirm and called it a night.

Speed tests / benchmarks

Just like in the early days of Gigabit connections, my internet connection is now faster than the connection of many servers. It’s a luxury problem to be sure, but in case you’re curious how far a 25 Gbit/s connection gets you in the internet, in this section I collected some speed test results.

Ookla speedtest.net

speedtest.net (run by Ookla) is the best way to measure fast connections that I’m aware of.

Here is my first 25 Gbit/s speedtest, which was run using the init7 speedtest server:

I also ran speedtests to all other servers that were listed for the broader Zürich area at the time, using the tamasboros/ookla-speedtest Docker image. As you can see, most speedtest servers are connected with a 10 Gbit/s port, and some (GGA Maur) even only with a 1 Gbit/s port:

Speedtest server latency download (mbps) upload (mbps)
Init7 AG - Winterthur 1.45 23530.27 23031.24
fdcservers.net 18.15 9386.29 1262.92
GIB-Solutions AG - Schlieren 6.64 9154.12 2207.68
Monzoon Networks AG 0.74 8874.85 6427.66
Glattwerk AG 0.92 8719.04 4008.28
AltusHost B.V. 0.80 8373.34 8518.90
iWay AG - Zurich 2.13 8337.56 8194.89
Sunrise Communication AG 9.04 8279.60 3109.34
31173 Services AB 18.69 8279.75 1503.92
Wingo 4.25 6179.57 5248.36
Netrics Zürich AG 0.74 7910.78 8770.19
Cloudflare - Zurich 1.14 7410.97 2218.88
Netprotect - Zurich 0.87 7034.62 8948.01
C41.ch - Zurich 9.90 6792.60 690.33
Goldenphone GmbH 18.91 3116.32 659.23
GGA Maur 0.99 940.24 941.24

Linux mirrors

For a few popular Linux distributions, I went through the mirror list and tried all servers in Switzerland and Germany. Only one or two would be able to deliver files at more than 1 Gigabit/s. Other miror servers were either capped at 1 Gigabit/s, or wouldn’t even reach that (slow disks?).

Here are the fast ones:

  • Debian: mirror1.infomaniak.com and mirror2.infomaniak.com
  • Arch Linux: mirror.puzzle.ch
  • Fedora Linux: mirrors.xtom.de
  • Ubuntu Linux: mirror.netcologne.de and ubuntu.ch.altushost.com

iperf3

Using iperf3 -P 2 -c speedtest.init7.net, iperf3 shows 23 Gbit/s:

[SUM]   0.00-10.00  sec  26.9 GBytes  23.1 Gbits/sec  597             sender
[SUM]   0.00-10.00  sec  26.9 GBytes  23.1 Gbits/sec                  receiver

It’s hard to find public iperf3 servers that are connected with a fast-enough port. I could only find one that claims to be connected via a 40 Gbit/s port, but it was unavailable when I wanted to test.

Interested in a speed test?

Do you have a ≥ 10 Gbit/s line in Europe, too? Are you interested in a speed test? Reach out to me and we can set something up.

Conclusion

What an exciting time to be an init7 customer! I still can’t quite believe that I now have a 25 Gbit/s connection in 2022, and it feels like I’m living 10 years in the future.

Thank you to Fredy, Pascal, Patrick, and all the other former and current init7 employees for showing how to run an amazing Internet Service Provider. Thank you for letting me peek behind the curtains, and keep up the good work! 💪

If you want to learn more, check out Pascal’s talk at DENOG:

at 2022-04-23 14:00

2022-04-08

RaumZeitLabor

Obatzda Wars - The Emmentaler stinks back

Liebe Cheddar-Ritter,

die diesjährige Jahresversammlung der Rotwein-Rebell Alliance findet am Samstag, den 30. April 2022, auf unserem Heimatplaneten RZL unter dem Motto “RoqueFort One” statt. Lasst uns gemeinsam ab 18.30 Uhr gegen die intergalaktosische Eintönigkeit anessen! Auch die Fontina Band (“Esst den selben Tomme nochmal… Den selben Tomme nochmal”) wird selbstverständlich mit von der Partie sein.

Bitte meldet euch bis zum 23. April 2022 per Mail an und schreibt dazu, ob ihr noch Luke SkyVachard oder CacioBacca mitbringt. Wir bitten um Zahlung einer Käsepauschale von mindestens 12 Idiazabalen Credits pro Kopf.

May the Scamorza be with you F-Leia-Rattie und die Ha’Niolos

P.S.: Dies ist eine öffentliche Veranstaltung. Bitte beachtet das Hygienekonzept!

Brie-B-8

by flederrattie at 2022-04-08 00:00

2022-03-19

sECuREs website

Smart Home components 🏠

I have tried a bunch of different Smart Home products over the last few years and figured I would give an overview of which ones I liked, which ones I disliked, and how I would go about selecting good Smart Home products to buy.

Smart Lights

To me, the primary advantage of Smart Lights is the flexibility in where you place extra light switches, and the extra functions that become much easier with Smart Lights.

For example, I have added an extra light switch in the bed and next to the couch, without having to have an electrician tear up the walls to add more wiring. An “all-off” button is super handy at the end of the day or when watching a movie.

Other attractive use-cases include controlling lights based on time of the day, based on whether people are home, or based on a motion sensor.

I used the RGB color light bulb version of all of the below systems. In practice, we typically don’t change the color much, but it is nice to be able to adjust the color and brightness to something that fits the respective room. And, every once in a while, scenes that use color are fun!

Moved away from: IKEA TRÅDFRI 👎

The first smart light system I used was IKEA TRÅDFRI. I figured as a system with a large user base, they would be inclined to improve it over time, and compatibility should be more likely than with other, smaller vendors.

Unfortunately the system is pretty much unchanged from when I first bought it many years ago.

You can easily find documentation about the API for using the TRÅDFRI gateway programmatically, but when I looked for available Go packages, I decided to use COAP and DTLS myself back in 2019 for lack of an attractive Go package.

The light switches are good in terms of features, and easy to install: you can just remove the old switch and glue the TRÅDFRI switch over the existing switch.

The downside of the light switches is that they are flimsy: because the switch is magnetically held in place in its case, it can easily fall on the floor when you bump against it.

Pairing the devices was always tricky for me. It got easier when I turned off all other ZigBee devices in my apartment before doing anything with IKEA devices.

At multiple points, the devices lost their pairing. It might have been when they ran out of battery.

The battery lifetime of the light switches was very poor — only about a year on average. They use the CR2032 form factor, which my charger does not support, so I couldn’t use rechargables.

Swapping out the batteries and re-pairing the system every year or so quickly becomes tedious!

Moved away from: Shelly Bulb 👎

Because I also bought some Shelly 1L smart relays, I figured I’d give the Shelly Bulb a try.

Instead of ZigBee, the Shelly Bulbs use WiFi. This makes them easy to get into your home network and does not require a separate gateway.

At 2 bulbs per room+hallway, and 2 buttons each, that sums up to having 16 extra devices in your WiFi network. This wasn’t a problem for me in practice, but depending on how stable your WiFi network is, it might be a concern.

Notably, this also means your lights can’t be controlled while your WiFi is unavailable.

In terms of physical light switches, you’ll need to use a separate product such as the Shelly Button. This is the weakest point of the system. The latency is noticeable, even when configuring a static IP address, which does make things better, but still not good. The Shelly Button is extremely simple, so dimming has to be emulated with double or triple-press actions.

Given that one typically interacts with this system multiple times a day via its switches, I think it makes sense to chose a system that has good switches.

On the plus side, the Shelly Button uses a rechargable battery that can be charged from a USB power bank, which is a concept I really like.

Philips Hue 👍

After the Shelly Bulb, I figured I’d try Philips Hue. It’s by far the most expensive system of the ones I have tried, but also by far the most polished and user-friendly.

People recommended the Feller Smart Light Control switches, which use energy harvesting (from you clicking them!) and hence don’t require a battery.

This makes it easy to place them anywhere, like next to the couch in the picture on the left.


Feller recommends extending existing installations by buying the next-larger mounting plate. Extending the box in the wall is not required, as no wires or in-wall space are needed. Drilling new holes for extra screws is required for stability, but that’s a lot more doable than extending the whole box. Here are some pictures before, during and after the installation:

Shelly 1L 👍

The Shelly 1L is a very interesting device. It goes behind your existing device into the wall and makes it smart!

This allows you to make smart any existing lights that can’t easily be replaced by smart lights, for example a bathroom light built into the bathroom mirror cabinet.

You can also make existing light switches smart if you like the ones you already have and can’t exchange them.

Another use-case is to easily connect buttons or sensors into your network, for example door bells or door sensors.

The Shelly 1L is special in that this specific model can be installed when all you have is a live wire (i.e. wiring for a light switch).

One potential issue is that depending on the configuration and connected device’s power usage, the Shelly might emit a slight hum noise. So, don’t install one right next to your bed.

Another limitation is that while the Shelly does work with both, light switches (changes state) and light buttons (generates an impulse), it can only distinguish between short and long press events when you use a light button. Newer light switches from Feller can be re-configured to function as a button, but if your model is too old you might need to replace a light switch with a button.

One weird issue I ran into was that after installing a new bathroom mirror cabinet, the relay of the connected Shelly 1L would no longer function correctly — the light just remained on, even when turning it off via the Shelly. I read on the Shelly forum that this could be caused by running the Shelly upside-down, and indeed, after turning it around, it started to work again!

Smart Heating

Smart Heating systems are often advertised to save cost. I wanted to try it out, and was also interested in the temperature logging because my apartment is on the more humid side and I wanted some data to optimize the situation.

HomeMatic 😐

I bought some HomeMatic temperature sensors and heating valve drives back in 2017. The hardware feels solid and was easy enough to install.

One massive downside of the system was the poor software quality of their Central Control Unit (CCU2). The web interface was super slow, looked very dated, and the whole thing kept running out of memory every 2 weeks or so. It was so bad that I re-implemented my own CCU in Go. I hear that by now, they have a new and better Control Unit version, though.

So far, one valve drive has failed with error code F1; I replaced it with a new one.

Turns out smart control of our heating does not seem to make any measurable difference. The rooms feel the same as before. No money is saved because the utility bill is divided equally among all tenants across the building (which seems to be standard in Switzerland), not billed for individual usage.

So, overall, I would not install smart heating valve drives again. The temperature sensors I still keep an eye on from time to time, but there are cheaper options if you only need temperature!

Smart Lock

Nuki 👍

During the pandemic, I was receiving packages at home and hence I was relying on my door bell much more than usual. Hence, I was looking for a way to make it smarter!

The first device I got was the Nuki Opener, a smart intercom system. It allows you to get notifications on your phone when the doorbell is rung, and to unlock the door from your phone.

I got this device because it was specifically marketed as compatible with the BTicino intercom system our house uses. Unfortunately, this turned out to be incorrect, so I ended up building a hardware-modified intercom unit that is connected to the Nuki Opener in analogue mode.

Once it actually works, it’s a convenient system, and having your doorbell generate desktop notifications with sound is just super useful when wearing headphones! Strongly recommended.

As you can see on the pictures, I’m powering the Nuki Opener via USB. It normally runs on batteries, but I want to minimize battery usage and swapping. A built-in rechargeable battery like in the Shelly devices would be a neat improvement to the Nuki Opener, so that the device could still work during power outages!

After I had the Nuki Opener, I also added a Nuki Smart Lock so that we can not only open the house front door, but also the apartment door itself in case one of us forgets their key.

The Nuki Smart Lock was easy to install and works great. It also shows with an elegant LED ring whether the door is currently locked or not, which I find handy.

Motion Sensors

Not having to turn on lights myself is something I find convenient, in particular in the kitchen, but also in the bathroom. When carrying plates or glasses into the kitchen, it’s nice to have the lights turn on while my hands are full.

Moved away from: Feller Motion Sensors 😐

First I tried Feller’s Motion Sensors, because they physically fit well into the existing Feller light switch installation:

But, their limitations made me move away from them quickly: while you can change one or two basic settings, you cannot, for example, disable the motion sensor after a certain time of day, or manually disable it for a certain time period.

Also, because the device is installed in a fixed position (determined by where your light switch is), it isn’t necessarily in the best place to spot all the motion you want to detect.

Shelly Motion 👍

The Shelly Motion Sensor seems like a good motion sensor to me! It has a number of useful settings and can easily trigger any REST API endpoint or can be used via MQTT.

Like with the Shelly Button, this device has a built-in rechargeable battery that can be charged via USB. Depending on the location of the sensor, you can either attach a USB powerbank once a year, or remove the sensor from its fixture and charge it elsewhere.

The positioning of the Shelly Motion can either be easy (as it was in my kitchen) or tricky to get right (in my bathroom). I don’t know if other motion sensors are better in terms of range.

One thing to note is that the Shelly Motion only reports state changes (motion start or motion end), and no continuous events while motion is detected.

For my kitchen, my regelwerk code directly translates motion on/off into light on/off commands (to Philips Hue and Shelly 1L), with the exception that a long-press turns off all motion control for the next 10 minutes. The granularity of the Shelly Motion is to report after no motion for 1 minute, which works well for me for the kitchen.

For my bathroom, I don’t want the lights to immediately turn off when no motion is detected anymore, to err on the side of not turning off the light while people are still using the bathroom and are just not seen by the motion sensor. To implement that, I found that using the Shelly 1L’s timer functionality works best. So, in my configuration, motion on means lights on, and motion off means lights on for 10 minutes, then off. Turning off the light manually disables that logic.

Note that the Shelly Motion should really be mounted in the orientation recommended by the manual. When the motion sensor lays on the side (or is upside down), detection is much poorer.

Smart Power Plug

A smart plug is an easy way to turn off a power-hungry device while you’re away, to make a lamp smart, or to power on a connected device like a kettle to boil water for making a tea.

My current use-cases are saving power for the stereo sound system connected to my PC, and saving power by powering up the devices in my gokrazy Continuous Integration test environment on-demand only.

While there are tons of vendors selling smart plugs, the selection narrows considerably when you look for one with a Swiss power plug.

HomeMatic 👎

The HomeMatic smart plug is expensive (55 CHF) and super bulky! As you can see, even if you connect it at the very end of a power strip, it still blocks the adjacent connector.

Worse: the way it’s built (bulky side pointing away from the earth pin), I can’t even insert it into 2 of the 3 power strips you see on the picture.

Somehow, even though it’s so bulky, the device feels flimsy at the same time. I’m never 100% sure if the plug is inserted fully and correctly, and it’s easy to accidentally turn off power when bumping against the smart plug with your foot.

Because it’s a HomeMatic device, you need a working Central Control Unit (CCU) to control it programmatically. Conceptually, I prefer smart plugs that can be used with a REST or MQTT API.

The only upside of this smart plug is that it can measure power. I occasionally use it for that.

Sonoff 😐

The Sonoff S26 are much cheaper (≈12 USD when I bought mine) and come in a Swiss plug variant. Contrary to the HomeMatic ones, the Sonoff smart plugs are built “the right way around”, meaning I can plug them into many Swiss power strips. Unfortunately, they also block adjacent connectors, but at least not as many as the HomeMatic.

The Open Source firmware Tasmota supports the Sonoff S26, but flashing them is a painful experience. You can’t do it over the air; you need to access rather small serial console pins inside the device.

Once you have them flashed with Tasmota, the devices work great.

One feature they lack is power measurement.

I would love to find a smart plug with a Swiss plug, that supports power measurement, and that is compatible with Tasmota (or builtin MQTT support), but until that product comes along, the Sonoff S26 are what I’m going to use.

Architecture as of March 2022

Here is an architecture diagram of the devices I’m currently using:

To tie these different systems together, I use a Raspberry Pi running gokrazy, which in turn runs my regelwerk program. regelwerk only talks to MQTT, so all the different devices are connected to MQTT using small adapter programs such as my hue2mqtt or shelly2mqtt.

A more off-the-shelf solution would be to use Node-RED, if you want to do a little programming, or Home Assistant if you want to do barely any programming.

My strategy for selecting components

I don’t look for one vendor or one system that has components for everything. Instead, I chose the leading vendor in each domain. Compatibility between systems is generally poor, so I try to keep my compatibility requirements to a minimum.

To programmatically interact with the devices, the best bet are devices that are designed to be developer-friendly (e.g. Shelly devices support MQTT) or at least have an official API with modules in my favorite programming language (e.g. Philips Hue). In terms of API, I expect to talk to a gateway device in my local network — I tried talking e.g. Zigbee directly but found it inconvenient due to poor software support, sparse documentation and strange compatibility issues.

Direct device-to-device communication is nice from a reliability perspective, but on some battery-powered systems you pay for it with reduced battery runtime. For example, when using multiple light switches for the same room with IKEA TRÅDFRI, you pair one to the other, which also makes all signals go through it.

If possible, I select devices that have an open firmware available. Ideally, I can keep using the vendor’s firmware, but if the vendor unexpectedly goes out of business, it’s handy to have an alternative firmware available. Also, if the devices require a cloud service to function, using open firmware typically allows using them in your local network.

I have come to avoid WiFi where latency is important, e.g. between light switches and lights.

I stopped looking at the price too much and instead look at the user experience. Smart home is about comfort and convenience, and if a product doesn’t delight in daily usage, why bother with it? Targeting the high end of mid-range devices seems like the sweet spot to me. Avoid anything more expensive than that, though — established players often re-brand third-party solutions and you only pay for the company name, not quality.

at 2022-03-19 13:51

2022-02-20

michael-herbst.com

RWTH Julia workshop 2022

Last Thursday and Friday (17/18 February) I taught an introductory course to the Julia programming language. The course took place in virtual format and to my great surprise around 90 people from all over the world ended up joining. Luckily I had a small support team consisting of Gaspard Kemlin and Lambert Theissen (thanks!) who took care of some of the organisational aspects in running the zoom session. Overall it was a lot of fun to spread the word about the Julia programming language with so many curious listeners with interested and supporting questions.

Thanks to everyone who tuned in and thanks to everyone who gave constructive feedback at the end. I'm very much encouraged by the fact that all of you, unanimously, would recommend the workshop to your peers. In that sense: Please go spread the word as I'm already looking forward to the next occasion I'll have to teach about Julia!

by Michael F. Herbst at 2022-02-20 11:00 under Teaching, Julia, workshop, programming and scripting

2022-01-25

michael-herbst.com

GdR nbody general meeting

About two weeks ago, from 10 till 13 Jan 2022 I was at the annual meeting of the French research group on many-body phaenomena, the GDR nbody. Originally scheduled to take place in person in Toulouse the Corona-related developments unfortunately caused the organisers to switch to a virtual event on short notice. Albeit I would have loved to return to Toulouse and see everyone in person, it was still an opportunity to catch up. In my talk at the occasion I presented on the {filename}/articles/Publications/2021-adaptive-damping.md, which Antoine Levitt and myself recently developed, see the submitted article on arxiv.

Link
A robust and efficient line search for self-consistent field iterations (Slides)

by Michael F. Herbst at 2022-01-25 11:00 under Research, talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, DFTK, solid state

2022-01-15

sECuREs website

My 2022 high-end Linux PC 🐧

I finally managed to get my hands on some DDR5 RAM to complete my Intel i9-12900 high-end PC build! This article contains the exact component list if you’re interested in doing a similar build.

Usually, I try to stay on the latest Intel CPU generation when possible. But I decided to skip the i9-10900 (Comet Lake) and i9-11900 (Rocket Lake) series entirely, largely because they were still stuck on Intel’s 14nm manufacturing process and didn’t seem to offer much improvement.

The new i9-12900 (Alder Lake) delivered good benchmark results and is manufactured with the much newer Intel 7 process, so I was curious: would an upgrade be worth it?

Components

Price Type Article
196 CHF Case Fractal Define 7 Solid (Midi Tower)
89 CHF Power Supply Corsair RM750x 2018 (750 W)
293 CHF Mainboard ASUS PRIME Z690-A (LGA1700, ATX)
646 CHF CPU Intel Core i9-12900K
113 CHF CPU fan Noctua NH-U12A
30 CHF Case fan Noctua NF-A14 PWM (140 m)
770 CHF RAM Corsair Vengeance CMK32GX5M2A4800C40 (64 GB)
408 CHF Disk WD Black SN850 (2 TB)
605 CHF GPU GeForce RTX 2070
65 EUR Network Mellanox ConnectX-3 (10 Gbit/s)

Fan compatibility

The Noctua NH-U12A CPU fan required an adapter (“Noctua NM-i17xx-MP78 SecuFirm2 mounting kit”) to be compatible with the Intel LGA1700 socket. I requested the adapter on Noctua’s Website on November 5th, and it arrived November 26th.

Fractal Define 7 case

Anytime you need to access a PC’s components, you’ll deal with its case. Especially for a self-built PC, the case you chose determines how easy it is to assemble and later modify your PC.

Over the years, I have come to value the following aspects of a PC case:

  1. No extra effort should be required for the case to be as quiet as possible.
  2. The case should not have any sharp corners (no danger of injury!).
  3. The case should provide just enough space for easy access to your components.
  4. The more support the case has to encourage clean cable routing, the better.
  5. USB3 front panel headers should be included.

I have been using Fractal cases for the past few years and came to generally prefer them over other brands because of their good build quality.

Hence I’m happy to report that the Fractal Define 7 (their latest generation at the time of writing) ticks all of the above boxes!

The case and power supply work well together in terms of cable management. It was a joy to route the cables.

It’s very easy to open the case doors (they clip in place), or remove the front panel. This is definitely the best PC case I have seen so far in terms of quick and easy access.

Here’s how clean the inside looks. Most cables are routed with very short ways to the back, where the case offers plenty of convenient cable guides:

You might also find this YouTube video review of the Fractal Define 7 interesting:

Slow boot

When I first powered everything on, I waited for a while, but never saw any picture on my monitor. The PC eventually rebooted, multiple times in a row. I took that as a bad sign and turned it off to prevent further damage.

Turns out I should have just waited until it would eventually start up!

It took multiple minutes for the machine to eventually start. I’m not 100% sure what the cause is for that, but I heard in a Linus Tech Tips YouTube video that DDR5 requires time-consuming memory testing when powering up with a fresh memory configuration, so that seems plausible.

In any case, my advice is: be patient when waiting for this machine to start up.

DDR5 availability as of Late 2021

I originally ordered all components on November 5th 2021. It took a while for the mainboard to become available, but almost everything shipped on November 15th — except for the DDR5 RAM.

Until Late December, I was not able to find any available DDR5 RAM in Switzerland.

The shortage is so pronounced that some YouTubers recommend going with DDR4 mainboards for now, which manufacturers are scrambling to introduce in their lineups. I did really want to squeeze out the last few extra percent in memory-intensive workloads, so I decided to wait.

Copying the data

Where possible, I like only changing one thing at a time. In this case, I wanted to change the hardware, but keep using my Linux installation as-is.

To copy my Linux installation over, I plugged my old M.2 SSD into the new machine, and then started a live Linux environment, so that neither my old nor my new SSD were in use. My preferred live Linux is grml (current version: 2021.07), which I copied to a USB memory stick and booted the machine from it.

In the grml live Linux environment, I copied the full M.2 SSD contents from old to new:

grml# dd \
  if=/dev/disk/by-id/nvme-Force_MP600_<TAB> \
  of=/dev/disk/by-id/nvme-WD_BLACK_SN850_2TB_<TAB> \
  bs=5M \
  status=progress

For some reason, the transfer was super slow. Last time I transferred the contents of a Samsung 960 Pro to a Samsung 970 Pro, it took only 16 minutes. But this time, copying the Force MP600 to a WD Black SN850 took many hours!

Once the data was transferred, I unplugged the old M.2 SSD and booted the system.

The hostname remains the same, and the network addresses are tied to the MAC address of the network card that I moved to the new machine. So, I didn’t have to adjust anything in the new machine and could just boot into my usual environment.

UEFI settings: enable XMP for 4800 MHz RAM

By default, the memory uses 4000 MHz instead of the 4800 MHz advertised on the box.

I figured it should be safe to try out the XMP option because it is shown as part of ASUS’s “EZ Mode” welcome page in the UEFI setup.

So far, I have not noticed any issues when running the system with XMP enabled.

Update February 2022: I have experienced weird crashes that seem to have gone away after disabling XMP. I’ll leave it disabled for now.

UEFI settings: fan speed

The Fractal Define case comes with a built-in fan controller.

I recommend not using the Fractal fan controller, as you can’t control it from Linux!

Instead, I have plugged my fans into the mainboard directly.

In the UEFI setup, I have configured all fan speeds to use the “silent” profile.

ASUS PRIME Z690-A: sensors and fan control

With Linux 5.15.11, some fan speeds and temperature are displayed, but oddly enough it only shows 2 out of the 3 fans I have connected:

% sudo sensors
nct6798-isa-0290
Adapter: ISA adapter
[…]
fan1:                        0 RPM  (min =    0 RPM)
fan2:                      944 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                      625 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +35.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +40.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  -128.0°C    sensor = thermistor
AUXTIN1:                   +24.0°C    sensor = thermistor
AUXTIN2:                   +28.0°C    sensor = thermistor
AUXTIN3:                   +31.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +40.0°C
[…]

Unfortunately, writing to the /sys/class/hwmon/hwmon2/pwm2 file does not seem to change its value, so I don’t think one can control the fans via PWM from Linux (yet?).

I have set all fans to silent in the UEFI setup, which is sufficient to not notice any noise.

Performance comparison: i9-9900K vs. i9-12900K

After cloning my old disk to the new disk, I took the opportunity to run a few time-intensive tasks from my day-to-day that I could remember.

On both machines, I configured the CPU governor to performance for stable results.

Keep in mind that I’m comparing two unique PC builds as they are (not under controlled and fair conditions), so the results might not necessarily be representative. For example, it seems like the SSD performance in the old machine was heavily degraded due to a incorrect TRIM configuration.

name old new
build Go 1.18beta1 (src/make.bash) ≈45s ≈29s
gokrazy/rsync tests ≈8s ≈5s
gokrazy UEFI test ≈9s ≈8s
distri cryptimage (cold cache) ≈143s ≈18s
gokrazy Linux compilation 215s 109s

As we can see, in all of my tests, the new PC achieves measurably better times! 🎉

Conclusion

Not only in the benchmarks above, but also subjectively, the new machine feels fast!

Already in the first few days of usage, I notice how time-consuming tasks such as tracking down a Linux kernel issue (requires multiple Linux kernel builds), are a little less terrible thanks to the faster machine :)

The Fractal Define 7 case is great and will likely serve as a good base for upgrades over the next couple of years, just like its predecessor (but perhaps even longer).

As far as I can tell, the machine works well and is compatible with Linux.

at 2022-01-15 15:00

2022-01-04

RaumZeitLabor

Remote Season Kickoff von "RZL Käfertal, 68309", Staffel 2022

Liebe Fans der Erfolgsshow “RZL Käfertal, 68309”, ein frohes Neues!

Kaum zu glauben, dass die letzte Season schon wieder rum ist und die nächste in den Startlöchern steht.

Da die vergangene Staffel gefühlt viel zu kurz und aus diversen Gründen nur bedingt ereignisreich war, bleibt zu hoffen, dass die Reihe in 2022 wieder volle Fahrt aufnimmt und zurück zu ihrer alten Form findet.

Aus diesem Grund findet am Samstag, den 8. Januar 2022, die Remote Season Kickoff statt, bei der wir uns ab 16 Uhr im Jitsi treffen und zusammen über die kommenden Handlungsstränge reden wollen.

Was nehmt ihr aus der letzten Runde von 2021 mit – welche Plot Twists waren gut, welche nicht? Habt ihr Ideen, was sich in den nächsten (Online-)Event-Episoden abspielen könnte? Mit dem neuen Video-Equipment steht ja auch den Vortragsfolgen nichts mehr im Wege… Wird es ein Wiedersehen mit den beliebten Protagonisten Agenda Aktion oder der Türöffner-Maus geben, oder werden ganz und gar neue eingeführt? Wie geht es mit dem letztjährigen Cliffhanger rund um die Werkstatt weiter?

Ich freue mich auf einen Austausch wie wir zusammen die Story in der Jahresstaffel 2022 beeinflussen werden.

RZL Käfertal, 68309

by flederrattie at 2022-01-04 00:00

2021-12-23

michael-herbst.com

Outlook to 2022

A quick teaser to some workshops I will organise next year.

  • 17/18 Feb 2022: Introduction to the Julia programming language (virtual).
    In two half-day sessions I will provide a concise overview of the Julia programming language and offer to get some hands-on practice. The selection of exercises and small projects makes the course particularly well-suited for interdisciplinary researchers in the computational sciences, but is free and open to everyone. Course website. Registration link.

  • 20-24 Jun 2022: CECAM workshop: Error control in first-principles modelling (Lausanne, Switzerland).
    In this workshop, which I organise jointly with Gábor Csányi, Geneviève Dusson, Youssef Marzouk, we plan to bring together mathematicians and simulation scientists to discuss error control and error estimation in first-principles simulations, an aspect which to date has seen too little attention in our opinion. We want to bring together experts on numerical analysis and uncertainty quantification on the one hand and researchers working on electronic-structure and molecular-dynamics methods on the other to identify promising directions of research to make progress in this topic. Website and registration.

  • 29-31 Aug 2022: DFTK school: Numerical methods for density-functional-theory simulations (Paris, France).
    Antoine Levitt and Eric Cancès and myself will organise an interdisciplinary summer school next year, centred around our joint work on DFTK and numerical developments in density-functional theory (DFT). With the school we want to bridge the divide between simulation practice and fundamental research in electronic-structure methods: It is is intended both for researchers with a background in mathematics and computer science interested to learn the numerics of DFT and physicists or chemists interested in modern software development methodologies and the mathematical background of DFT. Course website. Registration link.

by Michael F. Herbst at 2021-12-23 11:00 under Research, Julia, DFTK, workshop, programming and scripting, error estimates, uncertainty quantification, DFT, solid state

2021-12-13

michael-herbst.com

GdR REST Discussion meeting on Machine Learning

Last week on 9th and 10th December 2021 I participated in the Discussion meeting on Machine Learning been organised by the French research group REST, which is centred around theoretical spectroscopy in solids and molecules. While most participants joined remotely I was fortunately able to travel to École Polytechnique in Palaiseau (near Paris). This gave me the opportunity to interact with some of the speakers and local organisers. Since to date I have not yet taken a detailed look at applying machine learning in chemistry and materials science I took the chance to discuss with both practitioners as well as the other on-site speakers during the breaks and the social dinner. Overall this meeting has been extremely helpful and I feel I managed to get a good impression of the challenges and current research in this exciting topic. I am very grateful to the organisers Francesco Sottile and Jack Wetherell for the invitation and I already look forward to my next interaction with the GdR REST.

In my talk I gave an introduction to algorithmic differentiation (AD) approaches and their application in DFTK as well as density-functional theory simulations in general. I motivated our work both from data-driven approaches for the design of novel DFT functionals as well as the computation of properties, sensitivities and uncertainties. Summarised in one sentence the key advantage of getting a code algorithmically differentiable (AD-able) is to be able to automatically compute derivatives of arbitrary output quantities (band gaps, forces, ...) with respect to arbitrary input quantities (pseudo parameters, XC parameters, positions, temperature, ...) within an acceptable computational cost and without the need to code analytical gradients.

AD approaches are not new in the electronic-structure context. However, the successful existing AD-able codes are either centred around simplified settings (e.g. 1D systems) or Gaussian basis sets (thus primarily molecular systems). In contrast our focus in DFTK are solid-state systems. In particular for cases with vanishing band gaps (e.g. metals) this setting is more involved and one needs to be overall a bit more careful in the implementation. Another distinction from previous efforts is that our implementation in DFTK has not been written from scratch just for AD. Effectively the ability to make DFTK AD-able with relatively little effort is a side effect from our flexible design as well as our seamless integration with the composable Julia package ecosystem. To emphasise this let me mention that the largest part of the work I presented upon has been achieved in only 12 weeks by our excellent Google Summer of Code student Niklas Schmitz (Thanks very much Niklas!).

To give a practical demonstration I showed how to use forward-mode algorithmic differentiation to (a) compute polarisabilities, (b) the variation of the dipole moment with respect to changing parameters in the exchange functional and (c) a work-in-progress example using adjoint-mode differentiation. As usual my slides are attached below.

Link
DFTK: An algorithmically differentiable density-functional theory framework (Slides)
Recording on Youtube
Forward-mode algorithmic differentiation example

by Michael F. Herbst at 2021-12-13 11:00 under Research, talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, DFTK, solid state, algorithmic differentiation

2021-12-05

sECuREs website

Fixing the Logitech MX Ergo Trackball mouse buttons

The mouse I use daily for many hours is Logitech’s MX Ergo trackball and I generally consider it the best trackball one can currently buy.

Unfortunately, after only a year or two of usage, the trackball’s mouse buttons no longer function correctly. When clicking and dragging, they won’t hold down the selection reliably.

The mouse buttons first broke in my private trackball, and later also the ones in my work one!

After just buying a new one when the mouse buttons broke the first time, I figured this time I wanted to try and fix the trackball myself.

Logitech MX Ergo and Kailh replacement switches

Video recording

In this 27 minute video, you can look over my shoulder as I swap out the worn-out Omron mouse buttons with Kailh replacement mouse buttons:

The basic steps are:

  1. Unscrew the outside Torx screws.
  2. Unscrew the inside Philips screws.
  3. Remove the PCB from the case and fix it securely for desoldering.
  4. Desolder the switch: heat up all 3 pads as simultaneously as possible (add more solder → more flux!), then gently push down on the pins to make the switch fall out.
  5. Cleanly remove all remaining solder, then insert the replacement switch, double-check you aligned it will on the PCB, and solder it.
  6. Put everything back together.

Replacement switches: Kailh GM 8.0

The replacement mouse buttons I’m using are Kailh GM 8.0 from the Kailh Official Store on AliExpress, which are advertised as “ultra high life”. Even if their life span is also only a few years, I bought enough of them to probably replace them another 2 to 3 times per trackball.

The Kailh mouse buttons behave very similarly to the original Omron mouse buttons. The click is very satisfying now, and reminds me of a brand-new Logitech MX Ergo trackball. I wouldn’t call the Kailh ones better than the Omron ones, but maybe others notice a difference?

One interesting side note: I noticed that when wearing noise canceling headphones, it was very hard to tell the worn-out Omron mouse buttons from the Kailh mouse buttons. The difference really is mostly in the sound, not in the feel when pressing the button down!

Why is the MX Ergo so unreliable?

There is a 1-hour video by Alex Kenis saying that Logitech switched from 5V to 3.3V logic voltages, and this violates the minimum electrical condition for the Omron D2FC-F, which causes oxidation.

Indeed, when I merely opened the switches and cleaned them up with a screw driver, this seemed to help. But, opening everything up is so fiddly that one might as well solder in new switches altogether :)

at 2021-12-05 12:23

2021-11-28

sECuREs website

MacBook Air M1: the best laptop?

You most likely have heard that Apple switched from Intel CPUs to their own, ARM-based CPUs.

Various early reviews touted the new MacBooks, among the first devices with the ARM-based M1 CPU, as the best computer ever. This got me curious: after years of not using any Macs, would an M1 Mac blow my mind?

In this article, I share my thoughts about the MacBook Air M1, after a year of occasional usage.

MacBook Air M1

Energy efficiency

The M1 CPU is remarkably energy-efficient. This has two notable effects:

  1. The device does not have a fan, and stays absolutely quiet. This is pretty magical, and I now notice my ThinkPad’s fan immediately.
  2. The battery lasts many hours, even with demanding use-cases like video conferencing.

When it comes to energy efficiency, Apple sets the bar. All other laptops should be fanless, too! And the battery life really is incredible: taking notes in Google Docs (via WiFi) while at a conference for many hours left me with well over 80% of battery at the end of the day!

I briefly lent the computer to someone and got it back with a VPN client installed. The battery life was considerably shortened by that VPN client and recovered once I uninstalled it. So if you’re not seeing great battery life, maybe a single program is ruining your experience.

The fast wakeup feature that was heavily stressed during the initial introduction (to some ridicule) is actually pretty nice! I now notice having to wait for my ThinkPad to wake up.

Battery life during standby is great, too. Anecdotally, when leaving my ThinkPad lying around, it never survives until I plug it in again. The MacBook survives every single time.

Chipset advantage?

Now, given that Apple controls the entire machine, does that mean they now offer features that other computers cannot offer yet?

My personal bar for this question is whether a computer can be used with my bandwidth-hungry 8K monitor, and the disappointing news is that the MacBook Air M1 cannot drive the 8K monitor with its 7680x4320 pixels resolution (at 60 Hz, using 2 DisplayPort links), not even with an external USB-C dock.

Maybe future hardware generations add support for 8K displays, but for my day-to-day, Apple’s complete control doesn’t improve anything.

Built-in peripherals

The screen is great! Everything looks sharp, colors are vibrant and brightness is good.

As usual, the touchpad (which Apple calls “trackpad”) is great, much better than any touchpad I have ever used on a PC laptop. Apple trackpads have always had this advantage since I know them, and I don’t know why PC touchpads don’t seem to get any better? 🤔

Apple brought back their scissor mechanism keyboards, which is a very welcome change. I have witnessed so so many problems with the old butterfly mechanism keyboards.

This first MacBook Air M1 model has no MagSafe. Apple added MagSafe in the MacBook Pro M1 in late 2021. I hope they’ll eventually expand MagSafe to all notebooks.

Peripherals: not enough ports

Staying in peripheral-land, let me first state that this MacBook’s 2 USB-C ports are not enough!

When working on the go, after plugging in power, I can plug in a wired ethernet adapter (wireless can be spotty), but then won’t have any ports left for my ergonomic keyboard and mouse.

For video conferencing, I can plug in power (to ensure I won’t run out of battery), connect a table microphone, but won’t have any ports left for a decent webcam. This is particularly annoying because this MacBook’s built-in webcam is really bad, and the main reason why reviewers don’t give the MacBook a perfect score (example review on YouTube).

So, in practice, you need to carry a USB-C dock, or at least a USB hub, with your laptop when you anticipate possibly needing any peripherals. #donglelife

Not enough RAM for local software development

Hardware-wise, the biggest pain point for software developers is the small amount of RAM: both the MacBook Air M1 and the MacBook Pro M1 (13") can be configured with up to 16 GB of RAM. Only the newer MacBook Pro M1 14" or 16" (introduced late 2021) support more RAM.

To be clear, 16 GB RAM is enough to do software development in general, but it can quickly become limiting when you deal with larger programs or data sets.

In my ThinkPad, I have 64 GB of RAM, which allows for a lot more VMs, large index data structures, or just plenty of page cache. With the ThinkPad, I don’t have to worry about RAM.

Of course, there are strategies around this. Maybe your projects are large enough to warrant maintaining a remote build cluster, and you can run your test jobs in a staging environment. The MacBook makes for a fine thin client — provided your internet connection is fast and stable.

Operating System: macOS

I am talking about Operating Systems at a very high level in this section. Many use-cases will work fine, regardless of the Operating System one uses. I can typically get by with a browser and a terminal program.

So, this section isn’t a nuanced or fair review or critique of macOS or anything like that, just a collection of a few random things I found notable while playing with this device :)

My favorite way to install macOS is Internet Recovery. You can install a blank disk in your Mac and start the macOS installer via the internet! The Mac will even remember your WiFi password. The closest thing I know in the PC world is netboot.xyz, and that needs to be installed in your local network first.

Similarly, Apple’s integration when using multiple devices seems pretty good. For example, the Mac will offer to switch to your iPhone’s mobile connection when it loses network connectivity.

But, just like in all other operating systems, there is plenty in macOS to improve.

For example, software updates on the Mac still take 30 minutes (!) or so, which is entirely unacceptable for such a fast device! In particular, Apple seems to be (partially?) using immutable file system snapshots to distribute their software, so I don’t know why distri can install and update so much faster.

Speaking of Operating System shortcomings, I have observed how APFS (the Apple File System) can get into a state in which it cannot be repaired, which I found pretty concerning! Automated and frequent backups of all on-device data is definitely a must.

Slow software updates are annoying, and having little confidence in the file system makes me uneasy, but what’s really a dealbreaker is that my preferred keyboard layout does not work well on macOS: see Appendix A: NEO keyboard layout.

Linux? 🐧

So given my preference for Linux, could I just use Linux instead?

Unfortunately, while Asahi Linux is making great progress in bringing Linux to the M1 Macs, it seems like it’ll still be many months before I can install a Linux distribution and expect it to just work on the M1 Mac.

Until then, check out the Asahi Linux Progress Report blog posts!

Intel to M1 architecture transition

Apple developed the Rosetta 2 dynamic binary translator which transparently handles non-M1 programs, and so far it seems to work fine! All the things I tried just worked, and architecture never seemed to play a role during my usage.

Conclusion

The MacBook Air M1 is indeed impressive! It’s light, silent, fast and the battery life is amazing. If these points are the most important to you in a laptop, and you’re already in the Mac ecosystem, I imagine you’ll be very happy with this laptop.

But is the M1 really so mind-blowing that you should switch to it no matter what? No. As a long-time Linux user who is primarily developing software, I prefer my ThinkPad X1 Extreme with its plentiful peripheral connections and lots of RAM.

I know it’s not an entirely fair comparison: I should probably compare the ThinkPad to the newer MacBook Pro models (not MacBook Air). But I’m not a professional laptop reviewer, I can only speak about these 2 laptops that I found interesting enough to personally try.

Appendix A: NEO keyboard layout

The macOS implementation of the NEO keyboard layout has a number of significant incompatibilities/limitations: its layer 3 does not work correctly. Layer 3 contains many important common characters, such as / (Mod3 + i, i.e. Caps Lock + i) or ? (Mod3 + s).

I installed the current neo.keylayout file (2019-08-16) as described on the NEO download page.

In order to make / and ? work in Google Docs, I had to enable the additional Karabiner rule “Prevent all layer 3 keys from being treated as option key shortcut” (see also: this GitHub issue)


I encountered the following issues, ordered by severity:

Issue 1: I cannot use Emacs at all! I installed the emacsformacosx.com version (also tried homebrew), but cannot enter keys such as / or ?. Emacs interprets these as M-u instead.

The Karabiner rule “Prevent all layer 3 keys from being treated as option key shortcut” that fixed this issue in Google Docs does not help for Emacs. Removing it from Karabiner changes behavior, but Emacs still recognizes M-i instead of /, so it’s broken with or without the rule.

Issue 2: In the Terminal app, I cannot enable the “Use Option as Meta key” keyboard option, otherwise all layer 3 keys function as meta shortcuts (M-i) instead of key symbols (/).

I commonly use the Meta key to jump around word-wise: Alt+b / Alt+f on a PC. Since I can’t use Option + b / Option + f on a Mac, I need to use Option + arrow keys instead, which works.

Since the Option key does not work as Meta key, I need to press (and release!) the Escape key instead. This is pretty inconvenient in Emacs in a terminal.

Issue 3: In Gmail in Chrome, the search keyboard shortcut (/) is not recognized.

I reported this problem upstream, but there seems to be no solution.


I’m not sure why these programs don’t work well with NEO. I tried BBEdit for comparison, and it had no trouble with (macOS-level) shortcuts such as command + / and option + command + /.

On Linux, the NEO layout works so much better. I’m really not in the mood to continuously fight with my operating system over keyboard input and shortcuts.

at 2021-11-28 15:50

2021-11-20

RaumZeitLabor

Habemus Hygienekonzept

Knapp ein Jahr nach unserem Einzug, haben wir uns entschieden, das neue RZL für Besuche zu öffnen.

Leider nicht mir einer berauschenden Einweihungsfeier, aber die holen wir nach sobald es geht. Versprochen!

Unser aktuelles Hygienekonzept, das wir gegebenenfalls anpassen werden, findet ihr hier.

Kurzgesagt: 2G, Nachweise erforderlich, Check-In via CWA, Selbsttest vor Besuchen und Maske tragen wird empfohlen.

Bitte kommt nicht, wenn ihr euch krank fühlt oder von einem möglichen Kontakt mit einer covid-positiven Person wisst! Schützt euch und uns!

Bei Veranstaltungen können weitere Regelungen getroffen werden – informiert euch am besten direkt vor eurem Besuch auf der Webseite unter Events oder wendet euch bei Unklarheiten an den Vorstand.

Solltet ihr das erste Mal ins RZL kommen wollen, empfiehlt sich wieder der Dienstagabend mit der „Offenen RaumZeitLaborierung“. Um nicht vor verschlossener Tür zu stehen, solltet ihr euch vorab nach Möglichkeit trotzdem kurz anmelden.

by flederrattie at 2021-11-20 00:00

2021-11-03

michael-herbst.com

Surrogate models for quantum spin systems based on reduced order modeling

The simulation of quantum spin models is an actively researched field. Albeit rather basic these many-body systems are inherently strongly correlated and as such feature a rich variety of phaenomena including involved patterns of ordering / discordering, topological order or varieties of phase changes. Furthermore these models often provide a good approximation to the low-temperature regime of real physical systems justifying their detailed study. One approach is to consider parametrised quantum spin models as a low-complexity proxy for real systems and use them to understand which parameter values (e.g. which spin coupling strengths) lead to interesting behaviours. From this one can deduce inversely how novel materials ought to be designed in order to probe and study these behaviours experimentally.

In a recent work my mentor Benjamin Stamm and myself teamed up with Stefan Wessel (RWTH physics department) and Matteo Rizzi (Universität Köln, Forschungszentrum Jülich) to work on cheap surrogate models for accelerating the study of such parametrised quantum spin models. Our key assumption is that the Hamiltonian of these models as well as the deduced quantities of interest (e.g. the structure factor) can be decomposed affinely in the parameters. For many standard models this is indeed the case. Exploiting the affine structure of the Hamiltonian our approach constructs a reduced-basis surrogate, which effectively represents the full problem in a basis of the exact solutions at a carefully chosen set of parameter values. As we demonstrate for two examples (a chain of Rydberg atoms as well as a sheet of coupled triangles) the information in relatively small reduced bases, which are orders of magnitude smaller than the dimensionality of the Hilbert space, sufficient information is accumulated by the reduced basis in order to reproduce key quantities of interest over the full parameter domain to an absolute error of 10⁻⁴ or less.

For me this was the first time working with quantum spin models. Even more so I enjoyed this interdisciplinary collaboration and the associated diving into a new subject in the discussions we had. Along the work on this paper we actually identified a number of possibilities for future work. In fact a number of the problems typically encountered when numerically modelling quantum spin models (e.g. due to highly degenerate ground states or issues with the iterative eigensolvers) are closely related to the challenges for modelling difficult quantum-chemical systems.

The full abstract of our paper reads

We present a methodology to investigate phase-diagrams of quantum models based on the principle of the reduced basis method (RBM). The RBM is built from a few ground-state snapshots, i.e., lowest eigenvectors of the full system Hamiltonian computed at well-chosen points in the parameter space of interest. We put forward a greedy-strategy to assemble such small-dimensional basis, i.e., to select where to spend the numerical effort needed for the snapshots. Once the RBM is assembled, physical observables required for mapping out the phase-diagram (e.g., structure factors) can be computed for any parameter value with a modest computational complexity, considerably lower than the one associated to the underlying Hilbert space dimension. We benchmark the method in two test cases, a chain of excited Rydberg atoms and a geometrically frustrated antiferromagnetic two-dimensional lattice model, and illustrate the accuracy of the approach.· In particular, we find that the ground-state manifold can be approximated to sufficient accuracy with a moderate number of basis functions, which increases very mildly when the number of microscopic constituents grows --- in stark contrast to the exponential growth of the Hilbert space needed to describe each of the few snapshots. A combination of the presented RBM approach with other numerical techniques circumventing even the latter big cost, e.g., Tensor Network methods, is a tantalising outlook of this work.

by Michael F. Herbst at 2021-11-03 23:30 under Publications, reduced basis, quantum spin systems, strong correlation

2021-11-02

michael-herbst.com

Quantum Chemistry Common Driver and Databases (QCDB) and Quantum Chemistry Engine (QCEngine): Automation and Interoperability among Computational Chemistry Programs

As part of my previous work on the adcc code for computational spectroscopy based on the algebraic-diagrammatic construction (ADC), we also integrated the package with QCEngine. This package aims at integrating different quantum-chemistry codes under a common interface for end users, which is an effort I fully support. Recently the design and structure of QCEngine and the related QCDB packages have been summarised in a publication. Its full abstract reads:

Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.e., energy, gradient, and Hessian computations as well as molecular properties such as atomic charges and vibrational frequency analysis. Both standard users and power users benefit from adopting these APIs as they lower the language barrier of input styles and enable a standard layout of variables and data. These designs allow end-to-end interoperable programming of complex computations and provide best practices options by default.

by Michael F. Herbst at 2021-11-02 23:30 under Publications, electronic structure theory, theoretical chemistry, adcc, algebraic-diagrammatic construction

2021-10-14

michael-herbst.com

A robust and efficient line search for self-consistent field iterations

In an ongoing effort with Antoine Levitt our aim is to develop reliable density-functional theory (DFT) methods for computational materials design. Recently we looked into a strategy to automatically select the damping parameter for the self-consistent field iterations (SCF). Our adaptive damping approach is based on a theoretically sound quadratic model for the DFT energy, which is used to fix the step size (damping) adaptively along the search directions suggested by an underlying algorithm (such as Pulay mixing, Kerker mixing, etc.). Our algorithm is fully automatic, i.e. an a priori damping selection is no longer required. In our work we test our method successfully on a range of challenging systems including supercells, transition-metal alloys or metallic surfaces. Overall our study shows adaptive damping to provide superior robustness over the traditional fixed-damping approach.

As I have reported in previous blog articles and we also discussed in our previous publication on black-box mixing strategies for inhomogeneous systems the main motivation of our work is to design numerical methods, which are parameter-free and automatically self-adapt to the simulated material. In modern simulation scenarios where millions of DFT calculations are required in order to generate training data or screen over large design spaces, robustness and automation are the key requirements. Often it is in fact less the computational time of the individual calculations, which limits overall throughput. Much rather it is the human factor, i.e. the human time required to setup, check and verify computations.

Clearly at the level of millions of calculations computational parameters can no longer be selected manually. Instead elaborate heuristics are employed to select basis set size, k-point sampling, SCF algorithm or the damping parameter. In case a calculation fails heuristics are also employed for automatic restart. However, this approach is far from perfect and even an optimistic 1% failure rate easily equals thousands of calculations, which require human attention. With our work (both the previous paper as well as this one) we want to replace heuristic approaches to parameter selection by algorithms that employ a mixture of mathematical and physical insight to automatically adapt to the simulation at hand. As we demonstrate in this work, such algorithms might be associated with an increased effort compared to the best possible parameter setting, however it also makes calculations overall more robust. Therefore one saves (a) on the repeated effort to find a suitable parameter set by trial and error and (b) reduces the fraction of calculations, which need to be considered by a human. Overall the maximally attainable throughput can therefore be expected to increase from such a robust scheme despite the fact that an individual calculation might be more costly.

In this work in particular we considered the question of choosing the damping parameter. For this our adaptive damping approach is based on constructing an approximate quadratic model for the DFT energy and using this model within a line search procedure. Since this procedure is associated with an additional cost, we only employ it in case the proposed SCF step would either increase the DFT energy or SCF residual Notably our approach introduces no changes to the SCF in case each proposed SCF step by the mixing procedure is already perfect (i.e. energy or residual decreasing). Therefore adaptive damping can be considered a safeguared, which only comes into play if the proposed steps are noisy or erroneous. Adaptive damping is by construction orthogonal to any existing mixing and convergence acceleration technique for DFT methods and in our work we demonstrate it to integrate readily into an Anderson-accelerated SCF for various challenging systems. Overall we managed to increase performance and robustness at only a minor extra cost. The full abstract of our paper reads

We propose a novel adaptive damping algorithm for the self-consistent field (SCF) iterations of Kohn-Sham density-functional theory, using a backtracking line search to automatically adjust the damping in each SCF step. This line search is based on a theoretically sound, accurate and inexpensive model for the energy as a function of the damping parameter. In contrast to usual SCF schemes, the resulting algorithm is fully automatic and does not require the user to select a damping. We successfully apply it to a wide range of challenging systems, including elongated supercells, surfaces and transition-metal alloys.

by Michael F. Herbst at 2021-10-14 22:30 under Publications, electronic structure theory, theoretical chemistry, DFTK, Julia, DFT, numerical analysis, Kohn-Sham

2021-08-28

michael-herbst.com

Q-Chem 5 paper

About two years ago I integrated my open-source ctx library into the Q-Chem quantum-chemistry software suite. Quickly ctx became part of the core stack for managing computational results inside Q-Chem. In particular inside the ccman and adcman modules, which are responsible for most of the coupled-cluster and algebraic-diagrammatic construction methods available in Q-Chem, ctx is widely used.

In a recently published paper by all the Q-Chem authors the developments inside the Q-Chem package leading up the major version 5 of the software are now summarised. The full abstract reads

This article summarizes technical advances contained in the fifth major release of the Q-Chem quantum chemistry program package, covering developments since 2015. A comprehensive library of exchange-correlation functionals, along with a suite of correlated many-body methods, continues to be a hallmark of the Q-Chem software. The many-body methods include novel variants of both coupled-cluster and configuration-interaction approaches along with methods based on the algebraic diagrammatic construction and variational reduced density-matrix methods. Methods highlighted in Q-Chem 5 include a suite of tools for modeling core-level spectroscopy, methods for describing metastable resonances, methods for computing vibronic spectra, the nuclear–electronic orbital method, and several different energy decomposition analysis techniques. High-performance capabilities including multithreaded parallelism and support for calculations on graphics processing units are described. Q-Chem boasts a community of well over 100 active academic developers, and the continuing evolution of the software is supported by an "open teamware" model and an increasingly modular design.

by Michael F. Herbst at 2021-08-28 22:30 under Publications, electronic structure theory, theoretical chemistry

2021-08-28

sECuREs website

Silent HP Z440 workstation: replacing noisy fans

Since March 2020, I have been using my work computer at home: an HP Z440 workstation.

When I originally took the machine home, I immediately noticed that it’s quite a bit louder than my other PCs, but only now did I finally decide to investigate what I could do about it.

Finding all the fans

I first identified all fans, both by opening the chassis and looking around, and by looking at the HP Z440 Maintenance and Service Guide, which contains this description:

chassis components

Specifically, I identified the following fans:

  • “1 Fan”, a 92mm rear fan, sucking air out of the back of the chassis.
  • “5 Memory fans”, two 60mm fans in a custom HP plastic enclosure that are positioned directly above the DIMM slots to the left and right of the CPU.
  • “6 CPU Heat sink”, a 92mm fan on top of a heat sink
  • “11 Rear System Fan”, a 92mm front (!) fan, pulling air into the front of the chassis.
  • My aftermarket nVidia GeForce GPU has 3 fans on a massive heat sink.
  • The power supply has a fan, too, which I will not touch.

Memory fans

The Z440 comes with a custom HP plastic enclosure that is put over the CPU cooler, fastened with two clips at opposite ends, and positions two small 60mm fans above the DIMM banks.

This memory fan plastic enclosure is a pain to find anywhere. It looks like HP is no longer producing it.

The enclosure plugs into the mainboard with a custom connector that is directly wired up to the fans, meaning it’s a pain to replace the fans.

memory fans

Luckily, while shopping around for an enclosure I could modify, I realized that memory fans are only required when installing more than 4 DIMM modules!

My machine “only” has 64 GB of RAM, in 4 DIMM modules, and I don’t intend to upgrade anytime soon, so I just unplugged the whole memory fan enclosure and removed it from the chassis.

The UEFI firmware does not complain about the memory fans missing (contrary to the rear fan!), and this simple change alone makes a noticeable difference in noise levels.

GPU fans

nVidia GPUs can be run at different “PowerMizer” performance levels:

nVidia PowerMizer

Many years ago, I ran into lag when using Chrome that went away as soon as I switched my nVidia GPU’s Preferred Mode to “Prefer Maximum Performance” instead of “Auto” or “Adaptive mode”.

It turns out that nowadays, that is no longer a problem, so running at Prefer Maximum Performance is no longer necessary.

Worse, pinning the GPU at the highest Performance Level means that it produces more heat, resulting in the fans having to spin up more often, and run for longer durations.

But, even after switching to Auto, resulting in Adaptive mode being chosen, I noticed that my GPU was stuck at a higher PowerMizer level than I thought it should be.

An easy fix is to limit the GPU to a certain PowerMizer level, and ideally not the lowest level (level 0). For me, one level after that (level 1) seems to result in no slow-down during my typical usage.

I followed this blog post to limit my GPU to PowerMizer level 1, i.e. I added /etc/modprobe.d/nvidia-power-save.conf with the following contents:

options nvidia NVreg_RegistryDwords="OverrideMaxPerf=0x2"

…followed by a rebuild of my initramfs (update-initramfs -u) and a reboot.

This way, the fans don’t typically need to spin up as the GPU stays below its temperature limit.

Rear and front fans

With the memory fans and GPU fans out of the way, two easy to check fans remain: the rear fan and front fan. These are 92mm in size, the model number is Foxconn PVA092G12S.

rear fan

I unplugged both of them to see what effect these fans have on the noise level, and the difference was significant!

Unfortunately, unplugging isn’t enough: the UEFI firmware complains on boot when the rear fan is not connected, requiring you to press Enter to boot. Also, the machine seems to get a few degrees Celsius hotter inside without the front and rear fans, so I don’t want to run the machine without these fans for an extended period of time.

I ordered two Noctua NF-A9x14 PWM fans (for about 25 CHF each) to replace the stock front and rear fans.

Unfortunately, HP uses a custom 4-pin fan connector on its Z440 mainboard! Luckily, modifying the connector of the Noctua Low-Noise Adapter cable to fit on the custom 4-pin connector is as simple as using a knife to remove the connector’s guard rails:

fan connector mod

CPU fan

For the CPU fan, HP again chose to use a custom (6-pin) connector.

On the web, I read that the Z440 CPU fan is quite efficient and not worth replacing. This matches my experience, so I kept the standard Z440 CPU cooler.

Conclusion

I was quite happy to discover that I could just unplug the memory fans, and configure my GPU to make less noise. Together with replacing the front/rear fans with Noctua ones, the machine is much quieter now than before!

One downside of workstation-class hardware is that manufacturers (at least HP) like to build custom parts and solutions. Using their own fan connectors instead of standard connectors is such a pain! I’ll be sure to stick to standard PC hardware :)

at 2021-08-28 13:16

2021-08-04

michael-herbst.com

JuliaCon BoF discussion session: Building a Chemistry and Materials Science Ecosystem

The second event I co-organised at this year's JuliaCon (see this article for the other) was a Birds of Feather (BoF) discussion session titled Building a Chemistry and Materials Science Ecosystem in Julia. In this session Rachel Kurchin and I wanted to gather the various stakeholders working on Julia codes for chemistry and materials simulations and discuss possible overlaps and plan future joint efforts.

This has been the first time a meeting dedicated to this scientific field has been conducted within the Julia community and so we were quite curious about who would turn up. In the end we had a pretty mixed crowd consisting of Julia users tackling research problems in chemistry and materials as well as plenty of maintainers of various Julia packages related to the field, but also some veteran Julia users joined the discussion. This mix of people provoked a rather rich and lively debate about the perspectives of Julia in this respective field and the 90 minutes which were given to us passed almost in an instance.

A central discussion point within the session was the need for joint interfaces shared amongst the key packages of the ecosystem both to leverage Julia's unique composability between the various packages and to furthermore enhance the interoperability and lead to a good user experience. As many have pointed out during the session, a good first step is the design of an interface for representing the structure of the chemical system or the material to be studied. In particular this would allow to deveop unified approches to share data between packages, setup calculations and plainly compare between different approaches. Additionally annoying aspects such as file parsing, data export, plotting or other post-processing could then be easily implemented once using the general interface and used by everyone in the Julia community. Naturally a time slot of 90 minutes is just about sufficient to get the discussion started and scratch the surface, so the session has not yet yielded anything conclusive. However, following up from the conference the debate has definitely intensified amongst participants and I would not be suprised if some progress will be made.

In case you are interested to participate in these developments or plainly want to get in touch with Julia users and developers from chemistry, molecular or materials science, here are a number of relevant resouces:

by Michael F. Herbst at 2021-08-04 10:00 under Research, workshop, electronic structure theory, Julia

2021-08-03

michael-herbst.com

JuliaCon DFTK workshop: A mathematical look at electronic structure theory

From 13th July till 30th July this year's JuliaCon finally took place virtually. The first week (13th till 27th) hosted a number of three-hour live-streamed sessions of workshops, while the "regular" conference with a number of prerecorded talks started on 28th.

After my introductory talk to electronic structure theory and our DFTK code at last year's Juliacon, this year I participated at the conference with two events. One BoF discussion session gathering the people working on materials-science and electronic-structure codes in Julia about which I will write some more in a follow-up blog article.

My second event was a three-hour workshop titled A mathematical look at electronic structure theory in which I prepared a broadly accessible introduction into density-functional theory (DFT), the numerical procedures to solve DFT as well as some tools from numerical analysis to understand the convergence properties of these methods. As the tool to conduct the relevant calculations, code up and study the respective self-consistent field (SCF) algorithms we used our density-functional toolkit (DFTK). The workshop therefore also provides a great showcase for the merits of this code and how it leverages the broader Julia ecosystem to gain its unique features (arbitrary floating-point types, flexible and composable algorithms, automatic differentiation, numerical analysis techniques to investigate convergence failures, etc. ). For more details on the workshop see the dedicated teaching page.

What surprised me very positively during conducting the workshop was the large number of viewers that followed the workshop live and actively engaged by asking questions or posting comments on Youtube. Since the workshop was hosted at Juliacon I wouldn't have thought this topic would capture this many people, so in retrospect I am very happy I did it. In that sense also a big thanks to everyone who participated and provided me with feedback afterwards. (BTW: I'm still happy to take any in case you have some comments or suggesitons).

In case you missed the workshop the complete materials are available on github and the full recording of the workshop is available on Youtube.

by Michael F. Herbst at 2021-08-03 10:00 under Research, workshop, electronic structure theory, Kohn-Sham, high-throughput, DFT, DFTK, solid state, Julia

2021-08-01

michael-herbst.com

Virtual materials design 2021: Black-box density-functional theory methods

On 20th and 21st July 2021 the Virtual Materials Design 2021 CECAM workshop took place virtually. I was excited about this workshop and the opportunity to get in touch with researchers working on high-throughput computational materials design. While I am not actively working in this field the special requirements of the multitude of calculations running in this field clearly have been one of the main motivations for my work on DFTK, error control and black-box SCF algorithms. In advance of the workshop I asked the organisers to participate with a contributed talk to present my work to this community for the first time, which thankfully got accepted.

Due to the virtual format the workshop it was unfortunately rather packed, which allowed for little time to engage in discussion during the presentation slot. However, the organisers arranged multiple longer poster sessions in a GatherTown virtual world, which allowed for almost realistic face-to-face discussions. In these GatherTown sessions I talked with a number of scientists working on high-throughput studies as well as designing the large software infrastructures, which are commonly used to conduct these. At the level of performing millions of individual calculations in a screening study this naturally poses especial demands on the workflow software as well and I was curious to learn about some of the details.

With my focus on advocating a more mathematical look at screening and DFT simulations I represented a minority viewpoint at the meeting and I was very curious about the general feedback and critique of the more applied scientists in response to our recently proposed ideas. In general people were indeed quite interested to learn about our work on reliable SCF methods for inhomogeneous systems, but being confronted with our recent error estimation perspectives, some had doubts about the required effort being really worth it for DFT simulations. I certainly understand that concern. However, I think one should keep in mind the successes and potential, which has been unlocked by error estimation techniques in other fields, such as finite-element modelling or aerospace design. In these fields simulation methods have both become more efficient due to the lessons learned from uncertainty quantification and error estimation and the nowadays well-established error estimation techniques have furthermore contributed to prevent accidents from trusting faulty simulation data (such as the Sleipner A oil rig collapse). While clearly not all aspects of macroscopic modelling apply in the microscopic world, it is not hard to imagine that error bars establishing a guaranteed trustworthiness can make screening decisions more robust, thus potentially preventing costly manufacture of less useful compounds. Furthermore I expect a careful introduction of numerical errors (e.g. by lowering the floating-point type) to balance numerical error against the (usually much larger) DFT model error to allow for notable computational savings when performing on the order of millions of DFT calculations.

Overall I have enjoyed the two afternoons with many discussions in the high-throughput design community. As usual my slides are attached below.

Link
Towards error-controlled, black-box density-functional theory methods (Slides)

by Michael F. Herbst at 2021-08-01 10:00 under Research, talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, DFTK, solid state

2021-07-15

michael-herbst.com

SSD Seminar: Accelerating the discovery of tomorrow's materials by robust and error-controlled simulations

A couple of days ago, on 12th July, I was invited to present my research in the SSD Seminar Series of RWTH Aachen. Being part of the research training group on modern inverse problems as well as the School for Simulation and Data Science (SSD) the SSD seminars are interdisciplinary and feature researchers as well as Master-level students from a couple of departments at RWTH (mathematics, computer science, simulation sciences, ...).

To make my recent work on error estimation and the design of robust algorithms for density-functional theory broadly accessible I started by motivating the need for density-functional theory (DFT) and high-throughput methods for the discovery and design of novel materials. Afterwards I briefly hinted at the mathematical structure of the equations, which need to be solved to obtain DFT properties. With this in mind I presented current research questions at the edge of mathematics and electronic-structure modelling and presented some of my recent results. As usual the slides are attached below.

Link
Accelerating the discovery of tomorrow's materials by robust and error-controlled electronic-structure simulations (Slides)

by Michael F. Herbst at 2021-07-15 10:01 under Research, talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, solid state

2021-07-15

michael-herbst.com

Talk at many-body seminar at RWTH

On 29th June I was invited to present a short summary of my research at the seminar of the research training group Quantum Many-Body Methods at RWTH Aachen University. In the talk I give a overview over my ongoing work about reliable black-box self-consistent field schemes for high-throughput DFT calculations. My slides are attached below.

Link
Reliable black-box self-consistent field schemes for high-throughput DFT calculations (Slides)

by Michael F. Herbst at 2021-07-15 10:00 under Research, talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, solid state

2021-07-10

sECuREs website

25 Gigabit Linux internet router PC build

init7 recently announced that with their FTTH fiber offering Fiber7, they will now sell and connect you with 25 Gbit/s (Fiber7-X2) or 10 Gbit/s (Fiber7-X) fiber optics, if you want more than 1 Gbit/s.

While this offer will only become available at my location late this year (or possibly later due to the supply chain shortage), I already wanted to get the hardware on my end sorted out.

After my previous disappointment with the MikroTik CCR2004, I decided to try a custom PC build.

An alternative to many specialized devices, including routers, is to use a PC with an expansion card. An internet router’s job is to configure a network connection and forward network packets. So, in our case, we’ll build a PC and install some network expansion cards!

router PC build

Goals

For this PC internet router build, I had the following goals, highest priority to lowest priority:

  1. Enough performance to saturate 25 Gbit/s, e.g. with two 10 Gbit/s downloads.
  2. Silent: no loud fan noise.
  3. Power-efficient: low power usage, as little heat as possible.
  4. Low cost (well, for a high-end networking build…).

Network Port Plan

The simplest internet router has 2 network connections: one uplink to the internet, and the local network. You can build a router without extra cards by using a mainboard with 2 network ports.

Because there are no mainboards with SFP28 slots (for 25 Gbit/s SFP28 fiber modules), we need at least 1 network card for our build. You might be able to get by with a dual-port SFP28 network card if you have an SFP28-compatible network switch already, or need just one fast connection.

I want to connect a few fast devices (directly and via fiber) to my router, so I’m using 2 network cards: an SFP28 network card for the uplink, and a quad-port 10G SFP+ network card for the local network (LAN). This leaves us with the following network ports and connections:

Network Card max speed cable effective Connection
Intel XXV710 25 Gbit/s fiber 25 Gbit/s Fiber7-X2 uplink
Intel XXV710 25 Gbit/s DAC 10 Gbit/s workstation
Intel XL710 10 Gbit/s RJ45 1 Gbit/s rest (RJ45 Gigabit)
Intel XL710 10 Gbit/s fiber 10 Gbit/s MikroTik 1
Intel XL710 10 Gbit/s fiber 10 Gbit/s MikroTik 2
Intel XL710 10 Gbit/s / 10 Gbit/s (unused)
onboard 2.5 Gbit/s RJ45 1 Gbit/s (management)
network connectors

Hardware selection

Now that we have defined the goals and network needs, let’s select the actual hardware!

Network Cards

My favorite store for 10 Gbit/s+ network equipment is FS.COM. They offer Intel-based cards:

Network cards

Both cards work out of the box with the i40e Linux kernel driver, no firmware blobs required.

For a good overview over the different available Intel cards, check out the second page (“Product View”) in the card’s User Manual.

CPU and Chipset

I read on many different sites that AMD’s current CPUs beat Intel’s CPUs in terms of performance per watt. We can better achieve goals 2 and 3 (low noise and low power usage) by using fewer watts, so we’ll pick an AMD CPU and mainboard for this build.

AMD’s current CPU generation is Zen 3, and current Zen 3 based CPUs can be divided into 65W TDP (Thermal Design Power) and 105W TDP models. Only one 65W model is available to customers right now: the Ryzen 5 5600X.

Mainboards are built for/with a certain so-called chipset. Zen 3 CPUs use the AM4 socket, for which 8 different chipsets exist. Our network cards need PCIe 3.0, so that disqualifies 5 chipsets right away: only the A520, B550 and X570 chipsets remain.

Ryzen 5

Mainboard: PCIe bandwidth

I originally tried using the ASUS PRIME X570-P mainboard, but I ran into two problems:

Too loud: X570 mainboards need an annoyingly loud chipset fan for their 15W TDP. Other chipsets such as the B550 don’t need a fan for their 5W TDP. With a loud chipset fan, goal 2 (low noise) cannot be achieved. Only the recently-released X570S variant comes without fans.

Not enough PCIe bandwidth/slots! This is how the ASUS tech specs describe the slots:

This means the board has 2 slots (1 CPU, 1 chipset) that are physically wide enough to hold a full-length x16 card, but only the first port can electronically be used as an x16 slot. The other port only has PCIe lanes electronically connected for x4, hence “x16 (max at x4 mode)”.

Unfortunately, our network cards need electrical connection of all their PCIe x8 lanes to run at full speed. Perhaps Intel/FS.COM will one day offer a new generation of network cards that use PCIe 4.0, because PCIe 4.0 x4 achieves the same 7.877 GB/s throughput as PCIe 3.0 x8. Until then, I needed to find a new mainboard.

Searching mainboards by PCIe capabilities is rather tedious, as mainboard block diagrams or PCIe tree diagrams are not consistently available from all mainboard vendors.

Instead, we can look explicitly for a feature called PCIe Bifurcation. In a nutshell, PCIe bifurcation lets us divide the PCIe bandwidth from the Ryzen CPU from 1 PCIe 4.0 x16 into 1 PCIe 4.0 x8 + 1 PCIe 4.0 x8, definitely satisfying our requirement for two x8 slots at full bandwidth.

I found a list of (only!) three B550 mainboards supporting PCIe Bifurcation in an Anandtech review. Two are made by Gigabyte, one by ASRock. I read the Gigabyte UEFI setup is rather odd, so I went with the ASRock B550 Taichi mainboard.

Case

For the case, I needed a midi case (large enough for the B550 mainboard’s ATX form factor) with plenty of options for large, low-spinning fans.

I stumbled upon the Corsair 4000D Airflow, which is available for 80 CHF and achieved positive reviews. I’m pleased with the 4000D: there are no sharp corners, installation is quick, easy and clean, and the front and top panels offer plenty of space for cooling behind large air intakes:

Airflow case (from the top)

Inside, the case offers plenty of space and options for routing cables on the back side:

Airflow case (back)

Which in turn makes for a clean front side:

Airflow case (front)

Fans

I have been happy with Noctua fans for many years. In this build, I’m using only Noctua fans so that I can reach goal 2 (silent, no loud fan noise):

Noctua fans

These fans are large (140mm), so they can spin on slow speeds and still be effective.

The specific fan configuration I ended up with:

  • 1 Noctua NF-A14 PWM 140mm in the front, pulling air out of the case
  • 1 Noctua NF-A14 PWM 140mm in the top, pulling air into the case
  • 1 Noctua NF-A12x25 PWM 120mm in the back, pulling air into the case
  • 1 Noctua NH-L12S CPU fan

Note that this is most likely overkill: I can well imagine that I could turn off one of these fans entirely without a noticeable effect on temperatures. But I wanted to be on the safe side and have a lot of cooling capacity, as I don’t know how hot the Intel network cards run in practice.

Fan Controller

The ASRock B550 Taichi comes with a Nuvoton NCT6683D-T fan controller.

Unfortunately, ASRock seems to have set the Customer ID register to 0 instead of CUSTOMER_ID_ASROCK, so you need to load the nct6683 Linux driver with its force option.

Once the module is loaded, lm-sensors lists accurate PWM fan speeds, but the temperature values are mislabeled and don’t quite match the temperatures I see in the UEFI H/W Monitor:

nct6683-isa-0a20
Adapter: ISA adapter
fan1:              471 RPM  (min =    0 RPM)
fan2:                0 RPM  (min =    0 RPM)
fan3:                0 RPM  (min =    0 RPM)
fan4:                0 RPM  (min =    0 RPM)
fan5:                0 RPM  (min =    0 RPM)
fan6:                0 RPM  (min =    0 RPM)
fan7:                0 RPM  (min =    0 RPM)
Thermistor 14:     +45.5 C  (low  =  +0.0 C)
                            (high =  +0.0 C, hyst =  +0.0 C)
                            (crit =  +0.0 C)  sensor = thermistor
AMD TSI Addr 98h:  +40.0 C  (low  =  +0.0 C)
                            (high =  +0.0 C, hyst =  +0.0 C)
                            (crit =  +0.0 C)  sensor = AMD AMDSI
intrusion0:       OK
beep_enable:      disabled

At least with the nct6683 Linux driver, there is no way to change the PWM fan speed: the corresponding files in the hwmon interface are marked read-only.

At this point I accepted that I won’t be able to work with the fan controller from Linux, and tried just configuring static fan control settings in the UEFI setup.

But despite identical fan settings, one of my 140mm fans would end up turned off. I’m not sure why — is it an unclean PWM signal, or is there just a bug in the fan controller?

Controlling the fans to reliably spin at a low speed is vital to reach goal 2 (low noise), so I looked around for third-party fan controllers and found the Corsair Commander Pro, which a blog post explains is compatible with Linux.

Server Disk

This part of the build is not router-related, but I figured if I have a fast machine with a fast network connection, I could add a fast big disk to it and retire my other server PC.

Specifically, I chose the Samsung 970 EVO Plus M.2 SSD with 2 TB of capacity. This disk can deliver 3500 MB/s of sequential read throughput, which is more than the ≈3000 MB/s that a 25 Gbit/s link can handle.

Graphics Card

An important part of computer builds for me is making troubleshooting and maintenance as easy as possible. In my current tech landscape, that translates to connecting an HDMI monitor and a USB keyboard, for example to boot from a different device, to enter the UEFI setup, or to look at Linux console messages.

Unfortunately, the Ryzen 5 5600X does not have integrated graphics, so to get any graphics output, we need to install a graphics card. I chose the Zotac GeForce GT 710 Zone Edition, because it was the cheapest available card (60 CHF) that’s passively cooled.

An alternative to using a graphics card might be to use a PCIe IPMI card like the ASRock PAUL, however these seem to be harder to find, and more expensive.

Longer-term, I think the best option would be to use the Ryzen 5 5600G with integrated graphics, but that model only becomes available later this year.

Component List

I’m listing 2 different options here. Option A is what I built (router+server), but Option B is a lot cheaper if you only want a router. Both options use the same base components:

Price Type Article
347 CHF Network card FS.COM Intel XXV710, 2 × 25 Gbit/s (#75603)
329 CHF Network card FS.COM Intel XL710, 4 × 10 Gbit/s (#75602)
314 CHF CPU Ryzen 5 5600X
290 CHF Mainboard ASRock B550 Taichi
92 CHF Case Corsair 4000D Airflow (Midi Tower)
67 CHF Fan control Corsair Commander Pro
65 CHF Case fan 2 × Noctua NF-A14 PWM (140mm)
62 CHF CPU fan Noctua NH-L12S
35 CHF Case fan 1 × Noctua NF-A12x25 PWM (120mm)
60 CHF GPU Zotac GeForce GT 710 Zone Edition (1GB)

Base total: 1590 CHF

Option A: Server extension. Because I had some parts lying around, and because I wanted to use my router for serving files (from large RAM cache/fast disk), I went with the following parts:

Price Type Article
309 CHF Disk Samsung 970 EVO Plus 2000GB, M.2 2280
439 CHF RAM 64GB HyperX Predator RAM (4x, 16GB, DDR4-3600, DIMM 288)
127 CHF Power supply Corsair SF600 Platinum (600W)
14 CHF Power ext Silverstone ATX 24-24Pin Extension (30cm)
10 CHF Power ext Silverstone ATX Extension 8-8(4+4)Pin (30cm)

The Corsair SF600 power supply is not server-related, I just had it lying around. I’d recommend going for the Corsair RM650x *2018* (which has longer cables) instead.

Server total: 2770 CHF

Option B: Non-server (router only) alternative. If you’re only interested in routing, you can opt for cheaper low-end disk and RAM, for example:

Price Type Article
112 CHF Power supply Corsair RM650x *2018*
33 CHF Disk Kingston A400 120GB M.2 SSD
29 CHF RAM Crucial CT4G4DFS8266 4GB DDR4-2666 RAM

Non-server total: 1764 CHF

ASRock B550 Taichi Mainboard UEFI Setup

To enable PCIe Bifurcation for our two PCIe 3.0 x8 card setup:

  1. Set Advanced > AMD PBS > PCIe/GFX Lanes Configuration
    to x8x8.

To always turn on the PC after power is lost:

  1. Set Advanced > Onboard Devices Configuration > Restore On AC Power Loss
    to Power On.

To PXE boot (via UEFI) on the onboard ethernet port (management), but disable slow option roms for PXE boot on the FS.COM network cards:

  1. Set Boot > Boot From Onboard LAN
    to Enabled.
  2. Set Boot > CSM (Compatibility Support Module) > Launch PXE OpROM Policy
    to UEFI only.

Fan Controller Setup

The Corsair Commander Pro fan controller is well-supported on Linux.

After enabling the Linux kernel option CONFIG_SENSORS_CORSAIR_CPRO, the device shows up in the hwmon subsystem.

You can completely spin up (100% PWM) or turn off (0% PWM) a fan like so:

# echo 255 > /sys/class/hwmon/hwmon3/pwm1
# echo 0 > /sys/class/hwmon/hwmon3/pwm1

I run my fans at 13% PWM, which translates to about 226 rpm:

# echo 33 > /sys/class/hwmon/hwmon3/pwm1
# cat /sys/class/hwmon/hwmon3/fan1_input
226

Conveniently, the Corsair Commander Pro stores your settings even when power is lost. So you don’t even need to run a permanent fan control process, a one-off adjustment might be sufficient.

Power Usage

The PC consumes about 48W of power when idle (only management network connected) by default without further tuning. Each extra network link increases power usage by ≈1W:

graph showing power consumption when enabling network links

Enabling all Ryzen-related options in my Linux kernel and switching to the powersave CPU frequency governor lowers power usage by ≈1W.

On some mainboards, you might need to force-enable Global C-States to save power. Not on the B550 Taichi, though.

I tried undervolting the CPU, but that didn’t even make ≈1W of difference in power usage. Potentially making my setup unreliable is not worth that little power saving to me.

I measured these values using a Homematic HM-ES-PMSw1-Pl-DN-R5 I had lying around.

Performance

Goal 1 is to saturate 25 Gbit/s, for example using two 10 Gbit/s downloads. I’m talking about large bulk transfers here, not many small transfers.

To get a feel for the performance/headroom of the router build, I ran 3 different tests.

Test A: 10 Gbit/s bridging throughput

For this test, I connected 2 PCs to the router’s XL710 network card and used iperf3(1) to generate a 10 Gbit/s TCP stream between the 2 PCs. The router doesn’t need to modify the packets in this scenario, only forward them, so this should be the lightest load scenario.

bridging throughput

Test B: 10 Gbit/s NAT throughput

In this test, the 2 PCs were connected such that the router performs Network Address Translation (NAT), which is required for downloads from the internet via IPv4.

This scenario is slightly more involved, as the router needs to modify packets. But, as we can see below, a 10 Gbit/s NAT stream consumes barely more resources than 10 Gbit/s bridging:

NAT throughput

Test C: 4 × 10 Gbit/s TCP streams

In this test, I wanted to max out the XL710 network card, so I connected 4 PCs and started an iperf3(1) benchmark between each PC and the router itself, simultaneously.

This scenario consumes about 16% CPU, meaning we’ll most likely have plenty of headroom even when all ports are maxed out!

four 10 Gbit/s streams

Tip: make sure to enable the CONFIG_IRQ_TIME_ACCOUNTING Linux kernel option to include IRQ handlers in CPU usage numbers for accurate measurements.

Alternatives considered

The passively-cooled SuperServer E302-9D comes with 2 SFP+ ports (10 Gbit/s). It even comes with 2 PCIe 3.0 x8 capable slots. Unfortunately it seems impossible to currently buy this machine, at least in Switzerland.

You can find a few more suggestions in the replies of this Twitter thread. Most are either unavailable, require a lot more DIY work (e.g. a custom case), or don’t support 25 Gbit/s.

Router software: router7 porting

I wrote router7, my own small home internet router software in Go, back in 2018, and have been using it ever since.

I don’t have time to support any users, so I don’t recommend anyone else use router7, unless the project really excites you, and the lack of support doesn’t bother you! Instead, you might be better served with a more established and supported router software option. Popular options include OPNsense or OpenWrt. See also Wikipedia’s List of router and firewall distributions.

To make router7 work for this 25 Gbit/s router PC build, I had to make a few adjustments.

Because we are using UEFI network boot instead of BIOS network boot, I first had to make the PXE boot implementation in router7’s installer work with UEFI PXE boot.

I then enabled a few additional kernel options for network and storage drivers in router7’s kernel.

To router7’s control plane code, I added bridge network device configuration, which in my previous 2-port router setup was not needed.

During development, I compiled a few Linux programs statically or copied them with their dependencies (→ gokrazy prototyping) to run them on router7, such as sensors(1) , ethtool(8) , as well as iproute2’s ip(8) and bridge(8) implementation.

Next Steps

Based on my tests, the hardware I selected seems to deliver enough performance to use it for distributing a 25 Gbit/s upstream link across multiple 10 Gbit/s devices.

I won’t know for sure until the fiber7 Point Of Presence (POP, German Anschlusszentrale) close to my home is upgraded to support 25 Gbit/s “Fiber7-X2” connections. As I mentioned, unfortunately the upgrade plan is delayed due to the component shortage. I’ll keep you posted!

Other Builds

In case my build doesn’t exactly match your requirements, perhaps these others help inspire you:

Appendix A: DPDK test

Pim ran a DPDK based loadtester called T-Rex on this machine. Here’s his summary of the test:

For DPDK, this hardware does 4x10G at 64b frames. It does not do 6x10G as it tops out at 62Mpps using 4 cores (of 15.5Mpps per core).

I couldn’t test 25G symmetric [because we lacked a 25G DAC cable], but extrapolating from the numbers, 3 CPUs source and sink ~24.6Gbit per core, so we’d probably make it, leaving 1 core for OS and 2 cores for controlplane.

If the machine had a 12 core Ryzen, it would saturate all NICs with room to spare. So that’s what I’ll end up buying :)

DPDK test

at 2021-07-10 11:43

2021-06-21

michael-herbst.com

Errors and uncertainty quantification in density-functional theory

On 8th June I was invited to the seminar of the Uncertainty Quantification (UQ) group of Prof. Youssef Marzouk at MIT. Youssef and I planned to have this seminar since my involvement with MIT's CESMIX project last February (see also this blog article), but it took use quite some time to get it arranged. Finally I managed to present my point of view on UQ in density-functional theory (DFT), sneakily re-using most of the slides I had already prepared for my recent UQ-in-DFT talk at RWTH Aachen's UQ group the week earlier.

Similar to the Aachen talk I've put strong emphasis on engaging audience participation and discussion. I first introduced the UQ group to electronic structure theory and DFT, allowing for enough time to discuss the key ideas of the physics. Then I pointed out current research in error estimation and UQ in DFT and provided a number of opportunities for interesting future UQ-related research. The discussion was very lively and I hardly made it beyond a slide without a question, which was just great. Since a lot could be gained from stronger uncertainty quantification tools in DFT in my opinion, I hoped this talk made DFT more accessible to the UQ group and made some people curious to look into the details. On my end I would definitely enjoy to learn more about UQ in the future and look forward to my future UQ-related involvements in the CESMIX project. As usual my slides are attached below.

Link
Errors and uncertainty quantification in electronic-structure theory (Slides)

by Michael F. Herbst at 2021-06-21 10:00 under Research, talk, electronic structure theory, Kohn-Sham, uncertainty quantification, DFT, solid state

2021-06-15

michael-herbst.com

Talk at MATH4UQ seminar series at RWTH

On 1st June I was invited to the MATH4UQ seminar series of the Mathematics of Uncertainty Quantification chair of Prof. Raul Tempone at RWTH Aachen University.

Over the past months I got more and more interested in mathematical methods for uncertainty quantification (UQ) as an opportunity to estimate and understand errors in density-functional theory (DFT) calculations. In particular I imagine UQ methods to be useful to estimate the model error of a DFT model itself. At this level statistical approaches are likely the only feasible option for a practical error estimation, since the mathematical complexity of modern DFT models beyond the local density approximations very likely make a posteriori error analysis strategies extremely infeasible.

In my talk I explain the basics of DFT and provide a rough overview of present UQ developments in this method. Since I know very little about UQ and my audience knew very little about DFT, I intended the talk to be more of a Q&A session, where the slides are around to stimulate discussion. This turned out to work very well and I am very grateful to the many interesting questions from the audience and the enjoyful discussion. As usual my slides are attached below. Additionally a recording of my talk can be found on youtube.

Link
Errors in electronic-structure theory: Status and directions for future research (Slides)
Youtube recording of the talk

by Michael F. Herbst at 2021-06-15 10:00 under Research, talk, electronic structure theory, Kohn-Sham, uncertainty quantification, DFT, solid state

2021-06-05

sECuREs website

Laptop review: ThinkPad X1 Extreme (Gen 2)

ThinkPad X1 Extreme Gen 2, pear for scale

For many of my school and university years, I used and liked my ThinkPad X200 ultraportable laptop. But now that these years are long gone, I realized my use-case for laptops had changed: instead of carrying my laptop with me every day, I am now only bringing it on occasion, for example when I travel to conferences, visit friends, or do volunteer work.

After the ThinkPad X200, I used a few different laptops:

  • MacBook Pro 13" Retina, bought for its screen
  • ThinkPad X1 Carbon, which newly introduced a hi-dpi screen to ThinkPads
  • Dell XPS 9360, for a change, to try a device that ships with Linux

With each of these devices, I have felt limited by the lack of connectors and slim compute power that comes with the Ultrabook brand, even after years of technical progress.

More compute power is nice to be able to work on projects with larger data sets, for example debiman (scanning and converting all manpages in Debian), or distri (building Linux packages).

More peripheral options such as USB ports are nice when connecting a keyboard, trackball, USB-to-serial adapter, etc., to work on a micro controller or Raspberry Pi project, for example.

So, I was ready to switch from the heaviest Ultrabooks to the lightest of the “mobile workstation” category, when I stumbled upon Lenovo’s ThinkPad X1 Extreme (Gen 2), and it piqued my curiosity.

Peripherals

Let me start by going into the key peripherals of a laptop: keyboard, touchpad and screen. I will talk about these independently from the remaining hardware because they define the experience of using the computer.

Keyboard

After having used the Dell XPS 9360 for a few years, I can confidently say that the keyboard of the ThinkPads is definitely much better, and in a noticeable way.

It’s not that the Dell keyboards are bad. But comparing the Dell and ThinkPad side-by-side makes it really clear that the ThinkPad keyboards are the best notebook keyboards.

On the ThinkPad keyboard, every key press lands exactly as I imagine. Never do I need to hit a key twice because I didn’t press it just-right, and never do I notice additional ghost key presses.

Even though I connect my external Kinesis Advantage keyboard when doing longer stretches of work, the quality of the built-in keyboard matters: a good keyboard enables using the laptop on the couch.

Touchpad

Unfortunately, while the keyboard is great, I can’t say the same about the touchpad. I mean, it’s not terrible, but it’s also not good by any stretch.

This seems to be the status quo with PC touchpads for decades. It really blows my mind that Apple’s touchpads are consistently so much better!

My only hope is that Bill Harding (GitClear), who is working on improving the Linux touchpad experience, will eventually find a magic software tweak or something…

As mentioned on the ArchWiki, I also had to adjust the sensitivity like so:

% xinput set-prop 'SynPS/2 Synaptics TouchPad' 'libinput Accel Speed' 0.5

Display

I have high demands regarding displays: since 2013, every device of mine has a hi-dpi display.

The industry hasn’t improved displays across the board as fast as I’d like, so non-hi-dpi displays are still quite common. The silver lining is that it makes laptop selection a little easier for me: anything without a decent display I can discard right away.

I’m glad to report that the 4K display in the ThinkPad X1 Extreme with its 3840x2160 pixels is sharp, bright, and generally has good viewing angles.

It’s also a touchscreen, which I don’t strictly need, but it’s nice to use it from time to time.

I use the display in 200% scaling mode, i.e. I set Xft.dpi: 192. See also HiDPI in ArchWiki.

Hardware

Spec-wise, the ThinkPad X1 Extreme is a beast!

ThinkPad X1 Extreme Specs

The build quality seems very robust to me.

Another big plus of the ThinkPad series over other laptop series is the availability of the official Hardware Maintenance Manual: you can put “ThinkPad X1 Extreme Gen 2 Hardware Maintenance Manual” into Google and will find p1_gen2_x1extreme_hmm_v1.pdf as the first hit. This manual describes in detail how to repair or upgrade your device if you want to (or have to) do it yourself.

WiFi

The built-in Intel AX200 WiFi interface works fine, provided you have a new-enough linux-firmware package and kernel version installed.

I had trouble with Linux 5.6.0, and Linux 5.6.5 fixed it. Luckily, at the time of writing, Linux 5.11 is the most recent release, so most distributions should be recent enough for things to just work.

The WiFi card reaches almost the same download speed as the most modern WiFi device I can test: a MacBook Air M1. Both are connected to my UniFi UAP-AC-HD access point.

Laptop Download Upload
ThinkPad X1 Extreme 500 Mbit/s 150 Mbit/s
MacBook Air M1 600 Mbit/s 500 Mbit/s

I’m not sure why the upload speed is so low in comparison.

GPU

The GPU in this machine is by far the most troublesome bit of hardware.

I had hoped that after many years of laptops containing Intel/nVidia hybrid graphics, this setup would largely work, but was disappointed.

Both the proprietary nVidia driver and the nouveau driver would not work reliably for me. I ran into kernel error messages and hard-freezes, with even SSH sessions to the machine breaking.

In the end, I blacklisted the nouveau driver to use Intel graphics only:

% echo blacklist nouveau | sudo tee /etc/modprobe.d/blacklist.conf 

Without the nVidia driver, the GPU will not go into powersave mode, so I remove it from the PCI bus entirely to save power:

#!/bin/zsh

sudo tee /sys/bus/pci/devices/0000\:01\:00.0/remove <<<1
sudo tee /sys/bus/pci/devices/0000\:01\:00.1/remove <<<1

You can only re-awaken the GPU with a reboot.

Obviously this isn’t a great setup — I would prefer to be able to actually use the GPU. If you have any tips or a better experience, please let me know.

Also note that the HDMI port will be unusable if you go this route, as the HDMI port is connected to the nVidia GPU only.

Battery life

The 80 Wh battery lasts between 5 to 6 hours for me, without any extra power saving tuning beyond what the Linux distribution Fedora 33 comes with by default.

This is good enough for using the laptop away from a power socket from time to time, which matches my expectation for this kind of mobile workstation.

Software support

Linux support is generally good on this machine! Yes, I provide a few pointers in this article regarding problems, patches and old software versions. But, if you use a newer Linux distribution, all of these fixes are included and things just work out of the box. I tested with Fedora 33.

For a few months, I was using this laptop exclusively with my research Linux distribution distri, so even if you just track upstream software closely, the machine works well.

Firmware updates

Lenovo partnered with the Linux Vendor Firmware Service Project (LVFS), which means that through fwupd, ThinkPad laptops such as this X1 Extreme can easily receive firmware updates!

This is a huge improvement in comparison to earlier ThinkPad models, where you had to jump through hoops with Windows-only software, or CD images that you needed to boot just right.

If your laptop has a very old firmware version (before 1.30), you might be affected by the skipping keystrokes issues. You can check using the always-handy lshw(1) tool.

Performance

The specific configuration of my ThinkPad is:

ThinkPad X1 Extreme Spec (2020)
CPU Intel Core i7-9750H CPU @ 2.60GHz
RAM 2 × 32 GB Samsung M471A4G43MB1-CTD
Disk 2 × SAMSUNG MZVLB2T0HALB-000L7 NVMe disk

You can google for CPU benchmarks and comparisons yourself, and those likely are more scientific and carefully done than I have time for.

What I can provide however, is a comparison of working on one of my projects on the ThinkPad vs. on my workstation, an Intel Core i9-9900K that I bought in 2018:

Workstation Spec (2018)
CPU Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
RAM 4 × Corsair CMK32GX4M2A2666C16
Disk Corsair Force MP600 M.2 NVMe disk

Specifically, I am comparing how long my manpage static archive generator debiman takes to analyze and render all manpages in Debian unstable, using the following command:

ulimit -n 8192; time ~/go/bin/debiman \
  -keyring=/usr/share/keyrings/debian-archive-keyring.gpg \
  -sync_codenames=, \
  -sync_suites=unstable \
  -serving_dir=/srv/man/benchmark \
  -inject_assets=~/go/src/github.com/Debian/debiman/debian-assets \
  -concurrency_render=20 \
  -alternatives_dir=~/go/src/github.com/Debian/debiman/piuparts

On both machines, I ensured that:

  1. The CPU performance governor was set to performance
  2. A warm apt-cacher-ng cache was present, i.e. network download was not part of the test.
  3. Linux kernel caches were dropped using echo 3 | sudo tee /proc/sys/vm/drop_caches
  4. I was using debiman git revision f78c160

Here are the results:

Machine Time
i9-9900K Workstation 4:57,10 (100%)
ThinkPad X1 Extreme (Gen 2) 7:19,56 (147%)

This reaffirms my impression that even high-end laptop hardware just cannot beat a workstation setup (which has more space and better thermals), but it comes close enough to be useful.

Conclusion

Positives:

  • The ergonomics of the device really are great. It is a pleasure to type on a first-class, full-size ThinkPad keyboard. The screen has good quality and a high resolution.

  • Performance-wise, this machine can almost replace a proper workstation.

Negatives are:

  • the mediocre battery life
  • an annoyingly loud fan that spins up too frequently
  • poor software/driver support for hybrid nVidia GPUs.

Notably, all of these could be improved by better power saving, so perhaps it’s just a matter of time until Linux kernel developers land some improvements…? :)

at 2021-06-05 18:43

2021-05-30

michael-herbst.com

SIAM LA: Robust and efficient accelerated methods for density-functional theory

Just one day after my talk at the SIAM Materials Science conference (blog article) I gave another talk at a SIAM meeting, this time at SIAM Linear Algebra. I was very much looking forward to participate in SIAM LA, firstly because it was the first time I attended this conference, but also secondly because it was a good opportunity to talk about our recent algorithmic work on robust DFT methods to an international crowd of mathematicians.

I presented as part of the minisymposium Theory and Practice of Extrapolation and Acceleration Methods, which consisted of three interesting sessions of historic and recent talks about extrapolation and convergence acceleration in the broadest sense of the word. Both topics about iterative methods as well as summation theory and sequence summation were discussed, which turned out to be a very enjoyful mix. In that sense I am really grateful for the mini organisers, Agnieszka Miedlar and Yousef Saad, for the invitation and for allowing me to be part of the great sessions.

Beyond the mini I enjoyed a number of talks about emerging topics in numerical linear algebra such as mixed-precision computation, low-rank tensor approximations or randomised methods. Even though the time zone difference meant that the conference was mostly running during the afternoon and late evening for me and even though the collision with SIAM Materials Science made it quite a busy week, I took a lot from SIAM LA and I'm already looking forward to next time.

Link
Robust and Efficient Accelerated Methods for Kohn-Sham Density-Functional Theory

by Michael F. Herbst at 2021-05-30 16:01 under Research, talk, electronic structure theory, Julia, DFTK, numerical analysis, Kohn-Sham, high-throughput, DFT, solid state