Planet NoName e.V.

2020-09-28

sECuREs website

Nuki Opener with an SCS bus intercom (bTicino 344212)

I have long been looking for a way to make my intercom a little more pleasant.

Recently, a friend made me aware of the Nuki Opener, which promises to make existing intercom systems smart, and claims to be compatible with the specific intercom I have!

So I got one and tried setting it up, but could not get it to work.

This post documents how I have analyzed what goes over the intercom’s SCS bus. Perhaps the technique is interesting, or perhaps you want to learn more about SCS :)

Note that I have not yet used the Nuki Opener, so I can’t say anything about it yet. What I have seen so far makes a good impression, but it just does not seem to work at all with my intercom. I will update this article after working with the Nuki support to fix this.

Connecting the Nuki Opener to the bTicino 344212

First, I identified which wires are used for the bus: between BUS- and BUS+, the internet tells me that I would expect to measure ≈27V, and indeed a multimeter shows:

bTicino wiring

I then connected the Nuki Opener as described in “Connect the Nuki Opener to an unknown intercom”, Page 8, Bus intercoms → Basic setup without doorbell suppression:

Nuki wire Intercom Signal
black BUS- GND
red BUS+ SCS (+27V)
orange BUS+ SCS (+27V)

bTicino wiring

I had previously tried the enhanced setup with doorbell suppression, as the Nuki app recommends, but switched to the simplest setup possible when capturing the signal.

Configuring the Nuki Opener

With the Nuki app, I configured the Opener either as:

  • bTicino → 344212
  • Generic → Bus (SCS)
  • Unknown intercom

Unfortunately, with all configurations:

  1. The app says it learned the door open signal successfully.
  2. The device/app does react to door rings.
  3. The device never successfully opens the door.

Capturing the SCS bus with sigrok

The logic analyzer that I have at home only works with signals under 5V. As the SCS bus is running at 27V, I’m capturing the signal with my Hantek 6022BE USB oscilloscope.

sigrok is a portable, cross-platform, free open source signal analysis software suite and supports the Hantek 6022BE out of the box, provided you have at least version 0.1.4 of the the sigrok fx2lafw package installed.

Check out sigrok’s “Getting started with a logic analyzer” if you’re new to sigrok!

The Nuki Opener has 3 different pin headers you can use, depending on where you want to attach it on your wall. These are connected straight through, so I used them to conveniently grab BUS+ and BUS- just like the Nuki sees it:

bTicino capture

I set the oscilloscope probe head to its 10X divider setting, so that I had the full value range available, then started sampling 5M samples at 500 kHz:

sigrok PulseView screenshot

You can see 10s worth of signal. The three bursts are transmissions on the SCS bus.

The labeling didn’t quite match for me: it shows e.g. 3.2V instead of 27V, but as long as the signal comes in clearly, it doesn’t matter if it is offset or scaled.

SCS bus decoding with sigrok: voltage levels

Let’s tell sigrok what voltage level corresponds to a low or high signal:

  1. left-click on channel CH1
  2. set “conversion” to “to logic via threshold”
  3. set “conversion threshold” to 3.0V

Now you’ll see not only the captured signal, but also the logical signal below in green:

sigrok PulseView screenshot

SCS bus decoding with sigrok: SCS decoder

Now that we have obtained a logical/digital signal (low/high), we can write a sigrok decoder for the SCS bus. See sigrok’s Protocol decoder HOWTO for an introduction.

In general, I strongly recommend investing into tooling, in particular when decoding protocols. Spending a few minutes to an hour at this stage will minimize mistakes and save lots of time later, and—when you contribute your tooling—enable others to do more interesting work!

I found it easy to write a sigrok decoder, having never used their API before. It was quick to get something onto the screen, mistakes were easy to correct, and the whole process was nicely iterative.

Until it is merged and released with a new version of libsigrokdecode, you can find my SCS decoder on GitHub.

The decoder looks at every layer of an SCS telegram: the start/stop bits, the data bits, the value and the value’s logical position/function in the SCS telegram.

SCS full

Our SCS decoder displays the 3 bursts on the SCS bus when we ring the doorbell:

SCS bus door ring

SCS bus door ring

SCS bus door ring

Only the middle burst sets a destination address of 0x3, the configured number of my intercom system. I am not sure what the first and last burst indicate!

The SCS bus activity when opening the door seems more clear:

SCS bus door ring

SCS bus door ring

These 2 bursts are sent one second apart, and only differ in the request parameter field: my guess is that 0xa4 means “start buzzing the door open” and 0xa0 means “stop buzzing the door open”.

I’m not sure why all these bursts repeat their SCS telegrams 3 times. My understanding was that SCS telegrams are repeated only when they are not acknowledged, and I indeed see no acknowledgement telegrams in my captures. Does that mean something is wrong with our intercom and it only works due to retransmissions?

SCS bus decoding with sigrok git: UART+SCS decoder

As Gerhard Sittig pointed out, in the git version of libsigrokdecode, one can use the existing UART decoder to decode SCS:

  1. Set Baud rate to 9600
  2. Set Sample point to 20%

This seems a little more robust than my cobbled-together SCS decoder from above :)

In addition to the UART decoder, we can still use a custom SCS decoder to label individual bytes within an SCS telegram according to their function, and do CRC checks.

Captured SCS telegrams

Find a capture of the door bell and door buzzer in 2020-09-27-rohdaten-klingel.zip.

To extract the interesting parts from the sigrok files, I had to use sigrok-cli to convert the data to CSV, cut the data and convert from CSV to sigrok again:

% sigrok-cli \
  -i 2020-09-27-anlern-01-open.srzip \
  --output-format csv \
  -o 2020-09-27-anlern-01-open-PUR.csv

% tail -n 4050629 2020-09-27-anlern-01-open-PUR.csv \
  > 2020-09-27-anlern-01-open-PUR-filtered.csv

% $EDITOR *.csv # copy over the header

% sigrok-cli \
  -I csv:single_column=true:header=true:column_formats=1a \
  -i 2020-09-27-anlern-01-open-PUR-filtered.csv \
  --output-format srzip \
  -o 2020-09-27-anlern-01-open-PUR-filtered.srzip

Further reading

I used the following sources; please let me know of any others!

at 2020-09-28 06:43

2020-09-15

michael-herbst.com

Faraday Discussions: New horizons in density functional theory

Following the submission of our paper on a posteriori error estimation in the Kohn-Sham equations a few months ago I was recently invited to present our work at the Faraday Discussions on New horizons in density functional theory. Being amongst speakers such as Kieron Burke, Andreas Savin or Weitao Yang this was truly a great honour.

Even though the conference had to be virtual, I enjoyed it very much, especially because of its very unusual format. Unlike most other conferences where the presenting author typically does his thing for like 30 minutes, followed by just a few questions, the situation is completely reversed for the Faraday discussions. Since we had to submit our paper already months in advance (and this was shared with the other participants) the content of my talk was already known to the audience. The main chunk of the time at the conference was therefore allocated to the discussion and not to the presentation. My few slides therefore only briefly recap our work and hint at the general motivation and outlook. As I hoped our work did indeed stimulate an intense discussion with interested and stimulating questions, which I really appreciated (thanks to everyone who asked or commented). As far as I understand both the paper as well a transcript of the discussion will be part of the official Faraday Discussions conference proceedings, which will be published by the Royal Society of Chemistry soon. As per usual my slides are attached below.

Link Licence
A posteriori error estimation for the non-self-consistent Kohn-Sham equations (Slides) Creative Commons License

by Michael F. Herbst at 2020-09-15 13:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, error estimates, numerical analysis, Kohn-Sham, high-throughput

2020-09-06

RaumZeitLabor

Käfertal bleibt Käfertal bleibt Käfertal

Kaum zu glauben, aber wahr: Die Suche nach einem neuen Standort ist beendet! Das RaumZeitLabor bleibt in Käfertal und zieht 1,3 Luftlinienkilometer weiter in die Weinheimer Straße 58–60.

Hier gibt es noch einiges tun. Wände müssen versetzt, beziehungsweise gestellt werden, der Keller gehört renoviert und werkstatttauglich hergerichtet, die neue Aufteilung aller Räume muss durchdacht werden. Aber auch das “alte” RZL und die eigentliche Umzugsplanung sollten vor lauter Vorfreude auf die neue Location nicht vernachlässigt werden. 

Wir freuen uns weiterhin über Beteiligung aller Art. Wie es die kommenden Wochen weitergeht, erfahrt ihr auf der Mailingliste und sicher auch über Twitter/Mastodon.

by flederrattie at 2020-09-06 00:00

2020-09-03

michael-herbst.com

Black-box inhomogeneous preconditioning for self-consistent field iterations in density-functional theory

For the past half a year or so Antoine Levitt and myself have been looking at a particular tricky busyness for solid-state density-functional theory (DFT) calculations, namely how to design efficient self-consistent field (SCF) schemes for large inhomogeneous systems. I have already previously reported on this matter in a short talk at the seminar of our interdisciplinary working group, but now our results have reached a stage suitable for publication.

The underlying problem we are tackling in our work is that for large systems, meaning increased sizes of the unit cell, the SCF iterations become harder and harder to solve. Mathematically speaking the (spectral) condition number of the fixed point iterations underlying the SCF procedure increase rather drastically in such cases, leading to very slow convergence. For example in aluminium the number of iterations required to converge an SCF with a damped iteration scheme (the most simple one) increases quadratically with the system size. This quickly makes calculations intractable and multiple more sophisticated approaches have therefore been developed over the years. As is detailed in our work there are mainly two orthogonal directions of attack. The first is to black-box "accelerate" the convergence by using the so-called Anderson (or Pulay or DIIS) scheme. This reduces the growth of iterations with system size from quadratic to linear (in the aluminium example), which is a good start. The second approach is to use a carefully designed preconditioner for the SCF in order to tame the SCF iterations. Figuratively speaking this approach makes use of known physics to prevents the SCF from looking in the wrong direction for the solution. If done right, meaning that the physics modelled by the preconditioner fits the system at hand, this allows the SCF iteration count to become independent of system size. This latter approach is clearly the more important route to cure the problem, but both approaches are orthogonal and are therefore typically combined in order to get the fastest convergence.

Now what does it mean the preconditioner has to fit the system? As we detail in the paper, the convergence of an SCF is intimately linked to the dielectric behaviour of the material one models with the SCF. For homogeneous cases (i.e. bulk insulators, metals and semiconductors) people have devised very good models for their dielectric behaviour and have used them to construct preconditioners. As is well known (and confirmed by our study) these models show exactly the desirable property of a size-independent iteration count. The caveat is only that metals, insulators and semiconductors have deviating dielectric properties, meaning that each of these calls for a different preconditioning strategy. In return this means that heterogeneous cases where multiple of these materials are combined are difficult to treat in practice because none of the bulk recipes fully fit.

The main aim of our work was therefore to design a preconditioner which automatically and locally adapts to the system at hand, meaning that for heterogeneous cases it treats metallic regions like metals, insulating regions like insulators and so on. As we demonstrate with a number of test cases our preconditioner is able to do this completely black-box and parameter-free and performs well also for large heterogeneous systems. This is in contrast to previous approaches to tackle this problem, which were not as general as our approach and sometimes required complex hand-tuning of the involved parameters.

While our preconditioner solves the problem of efficiently treating cases like metallic slabs, metal clusters and basically any combination of metallic parts, insulators and vacuum, it is not fully capable of distinguishing insulators and semiconductors. We show that this can be cured at the expense of introducing another parameter to our algorithm. This works, but is not completely satisfactory to us. Part of our ongoing work is therefore to extend our scheme to treat mixed systems involving semiconductors as well. Another aspect we have so far neglected is spin, which is a constant annoyance for converging SCFs. Having a solid dielectric model as we propose it, also opens way to adapt preconditioning to each spin component differently. We hope to use this in the future to tackle convergence issues with spin in a hopefully more rigorous way than this is done to date.

The full abstract of our paper reads

We propose a new preconditioner for computing the self-consistent problem in Kohn-Sham density functional theory, based on the local density of states. This preconditioner is inexpensive and able to cure the long-range charge sloshing known to hamper convergence in large, inhomogeneous systems such as clusters and surfaces. It is based on a parameter-free and physically motivated approximation to the independent-particle susceptibility operator, appropriate for both metals and insulators. It can be extended to semiconductors by using the macroscopic electronic dielectric constant as a parameter in the model. We test our preconditioner successfully on inhomogeneous systems containing metals, insulators, semiconductors and vacuum.

by Michael F. Herbst at 2020-09-03 22:30 under electronic structure theory, theoretical chemistry, DFTK, Julia, dft, numerical analysis, Kohn-Sham

2020-08-09

sECuREs website

Adding a fiber link to my home network

Motivation

Despite using a FTTH internet connection since 2014, aside from the one fiber uplink, I had always used network gear with 1 Gbit/s links over regular old rj45 cat5(e) cables.


I liked the simplicity and uniformity of that setup, but decided it’s time to add at least one fiber connection, to get rid of a temporary ethernet cable that connected my kitchen with the rest of my network that is largely in the living room and office.

The temporary ethernet cable was an experiment to verify that running a server or two in my kitchen actually works (it does!). I used a flat ethernet cable, which is great for test setups like that, as you can often tape it onto the walls and still close the doors.

So, we will replace one ethernet cable with one fiber cable and converters at each end:

0.9mm thin fiber cables

Why is it good to switch from copper ethernet cables to fiber in this case? Fiber cables are smaller and hence easier to fit into existing cable ducts. While regular ethernet cable is way too thick to fit into any of the existing ducts in my flat, I was hoping that fiber might fit!

When I actually received the cables, I was surprised how much thinner fiber cables actually can be: there are 0.9mm cables, which are so thin, they can be hidden in plain sight! I had only ever seen 2mm fiber cables before, and the 0.9mm cables are incredibly light, flexible and thin! Even pasta is typically thicker:

0.9mm thin fiber cables

Preparing a delicious pot of glass noodles ;)


The cable shown above comes from the fiber store FS.COM, which different people have praised on multiple occasions, so naturally I was curious to give them a shot myself.

Also, for the longest time, it was my understanding that fiber connectors can only be put onto fiber cables using expensive (≫2000 CHF) machines. A while ago I heard about field assembly connectors so I wanted to verify that those indeed work.


Aside from practical reasons, playing around with fiber networking also makes for a good hobby during a pandemic :)

Hardware Selection

I ordered all my fiber equipment at FS.COM: everything they have is very affordable, and products in stock at their German warehouse arrive in Switzerland (and presumably other European countries) within the same week.

If you are in the luxurious position to have enough physical space and agility to pull through an entire fiber cable, without having to remove any connectors, you can make a new network connection with just a few parts:

amt price total article note
2x 36 CHF 72 CHF #17237 1 Gbit/s media converter RJ45/SFP
1x 8.5 CHF 8.5 CHF #39135 1 Gbit/s BiDi SFP 1310nm-TX/1550nm-RX
1x 11 CHF 11 CHF #39138 1 Gbit/s BiDi SFP 1550nm-TX/1310nm-RX
1x 2.3 CHF 2.3 CHF #12285 fiber cable, 0.9mm LC UPC/LC UPC simplex

I recommend buying an extra fiber cable or two so that you can accidentally damage a cable and still have enough spares.

Total cost thus far: just under 100 CHF. If you have existing switches with a free SFP slot, you can use those instead of the media converters and save most of the cost.


If you need to temporarily remove one or both of the fiber cable connector(s), you also need field assembly connectors and a few tools in addition:

amt price total article note
2x 4 CHF 8 CHF #35165 LC/UPC 0.9mm pre-polished field assembly connector
1x 110 CHF 110 CHF #14341 High Precision Fibre Optic Cleaver FS-08C
1x 26 CHF 26 CHF #14346 Fibre Optic Kevlar Cutter
1x 14 CHF 14 CHF #72812 Fibre Optical Stripper

I recommend buying twice the number of field assembly connectors, for practicing.

Personally, I screwed up two connectors before figuring out how the process goes.

Total cost: about 160 CHF for the field assembly equipment, so 260 CHF in total.


To boost your confidence in the resulting fiber, the following items are nice to have, but you can get by without, if you’re on a budget.

price article note
18 CHF #35388 FVFL-204 Visual Fault Locator
9.40 CHF #82730 2.5mm to 1.25mm adapter for Visual Fault Locator
4.10 CHF #14010 1.25mm fiber clean swabs (100pcs)

With the visual fault locator, you can shine a light through your fiber. You can verify correct connector assembly by looking at how the light comes out of the connector.

The fiber cleaning swabs are good to have in general, but for the field assembly connector, you need to use alcohol-soaked wipes anyway (which FS.COM does not stock).

The total cost for everything is just under 300 CHF.

Hardware Selection Process

The large selection at FS.COM can be overwhelming to navigate at first. My selection process went something like this:

My first constraint is using bi-directional (BiDi) fiber optics modules so that I only need to lay a single fiber cable, as opposed to two fiber cables.

The second constraint is to use field assembly connectors.

If possible, I wanted to use bend-insensitive fiber so that I wouldn’t need to pay so much attention to the bend radius and have more flexibility in where and how I can lay fiber.

With these constraints, there aren’t too many products left to combine. An obvious and good choice are 0.9mm fiber cable using LC/UPC connectors.

FS.COM details

As of 2020-08-05, FS.COM states they have 5 warehouses in 4 locations:

  • Delaware (US)
  • Munich (Germany)
  • Melbourne (Australia)
  • Shenzhen (China)

They recently built another, bigger (7 km²) warehouse in Shenzhen, and now produce inventory for the whole year.

By 2019, FS.COM had over 300,000 registered corporate customers, reaching nearly 200 million USD yearly sales.

Delivery times

As mentioned before, delivery times are quick when the products are in stock at FS.COM’s German warehouse.

In my case, I put in my order on 2020-Jun-26.

The items that shipped from the German warehouse arrived on 2020-Jul-01.

Some items had to be manufactured and/or shipped from Asia. Those items arrived after 3 more weeks, on 2020-Jul-24.

Unfortunately, FS.COM doesn’t stock any 0.9mm fiber cables in their German warehouse right now, so be prepared for a few weeks of waiting time.

Laying The Fiber

Use a cable puller to pull the fiber through existing cable ducts where possible.

  • In general, buy the thinnest one you can find. I have this 4mm diameter cable puller, but a 3mm or even 2mm one would work in more situations.

  • I found it worthwhile to buy a brand one. It is distinctly better to handle (less stiff, i.e. more flexible) than the cheap one I got, and thinner, too, which is always good.

In my experience, it generally did not work well to push the fiber into an existing duct or alongside an existing cable. I really needed a cable puller.

If you’re lucky and have enough space in your duct(s), you can leave the existing connectors on the fiber. I have successfully just used a piece of tape to fix the fiber connector on the cable puller, pushing down the nose temporarily:

fiber cable taped to cable puller

Where there are no existing ducts, you may need to lay the fiber on top of the wall. Obviously, this is tricky as soon as you need to make a connection going through a wall: whereas copper ethernet cables can be bent and squeezed into door frames, you quickly risk breaking fiber cables.

Luckily, the fiber is very light, so it’s very easy to fix to the wall with a piece of tape:

fiber cables on the wall

You can see the upstream internet fiber in the top right corner, which is rather thick in comparison to my 0.9mm yellow fiber that’s barely visible in the middle of the picture.

Note how the fiber entirely disappears behind the existing duct atop the door!

Above, you can see the flat ethernet cable I have been using as a temporary experiment.


Where there is an existing cable that you can temporarily remove, it might be possible to remove it, put the fiber in, and put the old cable back in, too. This is possible because the 0.9mm fiber cable is so thin!

I’m using this technique to cross another wall where the existing cable duct is too full, but there is a cable that can be removed and put back after pulling the fiber through:

fiber cable next to existing cable

…and on the other side of the wall:

fiber cable next to existing socket

Note how the fiber is thin enough to fit between the socket and duct!


Note: despite measuring how long a fiber cable I would need, my cable turned out too short! While the cable was just as long as I had measured, with distances exceeding 10m, it is a good idea to add a few meters spare on each side of the connection.

Field assembly connectors

To give you an overview, these are the required steps at a high level:

  1. Cut the fiber with the Fibre Optic Kevlar Cutter
  2. Strip the fiber with the Fibre Optical Stripper
  3. Put the field assembly jacket onto the fiber
  4. Cut the stripped fiber cleanly with the High Precision Fibre Optic Cleaver FS-08C
  5. Put the field assembly connector onto the fiber

I thought the following resources were useful:

  1. Pictograms: PDF: FS.COM LC UPC field assembley connectors quick start guide
  2. Pictures: Installation Procedure on FS.COM
  3. Video: YouTube: Terminate Fiber in 5 Minutes: this video shows a different product, but I found it helpful to see any field assembly connector on video, and this is one of the better videos I could find.

Beware: the little paper booklet that comes with the field assembly connector contains measurements which are not to scale. I have suggested to FS.COM that they fix this, but until then, you’ll need to use e.g. a tape measure.


For establishing an intuition of their different sizes, here are the different connectors:

fiber cable next to existing socket

From left to right:

  • 2.0mm fiber cable
  • cat6 ethernet cable
  • 0.9mm fiber cable (LC/UPC factory)
  • 0.9mm fiber cable (LC/UPC field assembly connector)

The 0.9mm fiber cables come with smaller connectors than the 2.0mm fiber cables, and that alone might be a reason to prefer them in some situations.

The field assembly connectors are pretty bulky in comparison, but since you can attach them yourself after pulling only the cable through the walls and/or ducts, you usually don’t care too much about their size.

Conclusion

Modern fiber cables available at FS.COM are:

  • thinner than I expected
  • more robust than I expected
  • cheaper than I expected
  • survive tighter bend radiuses than I expected

Replacing this particular connection with a fiber connection was a smooth process overall, and I would recommend it in other situations as well.


I would claim that it is totally feasible for anyone with an hour of patience to learn how to put a field assembly connector onto a fiber cable.

If labor cost is expensive in your country or you just like doing things yourself, I can definitely recommend this approach. In case you mess the connector up and don’t want to fix it yourself, you can always call an electrician!


Stay tuned for the next part, where I upgrade the 1G link to a 10G link!

at 2020-08-09 12:53

2020-07-31

michael-herbst.com

DFTK: A Julian approach for simulating electrons in solids

Since last Friday I have been attending JuliaCon, the annual conference for the Julia language. Naturally given the current situation the event did not take place "on location", but was instead converted into a virtual event. Albeit the different feel compared to a real-life conference the organisers did a very good job to maintain the social component into the event. Talks were pre-recorded and speakers available in a chat room to discuss during and after the presentation in written form. Birds of feather brainstorming sessions took place using audio discussions and at the end of every day there was a Gather Town virtual social, where one could videochat with fellow attendees by meeting up in a beautifully animated world, where each attendee was represented by a tiny avatar.

Apart from attending and listing to the great talks about the Julia language and its plenty of applications, I also had the chance to actively participate by giving a lecture about our package DFTK.jl. While I have presented on DFTK a few times before in front of expert audiences of the field, it was really the first time I presented DFTK as a released package to the broader Julia audience. That meant that I could, for once, give up on my usual storyline where I try and convince people into using Julia and instead focus on providing insight into the fascinating challenges of electronic-structure theory and how DFTK and Julia are ideal tools to tackle these.

In my talk I start easy by a general introduction into electronic-structure theory illustrating why an exact solution for electronic structures in molecules or solids is just not possible in realistic timeframes. Therefore one needs to live with approximate models, one example being density-functional theory (DFT), which we use in DFTK. As I detail in the talk an almost immediate consequence of the complexity of the problem is that advances in electronic-structure theory can typically only be realised if multiple disciplines join forces. An interdisciplinary project, however, brings some practical problems just quite frankly due to the fact that different fields have different approaches when tackling a problem. Being able to support such multidisciplinary motions in a common software platform for DFT, is one of the key aims of DFTK.

Related to this point we wanted DFTK to have a low entrance barrier for novel researchers. As time and money in research is tight programs should be easy to use and code simple and self-explanatory, such that new PhD students or researchers from foreign fields do not have a tough time to get started. In my talk I mention a few recent projects (an undergrad internship and a master project), where a noteworthy result could be achieved albeit students had little prior experience with neither Julia nor electronic-structure theory. A similar success story emphasising our ability to rapidly realise novel ideas in DFTK includes our recently published Faraday paper, where it only took 10 weeks from starting the project to submitting the paper.

Lastly, I discussed challenges arising from the so-called high-throughput screening methods, which are recently gaining popularity in computational materials design. In this particular research direction algorithms need to be particularly robust and tunable to find a sweet spot between accuracy and computational cost. This demands extremely stable and reliable algorithms, which poses interesting mathematical problems in numerical analysis and e.g. with respect to designing estimators for discretisation error. Especially in this area of application-oriented mathematical research we expect DFTK to be a handy tool in the future.

If you are interested in the full story a recording of the talk is available on youtube.

Link Licence
DFTK: A Julian approach for simulating electrons in solids (Slides) Creative Commons License
Youtube recording of the talk

by Michael F. Herbst at 2020-07-31 20:00 under talk, electronic structure theory, Julia, HPC, DFTK, theoretical chemistry, SCF, high-throughput

2020-07-20

Mero’s Blog

Parametric context

tl;dr: Go's Context.Value is controversial because of a lack of type-safety. I design a solution for that based on the new generics design draft.

If you are following what's happening with Go, you are aware that recently an updated design draft for generics has dropped. What makes this particularly notable is that it comes with an actual prototype implementation of the draft, including a playground. This means for the first time, people get to actually try out how a Go with generics might feel, once they get in. It is a good opportunity to look at common Go code lacking type-safety and evaluate if and how generics can help address them.

One area I'd like to look at here is Context.Value. It is often criticized for not being explicit enough about the dependencies a function has and some people even go so far as to discourage its use altogether. On the other hand, I'm on record saying that it is too useful to ignore. Generics might be a way to bring together these viewpoints.

We want to be able to declare dependency on a functionality in context.Context via a function's signature and make it impossible to call it without providing that functionality, while also preserving the ability to pass it through APIs that don't know anything about it. As an example of such functionality, I will use logging. Let's start by creating a fictional little library to do that (the names are not ideal, but let's not worry about that):

package logctx

import (
    "context"
    "log"
)

type LogContext interface {
    // We embed a context.Context, to say that we are augmenting it with
    // additional functionality.
    context.Context

    // Logf logs the given values in the given format.
    Logf(format string, values ...interface{})
}

func WithLog(ctx context.Context, l *log.Logger) LogContext {
    return logContext{ctx, l}
}

// logContext is unexported, to ensure it can't be modified.
type logContext struct {
    context.Context
    l *log.Logger
}

func (ctx logContext) Logf(format string, values ...interface{}) {
    ctx.l.Printf(format, values...)
}

You might notice that we are not actually using Value() here. This is fundamental to the idea of getting compiler-checks - we need some compiler-known way to "tag" functionality and that can't be Value. However, we provide the same functionality, by essentially adding an optional interface to context.Context.

If we want to use this, we could write

func Foo(ctx logctx.LogContext, v int) {
    ctx.Logf("Foo(%v)", v)
}

func main() {
    ctx := logctx.WithLog(context.Background(), log.New(os.Stderr, "", log.LstdFlags))
    Foo(ctx, 42)
}

However, this has a huge problem: What if we want more than one functionality (each not knowing about the other)? We might try the same trick, say

package tracectx

import (
    "context"

    "github.com/opentracing/opentracing-go"
)

type TraceContext interface {
    context.Context
    Tracer() opentracing.Tracer
}

func WithTracer(ctx context.Context, t opentracing.Tracer) TraceContext {
    return traceContext{ctx, t}
}

type traceContext struct {
    context.Context
    t opentracing.Tracer
}

func (ctx traceContext) Tracer() opentracing.Tracer {
    return ctx.t
}

But because a context.Context is embedded, only those methods explicitly mentioned in that interface are added to traceContext. The Logf method is erased. After all, that is the trouble with optional interfaces.

This is where generics come in. We can change our wrapper-types and -functions like this:

type LogContext(type parent context.Context) struct {
    // the type-parameter is lower case, so the field is not exported.
    parent
    l *log.Logger
}

func WithLog(type Parent context.Context) (ctx Parent, l *log.Logger) LogContext(Parent) {
    return LogContext(parent){ctx, l}
}

By adding a type-parameter and embedding it, we actually get all methods of the parent context on LogContext. We are no longer erasing them. After giving the tracectx package the same treatment, we can use them like this:

// FooContext encapsulates all the dependencies of Foo in a context.Context.
type FooContext interface {
    context.Context
    Logf(format string, values ...interface{})
    Tracer() opentracing.Tracer
}

func Foo(ctx FooContext, v int) {
    span := ctx.Tracer().StartSpan("Foo")
    defer span.Finish()

    ctx.Logf("Foo(%v)", v)
}

func main() {
    l := log.New(os.Stderr, "", log.LstdFlags)
    t := opentracing.GlobalTracer()
    // ctx has type TraceContext(LogContext(context.Context)),
    //    which embeds a LogContext(context.Context),
    //    which embeds a context.Context
    // So it has all the required methods
    ctx := tracectx.WithTracer(logctx.WithLog(context.Background(), l), t)
    Foo(ctx, 42)
}

Foo has now fully declared its dependencies on a logger and a tracectx, without requiring any type-assertions or runtime-checks. The logging- and tracing-libraries don't know about each other and yet are able to wrap each other without loss of type-information. Constructing the context is not particularly ergonomic though. We require a long chained function call, because the values returned by the functions have no longer a unified type context.Context (so the ctx variable can't be re-used).

Another thing to note is that we exported LogContext as a struct, instead of an interface. This is necessary, because we can't embed type-parameters into interfaces, but we can embed them as struct-fields. So this is the only way we can express that the returned type has all the methods the parameter type has. The downside is that we are making this a concrete type, which isn't always what we want¹.

We have now succeeded in annotating context.Context with dependencies, but this alone is not super useful of course. We also need to be able to pass it through agnostic APIs (the fundamental problem Context.Value solves). However, this is easy enough to do.

First, let's change the context API to use the same form of generic wrappers. This isn't backwards compatible, of course, but this entire blog post is a thought experiment, so we are ignoring that. I don't provide the full code here, for brevity's sake, but the basic API would change into this:

package context

// CancelContext is the generic version of the currently unexported cancelCtx.
type CancelContext(type parent context.Context) struct {
    parent
    // other fields
}

func WithCancel(type Parent context.Context) (ctx Parent) (ctx CancelContext(Parent), cancel CancelFunc) {
    // ...
}

This change is necessary to enable WithCancel to also preserve methods of the parent context. We can now use this in an API that passes through a parametric context. For example, say we want to have an errgroup package, that passes the context through to the argument to (*Group).Go, instead of returning it from WithContext:

// Derived from the current errgroup code.

// A Group is a collection of goroutines working on subtasks that are part of the same overall task.
//
// A zero Group is invalid (as opposed to the original errgroup).
type Group(type Context context.Context) struct {
    ctx    Context
    cancel func()

    wg sync.WaitGroup

    errOnce sync.Once
    err     error
}

func WithContext(type C context.Context) (ctx C) *Group(C) {
    ctx, cancel := context.WithCancel(ctx)
    return &Group(C){ctx: ctx, cancel: cancel}
}

func (g *Group(Context)) Wait() error {
    g.wg.Wait()
    return g.err
}

func (g *Group(Context)) Go(f func(Context) error) {
    g.wg.Add(1)

    go func() {
        defer g.wg.Done()

        if err := f(g.ctx); err != nil {
            g.errOnce.Do(func() {
                g.err = err
            })
        }
    }()
}

Note that the code here has barely changed. It can be used as

func Foo(ctx FooContext) error {
    span := ctx.Tracer().StartSpan("Foo")
    defer span.Finish()
    ctx.Logf("Foo was called")
}

func main() {
    var ctx FooContext = newFooContext()
    eg := errgroup.WithContext(ctx)
    for i := 0; i < 20; i++ {
        eg.Go(Foo)
    }
    if err := eg.Wait(); err != nil {
        log.Fatal(err)
    }
}

After playing around with this for a couple of days, I feel pretty confident that these patterns make it possible to get a fully type-safe version of context.Context, while preserving the ability to have APIs that pass it through untouched or augmented.

A completely different question, of course, is whether all of this is a good idea. Personally, I am on the fence about it. It is definitely valuable, to have a type-safe version of context.Context. And I think it is impressive how small the impact of it is on the users of APIs written this way. The type-argument can almost always be inferred and writing code to make use of this is very natural - you just declare a suitable context-interface and take it as an argument. You can also freely pass it to functions taking a pure context.Context unimpeded.

On the other hand, I am not completely convinced the cost is worth it. As soon as you do non-trivial things with a context, it becomes a pretty "infectious" change. For example, I played around with a mock gRPC API to allow interceptors to take a parametric context and it requires almost all types and functions involved to take a type-parameter. And this doesn't even touch on the fact that gRPC itself might want to add annotations to the context, which adds even more types. I am not sure if the additional machinery is really worth the benefit of some type-safety - especially as it's not always super intuitive and easily understandable. And even more so, if it needs to be combined with other type-parameters, to achieve other goals.

I think this is an example of what I tend to dislike about generics and powerful type-systems in general. They tempt you to write a lot of extra machinery and types in a way that isn't necessarily semantically meaningful, but only used to encode some invariant in a way the compiler understands.


[1] One upside however, is that this could actually address the other criticism of context.Value: Its performance. If we consequently embed the parent-context as values in struct fields, the final context will be a flat struct. The interface-table of all the extra methods we add will point at the concrete implementations. There's no longer any need for a linear search to find a context value.

I don't actually think there is much of a performance problem with context.Value in practice, but if there is, this could solve that.

at 2020-07-20 00:00

2020-07-09

sECuREs website

Introducing the kinT kinesis keyboard controller

Kinesis Advantage ergonomic keyboard

Back in 2013, I published a replacement controller for the Kinesis Advantage ergonomic keyboard. In the community, it is often referred to simply as the “stapelberg”, and became quite popular.

Many people like to use the feature-rich QMK firmware, which supports my replacement controller out of the box.

kinesis pcb mounted

On eBay, you can frequently find complete stapelberg kits or even already-modified Kinesis keyboards including the stapelberg board for sale.

In 2017, Kinesis released the Kinesis Advantage 2, which uses a different connector (an FPC connector) for connecting the two thumb pad PCBs to the controller PCB, instead of the soldered cable the older Kinesis Advantage used. Aside from the change in connector and cable type, the newer keyboard uses the same pinout as the old one.

I wanted to at least update my project to support the Kinesis Advantage 2. While doing so, I decided to also make a bunch of improvements to make the project more approachable and usable for beginners. Among many other improvements, the project switched from Eagle to KiCad, which is FOSS and means no more costly license fees!

kinT (T for Teensy!)

I am hereby announcing the kinT kinesis keyboard controller: a replacement keyboard controller for your Kinesis Advantage or Advantage 2 ergonomic keyboards.

kinT keyboard controller

The Teensy footprint looks a bit odd, but it’s a combined footprint so that you can use the same board with many different Teensy microcontrollers, giving you full flexibility regarding cost and features. See “Compatibility: which Teensy to use?” for more details.


I originally replaced the controller of my Kinesis Advantage to work around a bug, but these days I do most of it just because I enjoy tinkering with keyboards.

You might consider to replace your keyboard controller for example…

Building your own kinT keyboard controller

  1. Follow “Buying the board and components (Bill of materials)”. When ordering from OSH Park (board) and Digi-Key (components), you’ll get the minimum quantity of 3 boards for 72 USD (24 USD per board), and one set of components for 49 USD.

    • If you have any special requirements regarding which Teensy microcontroller to use, this is the step where you would replace the Teensy 3.6 with your choice.
  2. Wait for the components to arrive. When ordering from big shops like Digi-Key or Mouser, this typically takes 2 days to many places in the world.

  3. Wait for the boards to arrive. This takes 6 days in the best case when ordering from OSH Park with their Super Swift Service option. In general, the longer you are willing to wait, the cheaper it is going to get.

  4. Follow the soldering guide. This will take about an hour.

  5. Install the firmware

Improvements over the older replacement board

In case you’re familiar with the older replacement board and are wondering what changed, here is a complete list:

  • The kinT supports both, the older Kinesis Advantage (KB500) and the newer Kinesis Advantage 2 (KB600) keyboards. They differ in how the thumb pads are connected. See the soldering instructions below.

  • The kinT is made for the newer Teensy 3.x and 4.x series, which will remain widely available for years to come, whereas the future of the Teensy++ 2.0 is not as certain.

  • The kinT is a smaller PCB (4.25 x 3.39 inches, or 108.0 x 86.1 mm), which makes it:

    • more compact: can be inserted/removed without having to unscrew a key well.

    • cheaper: 72 USD for 3 boards at oshpark, instead of 81 USD.

  • The kinT silkscreen (front, back) and schematic are much much clearer, making assembly a breeze.

  • The kinT is a good starting point for your own project:

    • kinT was designed in the open source KiCad program, meaning you do not need any license subscriptions.

    • The clear silkscreen and schematic make development and debugging easier.

  • On the kinT, the Teensy no longer has to be soldered onto the board upside down.

  • On the kinT, the FPC connectors have been moved for less strain on the cables.

  • The kinT makes possible lower-cost builds: if you don’t need the scroll lock, num lock and keypad LEDs, you can use a Teensy LC for merely 11 USD.

Conclusion

I’m very excited to release this new keyboard controller, and I can’t wait to see all the custom builds and modifications!

By the way, there is also a (4-hour!) stream recording in case you are interested in some more history and context, and want to see me solder a kinT controller live on stream!

at 2020-07-09 07:25

2020-07-01

RaumZeitLabor

Will hack for space

Das Jahr 2020 und seine schlechten Neuigkeiten hören nicht auf. Zu allem Übel sind uns jetzt auch noch unsere Räumlichkeiten zum 31. Oktober 2020 gekündigt worden.

Damit wir passend zu Halloween eine Einweihungsparty feiern können, brauchen wir eure Mithilfe bei der Suche nach einem neuen Zuhause für den Verein und seinen Maschinenpark.

Was wir uns für ein zukünftiges RaumZeitLabor wünschen würden:

  • Büro- und Werkstatt-/Hallen-Kombination (wir brauchen wieder Arbeitsbereich und Platz für unsere Werkstätten)
  • Mindestens 200m², besser ein paar mehr m² Platz oder eine Option auf später Erweiterung
  • Küche oder Anschlüsse, um eine aufstellen zu können
  • Sanitäre Anlagen und Heizung
  • Einigermaßen guter ÖPNV-Anschluss, Parkmöglichkeiten in der Nähe
  • Internet, schnell

Bonus:

  • provisionsfrei
  • wenige Nachbarn, oder Nachbarn, die nur tagsüber vor Ort sind
  • barrierearmer Zugang

Solltet ihr einen Hinweis auf eine solche Immobilie haben, lasst es uns wissen und schreibt uns gern unter vorstand@raumzeitlabor.de an.

Die kommenden vier Monate heißt es jetzt „Auf die Kartons, fertig, los!“. Kommt bitte vorbei, nehmt eure privaten Sachen mit nach Hause und helft uns, alles umzugssicher zu verpacken.

Vielen Dank!

by flederrattie at 2020-07-01 00:00

2020-06-28

Insanity Industries

Socket activating arbitrary services

Socket activation is the idea of activating a daemon or service not by manually starting it, but by merely pre-exposing the socket that is used for communication with that service. This has several advantages:

  • system boot speeds up as less things need to actually be started at boot time
  • system resource usage is reduced as less services actually run1
  • you can restart services behind the socket invisible to the client (if the service gracefully takes up the connection on the socket again) without loosing messages2

The two requirements for this are:

  • a daemon that manages these sockets and activates the corresponding process once communication happens
  • the socket-activated daemon being able to work with a preestablished socket-connection

For the first part, systemd got us covered. The second part must be implemented in the daemon code, but we can regardless activate any socket based service, even if they don’t implement socket activation themselves with a little help. Let’s take a look at how things work in this order.

Systemd socket activation: The principle

This will be a brief rundown on how systemd’s socket activation works (see here for all the details):

To activate a service via its socket, we need to define two units: the service and the socket. A socket can be a network socket, a unix socket and several other connection types. In this post, however, we will exemplarily focus on network sockets. We place the service file of our socket-activatable service (we will come back to what this means in a moment) as /etc/systemd/system/myservice.service:

[Unit]
Description=Socket-activatable example service

[Service]
ExecStart=/usr/bin/myservice
NonBlocking=True

and we place the corresponding socket unit (in this case a TCP-network-socket) as /etc/systemd/system/myservice.socket:

[Unit]
Description=Socket for example service

[Socket]
# In this example we only listen on localhost
ListenStream=127.0.0.1:1234
NoDelay=true

[Install]
WantedBy=sockets.target

Note that the service does not require an install section, only the socket does3. We then enable our socket via systemctl enable --now myservice.socket.

Now we have an inactive service unit, but an active socket which is set up by systemd. The moment any activity occurs on this socket, systemd will start myservice.service and hand over the socket, buffering what is already written to it until the service takes over.

Leveraging socket activation in practice

In practice, we can distinguish three different cases:

1. Socket activation natively supported already

If the service in question already natively supports socket activation, simply activate the corresponding socket unit (or create one if it doesn’t exist yet). See systemctl list-units | grep socket for socket-units already available on the system.

2. Fault tolerance and boot time optimization desired only

If our service is intended to be started at boot and socket activation is intended only to provide transparent restarts and boot parallelization, we can simply use systemd’s systemd-socket-proxyd. See man systemd-socket-proxyd or here for details and usage examples.

Make sure you add an [Install]-section to the service file and enable the service itself as well then as systemd-socket-proxyd merely decouples the service unit from its socket, but doesn’t start it automatically.

3. On-demand starting and stopping of arbitrary services

If our service does not support socket activation natively and we want it to start not at boot time, but on-demand, we can use the tool socket-activate. For this, configure actual.service to listen on localhost on a different port such as 127.0.0.1:12345 and create two units: socket-activate-actual.service containing

[Unit]
Description=Socket-activate proxy for example service

[Service]
ExecStart=/usr/bin/socket-activate -u "actual.service" -a "127.0.0.1:12345"
NonBlocking=True

as well as a corresponding socket-activate-actual.socket as noted above, listening on the actual desired port. On the first connection to the socket, socket-activate-actual.service will be started and in turn start actual.service, proxying all traffic to it.

If socket-activate is invoked with an additional -t <timeout>, then both socket-activate-actual.service as well as actual.service are stopped again when no activity is detected for the specified timeout.


  1. Especially as socket-activated services can terminate once their job is done, as they simply get reactivated next time someone connects to their communication socket again. ↩︎

  2. This includes crashes or upgrades of the service, no further data written to the socket will be lost, only the data the service has already read from it before it crashed or was restarted. ↩︎

  3. The service may still have an [Install] section to be started conventionally or already being started even if no one has yet connected to its socket. In the latter case the socket activation would primarily serve as restart resilience and boot parallelization. ↩︎

by Jonas Große Sundrup at 2020-06-28 15:23

2020-06-26

michael-herbst.com

SCF preconditioning for mixed systems

Fortunately the general lockdown due to the Corona pandemic slowly starts to ease around Paris as well. While basically all seminars are only virtual it is good to see old procedures and habits to slowly return. From my end I gave the first talk after the forced break today in the EMC2 group meeting.

Since it was the first time discussing DFTK in the EMC2 synergy group I decided to talk about it taking the angle of tackling an actual research problem. After presenting briefly DFT methods and DFTK in the first half of the talk, I therefore focused on one of my ongoing projects with Antoine Levitt, namely constructing better preconditioners for the self-consistent field (SCF) iterations in mixed systems. What is meant by mixed systems are systems where locally differing dielectric properties are found, i.e. where some parts of the material are insulating, others may be metallic or semiconductors. Since the dielectric properties are closely related to the spectrum of the SCF fixed-point map, they therefore also control the convergence properties of SCF procedures. For metals and (to a minor extent) semiconductors simple SCF procedures, where one just applies the SCF cycle over and over require extremely small step sizes (i.e. small damping values). As a result the SCF converges only very slowly. The remedy is to precondition the spectrum of the SCF map itself by using so-called mixing techniques. In state-of-the-art approaches these are usually material-specific, i.e. different mixings are used for insulators, metals or semiconductors. This is fine for bulk materials, but fails in case of mixed systems, since one has to globally select a single approach. Our recent work has been to investigate the spectrum of the SCF map and to try and construct a preconditioner which locally adapts and as a result is able to properly treat mixed systems as well. The results I presented today are, however, not yet final and more investigations are to be underdone for our approach to work as reliable as we want.

Link Licence
SCF preconditioning for mixed systems: A DFTK case study (Slides) Creative Commons License
A few DFTK examples (Jupyter notebook) GNU GPL v3
SCF preconditioners in 1D (Jupyter notebook) GNU GPL v3

by Michael F. Herbst at 2020-06-26 16:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, SCF

2020-06-06

sECuREs website

Using the iPhone camera as a Linux webcam with v4l2loopback

iPhone camera setup

For my programming stream at twitch.tv/stapelberg, I wanted to add an additional camera to show test devices, electronics projects, etc. I couldn’t find my old webcam, and new ones are hard to come by currently, so I figured I would try to include a phone camera somehow.

The setup that I ended up with is:

iPhone camera
→ Instant Webcam
→ WiFi
→ gstreamer
→ v4l2loopback
→ OBS

Disclaimer: I was only interested in a video stream! I don’t think this setup would be helpful for video conferencing, due to lacking audio/video synchronization.

iPhone Software: Instant Webcam app

I’m using the PhobosLab Instant Webcam (install from the Apple App Store) app on an old iPhone 8 that I bought used.

There are three interesting related blog posts by app author Dominic Szablewski:

  1. MPEG1 Video Decoder in JavaScript (2013-May)
  2. HTML5 Live Video Streaming via WebSockets (2013-Sep)
  3. Decode it like it’s 1999 (2017-Feb)

As hinted at in the blog posts, the way the app works is by streaming MPEG1 video from the iPhone (presumably via ffmpeg?) to the jsmpeg JavaScript library via WebSockets.

After some git archeology, I figured out that jsmpeg was rewritten in commit 7bf420fd just after v0.2. You can browse the old version on GitHub.

Notably, the Instant Webcam app seems to still use the older v0.2 version, which starts WebSocket streams with a custom 8-byte header that we need to strip.

Linux Software

Install the v4l2loopback kernel module, e.g. community/v4l2loopback-dkms on Arch Linux or v4l2loopback-dkms on Debian. I used version 0.12.5-1 at the time of writing.

Then, install gstreamer and required plugins. I used version 1.16.2 for all of these:

Lastly, install either websocat or wsta for accessing WebSockets. I successfully tested with websocat 1.5.0 and wsta 0.5.0.

Streaming

First, load the v4l2loopback kernel module:

% sudo modprobe v4l2loopback video_nr=10 card_label=v4l2-iphone

Then, we’re going to use gstreamer to decode the WebSocket MPEG1 stream (after stripping the custom 8-byte header) and send it into the /dev/video10 V4L2 device, to the v4l2loopback kernel module:

% websocat --binary ws://iPhone.lan/ws | \
  dd bs=8 skip=1 | \
  gst-launch-1.0 \
    fdsrc \
    ! queue \
    ! mpegvideoparse \
    ! avdec_mpeg2video \
    ! videoconvert \
    ! videorate \
    ! 'video/x-raw, format=YUY2, framerate=30/1' \
    ! v4l2sink device=/dev/video10 sync=false

Here are a couple of notes about individual parts of this pipeline:

  • You must set websocat (or the alternative wsta) into binary mode, otherwise they will garble the output stream with newline characters, resulting in a seemingly kinda working stream that just displays garbage. Ask me how I know.

  • The queue element uncouples decoding from reading from the network socket, which should help in case the network has intermittent troubles.

  • Without enforcing framerate=30/1, you cannot cancel and restart the gstreamer pipeline: subsequent invocations will fail with streaming stopped, reason not-negotiated (-4)

  • Setting format YUY2 allows ffmpeg-based decoders to play the stream. Without this setting, e.g. ffplay will fail with [ffmpeg/demuxer] video4linux2,v4l2: Dequeued v4l2 buffer contains 462848 bytes, but 460800 were expected. Flags: 0x00000001.

  • The sync=false property on v4l2sink plays frames as quickly as possible without trying to do any synchronization.

Now, consumers such as OBS (Open Broadcaster Software), ffplay or mpv can capture from /dev/video10:

% ffplay /dev/video10
% mpv av://v4l2:/dev/video10 --profile=low-latency

Debugging

Hopefully the instructions above just work for you, but in case things go wrong, maybe the following notes are helpful.

To debug issues, I used the GST_DEBUG_DUMP_DOT_DIR environment variable as described on Debugging tools: Getting pipeline graphs. In these graphs, you can quickly see which pipeline elements negotiate which caps.

I also used the PL_MPEG example program to play the supplied MPEG test file. PL_MPEG is written by Dominic Szablewski as well, and you can read more about it in Dominic’s blog post MPEG1 Single file C library. I figured the codec and parameters might be similar between the different projects of the same author and used this to gain more confidence into the stream parameters.

I also used Wireshark to look at the stream traffic to discover that websocat and wsta garble the stream output by default unless the --binary flag is used.

at 2020-06-06 09:18

2020-06-05

michael-herbst.com

First release of the density-functional toolkit (DFTK)

After we released a preliminary snapshot of DFTK last year our focus in the first half of this year was on using it for some new science. Recently, however, we got back into polishing our code base and our documentation in order to get DFTK ready for a wider audience. Today I am proud to announce that DFTK 0.1.0 has been accepted into the Julia Package repository, such that the package can now be readily installed within the Julia ecosystem.

Let me take this opportunity to recapitulate on DFTK: I started the code with Antoine Levitt about a year ago when I moved to Paris. What we had in mind was to create a simple platform for methodological developments in density functional theory (DFT). Clearly our code should support the interdisciplinary requirements of the field, where advances are often the result from devising chemically and physically sound models, using mathematical insight for suggesting stable algorithms and then scaling them up to the high-performance regime. This means that we would need both (a) the flexibility to mix and match models and numerical approaches by keeping the code high-level and similar to a scripting language and (b) access to the usual tricks (vectorisation, GPUs, threading, distributed computing) to tweak performance down to the metal.

In Julia we found a language which suits these aims perfectly. This is illustrated by the fact that after only a good year of development we already support a sizable number of features in only about 5k lines of source code. Right now the focus of DFTK is on DFT ground-state simulations for solids (LDA/GGA in a plane-wave basis with GTH pseudopotentials) with more to come. Special care is taken to have a simple and clean codebase, well-commented and suitable for teaching or extensions (other models, basis, etc.). DFT is not hard-coded, and other similar models can be computed with DFTK (for instance, the 2D Gross-Pitaevskii equation with a magnetic field). Nevertheless, the performance is comparable with that of established plane-wave DFT codes, usually within a factor of 2. DFTK is fully multithreaded, although not distributed (yet). We also include interfaces with various codes (ASE, pymatgen, abipy...) for easy workflows and to integrate to the world beyond the Julia ecosystem. See for example the asedftk python package, which integrates DFTK into the atomistic simulation environment.

The code is of course fully open source and installation is easy. Since it is intended as a platform for multidisciplinary collaboration, we welcome any question, suggestion or addition. Feel free to get in touch by opening an issue at any time.

by Michael F. Herbst at 2020-06-05 08:00 under programming and scripting, DFTK, dft, electronic structure theory, Julia

2020-06-03

michael-herbst.com

Recent developments in adcc

Since the publication of the adcc paper a few months back, there are a few updates to report briefly:

  • adcc can now be interactively tried in the browser at try.adc-connect.org using the infrastructure from the binder project.
  • Binary installation of adcc is now available via conda for Linux and MacOS, see the adcc documentation.
  • Calculation of rotatory strengths at all ADC levels.
  • adcc is now fully integrated into the Psi4 quantum chemistry package. This means that ADC calculations in adcc can now be directly started from within the Psi4 ecosystem, including Psi4's python frontend and input files. This effectively equips Psi4 with all ADC capabilities adcc offers. See some details in the recent Psi4 paper.
  • Tensor evaluations in adcc are now lazy, which means that complex tensor evaluation expressions can be coded up in python without being evaluated. Only once results are needed the complete expression is evaluated in a batch using the underlying linear algebra frameworks.

by Michael F. Herbst at 2020-06-03 17:00 under electronic structure theory, theoretical chemistry, adcc, algebraic-diagrammatic construction

2020-05-23

sECuREs website

stapelberg uses this: my 2020 desk setup

Desk setup

I generally enjoy reading the uses this blog, and recently people have been talking about desk setups in my bubble (and on my Twitch stream), so I figured I’d write a post about my current setup!

Desk setup

I’m using a desk I bought at IKEA well over 10 years ago. I’m not using a standing desk: while I have one at work, I never change its height. Just never could get into the habit.

I was using an IKEA chair as well for many years.

Currently, I’m using a Haworth Comforto 89 chair that I bought second-hand. Unfortunately, the arm rests are literally crumbling apart and the lumbar back support and back rest in general are not as comfortable as I would like.

Hence, I recently ordered a Vitra ID Mesh chair, which I have used for a couple of years at the office before moving office buildings. It will take a few weeks before the chair arrives.

Full Vitra ID Mesh chair configuration details
  • ID Mesh
  • Chair type: office swivel chair
  • Backrest: ID Mesh
  • Colours and materials
  • - Cover material: seat and backrest Silk Mesh
  • - Colour of back cover: dim grey/ like frame colour
  • - Colour of seat cover: dim grey
  • - Frame colour: soft grey
  • Armrests: 2D armrests
  • Base: five-star base, polished aluminium
  • Base on: castors hard, braked for carpet
  • Ergonomics
  • Seat and seat depth adjustment: seat with seat depth adjustment
  • Forward tilt: with forward tilt

The most important aspect of the desk/chair setup for me are the arm rests. I align them with the desk height so that I can place my arms at a 90 degree angle, eliminating strain.

Peripherals

Note: all of my peripherals are Plug & Play under Linux and generally work with standard drivers across Windows, macOS and Linux.

Monitor: Dell 8K4K monitor (UP3218K)

The most important peripheral of a computer is the monitor: you stare at it all the time. Even when you’re not using your keyboard or mouse, you’re still looking at your monitor.

Ever since I first used a MacBook Pro with Retina display back in 2013, I’ve been madly in love with hi-DPI displays, and have gradually replaced all displays in my day-to-day with hi-DPI displays.

My current monitor is the Dell UP3218K, an 8K4K monitor (blog post).

Dell introduced the UP3218K in January 2017. It is the world’s first available 8K monitor, meaning it has a resolution of 7680x4320 pixels at a refresh rate of 60 Hz. The display’s dimensions are 698.1mm by 392.7mm (80cm diagonal, or 31.5 inches), meaning the display shows 280 dpi.

I run it in 300% scaling mode (Xft.dpi: 288), resulting in incredibly crisp text.

Years ago, I used multiple monitors (sometimes 3, usually 2). I stopped doing that in 2011/2012, when I lived in Dublin for half a year and decided to get only one external monitor for practical and cost reasons.

I found that using only one monitor allows me to focus more on what I’m doing, and I don’t miss anything about a multi-monitor setup.

Keyboard: Kinesis advantage keyboard

Kinesis advantage keyboard

The Kinesis is my preferred commercially available ergonomic keyboard. I like its matrix layout, ergonomic key bowls, thumb pads and split hands.

I find typing on it much more comfortable than regular keyboards, and I value the Kinesis enough to usually carry one with me when I travel. When I need to use a laptop keyboard for longer periods of time, my hands and arms get tired.

I bought my first one in 2008 for ≈250 EUR, but have since cleaned up and repaired two more Kinesis keyboards that were about to be trashed. Now I have one for home, one for work, and one for traveling (or keyboard development).

Over the years, I have modified my Kinesis keyboards in various ways:

The first modification I did was to put in Cherry MX blue key switches (tactile and audible), replacing the default Cherry MX browns. I like the quick feedback of the blues better, possibly because I was used to them from my previous keyboards. Without tons of patience and good equipment, it’s virtually impossible to unsolder the key switches, so I reached out to Kinesis, and they agreed to send me unpopulated PCBs into which I could solder my preferred key switches! Thank you, Kinesis.

I later replaced the keyboard controller to address a stuck modifier bug. The PCB I made for this remains popular in the Kinesis modification community to this day.

In 2018, I got interested in keyboard input latency and developed kinX, a new version of my replacement keyboard controller. With this controller, the keyboard has an input latency of merely 0.225ms in the worst case.

Aside from the keyboard hardware itself, I’m using the NEO Ergonomically Optimized keyboard layout. It’s optimized for German, English, Programming and Math, in that order. Especially its upper layers are really useful: hover over “Ebene 3” to see.

I used to remap keys in hardware, but that doesn’t cover the upper layers, so nowadays I prefer just enabling the NEO layout included in operating systems.

Pointing device: Logitech MX Ergo

During my student years (2008 to 2013), I carried a ThinkPad X200 and used its TrackPoint (“red dot”) in combination with trying to use lots of keyboard shortcuts.

The concept of relative inputs for mouse movement made sense to me, so I switched from a mouse to a trackball on the desktop, specifically the Logitech Trackball M570.

I was using the M570 for many years, but have switched to the Logitech MX Ergo a few months ago. It is more comfortable to me, so I replaced all 3 trackballs (home, office, travel) with the MX Ergo.

In terms of precision, a trackball will not be as good as a mouse can be. To me, it more than makes up for the fact by reducing the strain on my hands and wrists.

For comparison: a few years ago, I was playing a shooter with a regular mouse for one evening (mostly due to nostalgia), and I could feel pain from that for weeks afterwards.

Microphone: RØDE Podcaster

To record screencasts for the i3 window manager with decent audio, I bought a RØDE Podcaster USB Broadcast Mic in 2012 and have been using it ever since.

The big plus is that the setup couldn’t be easier: you connect it via USB, and it is Plug & Play on Linux. This is much easier than getting a working setup with XLR audio gear.

The audio quality is good: much better than headsets or cheap mics, but probably not quite as good as a more expensive studio mic. For my usage, this is fine: I don’t record radio broadcasts regularly, so I don’t need the absolutely highest quality, and for video conferences or the occasional podcast, the RØDE Podcaster is superb.

Webcam: Logitech C920

In the past, I have upgraded my webcam every so often because higher resolutions at higher frame rates became available for a reasonably low price.

I’m currently using the Logitech HD Pro Webcam C920, and I’m pretty happy with it. The picture quality is good, the device is Plug & Play under Linux and the picture quality is good out of the box. No fumbling with UVC parameters or drivers required :-)

Note: to capture at 30 fps at the highest resolution, you may need to specify the pixel format: https://wiki.archlinux.org/index.php/webcam_setup#mpv

Headphones: Sony WH-1000XM3

At work, I have been using the Bose QuietComfort 15 Noise Cancelling headphones for many years, as they were considered the gold standard for noise cancelling headphones.

I decided to do some research and give bluetooth headphones a try, in the hope that the technology has matured enough.

I went with the Sony WH-1000XM3 bluetooth headphones, and am overall quite happy with them. The lack of a cable is very convenient indeed, and the audio quality and noise cancellation are both superb. A single charge lasts me for multiple days.

Switching devices is a bit cumbersome: when I have the headphones connected to my phone and want to switch to my computer, I need to explicitly disconnect on my phone, then explicitly connect on my computer. I guess this is just how bluetooth works.

One issue I ran into is that when the headphones re-connected to my computer, they would not select the high-quality audio profile until you explicitly disconnect and re-connect again. This was fixed in BlueZ 5.51, so make sure you run at least that version.

USB memory stick: Sandisk Extreme PRO SSD USB 3.1

USB memory sticks are useful for all sorts of tasks, but I mostly use them to boot Linux distributions on my laptop or computer, for development, recovery, updates, etc.

A year ago, I was annoyed by my USB memory sticks being slow, and I found the Sandisk Extreme PRO SSD USB 3.1 which is essentially a little SSD in USB memory stick form factor. It is spec'd at ≈400 MB/s read and write speed, and I do reach about ≈350 MB/s in practice, which is a welcome upgrade from the < 10 MB/s my previous sticks did.

A quick USB memory stick lowers the hurdle for testing distri images on real hardware.

Audio: teufel sound system

My computer is connected to a Teufel Motiv 2 stereo sound system I bought in 2009.

The audio quality is superb, and when I tried to replace them with the Q Acoustics 3020 Speakers (Pair) I ended up selling the Q Acoustics and going back to the Teufel. Maybe I’m just very used to its sound at this point :-)

Physical paper notebook for sketches

I also keep a paper notebook on my desk, but don’t use it a lot. It is good to have it for ordering my thoughts when the subject at hand is more visual rather than textual. For example, my analysis of the TurboPFor integer compression scheme started out on a bunch of notebook pages.

I don’t get much out of hand writing into a notebook (e.g. for task lists), so I tend to do that in Emacs Org mode files instead (1 per project). I’m only a very light Org mode user.

Laptop: TBD

I’m writing a separate article about my current laptop and will reference the post here once published.

I will say that I mostly use laptops for traveling (to conferences or events) these days, and there is not much travel happening right now due to COVID-19.

Having a separate computer is handy for some debugging activities, e.g. single-stepping X11 applications in a debugger, which needs to be done via SSH.

Internet router and WiFi: router7 and UniFi AP HD

Mostly for fun, I decided to write router7, a highly reliabile, automatically updating internet router entirely in Go, primarily targeting the fiber7 internet service.

While the router could go underneath my desk, I currently keep it on top of my desk. Originally, I placed it in reach to lower the hurdle for debugging, but after the initial development phase, I never had to physically power cycle it.

These days, I only keep it on top of my desk because I like the physical reminder of what I accomplished :-)

For WiFi, I use a UniFi AP HD access point from Ubiquiti. My apartment is small enough that this single access point covers all corners with great WiFi. I’m configuring the access point with the mobile app so that I don’t need to run the controller app somewhere.

In general, I try to connect most devices via ethernet to remove WiFi-related issues from the picture entirely, and reduce load on the WiFi.

Switching peripherals between home and work computer

Like many, I am currently working from home due to COVID-19.

Because I only have space for one 32" monitor and peripherals on my desk, I decided to share them between my personal computer and my work computer.

To make this easy, I got an active Anker 10-port USB3 hub and two USB 3 cables for it: one connected to my personal computer, one to my work computer. Whenever I need to switch, I just re-plug the one cable.

Software setup

Linux

I have been using Linux as my primary operating system since 2005. The first Linux distribution that I installed in 2005 was Ubuntu-based. Later, I switched to Gentoo, then to Debian, which I used and contributed to until quitting the project in March 2019.

I had briefly tried Fedora before, and decided to give Arch Linux a shot now, so that’s what I’m running on my desktop computer right now. My servers remain on Flatcar Container Linux (the successor to CoreOS) or Debian, depending on their purpose.

For me, all Linux package managers are too slow, which is why I started distri: a Linux distribution to research fast package management. I’m testing distri on my laptop, and I’m using distri for a number of development tasks. I don’t want to run it on my desktop computer, though, because of its experimental nature.

Window Manager: i3

It won’t be a surprise that I am using the i3 tiling window manager, which I created in 2009 and still maintain.

My i3 configuration file is pretty close to the i3 default config, with only two major modifications: I use workspace_layout stacked and usually arrange two stacked containers next to each other on every workspace. Also, I configured a volume mode which allows for easily changing the default sink’s volume.

One way in which my usage might be a little unusual is that I always have at least 10 workspaces open.

Go

Over time, I have moved all new development work to Go, which is by far my favorite programming language. See the article for details, but in summary, Go’s values align well with my own: the tooling is quick and high-quality, the language well thought-out and operating at roughly my preferred level of abstraction vs. clarity.

Here is a quick description of a few notable Go projects I started:

Debian Code Search is a regular expression source code search engine covering all software available in Debian.

RobustIRC is an IRC network without netsplits, based on the Raft consensus algorithm.

gokrazy is a pure-Go userland for your Raspberry Pi 3 appliances. It allows you to overwrite an SD card with a Linux kernel, Raspberry Pi firmware and Go programs of your chosing with just one command.

router7 is a pure-Go small home internet router.

debiman generates a static manpage HTML repository out of a Debian archive and powers manpages.debian.org.

The distri research linux distribution project was started in 2019 to research whether a few architectural changes could enable drastically faster package management. While the package managers in common Linux distributions (e.g. apt, dnf, …) top out at data rates of only a few MB/s, distri effortlessly saturates 1 Gbit, 10 Gbit and even 40 Gbit connections, resulting in superior installation and update speeds.

Editor: Emacs

In my social circle, everyone used Vim, so that’s what I learnt. I used it for many years, but eventually gave Emacs a shot so that I could try the best notmuch frontend.

Emacs didn’t immediately click, and I haven’t used notmuch in many years, but it got me curious enough that I tried getting into the habit of using Emacs a few years ago, and now I prefer it over Vim and other editors.

Here is a non-exhaustive list of things I like about Emacs:

  1. Emacs is not a modal editor. You don’t need to switch into insert mode before you can modify the text. This might sound like a small thing, but I feel more of a direct connection to the text this way.

  2. I like Emacs’s built-in buffer management. I could never get used to using multiple tabs or otherwise arranging my Vim editor window, but with Emacs, juggling multiple things at the same time feels very natural.
    I make heavy use of Emacs’s compile mode (similar to Vim’s quick fix window): I will compile not only programs, but also config files (e.g. M-x compile i3 reload) or grep commands, allowing me to go through matches via M-g M-n.

  3. The Magit package is by far my most favorite Git user interface. Staging individual lines or words comes very naturally, and many operations are much quicker to accomplish compared to using Git in a terminal.

  4. The eglot package is a good LSP client, making available tons of powerful cross-referencing and refactoring features.

  5. The possible customization is impressive, including the development experience: Emacs’s built-in help system is really good, and allows jumping to the definition of variables or functions out of the box. Emacs is the only place in my day-to-day where I get a little glimpse into what it must have been like to use a Lisp machine

Of course, not everything is great about Emacs. Here are a few annoyances:

  1. The Emacs default configuration is very old, and a number of settings need to be changed to make it more modern. I have been tweaking my Emacs config since 2012 and still feel like I’m barely scratching the surface. Many beginners find their way into Emacs by using a pre-configured version of it such as Doom Emacs or Spacemacs.

  2. Even after going through great lengths to keep startup fast, Emacs definitely starts much more slowly than e.g. Vim. This makes it not a great fit for trivial editing tasks, such as commenting out a line of configuration on a server via SSH.

For consistency, I eventually switched my shell and readline config from vi key bindings to the default Emacs key bindings. This turned out to be a great move: the Emacs key bindings are generally better tested and more closely resemble the behavior of the editor. With vi key bindings, sooner or later I always ran into frustrating feature gaps (e.g. zsh didn’t support the delete-until-next-x-character Vim command) or similar.

Hardware setup: desktop computer

I should probably publish a separate blog post with PC hardware recommendation, so let me focus on the most important points here only:

I’m using an Intel i9-9900K CPU. I briefly switched to an AMD Ryzen 3900X based on tech news sites declaring it faster. I eventually found out that the Intel i9-9900K actually benchmarks better in browser performance and incremental Go compilation, so I switched back.

To be able to drive the Dell 8K4K monitor, I’m using a nVidia GeForce RTX 2070. I don’t care for its 3D performance, but more video RAM and memory bandwidth make a noticeable difference in how many Chrome tabs I can work with.

To avoid running out of memory, I usually max out memory based on mainboard support and what is priced reasonably. Currently, I’m using 64 GB of Corsair RAM.

For storage, I currently use a Phison Force MP600 PCIe 4 NVMe disk, back from when I tried the Ryzen 3900X. When I’m not trying out PCIe 4, I usually go with the latest Samsung Consumer SSD PRO, e.g. the Samsung SSD 970 PRO. Having a lot of bandwidth and IOPS available is great in general, but especially valuable when e.g. re-generating all manpages or compiling a new distri version from scratch.

I’m a fan of Fractal Design’s Define case series (e.g. the Define R6) and have been using them for many years in many different builds. They are great to work with: no sharp edges, convenient screws and mechanisms, and they result in a quiet computer.

For fans, my choice is Noctua. Specifically, their NH-U14S makes for a great CPU fan, and their NF-A12x25 are great case fans. They cool well and are super quiet!

Network storage

For redundancy, I am backing up my computers to 2 separate network storage devices.

My devices are built from PC Hardware and run Flatcar Linux (previously CoreOS) for automated updates. I put in one hard disk per device for maximum redundancy: any hardware component can fail and I can just use the other device.

The software setup is intentionally kept very simple: I use rsync (with hardlinks) over SSH for backups, and serve files using Samba. That way, backups are just files, immediately available, and accessible from another computer if everything else fails.

Conclusion

I hope this was interesting! If you have any detail questions, feel free to reach out via email or twitter.

If you’re looking for more product recommendations (tech or otherwise), one of my favorite places is the wirecutter.

at 2020-05-23 13:22

2020-05-16

sECuREs website

a new distri linux (fast package management) release

I just released a new version of distri.

The focus of this release lies on:

  • a better developer experience, allowing users to debug any installed package without extra setup steps

  • performance improvements in all areas (starting programs, building distri packages, generating distri images)

  • better tooling for keeping track of upstream versions

See the release notes for more details.

The distri research linux distribution project was started in 2019 to research whether a few architectural changes could enable drastically faster package management.

While the package managers in common Linux distributions (e.g. apt, dnf, …) top out at data rates of only a few MB/s, distri effortlessly saturates 1 Gbit, 10 Gbit and even 40 Gbit connections, resulting in fast installation and update speeds.

at 2020-05-16 07:13

2020-05-12

michael-herbst.com

Quantifying the error of the core-valence separation approximation

After only a few months of publishing our adcc paper, which introduces the novel algebraic-diagrammatic construction (ADC) code adcc, the package has already found mention in a number of related articles. One example is a manuscript on updates in the Psi4 quantum chemistry package, which features adcc as one of the highlighted community modules. As the paper discusses, the close integration of adcc into the Psi4 ecosystem now allows to start ADC calculations in adcc directly form Psi4's python frontend and its input files. This effectively extends Psi4 by all ADC capabilities adcc offers. Another example is a paper on computing complex polarisabilities for excited states, which especially emphasises the importance adcc has played for simplifying the implementation of the method.

On my end, I recently conducted a study with Thomas Fransson on the error of the core-valence separation (CVS) approximation. This approximation is very important for the simulation of X-ray absorption spectra using accurate wave-function methods like coupled-cluster or ADC. In the literature the error of the CVS approximation is widely accepted to be negligible compared to the error with respect to experiment. This statement is, however, based on previous investigations of the CVS error, which were limited by the methodologies to only small, non-representative basis sets.

In our work we present an iterative post-processing scheme in the ADC context, which is able to undo the CVS approximation and remove the CVS error. Our procedure is still basic (essentially just Rayleigh-Quotient iteration), but it allowed us to study the CVS error for a much larger range of systems and basis sets. This includes augmented and/or core-polarised triple-zeta basis sets as well as other bases prominent in the community to carry out simulations of core-excitations and X-ray spectra (read 6-311++G** and variants). Based on a representative compounds from elements of the second and third period we managed to confirm that the CVS error is not only small compared to experiment but additionally the spread across elements and compounds is even smaller, such that the impact on energy differences is small, too. This is an important finding, since most aspects of chemistry (spectroscopy, reaction barrier heights, thermochemistry) are dominated by energy differences and while relative errors in energies might be small, relative errors in energy differences can be much larger if the error spread across compounds is not uniform.

In particular our work identified the main contributions to the CVS error to originate from two classes of couplings, which are neglected by the CVS approximation and which moreover contribute with opposite sign. We demonstrate that basis sets providing a balanced description of core and valence regions of the electron density are also capable of describing these neglected couplings in a balanced fashion, thus providing not only a good description of the core-excitation process, but also a small CVS error in terms of absolute value and spread. Based on these findings we were able to conclude that especially tight polarising functions are key for describing core-excitations. Along our study we suggest appropriate modifications for the popular 6-311++G** basis to reduce its CVS error. The full abstract of our paper reads

For the calculation of core-excited states probed through X-ray absorption spectroscopy, the core-valence separation (CVS) scheme has become a vital tool. This approach allows to target such states with high specificity, albeit introducing an error. We report the implementation of a post-processing step for CVS excitations obtained within the algebraic-diagrammatic construction scheme for the polarisation propagator (ADC), which removes this error. Based on this we provide a detailed analysis of the CVS scheme, identifying its accuracy to be dominated by an error balance between two neglected couplings, one between core and valence single excitations and one between single and double core excitations. The selection of the basis set is shown to be vital for a proper description of both couplings, with tight polarising functions being necessary for a good balance of errors. The CVS error is confirmed to be stable across multiple systems, with an element-specific spread of only about ±0.02 eV. A systematic lowering of the CVS error by 0.02-0.03 eV is noted when considering excitations to extremely diffuse states, emulating ionisation.

by Michael F. Herbst at 2020-05-12 22:30 under electronic structure theory, theoretical chemistry, adcc, algebraic-diagrammatic construction, core-valence separation

2020-05-09

sECuREs website

Hermetic packages (in distri)

In distri, packages (e.g. emacs) are hermetic. By hermetic, I mean that the dependencies a package uses (e.g. libusb) don’t change, even when newer versions are installed.

For example, if package libusb-amd64-1.0.22-7 is available at build time, the package will always use that same version, even after the newer libusb-amd64-1.0.23-8 will be installed into the package store.

Another way of saying the same thing is: packages in distri are always co-installable.

This makes the package store more robust: additions to it will not break the system. On a technical level, the package store is implemented as a directory containing distri SquashFS images and metadata files, into which packages are installed in an atomic way.

Out of scope: plugins are not hermetic by design

One exception where hermeticity is not desired are plugin mechanisms: optionally loading out-of-tree code at runtime obviously is not hermetic.

As an example, consider glibc’s Name Service Switch (NSS) mechanism. Page 29.4.1 Adding another Service to NSS describes how glibc searches $prefix/lib for shared libraries at runtime.

Debian ships about a dozen NSS libraries for a variety of purposes, and enterprise setups might add their own into the mix.

systemd (as of v245) accounts for 4 NSS libraries, e.g. nss-systemd for user/group name resolution for users allocated through systemd’s DynamicUser= option.

Having packages be as hermetic as possible remains a worthwhile goal despite any exceptions: I will gladly use a 99% hermetic system over a 0% hermetic system any day.

Side note: Xorg’s driver model (which can be characterized as a plugin mechanism) does not fall under this category because of its tight API/ABI coupling! For this case, where drivers are only guaranteed to work with precisely the Xorg version for which they were compiled, distri uses per-package exchange directories.

Implementation of hermetic packages in distri

On a technical level, the requirement is: all paths used by the program must always result in the same contents. This is implemented in distri via the read-only package store mounted at /ro, e.g. files underneath /ro/emacs-amd64-26.3-15 never change.

To change all paths used by a program, in practice, three strategies cover most paths:

ELF interpreter and dynamic libraries

Programs on Linux use the ELF file format, which contains two kinds of references:

First, the ELF interpreter (PT_INTERP segment), which is used to start the program. For dynamically linked programs on 64-bit systems, this is typically ld.so(8).

Many distributions use system-global paths such as /lib64/ld-linux-x86-64.so.2, but distri compiles programs with -Wl,--dynamic-linker=/ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2 so that the full path ends up in the binary.

The ELF interpreter is shown by file(1), but you can also use readelf -a $BINARY | grep 'program interpreter' to display it.

And secondly, the rpath, a run-time search path for dynamic libraries. Instead of storing full references to all dynamic libraries, we set the rpath so that ld.so(8) will find the correct dynamic libraries.

Originally, we used to just set a long rpath, containing one entry for each dynamic library dependency. However, we have since switched to using a single lib subdirectory per package as its rpath, and placing symlinks with full path references into that lib directory, e.g. using -Wl,-rpath=/ro/grep-amd64-3.4-4/lib. This is better for performance, as ld.so uses a per-directory cache.

Note that program load times are significantly influenced by how quickly you can locate the dynamic libraries. distri uses a FUSE file system to load programs from, so getting proper -ENOENT caching into place drastically sped up program load times.

Instead of compiling software with the -Wl,--dynamic-linker and -Wl,-rpath flags, one can also modify these fields after the fact using patchelf(1). For closed-source programs, this is the only possibility.

The rpath can be inspected by using e.g. readelf -a $BINARY | grep RPATH.

Environment variable setup wrapper programs

Many programs are influenced by environment variables: to start another program, said program is often found by checking each directory in the PATH environment variable.

Such search paths are prevalent in scripting languages, too, to find modules. Python has PYTHONPATH, Perl has PERL5LIB, and so on.

To set up these search path environment variables at run time, distri employs an indirection. Instead of e.g. teensy-loader-cli, you run a small wrapper program that calls precisely one execve system call with the desired environment variables.

Initially, I used shell scripts as wrapper programs because they are easily inspectable. This turned out to be too slow, so I switched to compiled programs. I’m linking them statically for fast startup, and I’m linking them against musl libc for significantly smaller file sizes than glibc (per-executable overhead adds up quickly in a distribution!).

Note that the wrapper programs prepend to the PATH environment variable, they don’t replace it in its entirely. This is important so that users have a way to extend the PATH (and other variables) if they so choose. This doesn’t hurt hermeticity because it is only relevant for programs that were not present at build time, i.e. plugin mechanisms which, by design, cannot be hermetic.

Shebang interpreter patching

The Shebang of scripts contains a path, too, and hence needs to be changed.

We don’t do this in distri yet (the number of packaged scripts is small), but we should.

Performance requirements

The performance improvements in the previous sections are not just good to have, but practically required when many processes are involved: without them, you’ll encounter second-long delays in magit which spawns many git processes under the covers, or in dracut, which spawns one cp(1) process per file.

Downside: rebuild of packages required to pick up changes

Linux distributions such as Debian consider it an advantage to roll out security fixes to the entire system by updating a single shared library package (e.g. openssl).

The flip side of that coin is that changes to a single critical package can break the entire system.

With hermetic packages, all reverse dependencies must be rebuilt when a library’s changes should be picked up by the whole system. E.g., when openssl changes, curl must be rebuilt to pick up the new version of openssl.

This approach trades off using more bandwidth and more disk space (temporarily) against reducing the blast radius of any individual package update.

Downside: long env variables are cumbersome to deal with

This can be partially mitigated by removing empty directories at build time, which will result in shorter variables.

In general, there is no getting around this. One little trick is to use tr : '\n', e.g.:

distri0# echo $PATH
/usr/bin:/bin:/usr/sbin:/sbin:/ro/openssh-amd64-8.2p1-11/out/bin

distri0# echo $PATH | tr : '\n'
/usr/bin
/bin
/usr/sbin
/sbin
/ro/openssh-amd64-8.2p1-11/out/bin

Edge cases

The implementation outlined above works well in hundreds of packages, and only a small handful exhibited problems of any kind. Here are some issues I encountered:

Issue: accidental ABI breakage in plugin mechanisms

NSS libraries built against glibc 2.28 and newer cannot be loaded by glibc 2.27. In all likelihood, such changes do not happen too often, but it does illustrate that glibc’s published interface spec is not sufficient for forwards and backwards compatibility.

In distri, we could likely use a per-package exchange directory for glibc’s NSS mechanism to prevent the above problem from happening in the future.

Issue: wrapper bypass when a program re-executes itself

Some programs try to arrange for themselves to be re-executed outside of their current process tree. For example, consider building a program with the meson build system:

  1. When meson first configures the build, it generates ninja files (think Makefiles) which contain command lines that run the meson --internal helper.

  2. Once meson returns, ninja is called as a separate process, so it will not have the environment which the meson wrapper sets up. ninja then runs the previously persisted meson command line. Since the command line uses the full path to meson (not to its wrapper), it bypasses the wrapper.

Luckily, not many programs try to arrange for other process trees to run them. Here is a table summarizing how affected programs might try to arrange for re-execution, whether the technique results in a wrapper bypass, and what we do about it in distri:

technique to execute itself uses wrapper mitigation
run-time: find own basename in PATH yes wrapper program
compile-time: embed expected path no; bypass! configure or patch
run-time: argv[0] or /proc/self/exe no; bypass! patch

One might think that setting argv[0] to the wrapper location seems like a way to side-step this problem. We tried doing this in distri, but had to revert and go the other way.

Misc smaller issues

Appendix: Could other distributions adopt hermetic packages?

At a very high level, adopting hermetic packages will require two steps:

  1. Using fully qualified paths whose contents don’t change (e.g. /ro/emacs-amd64-26.3-15) generally requires rebuilding programs, e.g. with --prefix set.

  2. Once you use fully qualified paths you need to make the packages able to exchange data. distri solves this with exchange directories, implemented in the /ro file system which is backed by a FUSE daemon.

The first step is pretty simple, whereas the second step is where I expect controversy around any suggested mechanism.

Appendix: demo (in distri)

This appendix contains commands and their outputs, run on upcoming distri version supersilverhaze, but verified to work on older versions, too.

Large outputs have been collapsed and can be expanded by clicking on the output.

The /bin directory contains symlinks for the union of all package’s bin subdirectories:

distri0# readlink -f /bin/teensy_loader_cli
/ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli

The wrapper program in the bin subdirectory is small:

distri0# ls -lh $(readlink -f /bin/teensy_loader_cli)
-rwxr-xr-x 1 root root 46K Apr 21 21:56 /ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli

Wrapper programs execute quickly:

distri0# strace -fvy /bin/teensy_loader_cli |& head | cat -n
     1  execve("/bin/teensy_loader_cli", ["/bin/teensy_loader_cli"], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSHTTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.b0-4/"...]) = 0
     2  arch_prctl(ARCH_SET_FS, 0x40c878)       = 0
     3  set_tid_address(0x40ca9c)               = 715
     4  brk(NULL)                               = 0x15b9000
     5  brk(0x15ba000)                          = 0x15ba000
     6  brk(0x15bb000)                          = 0x15bb000
     7  brk(0x15bd000)                          = 0x15bd000
     8  brk(0x15bf000)                          = 0x15bf000
     9  brk(0x15c1000)                          = 0x15c1000
    10  execve("/ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli", ["/ro/teensy-loader-cli-amd64-2.1+"...], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSHTTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.0-4/"...]) = 0

Confirm which ELF interpreter is set for a binary using readelf(1):

distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep 'program interpreter'
[Requesting program interpreter: /ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2]

Confirm the rpath is set to the package’s lib subdirectory using readelf(1):

distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep RPATH
 0x000000000000000f (RPATH)              Library rpath: [/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib]

…and verify the lib subdirectory has the expected symlinks and target versions:

distri0# find /ro/teensy-loader-cli-amd64-*/lib -type f -printf '%P -> %l\n'
libc.so.6 -> /ro/glibc-amd64-2.31-4/out/lib/libc-2.31.so
libpthread.so.0 -> /ro/glibc-amd64-2.31-4/out/lib/libpthread-2.31.so
librt.so.1 -> /ro/glibc-amd64-2.31-4/out/lib/librt-2.31.so
libudev.so.1 -> /ro/libudev-amd64-245-11/out/lib/libudev.so.1.6.17
libusb-0.1.so.4 -> /ro/libusb-compat-amd64-0.1.5-7/out/lib/libusb-0.1.so.4.4.4
libusb-1.0.so.0 -> /ro/libusb-amd64-1.0.23-8/out/lib/libusb-1.0.so.0.2.0

To verify the correct libraries are actually loaded, you can set the LD_DEBUG environment variable for ld.so(8):

distri0# LD_DEBUG=libs teensy_loader_cli
[…]
       678:     find library=libc.so.6 [0]; searching
       678:      search path=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib            (RPATH from file /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli)
       678:       trying file=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib/libc.so.6
       678:
[…]

NSS libraries that distri ships:

find /lib/ -name "libnss_*.so.2" -type f -printf '%P -> %l\n'
libnss_myhostname.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_myhostname.so.2
libnss_mymachines.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_mymachines.so.2
libnss_resolve.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_resolve.so.2
libnss_systemd.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_systemd.so.2
libnss_compat.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_compat.so.2
libnss_db.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_db.so.2
libnss_dns.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_dns.so.2
libnss_files.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_files.so.2
libnss_hesiod.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_hesiod.so.2

at 2020-05-09 16:48

2020-04-28

michael-herbst.com

A posteriori error estimation for the non-self-consistent Kohn-Sham equations

After about a year of work on the density-functional toolkit (DFTK) we finally started using the code for some new science. Given the mathematics background of the CERMICS the first two DFTK-related articles were not so much about large-scale applications, but rather deal some fundamental questions of Kohn-Sham density-functional theory (DFT). The first submission provides a detailed analysis of self-consistent iterations and direct-minimisation approaches for solving equations such as DFT. My main focus in the past weeks, however, was rather the second project, an invited submission for the Faraday discussions New horizons in density functional theory, which are to take place in September this year.

In this article we deal with numerical error in DFT calculations. More precisely we want to address the question: What is the remaining error in the result obtained by a particular DFT simulation, which has already been performed?

This is naturally a rather broad statement, which cannot realistically be addressed in the scope of a single article. As we detail we only address the question of bounding the error of a particular quantity of interest, namely band energies near the Fermi level. Other quantities, such as the forces or the response to an external perturbation, might benefit from our ideas, but will need extensions on top. Also one needs to keep in mind that there are plenty of sources for error in a DFT calculation, including:

  1. The model error due to replacing the (almost) exact model of the many-body Schrödinger equation by a reduced, but more feasible model like DFT.
  2. The discretisation error due to employing only finitely many basis functions instead of solving analytically in an infinite-dimensional Hilbert space.
  3. The algorithmic error resulting from using non-zero convergence tolerances in the eigensolvers as well as the SCF procedure.
  4. The arithmetic error obtained by doing computation only in finite-precision floating-point arithmetic.

Of course in practice people often have a good ballpark idea of the error of DFT as a method or the errors obtained from a particular basis cutoff. Rightfully one might ask, why one should go through the effort of deriving provable bounds to the error in a DFT result? While there are surely many takes on this question, I only want to highlight two aspects in this summary:

  • Educated guesses for convergence parameters taken from experience can fail. Typically they fail exactly when interesting things happen in a system and thus the usual heuristic breaks down. In other words converging a simulation to investigate what's going on becomes difficult when it's most needed. Detailed error analysis splitting up the error during an SCF iteration into contributions like the errors 1 to 4 (or even finer) can help to hint at the parameters worth tweaking or can provide insight into which error term behaves unusual in an SCF.
  • Thinking such aspects one step further, a good bound to the individual terms even allows to equilibrate sources of error during a running calculation. The outlook of this idea would be a fully black-box scheme for DFT calculations where typical convergence parameters are automatically while the calculation progresses in order to yield the cheapest path to a desired target accuracy.

With respect to the errors 1 to 4 mentioned above one would of course like to be able to provide an upper bound to each of them. Unfortunately especially obtaining an estimate for the model error is a rather difficult task. In our work we have therefore concentrated on the errors 2 to 4 and moreover we only focused on Kohn-Sham models without self-consistency, i.e. where none of the terms in the Hamiltonian depend on the density / orbitals. It goes without saying that especially this last restriction needs to be lifted to make our results useful in practice. This angle we have left for future work.

Already at the stage of the current reduced model, however, there are a few aspects to consider when finding an upper bound:

  • We wanted our bound to be fully-guaranteed, which means that we wanted to design a bound where we are able to prove that the exact answer must lie inside the bounds we give. This means when we provide error bars for band energies it is mathematically guaranteed to find the exact answer of the full-dimensional (complete-basis set limit) calculation at infinite accuracy and at zero convergence tolerances inside our bound.
  • To be useful our bound should be (cheaply) computable, because otherwise plainly checking the calculation at vastly increased precision might end up being the better option. Notice that our bound does require to use a finer discretisation for some terms, but this is only a one-shot a posteriori step and not an iterative one.
  • Ideally the bound should be sharp meaning that the upper bound to the error we report should not be too far off the true error. Even better would be an optimal bound, where we are as close to the true error as possible (given proposed structure in the error estimate). Such considerations are very important to not end up with completely correct but also useless statements like: "The error in the band energy is smaller than 10 million Hartree".

Finding a balance between these three aspects is not always easy and in our work we often take the pragmatic route to obtain a simpler, albeit less sharp error. Still, our bound is fully computable and allowed us, for the first time, to report band structure diagrams of silicon annotated with fully-guaranteed error bars of combined discretisation, algorithm and arithmetic errors. Details of our methodologies are given in the paper. Its full abstract reads

We address the problem of bounding rigorously the errors in the numerical solution of the Kohn-Sham equations due to (i) the finiteness of the basis set, (ii) the convergence thresholds in iterative procedures, (iii) the propagation of rounding errors in floating-point arithmetic. In this contribution, we compute fully-guaranteed bounds on the solution of the non-self-consistent equations in the pseudopotential approximation in a plane-wave basis set. We demonstrate our methodology by providing band structure diagrams of silicon annotated with error bars indicating the combined error.

by Michael F. Herbst at 2020-04-28 14:30 under electronic structure theory, theoretical chemistry, DFTK, Julia, dft, numerical analysis, Kohn-Sham, error estimates

2020-03-29

judge

Using Boundary Scan for PCB debugging

I did a seminar talk on JTAG and how to use it to check a PCB for errors. For this i designed a little board in order to simulate manufacturing errors. In this post I want to give you a short introduction to JTAG and how to use it.

What is Boundary-scan

Boundary-scan was developed to simplify testing of integrated circuits. To do this the Joint Test Action Group introduced the Boundary Scan architecture as a standard and today it has replaced the old testing methods because of its cost efficiency and speed up of test development and execution.

So how does this work?

I will be just covering the basics here. If you want a more detailed introduction, you can read the articles referenced at the end of this post.

Boundary-scan is a technology which places cells at the circuits boundary that can sample the circuit inputs and also drive its outputs. This can be done while the circuit is operating and it is controlled via the JTAG interface. With these cells in place we wan test the internal logic of the circuit as well as the interconnects between the devices on a board.

In the picture above you can see an overview of the boundary-scan architecture. This picture only shows one JTAG device, but on more complex PCB there are many such devices that are connected in a JTAG chain. This means that the data out of one chip is connected to the input of the next.

In order for us use this architecture, we need to connect to the four JTAG pins:

  • TDI Test Data In
  • TDO Test Data Out
  • TCK Test Clock
  • TMS Test Mode Select

Once connected we can control all of the devices in the JTAG chain. In general we can perform two different types of actions. We can either write/read data or we can write instructions. Testing a circuit involves both of theses actions. By writing instructions we can set the connected circuits into different modes. And by writing data we can set the pins to the desired levels as well as read the results.

There are three different instructions we are interested in when testing interconnects:

  • SAMPLE/PRELOAD
  • EXTEST
  • BYPASS

With the SAMPLE/PRELOAD instruction the boundary-scan cells will sample the inputs of the circuit, and when shifting in data we can read values received at the pins. We can also use this command to load data into the boundary-scan cells. The EXTEST instruction will set the output of the pins to the values provided in the boundary-scan cells and the BYPASS instruction lets us skip devices in the chain that are not of interest to the test currently performed.

With these instructions we can set pins of a device to a logic level and test at the receiving circuit if the correct logic level was received.

The Hardware

We need to connect to the JTAG pins of a devices. If they are exposed there probably is a connector on the board. If not it may be possible to solder wires directly to the pins. In any case we will need a JTAG adapter in order to connect them to a normal computer via USB. I am using a JLink adapter which features a 20 pin JTAG connector.

The PCB I am using for testing is a small circuit I designed myself, it also has a 20 pin connector which exposes the JTAG pins. Here is the layout:

It features two MAX V Altera CPLDs that implement the boundary-scan architecture and are connected in a JTAG chain. To test some connections between them I connected eight of their pins at the top and added a dip switch to simulate shorts between the wires as well as pull ups and pull downs.

Tooling

Okay so we got a PCB and connected it to a computer with a JTAG adapter now what? How do get the tests running?

Well this is where it gets hard, because there is no nice open source software that does it all for us. We need to write the test ourselves and for that we need to know the details of the interconnects on our PCB and the details of how the boundary-scan architecture is implemented in the devices in our JTAG chain.

BSDL Files

The Boundary-scan standard does not tell manufacturers what bit codes to use to encode the instructions, but it does require them to publish the so called BSDL (Boundary Scan Description Language) files free of charge for their devices. These files use a dialect derived from VHDL a hardware description language and describe all information needed to use a devices boundary-scan architecture. For example:

  • Instruction register description
  • Boundary-scan cell description
  • Pin mappings
  • and more

Basically all information we might want to know about a specific device is in that file.

SVF Files

Most JTAG adapter tool chains support a file format called SVF (Serial Vector Format). These are plain text files that describe what data or instructions to write to the JTAG connection. It has no knowledge about the JTAG chain of devices or any of there properties. We can just specify what data/instructions to shift in and what results we expect to be shifted out.

Designing a simple test

Okay so putting it all together. Lets say we want to test two neighboring interconnects for a bridging fault.

On my example PCB two neighboring interconnects are connected as follows:

  1. CHIP 1 IO56 - CHIP 2 IO49
  2. CHIP 1 IO55 - CHIP 2 IO50

So if we set IO56 of the first chip to a logic 1 and IO55 to logic 0, we would expect to receive a logic 1 on IO49 and a logic 0 at IO50 on the second chip. If we receive a logic 1 at IO50 that means that there is a bridging fault between the interconnects.

To write a test for this we need to write an SVF file that we can run with the JTAG adapter tool chain. It needs to complete the following steps:

  1. Set first chip into SAMPLE/PRELOAD mode. (Shift in Instructions)
  2. Load Test pattern into first chip (Shift in Data)
  3. Set first chip to EXTEST and second chip to SAMPLE/PRELOAD mode (Shift in Instructions)
  4. Shift in dummy data to receive the data sampled from chip 2 and compare against the expected result (Shift in Data)

The following SVF files does exactly this:

TRST OFF;
ENDIR IDLE;
ENDDR IDLE;
STATE RESET;
STATE IDLE;
FREQUENCY 10000000 HZ;
SIR 20 TDI (01405);
SDR 480 TDI (000000000000000000000000000000000000000000800000000000000
	000000000000000000000000000000000000000000000000000000000000000);
SIR 20 TDI (03c05);
SDR 480 TDI (0) TDO (000000000000000000000000000000000000000000000000
	00000000000000000000000000000000000000000000000100000000000000000
	0000000) MASK (00000000000000000000000000000000000000000000000000
	00000000000000000000000000000000000000000000012000000000000000000
	00000);

Up until the first line starting with SIR all we are doing is making sure the boundary-scan circuit is in a known state and setting the speed at which we want to operate.

SIR means we want to write instruction data. We need to specify the length and since we have to chips with both 10 bit instruction registers we are writing 20 bits of data. The data we want to write is specified behind the TDI statement as a hex string surrounded by parenthesis.

SDR works the same as SIR but instead of instruction data we are just writing data. In this case we are writing 480 bits, since both chips have a boundary-scan register length of 240 bits. If we want to compare the shifted out data against the expected data we have to write the expected result as a hex string behind the TDO statement and we can even mask the result so that we only look at the bits we are interested in.

How do we know what data to write and what data to expect? Well for that we have to look at the BSDL files of the devices in our JTAG chain and figure out which pin is mapped to which bit in the data. Of course writing a SVF by hand is very tedious, so instead I created some python scripts to help me with the task. Feel free to use them for your own projects.

Conclusion

Using Boundary-scan for PCB testing is very nice when trying to automatically test PCBs. Sadly there is little to no open source software which makes it easy to design tests for your own layouts. Which means that you still have to do a lot of information gathering if you want to use this technology. But once the tests are implemented testing a PCB becomes a very easy and fast.

I hope this blog post gave you an idea of what you need to do, to get up and running with boundary-scan testing. If you have any feedback feel free to contact me.

Further Reading Material

by Felix Richter at 2020-03-29 00:00

2020-03-22

Insanity Industries

Simple networking

It is time to revisit of the simple wifi setup as much has changed since. This time, however, we will not only cover wifi, but also wired connections as well as DNS.

Wired networking

This wired setup is based on systemd-networkd, which typically comes with systemd. If you use a systemd-based Linux, you should be ready to go right away. It further assumes that acquiring an IP address via DHCP once a cable is plugged in the typical usecase.1

Create the file /etc/systemd/network/cable.network2:

[Match]
Name=<name of the wired network device> <second one if applicable> [<more ...>]

[Network]
DHCP=yes
IPv6PrivacyExtensions=true

[DHCP]
Anonymize=true

Then systemctl enable --now systemd-networkd and plug in a cable, wired networking with DHCP is up and running.

Wireless networking

The wireless setup is based on iwd, the currently vastly superior tool for wireless networking on Linux compared to existing alternatives such as wpa_supplicant (the effectively only other contender in this game) and software using it such as NetworkManager and netctl. It requires iwd as well as Linux >= 4.20 for full operation.

Create the file /etc/iwd/main.conf:

[General]
# uncomment for setting the wifi interface name yourself
# see https://iwd.wiki.kernel.org/interface_lifecycle
#UseDefaultInterface=true

# enable builtin DHCP-client within iwd for wifi
EnableNetworkConfiguration=true

# randomizes mac-address every time iwd starts or the hardware is initially detected
AddressRandomization=once

Then systemctl enable --now iwd, wireless networking with DHCP is up and running, blazingly fast and stable.

You might want to take a look at the previous post to deal with race conditions that might be introduced due to iwd being significantly faster than other wifi-solutions on Linux.

Frontends

iwd stores its known networks in /var/lib/iwd. To add, modify or delete them, several options are available:

iwctl

You can now connect to simple PSK wifi networks in the iwctl shell:

[iwctl] station <devicename> get-networks
…
[iwctl] station <devicename> connect <network-name>

iwctl will ask you for the password, iwd will memorize it for later connections and autoconnect the next time the network appears. Currently, iwctl only supports creating simple password-only network connections, for more complex network setups, instead of iwctl asking you all the nifty details of an enterprise connection, use the file-based config instead. For everything else, just type help within iwctl.

file-based config

If you have more complex wifi-setups, you can simply drop a network configuration file in /var/lib/iwd. The files must be named as networkname.protocol. You can find the protocol listed in the output of get-networks in the iwctl-shell. To fill the file, take a look to the network configuration settings in the iwd documentation or the iwd.network manpage.

For example, to use the University Heidelberg’s eduroam network, create the file /var/lib/iwd/eduroam.8021x containing

[Security]
EAP-Method=TTLS
EAP-Identity=<anonymous-identity>
EAP-TTLS-CACert=/etc/ssl/certs/T-TeleSec_GlobalRoot_Class_2.pem
EAP-TTLS-Phase2-Method=MSCHAPV2
EAP-TTLS-Phase2-Identity=<uni-id>@uni-heidelberg.de
EAP-TTLS-Phase2-Password=<your University account password>

For other institutions, the details in the method might vary, of course. See man iwd.network for details. It also includes several examples for almost all possible configurations.

Network Manager

You can also use Network Manager for a more graphical experience, you just have to enable the iwd-backend for NetworkManager to use it. In addition to that, double check if you have the right NetworkManager version for your iwd version, just as a precaution.

DNS

For modern domain name resolution, you can use systemd-resolved, which provides statistics, automated caching of DNS requests, DNSSEC validation and much more. To use it, ensure systemd-resolved is installed3 and systemctl enable --now systemd-resolved. Ensure that /etc/resolv.conf is symlinked to /run/systemd/resolve/stub-resolv.conf, if not

ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

takes care of it. Afterwards, you can change the default fallback DNS servers and other things in /etc/systemd/resolved.conf, if desired and otherwise enjoy resolvectl and resolvectl statistics.

Miscellaneous optimizations

Binding iwd to the wifi-device

To disable iwd when the wifi is not present (and iwd therefore not needed) create the file /etc/udev/rules.d/wifi.rules containing

# bind iwd to wifi
SUBSYSTEM=="rfkill", ENV{RFKILL_NAME}=="phy0", ENV{RFKILL_TYPE}=="wlan", ACTION=="change", ENV{RFKILL_STATE}=="1", RUN+="/usr/bin/systemctl --no-block start iwd.service"
SUBSYSTEM=="rfkill", ENV{RFKILL_NAME}=="phy0", ENV{RFKILL_TYPE}=="wlan", ACTION=="change", ENV{RFKILL_STATE}=="0", RUN+="/usr/bin/systemctl --no-block stop iwd.service"

If AddressRamdomization=once is set in the configuration, this udev rule has the nice side-effect that, as iwd starts nanew when the wifi is un-rfkilled, the MAC address of the interface is randomized again.

Persistent device naming

If you want to have persistent device names, for example for devices showing up in a status bar or something similar, this ist easiest achieved by creating a link-file4 /etc/systemd/network. To persistently name the wireless device wifi, create /etc/systemd/network/00-wifi.link containing

[Match]
MACAddress=ab:cd:ef:12:34:56  # wifi's original MAC address

[Link]
Name=wifi  # can be used for matching in the according *.network-file

If you do this for any device operated by iwd, ensure you have UseDefaultInterface=true set in section [General] in /etc/iwd/main.conf.

Setting specific MAC addresses for specific wifi networks and randomizing on every connect

In case you need to set a specific address for a specific wireless network, iwd>=1.6 allows using AddressOverride= inside a network configuration file. For this to take effect, however, AddressRandomization=once must be dropped from /etc/iwd/main.conf. This makes iwd generate persistent MAC addresses per network (generated from its SSID and real hardware address) by default and allows AddressOverride to take effect. Additionally, a network file can contain AlwaysRandomizeAddress=true, which randomizes the MAC address on each connect for this network (also only takes effect when AddressRandomization is not set or set do its default value in main.conf. However, due to kernel limitations, each change of a MAC address requires powercyling the card. This leads to (for iwd standards) singificantly prolonged connection times (by ≈300-400ms). Given that, as long as you do not require AddressOverride, AddressRandomization=once gives you fast connection times while still randomizing your card’s MAC address regularly enough.

Thanks to rixx for proofreading and helpful suggestions.


  1. Although this if of course an assumption, my personal statistics says that this setup has worked with 100% of all network cables I ever plugged into my laptop so far. ↩︎

  2. see man systemd.network (preferrably, as it fits your version) or https://www.freedesktop.org/software/systemd/man/systemd.network.html for the latest docs. ↩︎

  3. The simplest check is systemctl status systemd-resolved. ↩︎

  4. see man systemd.link or https://www.freedesktop.org/software/systemd/man/systemd.link.html ↩︎

by Jonas Große Sundrup at 2020-03-22 00:00

2020-03-15

Insanity Industries

Mastodon and Twitter Feeds

This blog has had an RSS-feed from the very start. But from today on, this blog also officially has feeds on Mastodon and Twitter you can subscribe to if you do not like RSS or prefer these over RSS.

Let’s see where this goes, I’m curious if that’s something people will find useful. Retoots or -tweets encouraged!

by Jonas Große Sundrup at 2020-03-15 00:00

2020-03-14

RaumZeitLabor

COVID-19

ACHTUNG: Aufgrund der aktuellen Lage und Vorgaben der Stadt Mannheim (siehe auch) bleibt das RaumZeitLabor bis auf weiteres geschlossen. Alle öffentlichen Veranstaltungen sind abgesagt.

Passt auf euch auf.

by tabascoeye at 2020-03-14 00:00

2020-03-07

Insanity Industries

Preserving data integrity

When we are storing data, we typically assume that our storage system of choice returns that data later just as we put it in. However what guarantees do we have that this is actually the case? The case made here is the case of bitrot, the silent degradation of the physical charges that physically make up today’s storage devices.

To counter this type of problem, one can employ data checksumming, as it is done by both btrfs and ZFS. However, while in the long run btrfs might be the tool of choice for this, it is fairly complex and not yet too mature, whereas ZFS, the most prominent candidate for this type of features, is not without hassle and it must be recompiled for every kernel update (although automation exists).

In this blogpost, we’ll therefore take a look into a storage design that actually checks whether the returned data is actually valid and not silently corrupted inside our storage system and is completely designed with components available in Linux itself without the need to recompile and test your storage layer on every kernel upgrade. We find that this storage design, while fulfilling the same purpose as ZFS, does not only yield comparable performance, but actually in some cases even able to significantly outperform it, as the benchmarks at the end indicate.

The setup

The system we will test in the following will be constructed from four components (including the filesystem):

  • dm-integrity provides blocklevel checksumming for all incoming data and will check said checksum for all data read through it (and return a read error if it doesn’t match).
  • mdraid provides redundancy if a disk misbehaves, avoiding data loss. If the dm-integrity-layer detects invalid data from the disk, mdraid will see a read-error and can immediately correct this error by reading it from somewhere else in the redundancy pool.
  • dm-crypt on top of the RAID, right below the filesystem will encrypt the data1
  • ext4 as a filesystem to actually store files on it

The order matters, dm-integrity must be placed between the harddisks and the mdraid, to ensure that data corruption errors can be corrected by mdraid after they have been detected by dm-integrity. Similarly, dm-crypt has nothing to do with redundancy, therefore it should be placed on top of of mdraid, to only having to be passed once, and not multiple times as it would be the case for placing it alongside dm-integrity.2

This yields the following storage design:

Storage system architecture

Architecture of the storeage system. Grey depicts physical hardware, green depicts device mapper technologies, yellow indicates mdraid and blue indicates filesystems. The encryption layer will later be ommitted when comparing performance to ZFS.

Architecture of the storeage system. Grey depicts physical hardware, green depicts device mapper technologies, yellow indicates mdraid and blue indicates filesystems. The encryption layer will later be ommitted when comparing performance to ZFS.

This script will assemble the above depicted system and mount it to /mnt.

Resulting performance

Now that we have an assembled system, we’d like to quantify the performance impact of the different layers. The test setup for this is a Kobol Helios4 (primarily because it’s the intended target platform for this storage system) with four disks3 running Debian 10 “Buster”. The Helios4 is powered by an energy efficient dual-core ARM SoC optimized for storage systems. The setup was not built based on the entire disks, but on partitions of a size of 10GiB each, allowing multiple parallel designs and therefore easing the benchmarking procedure, as well as speeding up the experiments4.

Layer analysis of performance impact

Throughput

To benchmark throughput, the following commands were used:

# write:
echo 3 > /proc/sys/vm/drop_cache  # drop all caches
dd if=/dev/zero of=/path/to/testfile bs=1M count=8000 conv=fdatasync

# read
echo 3 > /proc/sys/vm/drop_cache  # drop all caches
dd if=/path/to/testfile of=/dev/zero conv=fdatasync

This procedure was repeated ten times for proper statistics for the different cases for both read (r) and write (w):

  • ext4 filesystem only (f, one disk only)
  • encryption, topped by ext4 (cf, one disk only)
  • mdraid (RAID6/double parity), topped by the former (rcf, 4 disks)
  • the final setup, including integrity (ircf, 4 disks)
Throughput for different arrangements of layers on Kobol Helios4 system for both read (top row) and write (bottom row). Each dot indicates one measurement. f: filesystem, cf: crypto+f, rcf:raid+cf, ircf: integrity+rcf. Testplatform

Throughput for different arrangements of layers on Kobol Helios4 system for both read (top row) and write (bottom row). Each dot indicates one measurement. f: filesystem, cf: crypto+f, rcf:raid+cf, ircf: integrity+rcf. Testplatform

We see different interesting results in the data: First of all, the encryption engine on the helios4 is not completely impact free, although the resulting performance is still more than sufficient for the designated uses for a Helios4.

Secondly, we see that adding integrity layer does have a noticable impact on writing, but a negligible impact on reading, indicating that especially for systems primarily intended to read data from adding the integrity layer is a matter of negligible cost.

For write-heavy systems, the performance impact is more considerable, but for certain workloads, such as the home-NAS-case the Helios is designed for, the performance can still be considered fully sufficient, especially as normally the system would cache those writes to a certain extend, which was explicitly disabled for benchmarking purposes (see conv=fdatasync in the benchmark procedure).

The reason for the degradation in write (but not in read) is likely due to the fact that the integrity layer and the mdraid layer are decoupled from one another, the raid-layer is not aware that for a double parity setup it effectively has to write the same information three times, and the integrity layer has to account for all three writes before the data is considered synced as required by conv=fdatasync.

Latency

We have found that the throguhput of such a storage design is yielding useful throughput-rates. The next question is about the latency of the system, which we will, for simplicity, only estimate for random 4K-reads. Again, just as above, we will investigate the impact of the different layers of the system. To do so, we read 100.000 sectors randomly from an 18GB sized testfile filling the storage mountpoint for each configuration after dropping all caches.

Latency comparison of system layers

Distribution of latencies in milliseconds for different arrangements of layers on Kobol Helios4 system. green denotes the median, black denotes the average of the latencies of the respective setup, access times below 1ms were considered cache hits and therefore excluded for the computation of mean and average. Testplatform

Distribution of latencies in milliseconds for different arrangements of layers on Kobol Helios4 system. green denotes the median, black denotes the average of the latencies of the respective setup, access times below 1ms were considered cache hits and therefore excluded for the computation of mean and average. Testplatform

The figure above yields several interesting insights: First of all, we do see cache hits close to zero milliseconds, furthermore, we see that the latency distribution is fairly evenly distributed over the available range and finally and most interestingly, we see that the impact of the several layers onto the latency is measurable, but rather irrellevant for typical practical purposes.

Performance comparison with ZFS

So far we have tested the setup on its intended target platform, the Helios4. To compare the resulting system against the elephant in the room, ZFS, we will use a different test platform, based on an Intel i7-2600K as CPU and 4x1TB disks, as the zfs-dkms was not reliably buildable on Debian Buster on ARM, and when it actually built, it explicitly stated that 32bit-processors (as the Helios’ CPU) are not supported by upstream, although technically the system would run.

To allow for a cleaner comparison, the testbed was accordinly changed to accomodate for ZFS’s preferences. As ZFS did not adhere to conv=fdatasync5, the main memory was restricted to 1GB, swap was turned off and the size of the testfile was chosen to be 18GB. This way, any caching happening would be at least significantly reduced as there was little space next to the OS inside of the main memory for caching.

All tests were run on a Debian 10 “Buster” with Linux 4.19, ZFS was used in form of the package zfs-dkms in version 0.8.2 from backports. The storage layer for both setups was layouted with double parity (RAID6/raidz2) and, as the zfs-dkms package in Debian was not able to do encryption, the mdraid-based setup was also setup without the encryption layer.

Throughput

The commands used for benchmarking throughput were conceptually the same as above:

# write:
echo 3 > /proc/sys/vm/drop_cache  # drop all caches
dd if=/dev/zero of=/path/to/testfile bs=1M count=18000 conv=fdatasync

# read
echo 3 > /proc/sys/vm/drop_cache  # drop all caches
dd if=/path/to/testfile of=/dev/zero conv=fdatasync

Although ZFS seemingly does not adhere to any cache dropping or syncing instructions, they were still performed and adhered to by the mdraid-based setup.

Throughput comparison, zfs & md

Throughput for both ZFS and the md-based setup for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Testplatform

Throughput for both ZFS and the md-based setup for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Testplatform

The results are very interesting: While the md-based setup performs less consistent in and of itself, it still consistently outperforms ZFS in read performance. When it comes to write, though, ZFS performs noticably better.6

To investigate the cause of this unexpected balance, we note that while ZFS combines raid, integrity and filesystem in one component, for the md-based setup these are separate components. Of these components, not only the filesystem, but also the dm-integrity implements journalling to avoid inconsistencies in case of a power outage. This leads to increased work until the transaction has been fully flushed to disk, which can be seen in the next figure, where the md-based system (without the encryption layer) is tested with both journalling enabled and disabled in the integrity layer:

Throughput effect of integrity journaling

Throughput for the md-based setup, with journalling enabled and disabled in the integrity-layer for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Testplatform

Throughput for the md-based setup, with journalling enabled and disabled in the integrity-layer for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Testplatform

We find that write-performance increases significantly (and taking into account the factor of increase it actually surpasses the write performance of ZFS). We can therefore conclude that the Linux components are very capable of matching ZFS performance-wise, at least for low memory environements, and especially outperform ZFS in read-performance. In a purely throughput-optimized configuration (such as when an uninterruptible power supply is installed, so that the guarantees of journalling are not necessarily crucial) it even outperforms a standard ZFS-configuration in both read and write7. Before you disable integrity-journalling in production, however, take into account the implications8.

Disabling journalling on the filesystem layer of the md-based setup did not have any measurable impact onto the throughput performance of the setup.

Latency

Analoguous to the latency measurements on the Helios4 we can measure the latency profile for both the md-based storage as well as ZFS, reading 1 million random 4K-samples from an 18GB sized testfile.

Latency comparison, zfs & md

Throughput for both ZFS and the md-based setup for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Latency median is depicted by lime-green, latency average in black (which is hard to see for md, because both lie almost exactly on top of each other). Latencies below 1ms were considered cache hits and therefore excluded from the computation of average and median. Testplatform

Throughput for both ZFS and the md-based setup for both read and write. Each dot indicates one measurement, the green line indicates the median of all measurements. Latency median is depicted by lime-green, latency average in black (which is hard to see for md, because both lie almost exactly on top of each other). Latencies below 1ms were considered cache hits and therefore excluded from the computation of average and median. Testplatform

Besides the expected cache hits close to zero millisecond-mark we find some very interesting results with regards to latency: While the md-based system is very consistent with its latency profile, both with the distribution as well as the average and median being almost identical, ZFS exhibits a slightly lower median latency, but contrasts that with an up to five-fold larger latency tail, which lifts the average latency noticably above that of the md-based setup, indicating that ZFS seems far less predictable with regard to read-latcency.

Remark on RAID checks

One thing to note is that checks and resyncs of the proposed setup are significantly prolonged (by a factor of 3 to 4) compared to an mdraid without the integrity-layer underneath. The investigation so far has so far not revealed the underlying cause. It is not CPU-bound, indicating that the read-performance is not held back by checksumming, the latency figures above do not imply an increase in latency significant enough to cause a factor 3-4 prolonged check-time, disabling the journal did not change this either (as one would expect, as the journal is unrelated to read, which should be the only relevant mode for a raid-check).

ZFS was roughly on par with mdraid without the integrity-layer underneath with regard to raid-check time.

If anyone has an idea of the root cause of this behavior, feel encouraged to contact me, I’d be intrigued to know.

Conclusion

First of all, the conclusion when pondering if ZFS is required for guaranteeing data integrity in a storage layer can clearly be answered with a no. Not only does the combination of mdraid and dm-integrity yield more than sufficient data rates for a typical home-storage-setup, the data also indicates that at least in low-memory environements this kind of setup can actually outperform ZFS for read-heavy operations, both in throughput as well as being at least comparable if not more reliable with regard to latency. Especially for long-term-storage solutions primarily intended for just having tons of storage space, such as home-NAS-systems, this is typically the more relevant mode of operation as long as the write-performance is sufficient (which the data confirms).

Therefore data integrity on the storage layer is very possible without going through the hassle of having to build ZFS additionally with each kernel upgrade and preserving the option for easy restore with any modern linux, even if ZFS is not available, should the primary storage system fail.

The prolonged check- and resync-times still lack a reasonable explanation, which is unsatisfying. However, from a purely practical point of view, rebuilds and resyncs are typically a rather rare occurance, therefore the prolongation of check- and resync-time is a comparatively small (and rarely paid) price for the guarantee of data integrity.

Additional thoughts and outlook

In addition to the conclusions we can draw from the data we gathered, the md-based setup has another advantage ZFS does not (yet) provide: for ZFS all disks for the storage pool must be available upfront at first assembly9 (or the redundancy will not be “any 2 disks may fail and the storage pool is still operational” for a system ultimately intended to have double parity) whereas this is not the case with mdraid, which can extend or grow a pool10, resyncing it to, for all (changing) configurations, at any point in time (except disk failure and rebuild) fulfill “any two disks may fail” (for an intended double parity/RAID6-setup with over time increasing disk count).

Of course, ZFS is not only blocklevel checksumming. ZFS provides additional tooling for different purposes which might make its use still worthwhile. If one was to extend the md-raid-based system by some of these features, it might be of interest to use XFS as the filesystem on top, as with a sufficiently recent Kernel XFS also supports features like filesystem-snapshots and many more. Alternatively, one could introduce an LVM-layer into the system (or, if desired, even replace md with LVM).

As usual, feedback is always appreciated.

Thanks to Hauro for proofreading and helpful comments.


  1. This not only has the advantage of data protection should the system be stolen or otherwise fall into the wrong hands, it also ensures that any disk down in the stack will never see plaintext data, so broken drives do not have to be wiped (especially if they are broken in a way one has to resort to this kind of wiping) ↩︎

  2. Actually the functionality of dm-crypt and dm-integrity can be used within the same layer. However, this would then be authenticated encryption performed by dm-crypt, which is intended to prevent tampering and therefore uses cryptographic hashes. While standalone dm-integrity can do that, it uses a simple crc32 integrity checksum by default, which is better suited to detect not-on-purpose degradation. Furthermore, this would mean not only one encryption layer, but instead multiple of them, reducing performance. Actually, this kind of setup was briefly tested and performed worse with regards to throughput. ↩︎

  3. HITACHI HDS721010CLA332, 1TB ↩︎

  4. As the dm-integrity layer does blocklevel checksumming, it overwrites the entire disk once by default, ensuring that all checksums are initialized properly. This, however, takes quite some time on a full 1TB drive, significantly slowing down the creation of different testbeds. ↩︎

  5. This was evident by the fact that ZFS claimed it was able to persist to spinning disk with 1GB/s, which is roughly five times what the disks underneath can do in raw-byte write performance. Therefore either ZFS is an historically unmatched engineering feat, circumventing hardware limitations, or, more likely, ZFS was still caching despite being politely asked not to do this. ↩︎

  6. A certain part of ZFS’s performance will be due to remaining caching performed by ZFS, but that should be fairly negligible. ↩︎

  7. This could, of course, change if similar components in ZFS could be disabled to tweak throughput rate, which was not further investigated. ↩︎

  8. With journalling enabled, the worst case in a power outage is the loss of data that hasn’t been written to disk yet, the device itself, however, will always contain blocks with consistent checksums. Disabling journalling on the integrity layer implies that in case of a power outage, the integrity device might end up holding blocks with incorrect checksums. Given that above this layer resides the RAID-layer, which can in theory correct those inconsistent checksums as long as at least one of the three redundant copies of that block contains a valid checksum. Then again, in case of a power outage, there is a) no guarantee that this will be the case as the file in question must be modified in all three redundant copies and b) the standard assembly method of an md-raid will kick out a disk on the first read error, therefore multiple inconsistent blocks spread over the different devices might theoretically be recoverable, but in practice the raid will just kick out those disks at the first error encountered. It would have to be investigated if that behavior of mdraid can be modified to better suit this use case (which was not investigated further and might actually be pretty straightforward). Until then I would not disable journalling on the redundancy layer, given that the write-performance is most likely more than sufficient for most small setups, such as the 4-disk-setup presented here, especially taking into account that in real-life linux will also cache writes before flushing them to disk. ↩︎

  9. This can be somewhat circumvented by initially building a degraded pool, but until the missing disks are added, the degraded pool has also degraded redundancy, somewhat thwarting the entire point of such a system. ↩︎

  10. For example, mdraid allows growing from a 4-disk-RAID6 to a 5-disk-RAID6, or migrating from a 3-disk-RAID5 to a 4-disk-RAID6, from a 4-disk-RAID6 to a 4-disk-RAID5, etc. ↩︎

by Jonas Große Sundrup at 2020-03-07 00:00

2020-02-18

RaumZeitLabor

Aamiainen – Finnisches Spätstück

Liebe RaumZeitSaunierende,

am 1. März 2020 wollen wir zuSamen den “Kalevalan päivä” und somit den Tag der finnischen Kultur (28. Februar) bei einem Spätstück nachfeiern. Zieht euer Eläkeläiset-Fanshirt aus dem Schrank und kommt um 11.30 Uhr auf eurem Nokia 3310 ins RZL vibriert. Lasst euch dieses finntastische Ereignis nicht entgehen! Damit wir wissen, wie viel Munavoi wir anrühren müssen, bitten wir um Anmeldung per Mail bis zum 25. Februar 2020. Wir freuen uns über einen Beitrag von 8 Euro für unser Finnkasso-Unternehmen.

Hyvää ruokahalua!
Eure Muumirattie

FinnischeFrühstücksflagge

by flederrattie at 2020-02-18 00:00

2020-02-02

sECuREs website

Readiness notifications in Go

When spawning a child program, for example in an integration test, it is often helpful to know when the child program is ready to receive requests.

Delaying

A brittle strategy is to just add a delay (say, time.Sleep(2 * time.Second)) and hope the child program finishes initialization in that time. This is brittle because it depends on timing, so when the computer running the test is slow for whichever reason, your test starts failing. Many CI/CD systems have less capacity (and/or are more heavily utilized) than developer machines, so timeouts frequently need to be adjusted.

Also, relying on timing is a race to the bottom: your delay needs to work on the slowest machine that runs your code. Ergo, tests waste valuable developer time on your high-end workstation, just so that they pass on some under-powered machine.

Polling

A slightly better strategy is polling, i.e. repeatedly checking whether the child program is ready. As an example, in the dnsmasq_exporter test, I need to poll to find out when dnsmasq(8) is ready.

This approach is better because it automatically works well on both high-end and under-powered machines, without wasting time on either.

Finding a good frequency with which to poll is a bit of an art, though: the more often you poll, the less time you waste, but also the more resources you spend on polling instead of letting your program initialize. The overhead may be barely noticeable, but when starting lots of programs (e.g. in a microservice architecture) or when individual polls are costly, the overhead can add up.

Readiness notifications

The most elegant approach is to use readiness notifications: you don’t waste any time or resources.

It only takes a few lines of code to integrate this approach into your application. The specifics might vary depending on your environment, e.g. whether an environment variable is preferable to a command-line flag; my goal with this article is to explain the approach in general, and you can take care of the details.

The key idea is: the child program inherits a pipe file descriptor from the parent and closes it once ready. The parent program knows the child program is ready because an otherwise blocking read from the pipe returns once the pipe is closed.

This is similar to using a chan struct{} in Go and closing it. It doesn’t have to remain this simple, though: you can also send arbitrary data over the pipe, ranging from a simple string being sent in one direction and culminating in speaking a framed protocol in a client/server fashion. In Debian Code Search, I’m writing the chosen network address before closing the pipe, so that the parent program knows where to connect to.

Parent Program

So, how do we go about readiness notifications in Go? We create a new pipe and specify the write end in the ExtraFiles field of (os/exec).Cmd:

r, w, err := os.Pipe()
if err != nil {
  return err
}

child := exec.Command("child")
child.Stderr = os.Stderr
child.ExtraFiles = []*os.File{w}

It is good practice to explicitly specify the file descriptor number that we passed via some sort of signaling, so that the child program does not need to be modified when we add new file descriptors in the parent, and also because this behavior is usually opt-in.

In this case, we’ll do that via an environment variable and start the child program:

// Go dup2()’s ExtraFiles to file descriptor 3 and counting.
// File descriptors 0, 1, 2 are stdin, stdout and stderr.
child.Env = append(os.Environ(), "CHILD_READY_FD=3")

// Note child.Start(), not child.Run():
if err := child.Start(); err != nil {
  return fmt.Errorf("%v: %v", child.Args, err)
}

At this point, both the parent and the child process have a file descriptor referencing the write end of the pipe. Since the pipe will only be closed once all processes have closed the write end, we need to close the write end in the parent program:

// Close the write end of the pipe in the parent:
w.Close()

Now, we can blockingly read from the pipe, and know that once the read call returns, the child program is ready to receive requests:

// Avoid hanging forever in case the child program never becomes ready;
// this is easier to diagnose than an unspecified CI/CD test timeout.
// This timeout should be much much longer than initialization takes.
r.SetReadDeadline(time.Now().Add(1 * time.Minute))
if _, err := ioutil.ReadAll(r); err != nil {
  return fmt.Errorf("awaiting readiness: %v", err)
}

// …send requests…

// …tear down child program…

Child Program

In the child program, we need to recognize that the parent program requests a readiness notification, and ensure our signaling doesn’t leak to child programs of the child program:

var readyFile *os.File

func init() {
  if fd, err := strconv.Atoi(os.Getenv("CHILD_READY_FD")); err == nil {
    readyFile = os.NewFile(uintptr(fd), "readyfd")
    os.Unsetenv("CHILD_READY_FD")
  }
}

func main() {
  // …initialize…

  if readyFile != nil {
    readyFile.Close() // signal readiness
    readyFile = nil   // just to be prudent
  }
}

Conclusion

Depending on what you’re communicating from the child to the parent, and how your system is architected, it might be a good idea to use systemd socket activation (socket activation in Go). It works similarly in concept, but passes a listening socket and readiness is determined by the child process answering requests. We introduced this technique in the i3 testsuite and reduced the total wallclock time from >100 seconds to a mere 16 seconds back then (even faster today).

The technique described in this blog post is a bit more generic than systemd’s socket activation. In general, passing file descriptors between processes is a powerful idea. For example, in debiman, we’re passing individual pipe file descriptors to a persistent mandocd(8) process to quickly convert lots of man pages without encurring process creation overhead.

at 2020-02-02 00:00

2020-01-28

michael-herbst.com

1st GDR NBODY meeting in Lille: Applications of DFTK

Earlier this month the first annual meeting of the French working group NBODY (GDR NBODY) took place in Lille. Since the GDR NBODY is an association of interdisciplinary scientists working on N-body problems in chemistry an physics, the about 80 participants came from a broad background ranging from quantum chemistry, materials science, mathematics or nuclear physics. The overall atmosphere of the conference was extremely relaxed, such that during the talks vivid discussions frequently arose. I particularly enjoyed the presentations about N-body effects in nuclear physics and the first-principle simulations of the structure of nuclei, since that topic was completely new to me. Fortunately there was one introductory talk for each of the four mayor topics of the working group bringing everyone up to speed in each others' subject.

As part of the program I presented about DFTK and our recent advances with the code. I first gave a brief rationalisation why we started DFTK as a new code for working on electronic-structure problems from the mathematical perspective, then I briefly presented two applications, where we hope our code could be useful towards developing new approaches for practical calculations. One is an investigation of increased precision and more generally estimates for the floating-point error in density-functional theory calculations. The other was a discussion of our ongoing work on SCF preconditioning techniques. In this I showed our first steps towards developing a mathematically justified SCF preconditioner suitable for tackling systems containing both a conducting and an insulating part. Our first results indicate that our approach could be more suitable than established methods, which usually rely on interpolating empirically between the established preconditioning strategies for metals and insulators. Our hope would be that our preconditioner could allow to apply DFTK in the context of simulating the electronic structure of catalytic metal surfaces in the future. In this application a challenge for SCF schemes is that employed catalysts are usually coated with an insulating oxide layer or are interfacing with more insulating organic compounds or air.

Link Licence
Using the density-functional toolkit (DFTK) to investigate floating-point error and SCF convergence (Slides) Creative Commons License

by Michael F. Herbst at 2020-01-28 18:00 under talk, electronic structure theory, Julia, HPC, DFTK, theoretical chemistry

2020-01-21

sECuREs website

distri: 20x faster initramfs (initrd) from scratch

In case you are not yet familiar with why an initramfs (or initrd, or initial ramdisk) is typically used when starting Linux, let me quote the wikipedia definition:

“[…] initrd is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process […] to make preparations before the real root file system can be mounted.”

Many Linux distributions do not compile all file system drivers into the kernel, but instead load them on-demand from an initramfs, which saves memory.

Another common scenario, in which an initramfs is required, is full-disk encryption: the disk must be unlocked from userspace, but since userspace is encrypted, an initramfs is used.

Motivation

Thus far, building a distri disk image was quite slow:

This is on an AMD Ryzen 3900X 12-core processor (2019):

distri % time make cryptimage serial=1
80.29s user 13.56s system 186% cpu 50.419 total # 19s image, 31s initrd

Of these 50 seconds, dracut’s initramfs generation accounts for 31 seconds (62%)!

Initramfs generation time drops to 8.7 seconds once dracut no longer needs to use the single-threaded gzip(1) , but the multi-threaded replacement pigz(1) :

This brings the total time to build a distri disk image down to:

distri % time make cryptimage serial=1
76.85s user 13.23s system 327% cpu 27.509 total # 19s image, 8.7s initrd

Clearly, when you use dracut on any modern computer, you should make pigz available. dracut should fail to compile unless one explicitly opts into the known-slower gzip. For more thoughts on optional dependencies, see “Optional dependencies don’t work”.

But why does it take 8.7 seconds still? Can we go faster?

The answer is Yes! I recently built a distri-specific initramfs I’m calling minitrd. I wrote both big parts from scratch:

  1. the initramfs generator program (distri initrd)
  2. a custom Go userland (cmd/minitrd), running as /init in the initramfs.

minitrd generates the initramfs image in ≈400ms, bringing the total time down to:

distri % time make cryptimage serial=1
50.09s user 8.80s system 314% cpu 18.739 total # 18s image, 400ms initrd

(The remaining time is spent in preparing the file system, then installing and configuring the distri system, i.e. preparing a disk image you can run on real hardware.)

How can minitrd be 20 times faster than dracut?

dracut is mainly written in shell, with a C helper program. It drives the generation process by spawning lots of external dependencies (e.g. ldd or the dracut-install helper program). I assume that the combination of using an interpreted language (shell) that spawns lots of processes and precludes a concurrent architecture is to blame for the poor performance.

minitrd is written in Go, with speed as a goal. It leverages concurrency and uses no external dependencies; everything happens within a single process (but with enough threads to saturate modern hardware).

Measuring early boot time using qemu, I measured the dracut-generated initramfs taking 588ms to display the full disk encryption passphrase prompt, whereas minitrd took only 195ms.

The rest of this article dives deeper into how minitrd works.

What does an initramfs do?

Ultimately, the job of an initramfs is to make the root file system available and continue booting the system from there. Depending on the system setup, this involves the following 5 steps:

1. Load kernel modules to access the block devices with the root file system

Depending on the system, the block devices with the root file system might already be present when the initramfs runs, or some kernel modules might need to be loaded first. On my Dell XPS 9360 laptop, the NVMe system disk is already present when the initramfs starts, whereas in qemu, we need to load the virtio_pci module, followed by the virtio_scsi module.

How will our userland program know which kernel modules to load? Linux kernel modules declare patterns for their supported hardware as an alias, e.g.:

initrd# grep virtio_pci lib/modules/5.4.6/modules.alias
alias pci:v00001AF4d*sv*sd*bc*sc*i* virtio_pci

Devices in sysfs have a modalias file whose content can be matched against these declarations to identify the module to load:

initrd# cat /sys/devices/pci0000:00/*/modalias
pci:v00001AF4d00001005sv00001AF4sd00000004bc00scFFi00
pci:v00001AF4d00001004sv00001AF4sd00000008bc01sc00i00
[…]

Hence, for the initial round of module loading, it is sufficient to locate all modalias files within sysfs and load the responsible modules.

Loading a kernel module can result in new devices appearing. When that happens, the kernel sends a uevent, which the uevent consumer in userspace receives via a netlink socket. Typically, this consumer is udev(7) , but in our case, it’s minitrd.

For each uevent messages that comes with a MODALIAS variable, minitrd will load the relevant kernel module(s).

When loading a kernel module, its dependencies need to be loaded first. Dependency information is stored in the modules.dep file in a Makefile-like syntax:

initrd# grep virtio_pci lib/modules/5.4.6/modules.dep
kernel/drivers/virtio/virtio_pci.ko: kernel/drivers/virtio/virtio_ring.ko kernel/drivers/virtio/virtio.ko

To load a module, we can open its file and then call the Linux-specific finit_module(2) system call. Some modules are expected to return an error code, e.g. ENODEV or ENOENT when some hardware device is not actually present.

Side note: next to the textual versions, there are also binary versions of the modules.alias and modules.dep files. Presumably, those can be queried more quickly, but for simplicitly, I have not (yet?) implemented support in minitrd.

2. Console settings: font, keyboard layout

Setting a legible font is necessary for hi-dpi displays. On my Dell XPS 9360 (3200 x 1800 QHD+ display), the following works well:

initrd# setfont latarcyrheb-sun32

Setting the user’s keyboard layout is necessary for entering the LUKS full-disk encryption passphrase in their preferred keyboard layout. I use the NEO layout:

initrd# loadkeys neo

3. Block device identification

In the Linux kernel, block device enumeration order is not necessarily the same on each boot. Even if it was deterministic, device order could still be changed when users modify their computer’s device topology (e.g. connect a new disk to a formerly unused port).

Hence, it is good style to refer to disks and their partitions with stable identifiers. This also applies to boot loader configuration, and so most distributions will set a kernel parameter such as root=UUID=1fa04de7-30a9-4183-93e9-1b0061567121.

Identifying the block device or partition with the specified UUID is the initramfs’s job.

Depending on what the device contains, the UUID comes from a different place. For example, ext4 file systems have a UUID field in their file system superblock, whereas LUKS volumes have a UUID in their LUKS header.

Canonically, probing a device to extract the UUID is done by libblkid from the util-linux package, but the logic can easily be re-implemented in other languages and changes rarely. minitrd comes with its own implementation to avoid cgo or running the blkid(8) program.

4. LUKS full-disk encryption unlocking (only on encrypted systems)

Unlocking a LUKS-encrypted volume is done in userspace. The kernel handles the crypto, but reading the metadata, obtaining the passphrase (or e.g. key material from a file) and setting up the device mapper table entries are done in user space.

initrd# modprobe algif_skcipher
initrd# cryptsetup luksOpen /dev/sda4 cryptroot1

After the user entered their passphrase, the root file system can be mounted:

initrd# mount /dev/dm-0 /mnt

5. Continuing the boot process (switch_root)

Now that everything is set up, we need to pass execution to the init program on the root file system with a careful sequence of chdir(2) , mount(2) , chroot(2) , chdir(2) and execve(2) system calls that is explained in this busybox switch_root comment.

initrd# mount -t devtmpfs dev /mnt/dev
initrd# exec switch_root -c /dev/console /mnt /init

To conserve RAM, the files in the temporary file system to which the initramfs archive is extracted are typically deleted.

How is an initramfs generated?

An initramfs “image” (more accurately: archive) is a compressed cpio archive. Typically, gzip compression is used, but the kernel supports a bunch of different algorithms and distributions such as Ubuntu are switching to lz4.

Generators typically prepare a temporary directory and feed it to the cpio(1) program. In minitrd, we read the files into memory and generate the cpio archive using the go-cpio package. We use the pgzip package for parallel gzip compression.

The following files need to go into the cpio archive:

minitrd Go userland

The minitrd binary is copied into the cpio archive as /init and will be run by the kernel after extracting the archive.

Like the rest of distri, minitrd is built statically without cgo, which means it can be copied as-is into the cpio archive.

Linux kernel modules

Aside from the modules.alias and modules.dep metadata files, the kernel modules themselves reside in e.g. /lib/modules/5.4.6/kernel and need to be copied into the cpio archive.

Copying all modules results in a ≈80 MiB archive, so it is common to only copy modules that are relevant to the initramfs’s features. This reduces archive size to ≈24 MiB.

The filtering relies on hard-coded patterns and module names. For example, disk encryption related modules are all kernel modules underneath kernel/crypto, plus kernel/drivers/md/dm-crypt.ko.

When generating a host-only initramfs (works on precisely the computer that generated it), some initramfs generators look at the currently loaded modules and just copy those.

Console Fonts and Keymaps

The kbd package’s setfont(8) and loadkeys(1) programs load console fonts and keymaps from /usr/share/consolefonts and /usr/share/keymaps, respectively.

Hence, these directories need to be copied into the cpio archive. Depending on whether the initramfs should be generic (work on many computers) or host-only (works on precisely the computer/settings that generated it), the entire directories are copied, or only the required font/keymap.

cryptsetup, setfont, loadkeys

These programs are (currently) required because minitrd does not implement their functionality.

As they are dynamically linked, not only the programs themselves need to be copied, but also the ELF dynamic linking loader (path stored in the .interp ELF section) and any ELF library dependencies.

For example, cryptsetup in distri declares the ELF interpreter /ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2 and declares dependencies on shared libraries libcryptsetup.so.12, libblkid.so.1 and others. Luckily, in distri, packages contain a lib subdirectory containing symbolic links to the resolved shared library paths (hermetic packaging), so it is sufficient to mirror the lib directory into the cpio archive, recursing into shared library dependencies of shared libraries.

cryptsetup also requires the GCC runtime library libgcc_s.so.1 to be present at runtime, and will abort with an error message about not being able to call pthread_cancel(3) if it is unavailable.

time zone data

To print log messages in the correct time zone, we copy /etc/localtime from the host into the cpio archive.

minitrd outside of distri?

I currently have no desire to make minitrd available outside of distri. While the technical challenges (such as extending the generator to not rely on distri’s hermetic packages) are surmountable, I don’t want to support people’s initramfs remotely.

Also, I think that people’s efforts should in general be spent on rallying behind dracut and making it work faster, thereby benefiting all Linux distributions that use dracut (increasingly more). With minitrd, I have demonstrated that significant speed-ups are achievable.

Conclusion

It was interesting to dive into how an initramfs really works. I had been working with the concept for many years, from small tasks such as “debug why the encrypted root file system is not unlocked” to more complicated tasks such as “set up a root file system on DRBD for a high-availability setup”. But even with that sort of experience, I didn’t know all the details, until I was forced to implement every little thing.

As I suspected going into this exercise, dracut is much slower than it needs to be. Re-implementing its generation stage in a modern language instead of shell helps a lot.

Of course, my minitrd does a bit less than dracut, but not drastically so. The overall architecture is the same.

I hope my effort helps with two things:

  1. As a teaching implementation: instead of wading through the various components that make up a modern initramfs (udev, systemd, various shell scripts, …), people can learn about how an initramfs works in a single place.

  2. I hope the significant time difference motivates people to improve dracut.

Appendix: qemu development environment

Before writing any Go code, I did some manual prototyping. Learning how other people prototype is often immensely useful to me, so I’m sharing my notes here.

First, I copied all kernel modules and a statically built busybox binary:

% mkdir -p lib/modules/5.4.6
% cp -Lr /ro/lib/modules/5.4.6/* lib/modules/5.4.6/
% cp ~/busybox-1.22.0-amd64/busybox sh

To generate an initramfs from the current directory, I used:

% find . | cpio -o -H newc | pigz > /tmp/initrd

In distri’s Makefile, I append these flags to the QEMU invocation:

-kernel /tmp/kernel \
-initrd /tmp/initrd \
-append "root=/dev/mapper/cryptroot1 rdinit=/sh ro console=ttyS0,115200 rd.luks=1 rd.luks.uuid=63051f8a-54b9-4996-b94f-3cf105af2900 rd.luks.name=63051f8a-54b9-4996-b94f-3cf105af2900=cryptroot1 rd.vconsole.keymap=neo rd.vconsole.font=latarcyrheb-sun32 init=/init systemd.setenv=PATH=/bin rw vga=836"

The vga= mode parameter is required for loading font latarcyrheb-sun32.

Once in the busybox shell, I manually prepared the required mount points and kernel modules:

ln -s sh mount
ln -s sh lsmod
mkdir /proc /sys /run /mnt
mount -t proc proc /proc
mount -t sysfs sys /sys
mount -t devtmpfs dev /dev
modprobe virtio_pci
modprobe virtio_scsi

As a next step, I copied cryptsetup and dependencies into the initramfs directory:

% for f in /ro/cryptsetup-amd64-2.0.4-6/lib/*; do full=$(readlink -f $f); rel=$(echo $full | sed 's,^/,,g'); mkdir -p $(dirname $rel); install $full $rel; done
% ln -s ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2
% cp /ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so
% cp -r /ro/cryptsetup-amd64-2.0.4-6/lib ro/cryptsetup-amd64-2.0.4-6/
% mkdir -p ro/gcc-libs-amd64-8.2.0-3/out/lib64/
% cp /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1
% ln -s /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/cryptsetup-amd64-2.0.4-6/lib
% cp -r /ro/lvm2-amd64-2.03.00-6/lib ro/lvm2-amd64-2.03.00-6/

In busybox, I used the following commands to unlock the root file system:

modprobe algif_skcipher
./cryptsetup luksOpen /dev/sda4 cryptroot1
mount /dev/dm-0 /mnt

at 2020-01-21 16:50

2020-01-19

RaumZeitLabor

Atomschutzbunkertour

Werte Tageslichtablehnende und Stahlbetonfans,

die nächste Tour ist ein Highlight für uns Kellerkinder!

Am Samstag, den 8. Februar 2020, werden wir uns den Atomschutzbunker unter dem Stadthaus in N1 anschauen. Falls ihr etwas über die Geschichte des Tiefbunkers erfahren möchtet, oder wissen wollt, welche Auswirkungen der Kalte Krieg auf Mannheim hatte, solltet ihr diesen Ausflug in den Untergrund nicht verpassen. Für RaumZeitLaborierende sind 10 Plätze bei MannheimTours für die Führung ab 15.30 Uhr reserviert. Bitte meldet euch bis zum 1. Februar 2020 per Mail an, wenn ihr dabei sein wollt.

Let’s have a blast
Falloutrattie

by flederrattie at 2020-01-19 00:00

2020-01-19

RaumZeitLabor

BrexEat Party, the British Dinner

“Another One Bites the Crust” – Queen

Dear fellow RaumZeitLaboroyals!

Anlässlich des möglicherweise (oder auch doch noch nicht) bevorstehenden EU-Austritts des Vereinigten Königreichs, wollen wir uns am Freitag, den 31. Januar zum gemeinsamen BrexEat im RZL treffen.

Steckt eure schönste Tudor-Rosenkohl-Brosche ans Jäckchen und sattelt die Corgis, denn ab 18.30 Uhr stoßen wir auf das/die Wo/ahl unserer britischen Nachbarn an.

Chicken und Vegetable Tikka Masala mit Reis, sowie Indian Treacle Tart – für Veggie- und Beefeater wird es gleichermaßen Britisch-Indische Klassiker zum Dinieren geben.

Falls ihr zum Union Snack kommen wollt, meldet euch bitte bis zum 29.01. per Mail an und schreibt uns, ob ihr Fleisch esst oder nicht. Damit es auch in Zukunft noch “Keep calm and culinary on” heißt und wir weiter lustige Motto-Essfeste machen können, bitten wir um einen Unkostenbeitrag von mindestens 8 Euro pro Person.

Drool, Britannia!
Eure Chai Spice Girls

SirMadamCorgi

by flederrattie at 2020-01-19 00:00