Planet NoName e.V.

2021-06-21

michael-herbst.com

Errors and uncertainty quantification in density-functional theory

On 8th June I was invited to the seminar of the Uncertainty Quantification (UQ) group of Prof. Youssef Marzouk at MIT. Youssef and I planned to have this seminar since my involvement with MIT's CESMIX project last February (see also this blog article), but it took use quite some time to get it arranged. Finally I managed to present my point of view on UQ in density-functional theory (DFT), sneakily re-using most of the slides I had already prepared for my recent UQ-in-DFT talk at RWTH Aachen's UQ group the week earlier.

Similar to the Aachen talk I've put strong emphasis on engaging audience participation and discussion. I first introduced the UQ group to electronic structure theory and DFT, allowing for enough time to discuss the key ideas of the physics. Then I pointed out current research in error estimation and UQ in DFT and provided a number of opportunities for interesting future UQ-related research. The discussion was very lively and I hardly made it beyond a slide without a question, which was just great. Since a lot could be gained from stronger uncertainty quantification tools in DFT in my opinion, I hoped this talk made DFT more accessible to the UQ group and made some people curious to look into the details. On my end I would definitely enjoy to learn more about UQ in the future and look forward to my future UQ-related involvements in the CESMIX project. As usual my slides are attached below.

Link Licence
Errors and uncertainty quantification in electronic-structure theory (Slides) Creative Commons License

by Michael F. Herbst at 2021-06-21 10:00 under talk, electronic structure theory, Kohn-Sham, uncertainty quantification, DFT, solid state

2021-06-15

michael-herbst.com

Talk at MATH4UQ seminar series at RWTH

On 1st June I was invited to the MATH4UQ seminar series of the Mathematics of Uncertainty Quantification chair of Prof. Raul Tempone at RWTH Aachen University.

Over the past months I got more and more interested in mathematical methods for uncertainty quantification (UQ) as an opportunity to estimate and understand errors in density-functional theory (DFT) calculations. In particular I imagine UQ methods to be useful to estimate the model error of a DFT model itself. At this level statistical approaches are likely the only feasible option for a practical error estimation, since the mathematical complexity of modern DFT models beyond the local density approximations very likely make a posteriori error analysis strategies extremely infeasible.

In my talk I explain the basics of DFT and provide a rough overview of present UQ developments in this method. Since I know very little about UQ and my audience knew very little about DFT, I intended the talk to be more of a Q&A session, where the slides are around to stimulate discussion. This turned out to work very well and I am very grateful to the many interesting questions from the audience and the enjoyful discussion. As usual my slides are attached below. Additionally a recording of my talk can be found on youtube.

Link Licence
Errors in electronic-structure theory: Status and directions for future research (Slides) Creative Commons License
Youtube recording of the talk

by Michael F. Herbst at 2021-06-15 10:00 under talk, electronic structure theory, Kohn-Sham, uncertainty quantification, DFT, solid state

2021-06-05

sECuREs website

Laptop review: ThinkPad X1 Extreme (Gen 2)

ThinkPad X1 Extreme Gen 2, pear for scale

For many of my school and university years, I used and liked my ThinkPad X200 ultraportable laptop. But now that these years are long gone, I realized my use-case for laptops had changed: instead of carrying my laptop with me every day, I am now only bringing it on occasion, for example when I travel to conferences, visit friends, or do volunteer work.

After the ThinkPad X200, I used a few different laptops:

  • MacBook Pro 13" Retina, bought for its screen
  • ThinkPad X1 Carbon, which newly introduced a hi-dpi screen to ThinkPads
  • Dell XPS 9360, for a change, to try a device that ships with Linux

With each of these devices, I have felt limited by the lack of connectors and slim compute power that comes with the Ultrabook brand, even after years of technical progress.

More compute power is nice to be able to work on projects with larger data sets, for example debiman (scanning and converting all manpages in Debian), or distri (building Linux packages).

More peripheral options such as USB ports are nice when connecting a keyboard, trackball, USB-to-serial adapter, etc., to work on a micro controller or Raspberry Pi project, for example.

So, I was ready to switch from the heaviest Ultrabooks to the lightest of the “mobile workstation” category, when I stumbled upon Lenovo’s ThinkPad X1 Extreme (Gen 2), and it piqued my curiosity.

Peripherals

Let me start by going into the key peripherals of a laptop: keyboard, touchpad and screen. I will talk about these independently from the remaining hardware because they define the experience of using the computer.

Keyboard

After having used the Dell XPS 9360 for a few years, I can confidently say that the keyboard of the ThinkPads is definitely much better, and in a noticeable way.

It’s not that the Dell keyboards are bad. But comparing the Dell and ThinkPad side-by-side makes it really clear that the ThinkPad keyboards are the best notebook keyboards.

On the ThinkPad keyboard, every key press lands exactly as I imagine. Never do I need to hit a key twice because I didn’t press it just-right, and never do I notice additional ghost key presses.

Even though I connect my external Kinesis Advantage keyboard when doing longer stretches of work, the quality of the built-in keyboard matters: a good keyboard enables using the laptop on the couch.

Touchpad

Unfortunately, while the keyboard is great, I can’t say the same about the touchpad. I mean, it’s not terrible, but it’s also not good by any stretch.

This seems to be the status quo with PC touchpads for decades. It really blows my mind that Apple’s touchpads are consistently so much better!

My only hope is that Bill Harding (GitClear), who is working on improving the Linux touchpad experience, will eventually find a magic software tweak or something…

As mentioned on the ArchWiki, I also had to adjust the sensitivity like so:

% xinput set-prop 'SynPS/2 Synaptics TouchPad' 'libinput Accel Speed' 0.5

Display

I have high demands regarding displays: since 2013, every device of mine has a hi-dpi display.

The industry hasn’t improved displays across the board as fast as I’d like, so non-hi-dpi displays are still quite common. The silver lining is that it makes laptop selection a little easier for me: anything without a decent display I can discard right away.

I’m glad to report that the 4K display in the ThinkPad X1 Extreme with its 3840x2160 pixels is sharp, bright, and generally has good viewing angles.

It’s also a touchscreen, which I don’t strictly need, but it’s nice to use it from time to time.

I use the display in 200% scaling mode, i.e. I set Xft.dpi: 192. See also HiDPI in ArchWiki.

Hardware

Spec-wise, the ThinkPad X1 Extreme is a beast!

ThinkPad X1 Extreme Specs

The build quality seems very robust to me.

Another big plus of the ThinkPad series over other laptop series is the availability of the official Hardware Maintenance Manual: you can put “ThinkPad X1 Extreme Gen 2 Hardware Maintenance Manual” into Google and will find p1_gen2_x1extreme_hmm_v1.pdf as the first hit. This manual describes in detail how to repair or upgrade your device if you want to (or have to) do it yourself.

WiFi

The built-in Intel AX200 WiFi interface works fine, provided you have a new-enough linux-firmware package and kernel version installed.

I had trouble with Linux 5.6.0, and Linux 5.6.5 fixed it. Luckily, at the time of writing, Linux 5.11 is the most recent release, so most distributions should be recent enough for things to just work.

The WiFi card reaches almost the same download speed as the most modern WiFi device I can test: a MacBook Air M1. Both are connected to my UniFi UAP-AC-HD access point.

Laptop Download Upload
ThinkPad X1 Extreme 500 Mbit/s 150 Mbit/s
MacBook Air M1 600 Mbit/s 500 Mbit/s

I’m not sure why the upload speed is so low in comparison.

GPU

The GPU in this machine is by far the most troublesome bit of hardware.

I had hoped that after many years of laptops containing Intel/nVidia hybrid graphics, this setup would largely work, but was disappointed.

Both the proprietary nVidia driver and the nouveau driver would not work reliably for me. I ran into kernel error messages and hard-freezes, with even SSH sessions to the machine breaking.

In the end, I blacklisted the nouveau driver to use Intel graphics only:

% echo blacklist nouveau | sudo tee /etc/modprobe.d/blacklist.conf 

Without the nVidia driver, the GPU will not go into powersave mode, so I remove it from the PCI bus entirely to save power:

#!/bin/zsh

sudo tee /sys/bus/pci/devices/0000\:01\:00.0/remove <<<1
sudo tee /sys/bus/pci/devices/0000\:01\:00.1/remove <<<1

You can only re-awaken the GPU with a reboot.

Obviously this isn’t a great setup — I would prefer to be able to actually use the GPU. If you have any tips or a better experience, please let me know.

Also note that the HDMI port will be unusable if you go this route, as the HDMI port is connected to the nVidia GPU only.

Battery life

The 80 Wh battery lasts between 5 to 6 hours for me, without any extra power saving tuning beyond what the Linux distribution Fedora 33 comes with by default.

This is good enough for using the laptop away from a power socket from time to time, which matches my expectation for this kind of mobile workstation.

Software support

Linux support is generally good on this machine! Yes, I provide a few pointers in this article regarding problems, patches and old software versions. But, if you use a newer Linux distribution, all of these fixes are included and things just work out of the box. I tested with Fedora 33.

For a few months, I was using this laptop exclusively with my research Linux distribution distri, so even if you just track upstream software closely, the machine works well.

Firmware updates

Lenovo partnered with the Linux Vendor Firmware Service Project (LVFS), which means that through fwupd, ThinkPad laptops such as this X1 Extreme can easily receive firmware updates!

This is a huge improvement in comparison to earlier ThinkPad models, where you had to jump through hoops with Windows-only software, or CD images that you needed to boot just right.

If your laptop has a very old firmware version (before 1.30), you might be affected by the skipping keystrokes issues. You can check using the always-handy lshw(1) tool.

Performance

The specific configuration of my ThinkPad is:

ThinkPad X1 Extreme Spec (2020)
CPU Intel Core i7-9750H CPU @ 2.60GHz
RAM 2 × 32 GB Samsung M471A4G43MB1-CTD
Disk 2 × SAMSUNG MZVLB2T0HALB-000L7 NVMe disk

You can google for CPU benchmarks and comparisons yourself, and those likely are more scientific and carefully done than I have time for.

What I can provide however, is a comparison of working on one of my projects on the ThinkPad vs. on my workstation, an Intel Core i9-9900K that I bought in 2018:

Workstation Spec (2018)
CPU Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
RAM 4 × Corsair CMK32GX4M2A2666C16
Disk Corsair Force MP600 M.2 NVMe disk

Specifically, I am comparing how long my manpage static archive generator debiman takes to analyze and render all manpages in Debian unstable, using the following command:

ulimit -n 8192; time ~/go/bin/debiman \
  -keyring=/usr/share/keyrings/debian-archive-keyring.gpg \
  -sync_codenames=, \
  -sync_suites=unstable \
  -serving_dir=/srv/man/benchmark \
  -inject_assets=~/go/src/github.com/Debian/debiman/debian-assets \
  -concurrency_render=20 \
  -alternatives_dir=~/go/src/github.com/Debian/debiman/piuparts

On both machines, I ensured that:

  1. The CPU performance governor was set to performance
  2. A warm apt-cacher-ng cache was present, i.e. network download was not part of the test.
  3. Linux kernel caches were dropped using echo 3 | sudo tee /proc/sys/vm/drop_caches
  4. I was using debiman git revision f78c160

Here are the results:

Machine Time
i9-9900K Workstation 4:57,10 (100%)
ThinkPad X1 Extreme (Gen 2) 7:19,56 (147%)

This reaffirms my impression that even high-end laptop hardware just cannot beat a workstation setup (which has more space and better thermals), but it comes close enough to be useful.

Conclusion

Positives:

  • The ergonomics of the device really are great. It is a pleasure to type on a first-class, full-size ThinkPad keyboard. The screen has good quality and a high resolution.

  • Performance-wise, this machine can almost replace a proper workstation.

Negatives are:

  • the mediocre battery life
  • an annoyingly loud fan that spins up too frequently
  • poor software/driver support for hybrid nVidia GPUs.

Notably, all of these could be improved by better power saving, so perhaps it’s just a matter of time until Linux kernel developers land some improvements…? :)

at 2021-06-05 18:43

2021-05-30

michael-herbst.com

SIAM LA: Robust and efficient accelerated methods for density-functional theory

Just one day after my talk at the SIAM Materials Science conference (blog article) I gave another talk at a SIAM meeting, this time at SIAM Linear Algebra. I was very much looking forward to participate in SIAM LA, firstly because it was the first time I attended this conference, but also secondly because it was a good opportunity to talk about our recent algorithmic work on robust DFT methods to an international crowd of mathematicians.

I presented as part of the minisymposium Theory and Practice of Extrapolation and Acceleration Methods, which consisted of three interesting sessions of historic and recent talks about extrapolation and convergence acceleration in the broadest sense of the word. Both topics about iterative methods as well as summation theory and sequence summation were discussed, which turned out to be a very enjoyful mix. In that sense I am really grateful for the mini organisers, Agnieszka Miedlar and Yousef Saad, for the invitation and for allowing me to be part of the great sessions.

Beyond the mini I enjoyed a number of talks about emerging topics in numerical linear algebra such as mixed-precision computation, low-rank tensor approximations or randomised methods. Even though the time zone difference meant that the conference was mostly running during the afternoon and late evening for me and even though the collision with SIAM Materials Science made it quite a busy week, I took a lot from SIAM LA and I'm already looking forward to next time.

Link Licence
Robust and Efficient Accelerated Methods for Kohn-Sham Density-Functional Theory Creative Commons License

by Michael F. Herbst at 2021-05-30 16:01 under talk, electronic structure theory, Julia, DFTK, numerical analysis, Kohn-Sham, high-throughput, DFT, solid state

2021-05-30

michael-herbst.com

SIAM MS: Using the density-functional toolkit to design black-box DFT methods

After being moved by one year due to the pandemic, the last two weeks (from 17th to 28th May) the SIAM materials science conference finally took place in virtual form. Unfortunately this meant that the conference was scheduled in parallel to the SIAM Linear Algebra virtual conference, where I also presented (blog article), which made my past two weeks rather busy.

At SIAM Materials I was invited to talk in the Minisymposium Numerics of electronic structure calculations, which was organised by the steering committee of the GAMM activity group moansi (Modelling, Analysis and Simulation of Molecular Systems). Besides interesting sessions about some mathematical insights to electronic-structure methods, this gave the mini the additional feature of a spring gathering for the usual crowd of the activity group, which I already had the pleasure to meet at previous moansi workshops.

In my talk I gave a broad overview of the recent projects we realised with the density-functional toolkit for making self-consistent field calculations for density-functional theory more robust and reliable. Apart from our work on preconditioning inhomogeneous systems with the LDOS preconditioner I also presented first work-in-progress results on an adaptive damping strategy we recently came up with. The idea of our method is to use a line search based on an approximate quadratic model for cases where a proposed SCF step is not successful (i.e. increases energy and SCF residual). This firstly allows to automatically choose the damping parameter (instead of requiring the user to choose one by trial and error). Secondly it makes the SCF procedure more robust, especially for tricky cases. For example in our experiments on Heusler alloys our adaptive damping approach was the only method that managed to converge on some cases.

Link Licence
Using the density-functional toolkit (DFTK) to design black-box methods in density-functional theory Creative Commons License

by Michael F. Herbst at 2021-05-30 16:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, numerical analysis, Kohn-Sham, high-throughput, DFT, solid state

2021-05-28

sECuREs website

How I configured and then promptly returned a MikroTik CCR2004 router for Fiber7

init7 recently announced that with their FTTH fiber offering Fiber7, they will now sell and connect you with 25 Gbit/s (Fiber7-X2) or 10 Gbit/s (Fiber7-X) fiber optics, if you want more than 1 Gbit/s.

This is possible thanks to the upgrade of their network infrastructure as part of their “lifecycle management”, meaning the old networking gear was declared as end-of-life. The new networking gear supports not only SFP+ modules (10 Gbit/s), but also SFP28 modules (25 Gbit/s).

Availability depends on the POP (Point Of Presence, German «Anschlusszentrale») you’re connected to. My POP is planned to be upgraded in September.

Nevertheless, I wanted to already prepare my end of the connection, and ordered the only router that init7 currently lists as compatible with Fiber7-X/X2: the MikroTik CCR2004-1G-12S+2XS.

MikroTik CCR2004-1G-12S+2XS

The rest of this article walks through what I needed to configure (a lot, compared to Ubiquiti or OpenWRT) in the hope that it helps other MikroTik users, and then ends in Why I returned it.

Configuration

Connect an Ethernet cable to the management port on the MikroTik and:

  1. log into the system using ssh admin@192.168.88.1
  2. point a web browser to “Webfig” at http://192.168.88.1/ (no login required)

Update firmware

Update the CCR2004 to the latest firmware version. At the time of writing, the Long-term RouterOS track is at version 6.47.9 for the CCR2004 (ARM64):

  1. Use /system package print to display the current version.
  2. Upload routeros-arm64-6.47.9.npk using Webfig.
  3. /system reboot and verify that /system package print shows 6.47.9 now.

Set up auth

Set a password to prevent others from logging into the router:

/user set admin password=secret

Additionally, you can enable passwordless SSH key login, if you want.

  1. Create an RSA key, because ed25519 keys are not supported:

    % ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key: /home/michael/.ssh/id_mikrotik
    
  2. Upload the id_mikrotik.pub file in Webfig

  3. Import the SSH public key for the admin user:

    /user ssh-keys import user=admin public-key-file=id_mikrotik.pub
    

Lock down the router

  1. Enable HTTPS in Webfig.

  2. Disable all remote access except for SSH and HTTPS:

    /ip service disable telnet,ftp,www,api,api-ssl,winbox
    
  3. Follow MikroTik Securing Your Router recommendations:

    /tool mac-server set allowed-interface-list=none
    /tool mac-server mac-winbox set allowed-interface-list=none
    /tool mac-server ping set enabled=no
    /tool bandwidth-server set enabled=no
    /ip ssh set strong-crypto=yes
    /ip neighbor discovery-settings set discover-interface-list=none
    

Enable DHCPv6 Client

For some reason, you need to explicitly enable IPv6 in 2021:

/system package enable ipv6
/system reboot

MikroTik says this is a precaution so that users don’t end up with default-open firewall settings for IPv6. But then why don’t they just add some default firewall rules?!

Anyway, to configure and immediately enable the DHCPv6 client, use:

/ipv6 dhcp-client add pool-name=fiber7 pool-prefix-length=64 interface=sfp28-1 add-default-route=yes use-peer-dns=no request=address,prefix

Modify the IPv6 DUID

Unfortunately, MikroTik does not offer any user interface to set the IPv6 DUID, which I need to configure to obtain my static IPv6 network prefix from my provider’s DHCPv6 server.

Luckily, the DUID is included in backup files, so we can edit it and restore from backup:

  1. Run /system backup save
  2. Download the backup file in Webfig by navigating to Files → Backup → Download.

  3. Convert the backup file to hex in textual form, edit the DUID and convert it back to binary:

    % xxd MikroTik-19700102-0111.backup MikroTik-19700102-0111.backup.hex
    
    % emacs MikroTik-19700102-0111.backup.hex
    # Search for “dhcp/duid” in the file and edit accordingly:
    # got:  00030001085531dfa69e
    
    % xxd -r MikroTik-19700102-0111.backup.hex MikroTik-19700102-0111-patched.backup
    
  4. Upload the file in Webfig, then restore the backup:

    /system backup load name=MikroTik-19700102-0111-patched.backup

Enable IPv6 Router Advertisements

To make the router assign an IPv6 address from the obtained pool for itself, and then send IPv6 Router Advertisements to the network, set:

/ipv6 address add address=::1 from-pool=fiber7 interface=bridge1
/ipv6 nd add interface=bridge1 managed-address-configuration=yes other-configuration=yes

Enable DHCPv4 Client

To configure and immediately enable the DHCPv4 client on the upstream port, use:

/ip dhcp-client add interface=sfp28-1 disabled=no

I also changed the MAC address to match my old router’s address, just to take maximum precaution to avoid any Port Security related issues with my provider’s DHCP server:

/interface ethernet set sfp28-1 mac-address=00:0d:fa:4c:0c:31

Enable DNS Server

By default, only the MikroTik itself can send DNS queries. Enable access for network clients:

/ip dns set allow-remote-requests=yes

Enable DHCPv4 Server

First, let’s bundle all SFP+ ports into a single bridge interface:

/interface bridge add name=bridge1
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus1 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus2 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus3 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus4 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus5 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus6 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus7 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus8 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus9 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus10 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus11 hw=yes
/interface bridge port add bridge=bridge1 interface=sfp-sfpplus12 hw=yes

This means we’ll use the device like a big switch with routing between the switch and the uplink port sfp28-1.

To configure the DHCPv4 Server, configure an IP address, then start the setup wizard:

/ip address add address=10.0.0.1/24 interface=bridge1
/ip dhcp-server setup
Select interface to run DHCP server on

dhcp server interface: bridge1
Select network for DHCP addresses

dhcp address space: 10.0.0.0/24
Select gateway for given network

gateway for dhcp network: 10.0.0.1
Select pool of ip addresses given out by DHCP server

addresses to give out: 10.0.0.2-10.0.0.240
Select DNS servers

dns servers: 10.0.0.1
Select lease time

lease time: 20m

Enable IPv4 NAT

We need NAT to route all IPv4 traffic over our single public IP address:

/ip firewall nat add action=masquerade chain=srcnat out-interface=sfp28-1 to-addresses=0.0.0.0

Disable NAT services for security, e.g. to mitigate against NAT slipstreaming attacks:

/ip firewall service-port disable ftp,tftp,irc,h323,sip,pptp,udplite,dccp,sctp

I can observe ≈10-20% CPU load when doing a Gigabit speed test over IPv4.

TODO list

The following features I did not get around to configuring, but they were on my list:

Why I returned it

Initially, I thought the device’s fan spins up only at boot, and then the large heatsink takes care of all cooling needs. Unfortunately, after an hour or so into my experiment, I noticed that the MikroTik would spin up the fan for a whole minute or so occasionally! Very annoying.

I also ran into weird DNS slow-downs, which I didn’t fully diagnose. In Wireshark, it looked like my machine sent 2 DNS queries but received only 1 DNS result, and then waited for a timeout.

I also noticed that I have a few more unexpected dependencies such as my home automation using DHCP lease state by subscribing to an MQTT topic. Addressing this issue and other similar little problems would have taken a bunch more time and would have resulted in a less reliable system than I have today.

Since I last used MikroTik in 2014 the software seems to have barely changed. I wish they finally implemented some table-stakes features like DNS resolution for DHCP hostnames.

Given all the above, I no longer felt like getting enough value for the money from the MikroTik, and found it easier to just switch back to my own router7 and return the MikroTik.

I will probably stick with the router7 software, but exchange the PC Engines APU with the smallest PC that has enough PCI-E bandwidth for a multi-port SFP28 network card.

Appendix A: Full configuration

# may/28/2021 11:40:15 by RouterOS 6.47.9
# software id = 6YZE-HKM8
#
# model = CCR2004-1G-12S+2XS
/interface bridge
add name=bridge1
/interface ethernet
set [ find default-name=sfp28-1 ] auto-negotiation=no mac-address=00:0d:fa:4c:0c:31
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=dhcp_pool0 ranges=10.0.0.2-10.0.0.240
/ip dhcp-server
add address-pool=dhcp_pool0 disabled=no interface=bridge1 lease-time=20m name=dhcp1
/interface bridge port
add bridge=bridge1 interface=sfp-sfpplus1
add bridge=bridge1 interface=sfp-sfpplus2
add bridge=bridge1 interface=sfp-sfpplus3
add bridge=bridge1 interface=sfp-sfpplus4
add bridge=bridge1 interface=sfp-sfpplus5
add bridge=bridge1 interface=sfp-sfpplus6
add bridge=bridge1 interface=sfp-sfpplus7
add bridge=bridge1 interface=sfp-sfpplus8
add bridge=bridge1 interface=sfp-sfpplus9
add bridge=bridge1 interface=sfp-sfpplus10
add bridge=bridge1 interface=sfp-sfpplus11
add bridge=bridge1 interface=sfp-sfpplus12
/ip neighbor discovery-settings
set discover-interface-list=none
/ip address
add address=192.168.88.1/24 comment=defconf interface=ether1 network=192.168.88.0
add address=10.0.0.1/24 interface=bridge1 network=10.0.0.0
/ip dhcp-client
add disabled=no interface=sfp28-1 use-peer-dns=no
/ip dhcp-server lease
add address=10.0.0.54 mac-address=DC:A6:32:02:AA:10
/ip dhcp-server network
add address=10.0.0.0/24 dns-server=10.0.0.1 domain=lan gateway=10.0.0.1
/ip dns
set allow-remote-requests=yes servers=8.8.8.8,8.8.4.4,2001:4860:4860::8888,2001:4860:4860::8844
/ip firewall nat
add action=masquerade chain=srcnat out-interface=sfp28-1 to-addresses=0.0.0.0
/ip firewall service-port
set ftp disabled=yes
set tftp disabled=yes
set irc disabled=yes
set h323 disabled=yes
set sip disabled=yes
set pptp disabled=yes
set udplite disabled=yes
set dccp disabled=yes
set sctp disabled=yes
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www disabled=yes
set www-ssl certificate=webfig disabled=no
set api disabled=yes
set winbox disabled=yes
set api-ssl disabled=yes
/ip ssh
set strong-crypto=yes
/ipv6 address
add address=::1 from-pool=fiber7 interface=bridge1
/ipv6 dhcp-client
add add-default-route=yes interface=sfp28-1 pool-name=fiber7 request=address,prefix use-peer-dns=no
/ipv6 nd
add interface=bridge1 managed-address-configuration=yes other-configuration=yes
/system clock
set time-zone-name=Europe/Zurich
/system logging
add topics=dhcp
/tool bandwidth-server
set enabled=no
/tool mac-server
set allowed-interface-list=none
/tool mac-server mac-winbox
set allowed-interface-list=none
/tool mac-server ping
set enabled=no

at 2021-05-28 12:57

2021-05-20

RaumZeitLabor

GnOnlinePN – Workadventure Edition

Hey, ihr Aillioliebhaber!

Das wohlriechendste Ereignis des Jahres steht an: Am 5. Juni wird es garlicious bei der GnOnlinePN21!

Dieses Jahr wollen wir uns ab 20 Uhr im Workadventure treffen und gemeinsam dem weißen Knollengold huldigen. Untermalt wird das ganze mit den Klängen von seiner Durchlaucht, RZL Resident-DJ Asthma!

Genug mundgeruchgerechten Abstand bietet das Digit-ail-Event auch, sodass ihr euch mit dem Genuss knoblauchhaltiger Speisen und Getränke wirklich nicht zurückhalten müsst.

Lasst uns in der World alliumiteinander Garlic Bread grillen, Rezepte tauschen und mit Knoblauchtschunk anstoßen.

Den Link zur Teilnahme werden wir im Laufe des Veranstaltungsnachmittages über Twitter verbreiten. Bei Fragen oder Ideen könnt ihr uns via Mail oder Twitter erreichen.

Bis dann!
Man riecht sich!

Gnoblauchgnollen

by flederrattie at 2021-05-20 00:00

2021-05-16

sECuREs website

Home network 10 Gbit/s upgrade

After adding a fiber link to my home network, I am upgrading that link from 1 Gbit/s to 10 Gbit/s.

As a reminder, conceptually the fiber link is built using two media converters from/to ethernet:

0.9mm thin fiber cables

Schematically, this is what’s connected to both ends:

1 Gbit/s bottleneck

All links are 1 Gbit/s, so it’s easy to see that, for example, transfers between chuchi↔router7 and storage2↔midna cannot both use 1 Gbit/s at the same time.

This upgrade serves 2 purposes:

  1. Raise the floor to 1 Gbit/s end-to-end: Ensure that serving large files (e.g. distri Linux images and packages) does no longer impact, and is no longer impacted by, other bandwidth flows that also use this transfer link in my home network, e.g. daily backups.

  2. Raise the ceiling to 10 Gbit/s: Make it possible to selectively upgrade Linux PCs on either end of the link to 10 Gbit/s peak bandwidth.

Note that the internet uplink remains untouched at 1 Gbit/s — only transfers within the home network can happen at 10 Gbit/s.

Replacing the media converters with Mikrotik switches

We first replace both media converters and switches with a Mikrotik CRS305-1G-4S+IN.

Mikrotik CRS305-1G-4S+IN

This device costs 149 CHF on digitec and comes with 5 ports:

  • 1 × RJ45 Ethernet port for management, can be used as a regular 1 Gbit/s port.
  • 4 × SFP+ ports

Each SFP+ port can be used with either an RJ-45 Ethernet or a fiber SFP+ module, but beware! As Nexus2kSwiss points out on twitter, the Mikrotik supports at most 2 RJ-45 SFPs at a time!

Fiber module upgrade

I’m using 10 Gbit/s fiber SFP+ modules for the fiber link between my kitchen and living room.

To make use of the 10 Gbit/s link between the switches, all devices that should get their guaranteed 1 Gbit/s end-to-end connection need to be connected directly to a Mikrotik switch.

I’m connecting the PCs to the switch using Direct Attach Cables (DAC) where possible. The advantage of DAC cables over RJ45 SFP+ modules is their lower power usage and heat.

The resulting list of SFP modules used in the two Mikrotik switches looks like so:

Mikrotik 1 SFP speed speed Mikrotik 2 SFP
chuchi 10 Gbit/s DAC 10 Gbit/s DAC midna
storage2 1 Gbit/s RJ45 1 Gbit/s RJ45 router7
10 Gbit/s BiDi ⬅ BiDi fiber link ➡ 10 Gbit/s BiDi

Hardware sourcing

The total cost of this upgrade is 676 CHF, with the biggest chunk spent on the Mellanox ConnectX-3 network cards and MikroTik switches.

FS (Fiber Store) order

FS.COM was my go-to source for anything fiber-related. Everything they have is very affordable, and products in stock at their German warehouse arrive in Switzerland (and presumably other European countries, too) within the same week.

num price name
1 × 34 CHF Generic Compatible 10GBASE-BX BiDi SFP+ 1270nm-TX/1330nm-RX 10km DOM Transceiver Module, FS P/N: SFP-10G-BX #74681
1 × 34 CHF Generic Compatible 10GBASE-BX BiDi SFP+ 1330nm-TX/1270nm-RX 10km DOM Transceiver Module, FS P/N: SFP-10G-BX #74682
2 × 14 CHF 3m Generic Compatible 10G SFP+ Passive Direct Attach Copper Twinax Cable
0 × 56 CHF SFP+ Transceiver Modul - Generisch kompatibel 10GBASE-T SFP+ Kupfer RJ-45 30m, FS P/N: SFP-10G-T #74680

digitec order

There are a few items that FS.COM doesn’t stock. These I bought at digitec, a big and popular electronics store in Switzerland. My thinking is that if products are available at digitec, they most likely are available at your preferred big electronics store, too.

num price name
2 × 149 CHF Mikrotik CRS305-1G-4S+IN switch

misc order

The Mellanox cards are not as widely available as I’d like.

I’m waiting for an FS.COM card to arrive, which might be a better choice.

num price name
2 × 129 EUR Mellanox ConnectX-3 MCX311A-XCAT

Mikrotik switch setup

I want to use my switches only as switches, not for any routing or other layer 3 features that might reduce bandwidth, so I first reboot the MikroTik CRS305-1G-4S+ into SwOS:

  1. In the web interface menu, navigate to System → Routerboard → Settings, open the Boot OS drop-down and select option SwOS.

  2. In the web interface menu, navigate to System → Reboot.

  3. After the device rebooted, change the hostname which was reset to MikroTik.

Next, upgrade the firmware to 2.12 to fix a weird issue with certain combinations of SFP modules (SFP-10G-BX in SFP1, SFP-10G-T in SFP2):

  1. In the SwOS web interface, select the Upgrade tab, then click Download & Upgrade.

Network card setup (Linux)

After booting with the Mellanox ConnectX3 in a PCIe slot, the card should show up in dmesg(8) :

mlx4_core: Mellanox ConnectX core driver v4.0-0
mlx4_core: Initializing 0000:03:00.0
mlx4_core 0000:03:00.0: DMFS high rate steer mode is: disabled performance optimized steering
mlx4_core 0000:03:00.0: 31.504 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x4 link)
mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
mlx4_en 0000:03:00.0: Activating port:1
mlx4_en: 0000:03:00.0: Port 1: Using 16 TX rings
mlx4_en: 0000:03:00.0: Port 1: Using 16 RX rings
mlx4_en: 0000:03:00.0: Port 1: Initializing port
mlx4_en 0000:03:00.0: registered PHC clock
mlx4_core 0000:03:00.0 enp3s0: renamed from eth0
<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
<mlx4_ib> mlx4_ib_add: counter index 1 for port 1 allocated 1
mlx4_en: enp3s0: Steering Mode 1
mlx4_en: enp3s0: Link Up

Another way to verify the device is running at maximum speed on the computer’s PCIe bus, is to ensure LnkSta matches LnkCap in the lspci(8) output:

% sudo lspci -vv
03:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
	Subsystem: Mellanox Technologies Device 0055
[…]
	Capabilities: [60] Express (v2) Endpoint, MSI 00
[…]
		LnkCap:	Port #8, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[…]

You can verify your network link is running at 10 Gbit/s using ethtool(8) :

% sudo ethtool enp3s0
Settings for enp3s0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseKX/Full
	                        10000baseKR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseKX/Full
	                        10000baseKR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes

Benchmarking batch transfers

As mentioned in the introduction, routing 10 Gbit/s is out of scope in this article. If you’re interested in routing performance, check out Andree Toonk’s post which confirms that Linux can route 10 Gbit/s at line rate.

The following sections cover individual batch transfers of large files, not many small flows.

iperf3 speed test

Out of the box, the speeds that iperf3(1) measures are decent:

chuchi % iperf3 --version
iperf 3.6 (cJSON 1.5.2)
Linux chuchi 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64
Optional features available: CPU affinity setting, IPv6 flow label, SCTP, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication

chuchi % iperf3 --server
[…]

midna % iperf3 --version          
iperf 3.9 (cJSON 1.7.13)
Linux midna 5.12.1-arch1-1 #1 SMP PREEMPT Sun, 02 May 2021 12:43:58 +0000 x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, authentication

midna % iperf3 --client chuchi.lan
Connecting to host 10.0.0.173, port 5201
[  5] local 10.0.0.76 port 43168 connected to 10.0.0.173 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.42 Gbits/sec    0   1.62 MBytes       
[  5]   1.00-2.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.70 MBytes       
[  5]   2.00-3.00   sec  1.10 GBytes  9.41 Gbits/sec    0   1.70 MBytes       
[  5]   3.00-4.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.78 MBytes       
[  5]   4.00-5.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.87 MBytes       
[  5]   5.00-6.00   sec  1.10 GBytes  9.42 Gbits/sec    0   1.87 MBytes       
[  5]   6.00-7.00   sec  1.10 GBytes  9.42 Gbits/sec    0   1.87 MBytes       
[  5]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec    0   1.87 MBytes       
[  5]   8.00-9.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.96 MBytes       
[  5]   9.00-10.00  sec  1.09 GBytes  9.38 Gbits/sec  402   1.52 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec  402             sender
[  5]   0.00-10.00  sec  11.0 GBytes  9.40 Gbits/sec                  receiver

iperf Done.

HTTP speed test

Downloading a file from an nginx(1) web server using curl(1) is fast, too:

% curl -o /dev/null http://chuchi.lan/distri/supersilverhaze/img/distri-disk.img.zst
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  934M  100  934M    0     0  1118M      0 --:--:-- --:--:-- --:--:-- 1117M

Note that this download was served from RAM (Linux page cache). The next upgrade I need to do in this machine is replace the SATA SSD with an NVMe SSD, because the disk is now the bottleneck.

Conclusion

This was a pleasantly simple upgrade: plug in a bunch of new hardware and batch transfers become faster.

The Mikrotik switch provides great value for money, and the Mellanox ConnectX-3 cards work well, provided you can find them.

Appendix A: Switching from RJ45 SFP+ modules to Direct Attach Cables

Originally, I connected all PCs to the MikroTik switches with RJ45 SFP+ modules for two reasons:

  1. I bought Intel X550-T2 PCIe 10 Gbit/s network cards that RJ45 as my first choice.
  2. The SFP+ modules are backwards-compatible and can be used with 1 Gbit/s RJ45 devices, too, which makes for a nice incremental upgrade path.

However, I later was made aware that the RJ45 SFP+ modules use significantly more power and run significantly hotter than Direct Attach Cables (DAC).

I measured it: each RJ45 SFP+ module was causing my BiDi SFP+ module to run 5℃ hotter!

Around 06/02 I replaced one RJ45 SFP+ module with a Direct Attach Cable.

Around 06/06 I replaced the remaining RJ45 SFP+ module with another Direct Attach Cable.

As you can see, this caused a 10℃ drop in temperature of the BiDi SFP+ module.

The MikroTik is still uncomfortably hot, making it hard to work with when it’s powered on.

Appendix B: Network card setup (Linux) with Intel X550-T2

For reference, here is the Network card setup (Linux) section, but with the Intel X550-T2 that I previously used.

After booting with the Intel X550-T2 in a PCIe slot, the card should show up in dmesg(8) :

ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 XDP Queue count = 0
ixgbe 0000:03:00.0: 31.504 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x4 link)
ixgbe 0000:03:00.0: MAC: 4, PHY: 0, PBA No: H86377-006
ixgbe 0000:03:00.0: Intel(R) 10 Gigabit Network Connection
libphy: ixgbe-mdio: probed
ixgbe 0000:03:00.1: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 XDP Queue count = 0
ixgbe 0000:03:00.1: 31.504 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x4 link)
ixgbe 0000:03:00.1: MAC: 4, PHY: 0, PBA No: H86377-006
tun: Universal TUN/TAP device driver, 1.6
ixgbe 0000:03:00.1: Intel(R) 10 Gigabit Network Connection
libphy: ixgbe-mdio: probed
ixgbe 0000:03:00.0 enp3s0f0: renamed from eth0
ixgbe 0000:03:00.1 enp3s0f1: renamed from eth1
pps pps0: new PPS source ptp1
ixgbe 0000:03:00.0: registered PHC device on enp3s0f0
pps pps1: new PPS source ptp2
ixgbe 0000:03:00.1: registered PHC device on enp3s0f1

I think if you only use 1 of the card’s 2 network ports, you might not hit any bottlenecks even when running the card only at PCIe 3.0 ×2 link speed, but I haven’t verified this!

Another way to verify the device is running at maximum speed on the computer’s PCIe bus, is to ensure LnkSta matches LnkCap in the lspci(8) output:

% sudo lspci -vv
[…]
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
        Subsystem: Intel Corporation Ethernet Converged Network Adapter X550-T2
[…]
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
[…]
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[…]

You can verify your network link is running at 10 Gbit/s using ethtool(8) :

% sudo ethtool enp3s0f1 
Settings for enp3s0f1:
	Supported ports: [ TP ]
	Supported link modes:   100baseT/Full
	                        1000baseT/Full
	                        10000baseT/Full
	                        2500baseT/Full
	                        5000baseT/Full
	Supported pause frame use: Symmetric
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  100baseT/Full
	                        1000baseT/Full
	                        10000baseT/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: on
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	MDI-X: Unknown
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
	Link detected: yes

Appendix C: BIOS update for Mellanox ConnectX-3

On my Supermicro X11SSZ-QF mainboard, the Mellanox ConnectX-3 would not establish a link. The Mellanox Linux kernel driver logged a number of errors:

kernel: mlx4_en: enp1s0: CQE error - cqn 0x8e, ci 0x0, vendor syndrome: 0x57 syndrome: 0x4
kernel: mlx4_en: enp1s0: Related WQE - qpn 0x20d, wqe index 0x0, wqe size 0x40
kernel: mlx4_en: enp1s0: Scheduling port restart
kernel: mlx4_core 0000:01:00.0: Internal error detected:
kernel: mlx4_core 0000:01:00.0: device is going to be reset
kernel: mlx4_core 0000:01:00.0: crdump: devlink snapshot disabled, skipping
kernel: mlx4_core 0000:01:00.0: device was reset successfully
kernel: mlx4_en 0000:01:00.0: Internal error detected, restarting device
kernel: <mlx4_ib> mlx4_ib_handle_catas_error: mlx4_ib_handle_catas_error was started
kernel: <mlx4_ib> mlx4_ib_handle_catas_error: mlx4_ib_handle_catas_error ended
kernel: mlx4_core 0000:01:00.0: command 0x21 failed: fw status = 0x1
kernel: pcieport 0000:00:1c.0: AER: Uncorrected (Fatal) error received: 0000:00:1c.0
kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
kernel: mlx4_core 0000:01:00.0: command 0x43 failed: fw status = 0x1
kernel: infiniband mlx4_0: ib_query_port failed (-5)
kernel: pcieport 0000:00:1c.0:   device [8086:a110] error status/mask=00040000/00010000
kernel: pcieport 0000:00:1c.0:    [18] MalfTLP                (First)
kernel: pcieport 0000:00:1c.0: AER:   TLP Header: 4a000001 01000004 00000000 00000000
kernel: mlx4_core 0000:01:00.0: mlx4_pci_err_detected was called
kernel: mlx4_core 0000:01:00.0: Fail to set mac in port 1 during unregister
systemd-networkd[313]: enp1s0: Link DOWN
kernel: mlx4_en: enp1s0: Failed activating Rx CQ
kernel: mlx4_en: enp1s0: Failed restarting port 1
kernel: mlx4_en: enp1s0: Link Down
kernel: mlx4_en: enp1s0: Close port called
systemd-networkd[313]: enp1s0: Lost carrier
kernel: mlx4_en 0000:01:00.0: removed PHC
kernel: mlx4_core 0000:01:00.0: mlx4_restart_one_up: ERROR: mlx4_load_one failed, pci_name=0000:01:00.0, err=-5
kernel: mlx4_core 0000:01:00.0: mlx4_restart_one was ended, ret=-5
systemd-networkd[313]: enp1s0: DHCPv6 lease lost
kernel: pcieport 0000:00:1c.0: AER: Root Port link has been reset
kernel: mlx4_core 0000:01:00.0: mlx4_pci_resume was called
kernel: mlx4_core 0000:01:00.0: Multiple PFs not yet supported - Skipping PF
kernel: mlx4_core 0000:01:00.0: mlx4_pci_resume: mlx4_load_one failed, err=-22
kernel: pcieport 0000:00:1c.0: AER: device recovery successful

What helped was to update the X11SSZ-QF BIOS to the latest version.

at 2021-05-16 15:33

2021-05-11

michael-herbst.com

Infomath seminar: A one-hour introduction to Julia

Just one day after my talk at the Lüchow group in Aachen, on 6th May I was asked to give a short introduction to Julia at the Infomath seminar series at Sorbonne Université. While virtual seminars certainly don't share the same spirit as in-person ones do, the ability to quickly hop between seminar series organised all across the world has advantages, too.

In my one-hour talk I gave a short introduction into Julia, focusing on the perspective of applied mathematicians. I gave a short speed comparsion of Julia, python and C on a simple example and presented some of its strengths in different application scenarios (numerical linear algebra, numerical methods for solving PDEs, data science and statistical learning). For future reference I have put the Jupyter notebooks I used during the lecture on github.

Link Licence
An introductory hour to the Julia programming language (Github repository) Creative Commons License

by Michael F. Herbst at 2021-05-11 10:01 under talk, Julia, programming and scripting

2021-05-11

michael-herbst.com

Talk at Lüchow group seminar at RWTH

On 5th May I was invited to present a short summary of my research at the local theoretical chemistry research group of Prof. Dr. Arne Lüchow at RWTH Aachen. Because I wanted to give a broad overview of topics that I worked on over the past few years, I did not really go into many details. Nevertheless my talk lead to interesting and lively discussions, which I enjoyed very much. Clearly the pandemic makes it difficult to get in touch with other researchers, such that I was pretty happy to be able to get to get in touch with some more chemists from the RWTH during the seminar.

Link Licence
High-throughput electronic-structure simulations: Where reliability really matters (Slides) Creative Commons License

by Michael F. Herbst at 2021-05-11 10:00 under talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, solid state

2021-05-08

sECuREs website

Measure and reduce keyboard input latency with QMK on the Kinesis Advantage

Over the last few years, I worked on a few projects around keyboard input latency:

In 2018, I introduced the kinX keyboard controller with 0.2ms of input latency.

In 2020, I introduced the kinT keyboard controller, which works with a wide range of Teensy micro controllers, and both the old KB500 and the newer KB600 Kinesis Advantage models.

While the 2018 kinX controller had built-in latency measurement, I was starting from scratch with the kinT design, where I wanted to use the QMK keyboard firmware instead of my own firmware.

That got me thinking: instead of adjusting the firmware to self-report latency numbers, is there a way we can do latency measurements externally, ideally without software changes?

This article walks you through how to set up a measurement environment for your keyboard controller’s input latency, be it original or self-built. I’ll use a Kinesis Advantage keyboard, but this approach should generalize to all keyboards.

I will explain a few common causes for extra keyboard input latency and show you how to fix them in the QMK keyboard firmware.

Measurement setup

The idea is to connect a Teensy 4.0 (or similar), which simulates pressing the Caps Lock key and measures the duration until the keypress resulted in a Caps Lock LED change.

We use the Caps Lock key because it is one of the few keys that results in an LED change.

Here you can see the Teensy 4.0 connected to the kinT controller, connected to a laptop:

measurement setup

Enable the debug console in QMK

Let’s get our QMK working copy ready for development! I like to work in a separate QMK working copy per project:

% docker run -it -v $PWD:/usr/src archlinux
# pacman -Sy && pacman -S qmk make which diffutils python-hidapi python-pyusb
# cd /usr/src
# qmk clone -b develop qmk/qmk_firmware $PWD/qmk-input-latency
# cd qmk-input-latency

I compile the firmware for my keyboard like so:

# make kinesis/kint36:stapelberg

To enable the debug console, I need to edit my QMK keymap stapelberg by updating keyboards/kinesis/keymaps/stapelberg/rules.mk to contain:

CONSOLE_ENABLE = yes

After compiling and flashing the firmware, the hid_listen tool will detect the device and listen for QMK debug messages:

% sudo hid_listen
Waiting for device:...
Listening:

Finding the pins

Let’s locate the Caps Lock key’s corresponding row and column in our keyboard matrix!

We can make QMK show which keys are recognized after each scan by adding to keyboards/kinesis/keymaps/stapelberg/keymap.c the following code:

void keyboard_post_init_user() {
  debug_config.enable = true;
  debug_config.matrix = true;
}

Now we’ll see in the hid_listen output which key is active when pressing Caps Lock:

r/c 01234567
00: 00100000
01: 00000000
[…]

For our kinT controller, Caps Lock is on QMK matrix row 0, column 2.

In the kinT schematic, the corresponding signals are ROW_EQL and COL_2.

To hook up the Teensy 4.0 latency measurement driver, I am making the following GPIO connections to the kint36, kint41 or kint2pp (with voltage converter!) keyboard controllers:

driver 4.0 signal kint36, kint41 kint2pp (5V!)
GND GND GND GND
pin 10 ROW_EQL pin 8 D7
pin 11 COL_2 pin 15 F7
pin 12 LED_CAPS_LOCK pin 12 C1

Eager Caps Lock LED

When the host signals to the keyboard that Caps Lock is now turned on, the QMK firmware first updates a flag in the USB interrupt handler, but only updates the Caps Lock LED pin after the next matrix scan has completed.

This is fine in normal usage, but our measurement readings will get more precise if we immediately update the Caps Lock LED pin. We can do this in set_led_transfer_cb in tmk_core/protocol/chibios/usb_main.c, which is called from the USB interrupt handler:

#include "gpio.h"

static void set_led_transfer_cb(USBDriver *usbp) {
    if (usbp->setup[6] == 2) { /* LSB(wLength) */
        uint8_t report_id = set_report_buf[0];
        if ((report_id == REPORT_ID_KEYBOARD) || (report_id == REPORT_ID_NKRO)) {
            keyboard_led_state = set_report_buf[1];
        }
    } else {
        keyboard_led_state = set_report_buf[0];
    }
    if ((keyboard_led_state & 2) != 0) {
      writePinLow(C7); // turn on CAPS_LOCK LED
    } else {
      writePinHigh(C7); // turn off CAPS_LOCK LED
    }
}

Host side (Linux)

On the USB host, i.e. the Linux computer, I switch to a Virtual Terminal (VT) by stopping my login manager (killing my current graphical session!):

% sudo systemctl stop gdm

With the Virtual Terminal active, we know that the Caps Lock key press will be handled entirely in kernel driver code without having to round-trip to userspace.

We can verify this by collecting stack traces with bpftrace(8) when the kernel executes the kbd_event function in drivers/tty/vt:

% sudo bpftrace -e 'kprobe:kbd_event { @[kstack] = count(); }'

After pressing Caps Lock and cancelling the bpftrace process, you should see a stack trace.

I then measured the baseline end-to-end latency, using my measure-fw firmware running on the FRDM-K66F eval kit, a cheap and widely available USB 2.0 High Speed device. The firmware measures the latency between a button press and the USB HID report for the Caps Lock LED, but without any additional matrix scanning delay or similar:

% cat /dev/ttyACM0
sof=74 μs	report=393 μs
sof=42 μs	report=512 μs
sof=19 μs	report=512 μs
sof=39 μs	report=488 μs
sof=20 μs	report=518 μs
sof=90 μs	report=181 μs
sof=42 μs	report=389 μs
sof=7 μs	report=319 μs

This is the quickest reaction we can get out of this computer. Anything on top (e.g. X11, application) will be slower, so this measurement establishes a lower bound.

Code to simulate key presses and take measurements

I’m running the latencydriver Arduino sketch, with the Arduino IDE configured for:

Teensy 4.0 (USB Type: Serial, CPU Speed: 600 MHz, Optimize: Faster)

Here’s how we set up the pins in the measurement driver Teensy 4.0:

void setup() {
  Serial.begin(9600);

  // Connected to kinT pin 15, COL_2
  pinMode(11, OUTPUT);
  digitalWrite(11, HIGH);

  // Connected to kinT pin 8, ROW_EQL.
  // Pin 11 will be high/low in accordance with pin 10
  // to simulate a key-press, and always high (unpressed)
  // otherwise.
  pinMode(10, INPUT_PULLDOWN);
  attachInterrupt(digitalPinToInterrupt(10), onScan, CHANGE);

  // Connected to the kinT LED_CAPS_LOCK output:
  pinMode(12, INPUT_PULLDOWN);
  attachInterrupt(digitalPinToInterrupt(12), onCapsLockLED, CHANGE);
}

In order to make a key read as pressed, we need to connect the column with the row in the keyboard matrix, but only when the column is scanned. We do that in the interrupt handler like so:

bool simulate_press = false;

void onScan() {
  if (simulate_press) {
    // connect row scan signal with column read
    digitalWrite(11, digitalRead(10));
  } else {
    // always read not pressed otherwise
    digitalWrite(11, HIGH);
  }
}

In our text interface, we can now start a measurement like so:

caps_lock_on_to_off = capsLockOn();
Serial.printf("# Caps Lock key pressed (transition: %s)\r\n",
  caps_lock_on_to_off ? "on to off" : "off to on");
simulate_press = true;
t0 = ARM_DWT_CYCCNT;
emt0 = 0;
eut0 = 0;

The next keyboard matrix scan will detect the key as pressed, send the HID report to the OS, and when the OS responds with its HID report containing the Caps Lock LED status, our Caps Lock LED interrupt handler is called to finish the measurement:

void onCapsLockLED() {
  const uint32_t t1 = ARM_DWT_CYCCNT;
  const uint32_t elapsed_millis = emt0;
  const uint32_t elapsed_micros = eut0;
  uint32_t elapsed_nanos = (t1 - t0) / cycles_per_ns;

  Serial.printf("# Caps Lock LED (pin 12) is now %s\r\n", capsLockOn() ? "on" : "off");
  Serial.printf("# %u ms == %u us\r\n", elapsed_millis, elapsed_micros);
  Serial.printf("BenchmarkKeypressToLEDReport 1 %u ns/op\r\n", elapsed_nanos);
  Serial.printf("\r\n");
}

Running measurements

Connect the Teensy 4.0 to your computer and open its USB serial console:

% screen /dev/ttyACM0 115200

You should be greeted by a welcome message:

# kinT latency measurement driver
#   t  - trigger measurement

To save your measurements to file, use C-a H in screen to make it write to file screenlog.0.

Press t a few times to trigger a few measurements and close screen using C-a k.

You can summarize the measurements using benchstat:

% benchstat screenlog.0
name                 time/op
KeypressToLEDReport  1.82ms ±20%

Scan-to-scan delay

The measurement output on the USB serial console also contains the matrix scan-to-scan delay:

# scan-to-scan delay: 422475 ns

Each keyboard matrix scan turns on each row one-by-one, then reads all the columns.

This means that in each matrix scan, ROW_EQL will be set high once, then low again.

The Teensy 4.0 measures scan-to-scan delay by timing the activations of ROW_EQL.

We can verify this approach by making QMK self-report its scan rate. Enable the matrix scan rate debug option in keyboards/kinesis/keymaps/stapelberg/config.h like so:

#pragma once

#define DEBUG_MATRIX_SCAN_RATE

Using hid_listen we can now see the following QMK debug messages:

% sudo hid_listen
Waiting for new device:..
Listening:
matrix scan frequency: 2300
matrix scan frequency: 2367
matrix scan frequency: 2367

A matrix scan rate/frequency of 2367 scans per second corresponds to 422μs per scan:

1000000 μs / 2367 scans/second = 422μs

Yet another way of verifying the approach is by short-circuiting an end-to-end measurement with a one-line change in our QMK keyboard code:

bool process_action_kb(keyrecord_t *record) {
#define LED_CAPS_LOCK LINE_PIN12
#define ledTurnOn writePinLow
  ledTurnOn(LED_CAPS_LOCK);
  return true;
}

Repeating the measurements, this gives us:

% benchstat screenlog.0     
name                 time/op
KeypressToLEDReport  693µs ±26%

This value is between [0, 2 * 422μs] because a key might be pressed after it was already scanned by the in-progress matrix scan, meaning it will need to wait until the next scan completed (!) before it can be registered as pressed.

Measurement harness

Now that we have our general measurement environment all set up, it’s time to connect our Teensy 4.0 to a few different keyboard controllers!

kint36, kint41: GPIO

If you have an un-soldered micro controller you want to measure, setup is easy: just connect all GPIOs to the Teensy 4.0 latency test driver directly! I’m using this for the kint36 and kint41:

GPIO measurement

(build in /home/michael/kinx/kintpp/rebased, last results in screenlog-kint36-eager-caps.0)

kint2pp: 5V

Because the Teensy++ uses 5V logic levels, we need to convert the levels from/to 3.3V. This is easily done using e.g. the SparkFun Logic Level Converter (Bi-Directional) on a breadboard:

kint2pp with level shifter

kinX: FPC

But what if you have a design where the micro controller doesn’t come standalone, only soldered to a keyboard controller board, such as my earlier kinX controller?

You can use a spare FPC connector (Molex 39-53-2135) and solder jumper wires to the pins for COL_2 and ROW_EQL. For Caps Lock and Ground, I soldered jumper wires to the board:

kinX measurement

Original Kinesis controller

But what if you don’t want to solder jumper wires directly to the board?

The least invasive method is to connect the FPC connector break-out, and hold probe heads onto the contacts while doing your measurements:

kinesis original controller measurement

QMK input latency

Now that the measurement hardware is set up, we can go through the code.

The following sections each cover one possible contributor to input latency.

Eager debounce

Key switches don’t generate a clean signal when pressed, instead they show a ripple effect. Getting rid of this ripple is called debouncing, and every keyboard firmware does it.

See QMK’s documentation on the Debounce API for a good explanation of the differences between the different debounce approaches.

QMK’s default debounce algorithm sym_defer_g is chosen very cautiously. I don’t know what the criteria are specifically for which types of key switches suffer from noise and therefore need the sym_defer_g algorithm, but I know that Cherry MX key switches with diodes like used in the Kinesis Advantage don’t have noise and hence can use the other debounce algorithms, too.

While the default sym_defer_g debounce algorithm is robust, it also adds 5ms of input latency:

% benchstat screenlog-kint36.0
name                 time/op
KeypressToLEDReport  7.61ms ± 8%

For lower input latency, we need an eager algorithm. Specifically, I am chosing the sym_eager_pk debounce algorithm by adding to my keyboards/kinesis/kint36/rules.mk:

DEBOUNCE_TYPE = sym_eager_pk

Now, the extra 5ms are gone:

% benchstat screenlog-kint36-eager.0
name                 time/op
KeypressToLEDReport  2.12ms ±16%

Example change: https://github.com/qmk/qmk_firmware/pull/12626

Quicker USB polling interval

The USB host (computer) divides time into fixed-length segments called frames:

  • USB Full Speed (USB 1.0) uses frames that are 1ms each.
  • USB High Speed (USB 2.0) introduces micro frames, which are 125μs.

Each USB device specifies in its device descriptor how frequently (in frames) the device should be polled. The quickest polling rate for USB 1.0 is 1 frame, meaning the device can send data after at most 1ms. Similarly, for USB 2.0, it’s 1 micro frame, i.e. send data every 125μs.

Of course, a quicker polling rate also means occupying resources on the USB bus which are then no longer available to other devices. On larger USB hubs, this might mean fewer devices can be used concurrently. The specifics of this limitation depend on a lot of other factors, too. The polling rate plays a role, in combination with the max. packet size and the number of endpoints.

Note that we are only talking about concurrent device usage, not about hogging bandwidth: the bulk transfers that USB mass storage devices use are not any slower in my tests. I achieve about 37 MiB/s with or without the kint41 USB 2.0 High Speed controller with bInterval=1 present.

Even connecting two kint41 controllers at the same time still leaves enough resources to use a Logitech C920 webcam in its most bandwidth-intensive pixel format and resolution. The same cannot be said for e.g. NXP’s LPC-Link2 debug probe.

To display the configured interval, the Linux kernel provides a debug pseudo file:

% sudo cat /sys/kernel/debug/usb/devices

[…]
T:  Bus=01 Lev=02 Prnt=09 Port=02 Cnt=02 Dev#= 53 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1209 ProdID=345c Rev= 0.01
S:  Manufacturer="https://github.com/stapelberg"
S:  Product="kinT (kint41)"
C:* #Ifs= 3 Cfg#= 1 Atr=a0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=125us
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=82(I) Atr=03(Int.) MxPS=  32 Ivl=125us
I:* If#= 2 Alt= 0 #EPs= 2 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=83(I) Atr=03(Int.) MxPS=  32 Ivl=125us
E:  Ad=04(O) Atr=03(Int.) MxPS=  32 Ivl=125us
[…]

Alternatively, you can display the USB device descriptor using e.g. sudo lsusb -v -d 1209:345c and interpret the bInterval setting yourself.

The above shows the best case: a USB 2.0 High Speed device (Spd=480) with bInterval=1 in its device descriptor (Iv=125us).

The original Kinesis Advantage 2 keyboard controller (KB600) uses USB 2.0, but in Full Speed mode (Spd=12), i.e. no faster than USB 1.1. In addition, they specify bInterval=10, which results in a 10ms polling interval (Ivl=10ms):

T:  Bus=01 Lev=02 Prnt=09 Port=02 Cnt=02 Dev#= 52 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=29ea ProdID=0102 Rev= 1.00
S:  Manufacturer=Kinesis
S:  Product=Advantage2 Keyboard
C:* #Ifs= 3 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=83(I) Atr=03(Int.) MxPS=   8 Ivl=10ms
I:* If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=01 Driver=usbhid
E:  Ad=84(I) Atr=03(Int.) MxPS=   8 Ivl=2ms
I:* If#= 2 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
E:  Ad=85(I) Atr=03(Int.) MxPS=   8 Ivl=2ms

My recommendation:

  • With USB 1.1 Full Speed, definitely specify bInterval=1. I’m not aware of any downsides.
  • With USB 2.0 High Speed, I also think bInterval=1 is a good choice, but I am less certain. If you run into trouble, reduce to bInterval=3 and send me a message :)

For details on measuring, see Appendix B: USB polling interval (device side).

Example change: https://github.com/qmk/qmk_firmware/pull/12625

Faster matrix scan

The purpose of a keyboard controller is reporting pressed keys after scanning the key matrix. The more scans a keyboard controller can do per second, the faster it can react to your key press.

How many scans your controller does depends on multiple factors:

  • The clock speed of your micro controller. It’s worth checking if your micro controller model supports running at faster clock speeds, or upgrading your keyboard to a faster model to begin with. There is a point of diminishing returns, which I would guess is at ≈100 MHz. Comparing e.g. the kint36 at 120 MHz vs. 180 MHz, the difference in scan-to-scan is 5μs.

  • How much other code your firmware runs aside from matrix scanning. If you enable any non-standard QMK features, or even self-written code, it’s worth disabling and measuring.

  • Whether you run scans back-to-back or e.g. synchronized with USB start-of-frame interrupts. QMK runs scans back-to-back, so this point is only relevant for other firmwares.

  • How long you need to sleep to let the signal settle. Reducing your sleep times results in more scans per second, but if you don’t sleep long enough, you’ll see ghost key presses. See also the next section about Shorter sleeps.

For details on measuring, see the Scan-to-scan delay section above.

I also tried configuring the GPIOs to be faster to see if that would reduce the required unselect delay, but unfortunately there was no difference between the default setting and the fastest setting: drive strength 6 (DSE=6), fast slew rate (SRE=1), 200 MHz (SPEED=3).

Shorter sleeps

QMK calls ChibiOS’s chThdSleepMicroseconds function in its matrix scanning code. This function unfortunately has a rather long shortest sleep duration of 1 ChibiOS tick: if you tell it to sleep less than 100μs, it will still sleep at least 100μs!

This is a problem on controllers such as the kint41, where we want to sleep for only 10μs.

The length of a ChibiOS tick is determined by how the ARM SysTick timer is set up on the specific micro controller you’re using. While the SysTick timer itself could be configured to fire more frequently, it is not advisable to shorten ChibiOS ticks: chSysTimerHandlerI() must be executable in less than one tick.

Instead, I found it easier to implement short delays by busy-looping until the ARM Cycle Counter Register (CYCCNT) indicates enough time has passed. Here’s an example from keyboards/kinesis/kint41/kint41.c:

// delay_inline sleeps for |cycles| (e.g. sleeping for F_CPU will sleep 1s).
//
// delay_inline assumes the cycle counter has already been initialized and
// should not be modified, i.e. is safe to call during keyboard matrix scan.
//
// ChibiOS enables the cycle counter in chcore_v7m.c.
static void delay_inline(const uint32_t cycles) {
  const uint32_t start = DWT->CYCCNT;
  while ((DWT->CYCCNT - start) < cycles) {
    // busy-loop until time has passed
  }
}

void matrix_output_unselect_delay(void) {
  // 600 cycles at 0.6 cycles/ns == 1μs
  const uint32_t cycles_per_us = 600;
  delay_inline(10 * cycles_per_us);
}

Of course, the cycles/ns value is specific to the frequency at which your micro controller runs, so this code needs to be adjusted for each platform.

Results

With the QMK keyboard firmware configured for lowest input latency, how do the different Kinesis keyboard controller compare? Here are my measurements:

model CPU speed USB poll interval scan-to-scan scan rate caps-to-report
kint41 600 MHz 125μs 181μs 5456 scans/s 930µs ±17%
kinX 120 MHz 125μs 213μs 4694 scans/s 953µs ±15%
kint36 180 MHz 1000μs 444μs 2252 scans/s 1.97ms ±15%
kint2pp 16 MHz 1000μs 926μs 1078 scans/s 3.27ms ±32%
original 60 MHz 10000μs 1936μs 516 scans/s 13.6ms ±21%

The changes required to obtain these results are included since QMK 0.12.38 (2021-04-20).

kint41 support is being added with all required changes to begin with, but still in progress.

The following sections go into detail about the results.

kint41

I am glad that the most recent Teensy 4.1 micro controller takes the lead! The kinX controller achieved similar numbers, but was quite difficult to build, so few people ended up using it.

The key improvement compared to the Teensy 3.6 is the now-available USB 2.0 High Speed, and the powerful clock speed of 600 MHz allows for an even faster matrix scan rate.

kinX

In my previous article about the kinX controller, I measured the kinX scan delay as ≈100μs. During my work on this article, I learnt that the ≈100μs figure was misleading: the measurement code turned off interrupts to measure only the scan function. While that is technically correct, it is not a useful measure, as in practice, interrupts should not be disabled, and the scanning function is interrupted frequently enough that it comes in at ≈208μs.

I also fixed the USB polling interval in the kinX firmware, which wasn’t set to bInterval=1.

Original Kinesis

The original keyboard controller that the Kinesis Advantage 2 (KB600) keyboard comes with uses an AT32UC3B0256 micro controller which is clocked at 60 MHz, but the measured input latency is much higher than even the slowest kint controller (kint2pp at 16 MHz). What gives?

Here’s what we can deduce without access to their firmware:

  1. They seem to be using an eager debounce algorithm (good!), otherwise we would observe even higher latency.
  2. Their USB polling interval setting (bInterval=10) is excessively high, even more so because they are using USB Full Speed with longer USB frames. I would recommend they change it to bInterval=1 for up to 10ms less input latency!
  3. The matrix scan rate is twice as slow as with my kint2pp. I can’t say for sure why this is. Perhaps their firmware does a lot of other things between matrix scans.

Note that we could not apply the Eager Caps Lock LED firmware change to the original controller, which is why the measurement variance is ±21%. This variance includes ± 1.9ms for finishing a matrix scan before updating the LED state.

Conclusion

After analyzing the different controllers in my measurement environment, I think the following factors play the largest role in keyboard input latency, ordered by importance:

  1. Does the firmware use an eager debounce algorithm?
  2. Does the device specify a quick USB polling rate (bInterval setting)?
  3. Is the matrix scan frequency in the expected range, or are there unexpected slow-downs?

Hopefully, this article gives you all the tools you need to measure and reduce keyboard input latency of your own keyboard controller!

Appendix A: isitsnappy

The iPhone app Is It Snappy? records video using the iPhone’s 240 fps camera and allows you to mark the frame that starts respectively ends the measurement.

The app does a good job of making this otherwise tedious process of navigating a video frame by frame much more pleasant.

However, for measuring keyboard input latency, I think this approach is futile:

  • The resolution is too imprecise. At 240 fps, that means each frame represents 4.6ms of time, which is already higher than the input latency of our slowest micro controller.
  • Visually deciding whether a key switch is pressed or not pressed, at frame-perfect precision, seems impossible to me.

I believe the app can work, provided the latency you want to measure is really high. But with the devices covered in this article, the app couldn’t measure even 10ms of injected input latency.

Appendix B: USB polling interval (device side)

You can also verify the USB polling interval on the device side. In the SOF (Start Of Frame) interrupt in tmk_core/protocol/chibios/usb_main.c, we can print the cycle delta to the previous SOF callback, every second:

#include "timer.h"

static uint32_t last_sof = 0;
static uint32_t sof_timer = 0;
void kbd_sof_cb(USBDriver *usbp) {
  (void)usbp;

  uint32_t now = DWT->CYCCNT;
  uint32_t delta = now - last_sof;
  last_sof = now;

  uint32_t timer_now = timer_read32();
  if (TIMER_DIFF_32(timer_now, sof_timer) > 1000) {
    sof_timer = timer_now;
    dprintf("sof delta: %u cycles", delta);
  }
}

Using hid_listen, we expect to see ≈75000 cycles of delta, which corresponds to the 125μs microframe latency of USB 2.0 High Speed with bInterval=1 in the USB device descriptor:

125μs * 1000 * 0.6 cycles/ns = 75000 cycles

at 2021-05-08 13:57

2021-05-07

michael-herbst.com

DFTK: A Julian approach for simulating electrons in solids

Following my talk at Juliacon about our DFTK code last year (slides, recording, blog article), we have now published an extended abstract in the JuliaCon proceedings, which you can find below. The JuliaCon proceedings use the same open journals software stack to manage their publication infrastructure as the Journal of Open-source Software. This stack is actually pretty impressive since it reduces the effort both on the reviewer as well as on the author side to comments within github issues. Since thus the complete exchange (including the review process) is public, this is not only convenient, but also leads to truly transparent publication process. I wish publishing with all journals was like that ...

by Michael F. Herbst at 2021-05-07 22:30 under talk, electronic structure theory, Julia, HPC, DFTK, theoretical chemistry, SCF, high-throughput

2021-05-01

michael-herbst.com

Thoughts on initial guess methods for DFT

On Thursday I gave a brief talk in our weekly ACED differentiate group meeting about initial guess methods for starting self-consistent field calculations in methods such as density-functional theory. For preparing the talk I did a little digging into both the standard approaches used by many molecular and solid-state codes and did a literature review of some recent ideas motivated from reduced-order modelling or data science. The slides of my talk (which include most references I found) are attached below.

Link Licence
Thoughts on initial guess methods for DFT (Slides) Creative Commons License

by Michael F. Herbst at 2021-05-01 10:00 under talk, electronic structure theory, Kohn-Sham, high-throughput, DFT, solid state

2021-04-30

RaumZeitLabor

Halbjähriges im RaumZweitLabor – Aufbauliebe in den Zeiten des Corona

Im neuen RZL feiern wir aktuell noch, wie ein verliebtes Teenagerpärchen, jedes Wochen- und Monatsjubiläum. Unfassbar, dass jetzt schon unser „Halbjähriges“ ansteht. <3

Um euch an unserem Glück teilhaben zu lassen, gibt es hier ein kleines Update, was in der letzten Zeit alles so Aufregendes unter Einhaltung der Corona-Auflagen passiert ist:

Das erste Mal Kisten auspacken, neue Räume umbauen, aufteilen und einräumen; das erste Mal 5 Kubikmeter KMF und andere Altlasten entsorgen; das erste Mal Personenfahrstuhl kaputt, Wasser im Lastenaufzugsschacht im Keller und Kommunikationsprobleme mit dem Vermieter; das erste Mal Rechnungen bei der Versicherung einreichen; das erste Mal Adressänderungen überall; das erste Mal Brandverhütungsschau der Feuerwehr im neuen Raum; das erste Mal Winter mit funktionierender Heizung; das erste Mal schöne, bunte Deko an die neuen Wände packen; das erste Mal Labortische selbst bauen; und vieles, vieles mehr…

Außerdem sind wir Teil des Rats für Kunst und Kultur Mannheim in der Sektion Kulturelle Bildung und Soziokultur geworden, arbeiten im Hintergrund an spannenden Projekten für die Nach-Corona-Zeit und haben die neuen Räume auch endlich mit dem Siegel „Inte approved“ zertifizieren lassen!

Damit aus der frischen Beziehung aber keine schnell verflossene Romanze wird, sind wir weiterhin auf eure Beteiligung bei (Aufbau-)Aktionen angewiesen und freuen uns mehr denn je über einmalige und dauerhafte Zeichen der Zuneigung.

Halbjahr21-Collage

by flederrattie at 2021-04-30 00:00

2021-04-27

sECuREs website

Linux and USB virtual serial devices (CDC ACM)

During my work on Teensy 4.1 support in ChibiOS for the QMK keyboard firmware, I noticed that ChibiOS’s virtual serial device USB demo would sometimes print garbled output, and that I would never see the ChibiOS shell prompt.

This article walks you through diagnosing and working around this issue, in the hope that it helps others who are working with micro controllers and USB virtual serial devices.

Background

Serial interfaces are often the easiest option when working with micro controllers to print text: you only connect GND and the micro controller’s serial TX pin to a USB-to-serial converter. The RX pin is only needed when you want to send text to the micro controller as well.

While conceptually simple, the requirement for an extra piece of hardware (USB-to-serial adapter) is annoying. If your micro controller has a working USB interface and USB stack, a popular alternative is for the micro controller to provide a virtual serial device via USB.

This way, you just need one USB cable between your micro controller and computer, reusing the same connection you already use for programming the device.

A popular choice within this solution is to provide a device conforming to the USB Communications Device Class (CDC) standard, specifically its Abstract Control Model (ACM), which is typically used for modem hardware.

On Linux, these devices show up as e.g. /dev/ttyACM0. In case you’re wondering: /dev/ttyUSB0 device names are used by more specific drivers (vendor-specific). The blog post What is the difference between /dev/ttyUSB and /dev/ttyACM? goes into a lot more detail.

ModemManager

One unfortunate side-effect of using a modem standard to provide a generic serial device is that modem-related software might mistake our micro controller for a modem.

Use the following command to disable ModemManager until the next reboot, which otherwise might open and probe any new serial devices:

% sudo systemctl mask --runtime --now ModemManager

Problem statement

With a regular, non-USB serial interface, you can send data at any time. If nobody is receiving the data on the other end, the micro controller doesn’t care and still writes serial data.

When using the ChibiOS shell with a regular serial interface, this means that if you open the serial interface too late, you will not see the ChibiOS shell prompt. But, if you have the serial interface already opened when powering on your device, you will be greeted by ChibiOS’s shell prompt:

ChibiOS/RT Shell
ch> 

With a USB serial, however, the host will not transfer data from the device until the serial interface is opened. This means that writes to the USB serial can block, whereas writes to the UART serial will not block but may go ignored if nobody is listening.

So when I open the USB serial interface, I would expect to see the ChibiOS shell prompt like above. Instead, I would often not see any prompt at all, and I would even sometimes see garbled output like this:

cch> biOS/RT She

USB analysis with Wireshark

Wireshark allows us to analyze USB traffic in combination with the usbmon Linux kernel module.

Looking through the captured packets, I noticed unexpected packets from the host (computer) to the device (micro controller), specifically containing the following bytes:

  1. hex 0xa = ASCII \n
  2. hex 0xd = ASCII \r

Seeing any packets in this direction is unexpected, because I am only opening the serial interface for reading, and I am not consciously sending anything. So where do the packets come from?

To verify I am not missing any nuance of the CDC protocol, I added debug statements to the ChibiOS shell to log any incoming data. The \n\r bytes indeed make it to the ChibiOS shell.

When the shell receives a line break, it prints a new prompt. This seems to be the reason why I’m seeing garbled data: while the output is transferred to the host, line breaks are received, causing more data transfers. It’s as if somebody was hammering the return key really quickly.

Linux tty echo vs. ChibiOS shell banner

The unexpected \n\r bytes turn out to come from the Linux USB CDC ACM driver, or its interplay with the Linux tty driver, to be specific. The CDC ACM driver is a kind of tty driver, so it is built atop the Linux tty infrastructure, whose standard settings include various ECHO flags.

When echoing is enabled, the ChibiOS shell banner triggers echo characters, which in turn are interpreted as input to the shell, causing garbled output.

So why is echoing enabled? Wouldn’t a terminal emulator turn off echoing first thing?

Yes. But, when the CDC ACM driver receives the first data transfer via USB (already queued), the standard tty settings are still in effect, because the application did not yet have a chance to set its tty configuration up!

This can be verified by running the following command on a Linux host:

% stty -F /dev/ttyACM0 115200 -echo -echoe -echok

Even though the command’s sole purpose is to configure the tty, its opening of the device still causes the banner to print, and echoing to happen, and garbled output is the result.

It turns out this is a somewhat common problem. Hence, the Linux USB CDC ACM driver has a quirks table, in which devices that print a banner select the DISABLE_ECHO quirk, which results in the CDC ACM driver turning off the echoing termios flag early:

static const struct usb_device_id acm_ids[] = {
	/* quirky and broken devices */
	{ USB_DEVICE(0x0424, 0x274e), /* Microchip Technology, Inc. */
	  .driver_info = DISABLE_ECHO, }, /* DISABLE ECHO in termios flag */
// 

So, a quick solution to turn off echoing early is to change your USB vendor and product id (VID/PID) to an ID for which the Linux kernel applies the DISABLE_ECHO quirk, e.g.:

#define USB_DEVICE_VID 0x0424
#define USB_DEVICE_PID 0x274e

Flushing in Screen

With tty echo disabled, I don’t see garbled output anymore, but still wouldn’t always see the ChibiOS shell prompt!

This issue turned out to be specific to the terminal emulator program I’m using. For many years, I have been using Screen for serial devices of any sort.

I was surprised to learn during this investigation that Screen flushes any pending output when opening the device. This typically isn’t a problem because adapter-backed serial devices are opened once and then stay open. USB virtual serial devices however are only opened when used, and disappear when loading new program code onto your micro controller.

I verified this is the problem by using cat(1) instead, with which I can indeed see the prompt:

% cat /dev/ttyACM0

ChibiOS/RT Shell
                
                ch> 

After commenting out the flush call in Screen’s sources, I could see the prompt in Screen as well.

Line ending conversion

Now that we no longer flush the prompt away, why is the spacing still incorrect, and where does it go wrong?


ChibiOS/RT Shell
                
                ch> 

If we use strace(1) to see what screen(1) or cat(1) read from the driver, we see:

797270 read(7, "\n\nChibiOS/RT Shell\n\nch> ", 4096) = 24

We would have expected "\r\nChibiOS/RT Shell\r\nch> " instead, meaning all Carriage Returns (\r) have been translated to Newlines (\n).

This is again due to the Linux tty driver’s default termios settings: c_iflag enables option ICRNL by default, which translates CR (Carriage Return) to NL (Newline).

Unfortunately, contrary to the DISABLE_ECHO quirk, there is no corresponding quirk in the Linux ACM driver to turn off line ending conversion, so a fix would need a Linux kernel driver change!

Device-side workaround: wait until opened

At this point, we have covered a few problems that would need to be fixed:

  1. Change USB VID/PID to get the DISABLE_ECHO quirk in the driver.
  2. Recompile terminal emulator programs to remove flushing, if needed.
  3. Modify kernel driver to add quirk to disable Carriage Return (\r) conversion.

Time for a quick reality check: this seems too hard and too long a time for all parts of the stack to be fixed. Is there an easier way, and why don’t others run into this problem? If only the device didn’t print its banner so early, that would circumvent all of the problems above, too!

Luckily, the host actually notifies the device when a terminal emulator program opens the USB serial device by sending a CDC_SET_CONTROL_LINE_STATE request. I verified this behavior on Linux, Windows and macOS.

So, let’s implement a workaround in our device code! We will delay starting the shell until:

  1. The USB serial device was opened (not just configured).
  2. An additional delay of 100ms has passed to give the terminal emulator application a chance to configure the serial device.

In our main.c loop, we wait until USB is active, and until we receive the first CDC_SET_CONTROL_LINE_STATE request because the serial port was opened:

  while (true) {
    if (SDU1.config->usbp->state == USB_ACTIVE) {
      chSemWait(&scls);
      chThdSleepMilliseconds(100);

      thread_t *shelltp = chThdCreateFromHeap(NULL, SHELL_WA_SIZE, "shell", NORMALPRIO + 1, shellThread, (void *)&shell_cfg1);
      chThdWait(shelltp);
    }
  }

And in our usbcfg.c, when receiving a CDC_SET_CONTROL_LINE_STATE request, we will reset the semaphore to non-blockingly wake up all waiters:

extern semaphore_t scls;

bool requests_hook(USBDriver *usbp) {
  const bool result = sduRequestsHook(usbp);

  if ((usbp->setup[0] & USB_RTYPE_TYPE_MASK) == USB_RTYPE_TYPE_CLASS &&
      usbp->setup[1] == CDC_SET_CONTROL_LINE_STATE) {
    osalSysLockFromISR();
    chSemResetI(&scls, 0);
    osalSysUnlockFromISR();
  }

  return result;
}

Screenshots: Mac and Windows

Aside from Linux, I also verified the workaround works on a Mac (with Screen):

USB virtual serial device on macOS

…and that it works on Windows (with PuTTY):

USB virtual serial device on Windows 10

at 2021-04-27 06:18

2021-04-09

michael-herbst.com

A novel black-box preconditioning strategy for high-throughput density-functional theory

A couple of weeks ago, from 15th to 19th March, I participated in the virtual annual meeting of the German Association of Applied Mathematics and Mechanics (GAMM). For me this meeting was the first time I presented my work to an audience of applied mathematicians with a broad background and no inherent interest in quantum chemistry. With only 15 minutes for my talk in the "scientific computing" track preparing the material was quite a challenge. I hope I still managed to convey the main ideas of our recently published LDOS preconditioner in a broadly accessible way. My slides are attached below.

Link Licence
A novel black-box preconditioning strategy for high-throughput density-functional theory (Slides) Creative Commons License

by Michael F. Herbst at 2021-04-09 16:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, numerical analysis, Kohn-Sham, high-throughput, DFT, solid state

2021-04-02

sECuREs website

Emacs: overriding the project.el project directory

I recently learnt about the Emacs package project.el, which is used to figure out which files and directories belong to the same project. This is used under the covers by Eglot, for example.

In practice, a project is recognized by looking for Git repositories, which is a decent first approximation that often just works.

But what if the detection fails? For example, maybe you want to anchor your project-based commands in a parent directory that contains multiple Git repositories.

Luckily, we can provide our own entry to the project-find-functions hook, and look for a .project.el file in the parent directories:

;; Returns the parent directory containing a .project.el file, if any,
;; to override the standard project.el detection logic when needed.
(defun zkj-project-override (dir)
  (let ((override (locate-dominating-file dir ".project.el")))
    (if override
      (cons 'vc override)
      nil)))

(use-package project
  ;; Cannot use :hook because 'project-find-functions does not end in -hook
  ;; Cannot use :init (must use :config) because otherwise
  ;; project-find-functions is not yet initialized.
  :config
  (add-hook 'project-find-functions #'zkj-project-override))

Now, we can use touch .project.el in any directory to make project.el recognize the directory as project root!

By the way, in case you are unfamiliar, the configuration above uses use-package, which is a great way to (lazily, i.e. quickly!) load and configure Emacs packages.

at 2021-04-02 12:08

2021-04-01

sECuREs website

Eclipse: Enabling Compilation Database (CDB, compile_commands.json) in NXP MCUXpresso v11.3

NXP’s Eclipse-based MCUXpresso IDE is the easiest way to make full use of the hardware debugging features of modern NXP micro controllers such as the i.MX RT1060 found on the NXP i.MX RT1060 Evaluation Kit (MIMXRT1060-EVK), which I use for Teensy 4 development.

For projects that are fully under your control, such as imported SDK examples, or anything you created within Eclipse, you wouldn’t necessarily need Compilation Database support.

When working with projects of type Makefile Project with Existing Code, however, Eclipse doesn’t know about preprocessor definition flags and include directories, unless you would manually duplicate them. In large and fast-changing projects, this is not an option.

The lack of compiler configuration knowledge (defines and include directories) breaks various C/C++ tooling features, such as Macro Expansion or the Open Declaration feature, both of which are an essential tool in my toolbelt, and particularly useful in large code bases such as micro controller projects with various SDKs etc.

In some configurations, Eclipse might be able to parse GCC build output, but when I was working with the QMK keyboard firmware, I couldn’t get the QMK makefiles to print commands that Eclipse would understand, not even with VERBOSE=true.

Luckily, there is a solution! Eclipse CDT 9.10 introduced Compilation Database support in 2019. MCUXpresso v11.3.0 ships with CDT 9.11.1.202006011430, meaning it does contain Compilation Database support.

In case you want to check which version your installed IDE has, open HelpAbout MCUXpresso IDE, click Installation Details, open the Features tab, then locate the Eclipse CDT, C/C++ Development Platform line.

For comparison, Eclipse IDE 2021-03 contains 10.2.0.202103011047, if you want to verify that the issues I reference below are indeed fixed.

Bug: command vs. arguments

Before we can enable Compilation Database support, we need to ensure we have a compatible compile_commands.json database file. Eclipse CDT’s Compilation Database support before version CDT 10 suffered from Bug 563006: it only understood the command JSON property, not the arguments property.

Depending on your build system, this isn’t a problem. For example, Meson/ninja’s compile_commands.json uses command and will work fine.

But, when using Make with Bear, you will end up with arguments by default.

Bear 3.0 allows generating a compile_commands.json Compilation Database with command, but requires multiple commands and config files, which is a bit inconvenient with Eclipse.

So, let’s put the extra commands into a commandbear.sh script:

#!/bin/sh

set -eux

intercept --output commands.json -- "$@"
citnames \
  --input commands.json \
  --output compile_commands.json \
  --config config.json

The "command_as_array": false option goes into config.json:

{
  "compilation": {
  },
  "output": {
    "content": {
      "include_only_existing_source": true
    },
    "format": {
      "command_as_array": false,
      "drop_output_field": false
    }
  }
}

Don’t forget to make the script executable:

chmod +x commandbear.sh

Then configure Eclipse to use the commandbear.sh script to build:

  1. Open Project Properties by right-clicking your project in the Project Explorer panel.
  2. Select C/C++ Build and open the Builder Settings tab
  3. In the Builder group, set the Build command text field to: ${workspace_loc:/qmk_firmware}/commandbear.sh make -j16

Verify your build is working by selecting ProjectClean… and triggering a build.

Enabling Compilation Database support

  1. Open Project Properties by right-clicking your project in the Project Explorer panel.
  2. Expand C/C++ General, select Preprocessor Include Paths, Macros etc. and open the Providers tab.
  3. Untick everything but:
    • MCU GCC Built-in Compiler Parser
    • MCU GCC Build Output Parser
    • Compilation Database Parser
  4. Select Compilation Database Parser, click Apply to make the Compilation Database text field editable.
  5. Put a full path to your compile_commands.json file into the text field, e.g. /home/michael/kinx/workspace/qmk_firmware/compile_commands.json. Note that variables will not be expanded! Support for using variables was added later in Bug 559186.
  6. Select MCU GCC Build Output Parser as Build parser.
  7. Tick the Exclude files not in the Compilation Database checkbox.
  8. Click Apply and Close.
Compilation Database Parser settings

You will know Compilation Database support works when its progress view shows up:

Compilation Database progress

If you have an incompatible or empty compile_commands.json, nothing visible will happen (no progress indicator or error messages).

After indexing completes, you should see:

  1. Files that were not used as greyed out in the Project Explorer
  2. Open Declaration in the context menu of a selected identifier (or F3) should jump to the correct file. For example, my test sequence for this feature in the QMK repository is:
    • in tmk_core/protocol/chibios/main.c, open init_usb_driver
    • open usbStart, should bring up lib/chibios git submodule
    • open usb_lld_start, should bring up MIMXRT1062 port
  3. Macros expanded correctly, e.g. MIMXRT1062_USB_USE_USB1 in the following example
Compilation Database in effect: files greyed out and macros expanded

Slow file exclusion in projects with many files

Bug 565457 explains an optimization in the algorithm used to generate the list of excluded paths, which I would summarize as “use whole directories instead of individual files”.

This optimization was introduced later, so in MCUXpresso v11.3, we still have to endure watching the slow algorithm for a few seconds:

Compilation Database exclusion slow

Conclusion

NXP, please release a new MCUXpresso IDE with a more recent CDT version!

The improvements in the newer version would make the setup so much simpler.

at 2021-04-01 09:59

2021-03-13

sECuREs website

Make your intercom smarter with an MQTT backpack

I bought the cheapest compatible BTicino intercom device (BT 344232 for 32 €) that I could find on eBay, then soldered in 4 wires and added microcontrollers to make it smart. It now connects to my Nuki Opener Smart Intercom IOT device, and to my local MQTT Pub/Sub bus (why not?).

modified BTicino

Background

In my last post about the BTicino intercom from November, I described how to use a Teensy microcontroller to reliably interpret SCS bus signals and drive a Nuki Opener (Smart Intercom).

Originally, I had hoped the Nuki developers would be able to fix their device based on my SCS bus research, but they don’t seem to be interested. Instead, their support actually suggested I run my microcontroller workaround indefinitely!

Hence, I decided to work on the next revision to clean up my setup in terms of cable clutter. I also figured: if I already need to run my own microcontroller, I also want to connect it to my local MQTT Pub/Sub bus for maximum flexibility.

Unfortunately, the Teensy microcontroller lacks built-in WiFi, or any kind of networking.

I switched to an ESP32-based microcontroller, but powering those from the SCS bus seems like a bad idea: they draw a lot of power, and building small high-quality power supplies is hard.

This made me scrap my previous plans to make my own SCS send/receive hardware.

Instead, I wondered what the easiest yet most reliable approach might be to make this intercom unit smart. Instead of building my own SCS hardware, could I use the intercom unit itself to send the door unlock signal, and could I obtain the unit’s already-decoded SCS bus signal?

Finding the signals

Based on my previous research, I roughly knew what to expect: closest to the bus terminals, there will be some components that filter the bus signal and convert the 27V into a lower voltage. Connected to that power supply is a microcontroller which deals with all user interface.

To learn more about the components, I first identified all ICs (Integrated Circuits) based on their labeling. The following are relevant:

I connected my development intercom unit to my SCS bus lab setup and used my oscilloscope to confirm expected signal levels based on the pinout from the IC datasheets.

I settled on the following 4 relatively easily accessible signals and soldered jumper wires to them:

  • 5V and GND: 5V, 100mA. Our QT Py microcontroller uses 7mA.
  • OPEN5V: activates the button which unlocks the door
  • SCSRX5V: converted SCS signal
BTicino signals

Converting the signals

Because the BTicino intercom units runs at 5V, but more modern microcontrollers run at 3.3V, we need to convert between the two voltages:

  1. We need to convert a 3.3V signal to OPEN5V to trigger opening the door.

  2. We need to convert SCSRX5V signal to 3.3V so that I can use an ESP32 microcontroller to read the signal and place it on MQTT.

Here’s the corresponding schematic:

schematic

Microcontroller selection

I eventually decided to task a dedicated microcontroller with the signal conversion, instead of having the WiFi-enabled microcontroller do everything, for multiple reasons:

  • Reliability. It turns out that using a hardware analog comparator results in a much higher signal quality than continuously sampling an ADC yourself, even when using the ESP32’s ULP (Ultra Low Power) co-processor to do the sampling.

  • Easy implementation. Converting an SCS signal to a serial signal is literally a single delayMicroseconds(20); call in the right place. Having a whole microcontroller for only this task eliminates any concurrency concerns. I have not had to debug or change the software even once in the last few months.

  • Easy debugging/introspection. I can connect a standard USB-to-serial adapter and verify the signal is read correctly. This quickly narrows down issues on either side of the serial interface. Issues with the microcontroller side can be reproduced by sending serial data.

Here are the 2 microcontrollers I’m using in this project, plus the Teensy I used previously:

Microcontroller WiFi Analog Comparator Price
Teensy 4.0 no yes 19 USD
Adafruit QT Py no yes 6 USD
TinyPICO yes no 20 USD

If ESP32 boards such as the TinyPICO had a hardware Analog Comparator, I would likely use just one microcontroller, but keep the serial interface on a GPIO for easy debugging.

Why the Adafruit QT Py?

The minimal function we need for our signal conversion device is to convert an SCS signal (5V) to a serial signal (3.3V). For this conversion, we need a hardware analog comparator and an output GPIO that we can drive independently, so that we can modify the signal.

Additionally, the device should use as little power as possible so that it can comfortably fit in the left-over energy budget of the intercom unit’s power supply.

The smallest microcontroller I know of that comes with a hardware analog comparator is the Adafruit QT Py. It’s a 32-bit Cortex M0+ (SAMD21) that can be programmed using the Arduino IDE, or MicroPython (hence the name).

Adafruit QT Py

There are other SAMD21 boards with the same form factor, such as the Seeeduino XIAO.

Why the TinyPICO ESP32 board?

When looking for a WiFi-enabled microcontroller, definitely go with something ESP32-based!

The community around the Espressif ESP32 (and its predecessor ESP8266) is definitely one of its biggest pluses: there are tons of Arduino sketches, troubleshooting tips, YouTube videos, reference documentation, forum posts, and so on.

The ESPs have been around since ≈2014, so many (largely-compatible) boards are available. In fact, I started this project on an M5Stack ESP32 Basic Core IoT Development Kit, deployed it on an Adafruit HUZZAH32 Breakout Board and ultimately ported it to the TinyPICO. Porting between the different microcontrollers was really smooth: the only adjustments were pin numbers and dropping in a TinyPICO helper library for its RGB LED, which I chose to use as a power LED.

I chose the TinyPICO ESP32 board specifically for its small form factor and convenience:

TinyPICO comparison with Adafruit Huzzah32 and Teensy 4.0

The TinyPICO is only 18mm × 32mm, slightly smaller than the Teensy 4.0’s 18mm × 35mm.

In comparison, the Adafruit HUZZAH32 breakout board is gigantic with its 25mm × 44mm. And that’s without the extra USB-to-serial adapter (FT232H in the picture above) you need for programming, serial console and powering the board!

The TinyPICO does not need an extra adapter. You can plug it in and program it immediately, just like the Teensy!

I’d like it if the next revision of the TinyPICO switched from Micro USB to USB C.

If the TinyPICO is not for you (or unavailable), search for other boards that contain the ESP32-PICO-D4 chip. For example, DFRobot’s ESP32-PICO-KIT or Espressif’s own ESP32-PICO-KIT.

Prototype

After testing everything on a breadboard, I soldered a horizontal pin header onto the QT Py, connected it to my Sparkfun level shifter board and soldered the remaining voltage divider components “flying”. The result barely fit into the case, but worked flawlessly for weeks:

prototype

Backpack PCB for the QT Py

After verifying this prototype works well in practice, I miniaturized it into a “backpack” PCB.

The backpack contains all the same parts as the prototype, but with fewer bulky wires and connectors, and using only SMD parts. The build you see below uses 0602 SMD parts, but if I made another revision I would probably chose the larger 0805 parts for easier soldering.

QT Py with backpack QT Py with backpack PCB

Assembly

To save some space in the intercom unit case, I decided to solder the jumper wires directly onto the TinyPICO instead of using a pin header. I could have gone one step further by cutting the wires at length and soldering them directly on both ends, without any connectors, but I wanted to be able to easily unplug and re-combine the parts of this project.

wires soldered directly into the TinyPICO

From top to bottom, I made the following connections:

Pin Color Function
25 red SCSRX_3V3
27 green OPEN_3V3
15 blue Nuki Opener blue cable
14 yellow Nuki Opener yellow cable
4 purple floor ring button pushed
3V3 white 3.3V for the floor ring button
5V orange power for the TinyPICO
GND brown ground for the TinyPICO
GND brown ground to the QT Py
GND brown ground to the Nuki Opener

The TinyPICO USB port is still usable for updating the software and serial console debugging.

Here’s the TinyPICO connected to the QT Py inside the intercom unit:

modified BTicino

The QT Py is powered by the intercom unit’s supply, and the TinyPICO I’m powering with an external USB power supply and a cut-open USB cable. This allows me to route the jumper wires through the intercom unit’s hole in the back, through which a USB plug doesn’t fit:

final installation

Software / Artifacts

You can find the Arduino sketches and KiCad files for this project at https://github.com/stapelberg/intercom-backpack

For debugging, I found it useful to publish every single byte received from the SCS bus on the doorbell/debug/scsrx MQTT topic. Full SCS telegrams are published to doorbell/events/scs, so by observing both, you can verify that retransmission suppression and SCS decoding work correctly.

Similarly, signaling a doorbell ring to the Nuki Opener can be debugged by sending a message to MQTT topic doorbell/debug/cmd/ring.

Initially, it wasn’t clear to me whether the WiFi library would maintain the connection indefinitely. After observing my microcontroller eventually disappearing from my network, I added the taskreconnect FreeRTOS task, and things have been stable since.

Nuki Opener: verdict

I now have a Nuki Opener running next to my own microcontroller, so I can see how well it works.

setup

Setting up the Nuki is the worst part: their colorful cable is super flimsy and loose, often losing contact. They should definitely switch to a cable with a mechanical lock.

The software part of the setup is okay, but the compatibility with the SCS bus is poor: I couldn’t get the device to work at all (see my initial post), and had to resort to using my own microcontroller to drive the Nuki in analogue mode.

I’m disappointed that the Nuki developers aren’t interested in improving their device’s compatibility and reliability with the SCS bus. They seem to capture/replay the entire signal (including re-transmissions) instead of actually decoding the signal.

in my day-to-day

The push notifications I get on my iPhone from the Nuki are often delayed. Usually the delay is a few seconds, but sometimes notifications arrive hours later or just don’t arrive at all!

While the push notifications are sent from a Nuki server and hence need the internet to function, the Nuki Bridge (translating Bluetooth Low Energey from the Nuki Opener to WiFi) allows configuring notifications in the local network via web hooks.

The Nuki Bridge’s notifications are much more reliable in my experience.

People sometimes ask why I use the Nuki Opener at all, given that I have some infrastructure of my own, too. While opening the door and receiving notifications is something I can do without the Nuki, too, I don’t want to spend my spare time re-implementing the Nuki app (on multiple platforms) with its geo fencing, friend invitations, ring to open, etc. In addition, the Nuki Opener physical device has a nice ring sound and large push button to open the door, both of which are convenient.

Conclusion

My intercom is now much smarter! Doorbell notifications make their way to my various devices via MQTT, and I can conveniently open the door from any device, as opposed to rushing to the intercom unit in the hallway.

Compared to the previous proof-of-concepts and development installations, I feel more confident in the current solution because it re-uses the intercom unit for the nitty-gritty SCS bus communication details.

The overall strategy should be widely applicable regardless of the specific intercom vendor/unit you have. Be sure to buy your own unit (don’t solder into your landlord’s intercom unit!) and test in a separate lab setup first, of course!

Appendix A: Troubleshooting

To debug the problem of ring detection no longer working, check:

  • Is the ESP32 still working?
    • ping doorbelltp
    • mosquitto_pub -h dr -t 'doorbell/debug/cmd/ring' -m '3' should signal a ring to the Nuki Opener and result in events on the MQTT bus
  • Is the QT Py still working?
    • Its power LED should be off. If the LED is on, the QT Py is in the bootloader.
    • Unplug and replug the +5V wire to the QT Py, see if that fixes it.
    • Connect a USB-to-serial adapter and see if triggering a door open results in SCS bytes on the serial interface.
    • See if ringing the bell results in SCS bytes on the serial interface. If no, re-solder cable to SCSRX5V.

To debug the problem of door opening no longer working, check:

  • Does it work when triggering it via the button on the BTicino? If yes, re-solder cable to OPEN5V.

at 2021-03-13 15:54

2021-03-06

sECuREs website

Debian Code Search: OpenAPI now available

Debian Code Search now offers an OpenAPI-based API!

Various developers have created ad-hoc client libraries based on how the web interface works.

The goal of offering an OpenAPI-based API is to provide developers with automatically generated client libraries for a large number of programming languages, that target a stable interface independent of the web interface’s implementation details.

Getting started

  1. Visit https://codesearch.debian.net/apikeys/ to download your personal API key. Login via Debian’s GitLab instance salsa.debian.org; register there if you have no account yet.

  2. Find the Debian Code Search client library for your programming language. If none exists yet, auto-generate a client library on editor.swagger.io: click “Generate Client”.

  3. Search all code in Debian from your own analysis tool, migration tracking dashboard, etc.

curl example

curl \
  -H "x-dcs-apikey: $(cat dcs-apikey-stapelberg.txt)" \
  -X GET \
  "https://codesearch.debian.net/api/v1/search?query=i3Font&match_mode=regexp" 

Web browser example

You can try out the API in your web browser in the OpenAPI documentation.

Code example (Go)

Here’s an example program that demonstrates how to set up an auto-generated Go client for the Debian Code Search OpenAPI, run a query, and aggregate the results:

func burndown() error {
	cfg := openapiclient.NewConfiguration()
	cfg.AddDefaultHeader("x-dcs-apikey", apiKey)
	client := openapiclient.NewAPIClient(cfg)
	ctx := context.Background()

	// Search through the full Debian Code Search corpus, blocking until all
	// results are available:
	results, _, err := client.SearchApi.Search(ctx, "fmt.Sprint(err)", &openapiclient.SearchApiSearchOpts{
		// Literal searches are faster and do not require escaping special
		// characters, regular expression searches are more powerful.
		MatchMode: optional.NewString("literal"),
	})
	if err != nil {
		return err
	}

	// Print to stdout a CSV file with the path and number of occurrences:
	wr := csv.NewWriter(os.Stdout)
	header := []string{"path", "number of occurrences"}
	if err := wr.Write(header); err != nil {
		return err
	}
	occurrences := make(map[string]int)
	for _, result := range results {
		occurrences[result.Path]++
	}
	for _, result := range results {
		o, ok := occurrences[result.Path]
		if !ok {
			continue
		}
		// Print one CSV record per path:
		delete(occurrences, result.Path)
		record := []string{result.Path, strconv.Itoa(o)}
		if err := wr.Write(record); err != nil {
			return err
		}
	}
	wr.Flush()
	return wr.Error()
}

The full example can be found under burndown.go.

Feedback?

File a GitHub issue on github.com/Debian/dcs please!

Migration status

I’m aware of the following third-party projects using Debian Code Search:

Tool Migration status
Debian Code Search CLI tool Updated to OpenAPI
identify-incomplete-xs-go-import-path Update pending
gnome-codesearch makes no API queries

If you find any others, please point them to this post in case they are not using Debian Code Search’s OpenAPI yet.

at 2021-03-06 10:15

2021-03-04

michael-herbst.com

PostDoc position at Appl. & Comput. Mathematics lab, RWTH Aachen University

This week I started my new position as a postdocotoral researcher at the Applied and Computational Mathematics (ACoM) research lab at RWTH Aachen University. The lab consists of two interdisciplinary research groups, namely the group of Prof. Dr. Manuel Torrilhon, who works on the mathematical modelling and simulation of technical processes (e.g. plasma or gas flow processes) as well as the group of Prof. Dr. Benjamin Stamm, which I am now joining. Ben's research focus is the numerical analysis of PDEs and linear algebra problems, which arise e.g. in electrostatics or quantum chemistry. This includes principle questions related to eigenvalue problems, but also concrete applications such as improving the performance of the polarisable continuum model, a standard solvation model in electronic structure theory. I see a good fit between our respective research backgrounds and I am happy for this opportunity to extend my research horizon and contribute to the research in Ben's group and the ACoM over the next years.

During my time in Aachen Ben and I want to continue to work on the numerical analysis and the development of mathematically-motivated methods for density-functional theory (DFT). One aspect we have in mind, for example, is to port Ben's recent work for constructing good initial guesses for the self-consistent iterations in molecular DFT (and Gaussian basis functions) to plane-wave DFT. As part of this research we will make use and extend the density-functional toolkit (DFTK), the density-functional theory code I started in Paris. Beyond our work at the ACoM I expect DFTK and its suitability for multidisciplinary research also to be helpful for reaching out to other researchers in the mathematics, computer science and physics departments in Aachen. In particular I see a good fit of DFTK within the JARA-CSD, a joint research initiative between RWTH Aachen and the Jülich research centre.

Having known Aachen already a little from my previous visits in Ben's group I am very much looking forward to work here. Not only is the city very pretty and welcoming, but also the interdisciplinary orientation of RWTH Aachen resonates well with me. I'm looking forward to the many interesting discussions to come and to becoming part of strengthening the interdisciplinary links in Aachen, while at the same time continuing to work at the boundary of chemistry, physics and mathematics.

by Michael F. Herbst at 2021-03-04 23:00 under electronic structure theory, DFT, solid state

2021-03-02

Insanity Industries

Pareto-optimal compression

Data compression is an incredibly useful tool in many situations, be it for backups, archiving, or even filesystems. But of the many compressors that there are, which of them is the one to be preferred? To properly answer this question, we first have to answer a different, but related one: What exactly makes a compression algorithm good?

Time for a closer investigation to find the best of the best of the best!

Pareto optimality or what makes a compression algorithm good

Pareto optimality is a situation where no […] preference criterion can be better off without making at least one […] preference criterion worse off […].

The concept of Pareto Optimality is tremendously useful, far beyond the scope of this blogpost, by acknowledging competing interests.

In our particular case of optimal compression, one of the obvious candidates is compression size. The smaller the resulting file is, the better the compressor, right?

But what if we are just compressing so we can speed up upload to a remote location? At this point, it doesn’t matter if the compression is supremely superior to everything else if we have to wait twice as long for it to happen compared to simply upload the uncompressed file.

So for our algorithms, we have at least two criteria that come into play: actual achievable compression and compression cost, which we will measure as “how long does it take to compress our content?"1. These two are the criteria this blogpost is focused on.

A practical example

Practically speaking, let’s assume two compression tools A and B, having both two compression levels 1 and 2, which we use on a sample file. The results might look like the following:

algorithm level file size time
A 1 60% 2s
A 2 50% 4s
B 1 40% 3s
B 2 30% 11s

We find that while both algorithms increase their compression size with level, B.1 is unconditionally better than A.2, so there is no reason to ever use A.2 if B.1 is available. Besides that, A.1, B.1 and B.2 continuously increase in compression effectiveness as well as time taken for that. For easier digestion, we can visualize these results:

Compression results for different hypothetical compression algorithms, including the Pareto frontier indicated in blue.

Compression results for different hypothetical compression algorithms, including the Pareto frontier indicated in blue.

Here we clearly see what could already be taken from the table above, but significantly more intuitive. Remembering our definition of Pareto optimality from before, we see that A.2 is not Pareto optimal, as B.1 is both better in time taken as well as in compression effect. This shows more intuitively that in this scenario there is no reason to use A.2 if B.1 is available. For A.1, B.1 and B.2 the choice is not so clear-cut, as the resulting file size can only be reduced further by investing more time into the compression. Hence, all three of them are Pareto optimal and constitute the Pareto frontier or the Pareto set.

One wants to always strive for choosing a Pareto optimal solution whenever possible, as non-Pareto-optimal solutions are always to some degree wasteful.

With this insight into Pareto optimality, we can now put our knowledge into practice and begin our journey to the Pareto frontiers of compression algorithms.

Setup and test procedure for real-world measurements

Data was gathered on a Linux system with a given sample file for the resulting filesize compared to the original as well as the time the process took for compressing. First, the sample file was read from disk entirely, ensuring it would be present in Linux' filesystem cache, then compressed with each compression program at each level 15 times for a decent statistic, the compressed result is routed directly into /dev/null. All presented compression times presented are the median of these 15 runs, compression size was only measured once after verifying that it was deterministic2.

Unless explicitly denoted otherwise, all compressors were run in their default configuration with a single compression thread. Furthermore, all applications on the machine used for benchmarking except the terminal holding the benchmarking process were closed to reduce interference by other processes as much as possible.

The tests for decompression were done analogously (with the obvious exception that the sample file is compressed according to the testcase in play beforehand), single-threaded decompression as well as routing the decompressed result to /dev/null.

All tests were run on several different CPUs: Intel i7-8550U, Intel i5-3320M3 (both mobile CPUs), Intel i7-7700K and AMD Ryzen 7 3800X (both workstation CPUs). While the absolute compression times changed, the Pareto frontiers did not, hence in this blogpost only the plots for the i7-8550U are shown exemplarily.

Case study: initramfs-compression

Just recently, the current maintainer of mkinitcpio, Giancarlo Razzolini, announced the transition from gzip to zstd as the default for initramfs-compression. The initramfs is a little mini-system, whose only objective is to prepare the hardware and assemble all disks so that the main system is readily prepared to take over operation. The initramfs needs to be loaded (and hence decompressed) at every boot as well as recreated occasionally, when components contained in it are updated.

mkinitcpio supports a variety of compression algorithms: none, gzip, bzip2, lzma, lzop, lz4 and most recently zstd.

Quantifying these algorithms and plotting the Pareto frontier for compression yields:

Compression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier. The difficult-to-decipher black culmination at about 34% resulting file size are the bzip2 results.

Compression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier. The difficult-to-decipher black culmination at about 34% resulting file size are the bzip2 results.

We find that the change of defaults from gzip to zstd was well justified, as gzip can no longer be considered Pareto optimal for this type of file. Choosing lzma as the default would make it even smaller, but this would be paid by a noticable higher resource usage for compression (which has to be invested on every update affecting the initramfs), so from the data zstd is certainly the wiser choice4.

This can also be seen when we take a look of Pareto optimality of decompression (after all, this decompression needs to happen on every single system boot):

Decompression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier.

Decompression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier.

We clearly see that zstd is blasting all other algorithms out of the water when it comes to decompression speed, making it even more of a good choice for this use case. Given these numbers, it is even more of a good choice for an initramfs, not only does it compress fast, it also decompresses impressively fast, six times faster than lzma which was previously known for its quick decompression speed despite high compression factors.

Given the data for zstd, it cannot be ruled out completely that zstd simply hit a non-CPU-bound on decompression, but even if it did, the conclusion for the choice of algorithm does not change.

dracut

If you happen to be on a Linux distribution that uses dracut for initramfs-generation, the conclusions that can be drawn for dracut-initramfs-compressions are the almost identical given the compression and decompression data for a dracut-initrd, the Pareto frontier remains mostly unchanged with just some more levels of zstd in it.

Real world usage recommendation

To use zstd in mkinitcpio, simply use version 30 and above. You can modify the compression level (default is -3) by adding a specific level to COMPRESSION_OPTIONS, but given the data, this doesn’t seem to provide much of a benefit.

For dracut, add compress="zstd" to /etc/dracut.conf to get zstd compression at the default level.

Case study: borgbackup

In the next scenario we will investigate the impact of compression when backing up data with borgbackup. Borg comes with an integrated option to compress the backupped data, with algorithms available being lz4, zstd, zlib/gzip and lzma. In addition to this, borg has an automated detection routine to see if the files backupped do actually compress enough to spend CPU cycles on compressing (see borg help compression for details).

For this scenario, we do not only have one definite sample, but will consider three different samples: A simple text file, being a dump of the Kernel log via the dmesg command, representing textual data. A binary, in this particular case the dockerd ELF binary, (rather arbitrarily) representing binary data. Finally, an mkinitcpio-image, intended as somewhat of a mixed-data sample. We do not consider media files, as these are typically already stored in compressed formats, hence unlikely to compress further and thus dealt with by borg’s compressibility heuristics.

The resulting Pareto frontiers for compression are, starting with least effective compression:

text binary initramfs
lz4.2 lz4.1 lz4.1
zstd.1 zstd.1 zstd.1
zstd.2 zstd.2 zstd.2
zstd.4 zstd.3 zstd.3
zstd.3 zstd.4 zstd.4
zstd.5 zstd.5 zstd.5
zstd.6 zstd.6 zstd.6
zstd.7 zstd.7 zstd.7
zstd.8 zstd.8 zstd.8
zstd.9 zstd.9 zstd.9
zstd.10 lzma.0 lzma.0
zstd.11 lzma.1 lzma.1
zstd.12 lzma.2 lzma.2
lzma.2 lzma.3 lzma.3
lzma.3 lzma.4 lzma.4
lzma.6 lzma.5 lzma.5
zstd.19 lzma.6 lzma.6
. lzma.7 lzma.7
. lzma.8 lzma.8
. lzma.9 lzma.9

We see that effectively, except for some brief occurrance of lz4 at the top, the relevant choices are lzma and zstd. More details can be seen in the plots linked in the column headers. Hence, as backups should be run often (with a tool as borg there is little reason for anything else but daily), zstd with a level slightly below 10 seems to be a good compromise of speed and resulting data size.

Real world usage recommendation

Adding --compression=auto,zstd,7 to the borg command used to create a backup will use zstd on level 7 if borgs internal heuristics considers the file in question to compress well, otherwise no compression will be used.

This flag can be added on-the fly, without affecting existing repositories or borgs deduplication. Already backupped data is not recompressed, meaning that adding this flag for use with an existing repository does not require a reupload of everything. Consequentially, it also means that to recompress the entire repository with zstd one effectively has to start from scratch.

Case study: Archiving with tar

Compression can not only be used for backup-tools like borg, it can also be used to archive files with tar. Some compressors have explicit flags in tar, such as gzip (-z), lzma (-J) or bzip2 (-j), but any compression algorithm can be used via tar’s -I flag.

Working with tar poses two challenges with regard to the compressor:

  • streaming: tar concatenates data and streams the result through the compressor. Hence, to extract files at a certain position of a tarball, the entirety of the data before that file needs to be decompressed as well.
  • compress-only: as a further consequence of this, tar lacks a feature testing if something is well compressible, so uncompressible data will also be sent through the compressor

If we want to investigate a good compressor without consideration for the input data, we aim for picking a Pareto optimal compressor that takes two properties into consideration:

  • fast decompression, to be able to easily and quickly extract data from the archive again if need be
  • good performance on non-compressible data (this particularly means that incompressible data should especially not increase in size).

Compression capabilities

To investigate the decompression capabilities of certain compressors, we can reuse the dataset used on the borg case study and add some incompressible data in form of a flac music file to the mix. As tar has a larger variety of usable algorithms, we include lz4, gzip, lzma, zstd, lzop and brotli as well.

We can exemplarily see the effectiveness of these on the dockerd elf binary (other datasets can be found below) first:

Compression results and the corresponding Pareto frontier for the dockerd elf binary. The unreadable black cluster at 25% compressed size is again bzip2, the one at 34% is predominantly lz4.

Compression results and the corresponding Pareto frontier for the dockerd elf binary. The unreadable black cluster at 25% compressed size is again bzip2, the one at 34% is predominantly lz4.

Decompression results and the corresponding Pareto frontier for the dockerd elf binary. The clusters are lzma for 20% resulting size, zstd at 25%, brotli at 24% and lz4 at 33%.

Decompression results and the corresponding Pareto frontier for the dockerd elf binary. The clusters are lzma for 20% resulting size, zstd at 25%, brotli at 24% and lz4 at 33%.

In summary, the Pareto frontiers for the different types of data overall turn out to be (with c for compression and d for decompression):

text (c) text (d) binary (c) binary (d) initramfs (c) initramfs (d) flac (c) flac (d)
lz4.2 lz4.9 lz4.1 lz4.8 lz4.1 lz4.7 lz4.2 zstd.1
zstd.1 zstd.13 lzop.1 lz4.9 lzop.6 lz4.9 brotli.1 zstd.5
zstd.2 zstd.11 lzop.3 zstd.15 zstd.1 zstd.1 brotli.2 zstd.7
zstd.4 zstd.12 zstd.1 zstd.16 zstd.2 zstd.9 brotli.3 zstd.19
zstd.3 zstd.14 zstd.2 zstd.17 zstd.3 zstd.14 zstd.18 .
zstd.5 zstd.15 zstd.3 zstd.19 zstd.4 zstd.15 zstd.19 .
zstd.6 zstd.19 zstd.4 lzma.8 zstd.5 zstd.16 . .
zstd.7 . zstd.5 lzma.9 zstd.6 zstd.17 . .
zstd.8 . zstd.6 . zstd.7 zstd.18 . .
zstd.9 . zstd.7 . zstd.8 zstd.19 . .
zstd.10 . zstd.8 . zstd.9 lzma.8 . .
brotli.5 . zstd.9 . brotli.5 lzma.9 . .
brotli.6 . brotli.5 . lzma.0 . . .
brotli.7 . lzma.1 . lzma.1 . . .
lzma.3 . lzma.2 . lzma.2 . . .
lzma.6 . lzma.3 . lzma.3 . . .
zstd.19 . lzma.4 . lzma.4 . . .
. . lzma.5 . lzma.5 . . .
. . lzma.6 . lzma.6 . . .
. . lzma.7 . lzma.7 . . .
. . lzma.8 . lzma.8 . . .
. . lzma.9 . lzma.9 . . .

As can be seen in the linked plots, we again find that while lzma still achieves the highest absolute compression, zstd dominates the sweet spot right before computational cost skyrockets. We also find brotli to be an interesting contender here, making it into the Pareto frontier as well. However, with only sometimes making it into the Pareto frontier, whereas lzma and zstd robustly defend their inclusion in it, it seems more advisable to resort to either lzma or zstd as this only provides a sample binary and actual data might vary. Furthermore, when it comes to decompression brotli is not Pareto optimal anymore at all, also indicating lzma and zstd as being the better choice.

Impact on incompressible files in detail

We will take another closer look at the incompressible case, represented by a flac file and strip away bzip2 and lzma, as we could tell from the linked plots that these two clearly increase the size of the result and are hence not Pareto optimal (as they are already beaten by the Pareto optimal case “no compression”).

The results have a clear indication:

Compression results on incompressible data.

Compression results on incompressible data.

Decompression results on incompressible data. The unreadable black cluster at 99.975% size is zstd, the one at 99.988% contains lzop and brotli.

Decompression results on incompressible data. The unreadable black cluster at 99.975% size is zstd, the one at 99.988% contains lzop and brotli.

The recommended choice of algorithm for compression is either brotli or zstd, but when it comes to decompression, zstd takes the lead again. This is of course a cornercase, the details of this might change with the particular choice of incompressible data. However, I do not expect the overall impression to significantly change.

Real world usage recommendation

Concluding this section, the real-world recommendation resulting from this seems to simply use zstd for any tarball compression if available. To do so, tar "-Izstd -10 -T0" can be a good choice, with -T0 telling zstd to parallelize the compression onto all available CPU cores, speeding things up even more beyond our measurements. Depending on your particular usecase it might be interesting to use an alias like

alias archive='tar "-Izstd -19 -T0" -cf'

which allows quickly taring data into a compressed archive via archive myarchive.tar.zst file1 file2 file3 ….

Case study: filesystems

Another usecase for compression is filesystem compression. Conceptually similar to what borg does, files are transparently compressing and decompressed when written to or read from disk.

Among the filesystems capable of such inline-compression are ZFS and btrfs. btrfs supports ZLIB (gzip), LZO (lzop) and zstd, whereas ZFS supports lz4, LZJB (which was not included in these benchmarks as no appropriate binary compressor was found), gzip and ZLE (zero-length-encoding, only compressing zeroes, hence also not tested). zstd support for OpenZFS has been merged, but apparently hasn’t made it into any stable version yet at time of writing according to the OpenZFS documentation.

This case is situated similar to the tar-case study, and as all compressors available for ZFS and btrfs have already been covered in the section above, there is no reason to reiterate these results here.

It shall, however, be noted, that at least for btrfs, the standard flag for filesystem compression adopts a similar heuristic as borg and hence the case of incompressible data might not be so relevant for a btrfs installation. That being said, the conclusion here is a recommendation of zstd, and as we have seen in the last section, the question of incompressible files doesn’t change the overall recommendation.

Real world usage recommendation

If you want to save diskspace by using compression, the mount option compress in combination with zstd is generally a good choice for btrfs. This also includes the compressibility-heuristics (compress-force would be the option that compresses without this heuristics). For ZFS, the general recommendation is consequentially also zstd once it makes its way into a release.

Conclusion

Concluding the experiments laid out in this blogpost, we can effectively state an almost unconditional and surprisingly clear recommendation to simply use zstd for everything. The exact level might depend on the usecase, but overall it has demonstrated to be the most versatile, yet effective compressor around when it comes to effectiveness and speed, both for compression and especially decompression. Furthermore, it has, in contrast to most other contenders, flags for built-in parallelization, which not used in this blogpost at all, and yet zstd still stomped almost the entire competition.

Only if resulting filesize should be pushed down as much as possible, without any regard for computational cost, lzma retains an edge for most kinds of data. In practice, however, the conclusion is to simply use zstd.

Thanks to Joru and corvus for proofreading and helpful comments.


  1. Which is a easy though little bit cheated way of asking “how much CPU-time do I have to burn on this?”. Obviously, there are plenty of other criteria that might be relevant, depending on the particular usecase, such as memory consumption. Furthermore, all these criteria also apply for decompression as well as compression, as we will investigate later, technically doubling the amount of criteria we can take into consideration. ↩︎

  2. For algorithms that allow parallelized compressions, this might no longer necessarily be the case, but all data in this blogpost was gathered with non-parallelized compression for all tested algorithms. ↩︎

  3. Fun fact on the site: the entire benchmarking suite (including some more data that is not included in this blogpost) runs 61 days straight on the i5-3320M. Fortunately it’s a bit faster on newer CPUs. :D ↩︎

  4. Furthermore, mkinitcpio runs zstd with -T0 by default, which parallelizes compression to all available cores. This accellerates compression even further, but was not tested in this particular scenario and hence not included in the plot, as most compressors do not support parallelization. But even without parallelization, zstd still makes it to the pareto frontier. There might be another blogpost upcoming to take a look at parallelization at some point, though… ↩︎

by Jonas Große Sundrup at 2021-03-02 22:39

2021-02-26

michael-herbst.com

Gator: a Python-driven program for spectroscopy simulations

A bit over a year ago we published our adcc code. In this work the aim was to develop a toolkit for computational spectroscopy methods focused on rapid development and interactive hands-on usage (see the blog article for details). Our target back then was to simplify method development involving the algebraic-diagrammatic construction approach (ADC) to compute excited states energies and properties. ADC has been a research focus both of myself as well as the group of Andreas Dreuw and the ADC family of methods have proven in the past to be greatly suited for describing photochemistry and spectroscopic results.

Employing mainly thread-based parallelism and (apart from our recent inclusion of libxm) basically no options for swapping stored tensors to disk, adcc is naturally restricted to problems that fit into the main memory of a single cluster node. This is fine for developing and testing new ADC methods, but can be limiting for employing ADC methods in practice: The code can currently only treat small-sized to medium-sized molecules.

In parallel to adcc we therefore started working on the Gator project in collaboration with the groups of Patrick Norman and Zilvinas Rinkevicius (both KTH Stockholm). We now release in a first version. Apart from an interface to adcc, Gator features a response library capable of the complex polarisation propagator (CPP) approach for simulating properties such as excited-states polarisabilities or enabling a direct computation of spectra including broadening. Additionally it contains a newly developed ADC(2) module with MPI-based distributed computing capabilities. For this the integral driver of the Veloxchem code from KTH is used, which allows the ADC(2) computation to be performed in a direct fashion (i.e. without storing the two-electron-integral tensor). This makes ADC(2) simulations in Gator more memory efficient and allows them to be distributed over a few cluster nodes. In this publication we provide an overview of Gator's current capabilities. The full abstract reads

The Gator program has been developed for computational spectroscopy and calculations of molecular properties using real and complex propagators at the correlated level of wave function theory. At present, the focus lies on methods based on the algebraic diagrammatic construction (ADC) scheme up to third-order of perturbation theory. A Fock matrix-driven implementation of the second-order ADC method for excitation energies has been realized with an underlying hybrid MPI/OpenMP parallelization scheme suitable for execution in high-performance computing cluster environments. With a modular and object-oriented program structure written in a Python/C++ layered fashion, Gator enables, in addition, time-efficient prototyping of novel scientific approaches as well as interactive notebook-driven training of students in quantum chemistry.

by Michael F. Herbst at 2021-02-26 23:30 under electronic structure theory, theoretical chemistry, adcc, algebraic-diagrammatic construction

2021-02-13

Insanity Industries

Tracking leftover packages with pacman

Automatically resolving and installing dependencies is one of the core features of package managers (and one of the most convenient). However, this can lead to packages being installed that have been pulled as a dependency for another package, but are no longer needed1. This can have slightly unfortunate side effects, such as crowding the upgrade dialog when a bunch of packages that are no longer needed receive updates in high frequency (looking at you, Haskell packages in Arch…).

To ease house cleaning, at least on distros using pacman, we can simply find these unneeded packages that used to be installed as a dependency via pacman -Qtd2. This will list all unrequired packages that have been originally installed as a dependency instead of being specifically installed by the user.

We could either periodically run this by hand, or we can simply tell pacman to just do it for us regularly. To do so, we create a new file /etc/pacman.d/hooks/unneeded-packages.hook3 containing

[Trigger]
Operation = Install
Operation = Upgrade
Operation = Remove
Type = Package
Target = *

[Action]
Description = "Checking for unneeded packages"
When = PostTransaction
Exec = /usr/bin/pacman -Qtd

This will run pacman -Qtd on every pacman-operation that can change our package state for every possible target package after all package transactions have been completed.

While this works, the output is not the most neatly arranged, so we can just modify our Exec a little to make it more beautiful, either by calling a script that does that or simply by a semi-beautiful, but compact one-liner, changing our pacman-hook to

[Trigger]
Operation = Install
Operation = Upgrade
Operation = Remove
Type = Package
Target = *

[Action]
Description = "Checking for unneeded packages"
When = PostTransaction
Exec = /usr/bin/bash -c "set -o pipefail && /usr/bin/pacman -Qtd | sed 's/^/  - /'  || /usr/bin/echo '  :: No unneeded packages found.'"

This now neatly displays superfluous packages right after ever pacman -Syu, so you can clean up right away if you really don’t need those packages anymore.

Cleaning up superfluous packages

The quickest way of removing superfluous packages (and all its dependencies, that are only needed by these packages) is by running pacman -Rnsc $(pacman -Qdtq)2, which can be easily aliased to something like pacclean, being readily available at your fingertips afterwards without much typing.

off-topic remark: dependency tracking

On the note of package dependencies: if you want to find out which package(s) pulled in a specific package, pactree -r specific_package_name will give you insight into that.


  1. This could happen for numerous reasons, for example because one removed the package needing it without telling the package manager to remove dependencies as well or that package simply dropped its dependency. ↩︎

  2. Finding out what the single flags mean is left to the reader, if not known, the author highly recommends getting to know pacman’s tremendously helpful subcommand-helps, such as pacman -Qh. ↩︎

  3. The name of the file doesn’t matter, however, it must be located in the hooks-directory and end in .hook for pacman to read it. ↩︎

by Jonas Große Sundrup at 2021-02-13 00:00

2021-02-04

michael-herbst.com

CESMIX TST meeting: DFTK.jl: A multidisciplinary Julia code for density-functional theory development

These past two days I have participated in the Tri-Lab Support Team (TST) meeting of the CESMIX, the newly founded Center for the Exascale Simulation of Material Interfaces in Extreme Environments at the Massachusetts Institute of Technology. Within the next few years the idea of the CESMIX is to develop a multi-level simulation stack all the way up from DFT over MD to flow simulations to be able to discover novel materials suitable for extremely high temperatures under atmospheric conditions. The prototypical application for such materials would be heat shields for example in space crafts returning to earth or supersonic planes.

One novel aspect of the project is to include progress from modern compiler techniques and programming language design when building the software stack. In particular the challenge is that multiple codes will be involved in the project that feature a large variety of programming languages (FORTRAN, C++, Julia, python, ...). On top of that one goal is to keep track of simulation errors using uncertainty quantification (UQ) and use that insight to construct a multi-fidelity workflow. In such an approach the data generation for the simulation does not only employ a single accurate model, but in fact features multiple simulation layers based on cheaper and cruder models as well as more costly and accurate ones. Using the deduced knowledge of the error one can dynamically switch between these models and reach a compromise between accuracy and computational cost, but in a way that the result still has the quality to be comparable to experiments. As the employed fidelity layers the CESMIX project targets classical molecular dynamics or ab initio molecular dynamics with various kinds of density-functional theory (DFT) methods ... and this is the angle of my involvement in the project.

In particular using our density-functional toolkit (DFTK) the idea is to be able to quickly prototype parts of the workflow. Then, building on DFTK's design as a multi-disciplinary platform (see my related blog articles), we want to start incorporating new techniques (GPU platforms, UQ, multi-fidelity) to see how they could fit. I am excited about the opportunity to contribute to a project, which shares a lot in philosophy to DFTK itself. In particular I am looking forward to seeing how DFTK will play out in a real-world research scenario for connecting the needs from the modelling side with the approaches of the computer science folks.

With respect to the TST meeting, I briefly gave an overview about DFTK to show where we are. Slides and a short demo are attached below.

Link Licence
DFTK.jl: A multidisciplinary Julia code for density-functional theory development (Slides) Creative Commons License
Benchmarks and DFTK demo (Tarball) Creative Commons License

by Michael F. Herbst at 2021-02-04 19:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, numerical analysis, Kohn-Sham, high-throughput, invited talk, DFT, solid state

2021-01-10

sECuREs website

A quick introduction to MQTT for IOT

While I had heard the abbreviation MQTT many times, I never had a closer look at what MQTT is.

Here are a few quick notes about using MQTT as Pub/Sub bus in a home IOT network.

Motivation

Once you have a few IOT devices, an obvious question is how to network them.

If all your devices are from the same vendor, the vendor takes care of it.

In my home, I have many different vendors/devices, such as (incomplete list):

Here is how I combine these devices:

  • When I’m close to my home (geo-fencing), the Nuki Opener enables Ring To Open (RTO): when I ring the door bell, it opens the door for me.
  • When I open the apartment door, the Smart Lights in the hallway turn on.
  • When I’m home, my stereo speakers should be powered on so I can play music.

A conceptually simple way to hook this up is to connect things directly: listen to the Aqara Door Sensor and instruct the Smart Lights to turn on, for example.

But, connecting everything to an MQTT bus has a couple of advantages:

  1. Unification: everything is visible in one place, the same tools work for all devices.
  2. Your custom logic is uncoupled from vendor details: you can receive and send MQTT.
  3. Compatibility with existing software, such as Home Assistant or openHAB

Step 1. Set up an MQTT broker (server)

A broker is what relays messages between publishers and subscribers. As an optimization, the most recent value of a topic can be retained, so that e.g. a subscriber does not need to wait for the next change to obtain the current state.

The most popular choice for broker software seems to be Mosquitto, but since I like to run Go software on https://gokrazy.org/, I kept looking and found https://github.com/fhmq/hmq.

One downside of hmq might be that it does not seem to support persisting retained messages to disk. I’ll treat this as a feature for the time being, enforcing a fresh start on every daily reboot.

To restrict hmq to only listen in my local network, I’m using gokrazy’s flag file feature:

mkdir -p flags/github.com/fhmq/hmq
echo --host=10.0.0.217 > flags/github.com/fhmq/hmq/flags.txt

Note that you’ll need https://github.com/fhmq/hmq/pull/105 in case your network does not come up quickly.

MQTT broker setup: displaying/sending test messages

To display all messages going through your MQTT broker, subscribe using the Mosquitto tools:

% sudo pacman -S mosquitto
% mosquitto_sub --id "${HOST}_all" --host dr.lan --topic '#' --verbose

The # sign denotes an MQTT wildcard, meaning subscribe to all topics in this case.

Be sure to set a unique id for each mosquitto_sub command you run, so that you can see which subscribers are connected to your MQTT bus. Avoid id clashes, otherwise the subscribers will disconnect each other!

Now, when you send a test message, you should see it:

% mosquitto_pub --host dr.lan --topic 'cmnd/tasmota_68462F/Power' -m 'ON'

Tip: If you have binary data on your MQTT bus, you can display it in hex with timestamps:

% mosquitto_sub \
  --id "${HOST}_bell" \
  --host dr.lan \
  --topic 'doorbell/#' \
  -F '@Y-@m-@dT@H:@M:@S@z : %t : %x'

Step 2. Integrate with MQTT

Now that communication via the bus works, what messages do we publish on which topics?

MQTT only defines that topics are hierarchical; messages are arbitrary byte sequences.

There are a few popular conventions for what to put onto MQTT:

If you design everything yourself, Homie seems like a good option. If you plan to use Home Assistant or similar, stick to the Home Assistant convention.

Best practices for your own structure

In case you want/need to define your own topics, keep these tips in mind:

  • devices publish their state on a single, retained topic
    • the topic name could be e.g. stat/tasmota_68462F/POWER
    • retaining the topic allows consumers to catch up after (re-)connecting to the bus
  • publish commands on a single, possibly-retained topic
    • e.g. publish ON to topic cmnd/tasmota_68462F/Power
    • publish the desired state: publish ON or OFF instead of TOGGLE
    • if you retain the topic and publish TOGGLE commands, your lights will mysteriously go off/on when they unexpectedly re-establish their MQTT connection

Integration: Shelly devices with MQTT built-in

Shelly has a number of smart devices that come with MQTT out of the box! This sounds like the easiest solution if you’re starting from scratch.

I haven’t used these devices personally, but I hear good things about them.

Integration: Zigbee2MQTT for Zigbee devices

Zigbee2MQTT supports well over 1000 Zigbee devices and exposes them on the MQTT bus.

For example, this is what you would use to connect your IKEA TRÅDFRI Smart Lights to MQTT.

Integration: ESPHome for micro controllers + sensors

The ESPHome system is a ready-made solution to connect a wide array of sensors and devices to your home network via MQTT.

If you want to use your own ESP-based micro controllers and sensors, this seems like the easiest way to get them programmed.

Integration: Mongoose OS for micro controllers

Mongoose OS is an IOT firmware development framework, taking care of device management, Over-The-Air updates, and more.

Mongoose comes with MQTT support, and with just a few lines you can build, flash and configure your device. Here’s an example for the NodeMCU (ESP8266-based):

% yay -S mos-bin
% mos clone https://github.com/mongoose-os-apps/demo-js app1
% cd app1
% mos --platform esp8266 build
% mos --platform esp8266 --port /dev/ttyUSB1 flash
% mos --port /dev/ttyUSB1 config-set mqtt.enable=true mqtt.server=dr.lan:1883

Pressing the button on the NodeMCU publishes a message to MQTT:

% mosquitto_sub --host dr.lan --topic devices/esp8266_F4B37C/events
{"ram_free":31260,"uptime":27.168680,"btnCount":2,"on":false}

Integration: Arduino for custom micro controller firmware

Arduino has an MQTT Client library. If your microcontroller is networked, e.g. an ESP32 with WiFi, you can publish MQTT messages from your Arduino sketch:

#include <WiFi.h>
#include <PubSubClient.h>

WiFiClient wificlient;
PubSubClient client(wificlient);

void callback(char* topic, byte* payload, unsigned int length) {
    Serial.print("Message arrived [");
    Serial.print(topic);
    Serial.print("] ");
    for (int i = 0; i < length; i++) {
      Serial.print((char)payload[i]);
    }
    Serial.println();
  
    if (strcmp(topic, "doorbell/cmd/unlock") == 0) {
  		// …
    }
}

void taskmqtt(void *pvParameters) {
	for (;;) {
		if (!client.connected()) {
			client.connect("doorbell" /* clientid */);
			client.subscribe("doorbell/cmd/unlock");
		}

		// Poll PubSubClient for new messages and invoke the callback.
		// Should be called as infrequent as one is willing to delay
		// reacting to MQTT messages.
		// Should not be called too frequently to avoid strain on
		// the network hardware:
		// https://github.com/knolleary/pubsubclient/issues/756#issuecomment-654335096
		client.loop();
		vTaskDelay(pdMS_TO_TICKS(100));
	}
}

void setup() {
	connectToWiFi(); // WiFi configuration omitted for brevity

	client.setServer("dr.lan", 1883);
	client.setCallback(callback);

	xTaskCreatePinnedToCore(taskmqtt, "MQTT", 2048, NULL, 1, NULL, PRO_CPU_NUM);
}

void processEvent(void *buf, int telegramLen) {
	client.publish("doorbell/events/scs", buf, telegramLen);
}

Integration: Webhook to MQTT

The Nuki Opener doesn’t support MQTT out of the box, but the Nuki Bridge can send Webhook requests. In a few lines of Go, you can forward what the Nuki Bridge sends to MQTT:

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"

	mqtt "github.com/eclipse/paho.mqtt.golang"
)

func nukiBridge() error {
	opts := mqtt.NewClientOptions().AddBroker("tcp://dr.lan:1883")
	opts.SetClientID("nuki2mqtt")
	opts.SetConnectRetry(true)
	mqttClient := mqtt.NewClient(opts)
	if token := mqttClient.Connect(); token.Wait() && token.Error() != nil {
		return fmt.Errorf("MQTT connection failed: %v", token.Error())
	}

	mux := http.NewServeMux()
	mux.HandleFunc("/nuki", func(w http.ResponseWriter, r *http.Request) {
		b, err := ioutil.ReadAll(r.Body)
		if err != nil {
			log.Print(err)
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

		mqttClient.Publish(
			"zkj-nuki/webhook", // topic
			0, // qos
			true, // retained
			string(b)) // payload
	})

	return http.ListenAndServe(":8319", mux)
}

func main() {
	if err := nukiBridge(); err != nil {
		log.Fatal(err)
	}
}

See Nuki’s Bridge HTTP-API document for details on how to configure your bridge to send webhook callbacks.

Step 3. Express your logic

Home Assistant and Node-RED are both popular options, but also large software packages.

Personally, I find it more fun to express my logic directly in a full programming language (Go).

I call the resulting program regelwerk (“collection of rules”). The program consists of:

  1. various control loops that progress independently from each other
  2. an MQTT message dispatcher feeding these control loops
  3. a debugging web interface to visualize state

This architecture is by no means a new approach: as moquette describes it, this is to MQTT what inetd is to IP. I find moquette’s one-process-per-message model to be too heavyweight and clumsy to deploy to https://gokrazy.org, so regelwerk is entirely in-process and a single, easy-to-deploy binary, both to computers for notifications, or to headless Raspberry Pis.

regelwerk: control loops definition

regelwerk defines a control loop as a stateful function that accepts an event (from MQTT) and returns messages to publish to MQTT, if any:

type controlLoop interface {
	sync.Locker

	StatusString() string // for human introspection

	ProcessEvent(MQTTEvent) []MQTTPublish
}

// Like mqtt.Message, but with timestamp
type MQTTEvent struct {
	Timestamp time.Time
	Topic     string
	Payload   interface{}
}

// Parameters for mqtt.Client.Publish()
type MQTTPublish struct {
	Topic    string
	Qos      byte
	Retained bool
	Payload  interface{}
}

regelwerk: MQTT dispatcher

Our MQTT message handler dispatches each incoming message to all control loops, in one goroutine per message and loop. With typical message volumes on a personal MQTT bus, this is a simple yet effective design that brings just enough isolation.

type mqttMessageHandler struct {
	dryRun bool
	loops  []controlLoop
}

func (h *mqttMessageHandler) handle(client mqtt.Client, m mqtt.Message) {
	log.Printf("received message %q on %q", m.Payload(), m.Topic())
	ev := MQTTEvent{
		Timestamp: time.Now(), // consistent for all loops
		Topic:     m.Topic(),
		Payload:   m.Payload(),
	}

	for _, l := range h.loops {
		l := l // copy
		go func() {
			// For reliability, we call each loop in its own goroutine
			// (yes, one per message), so that when one loop gets stuck,
			// the others still make progress.
			l.Lock()
			results := l.ProcessEvent(ev)
			l.Unlock()
			if len(results) == 0 {
				return
			}
			for _, r := range results {
				log.Printf("publishing: %+v", r)
				if !h.dryRun {
					client.Publish(r.Topic, r.Qos, r.Retained, r.Payload)
				}
			}
			// …input/output logging omitted for brevity…
		}()
	}
}

regelwerk: control loop example

Now that we have the definition and dispatching out of the way, let’s take a look at an actual example control loop.

This control loops looks at whether my PC is unlocked (in use) or whether my phone is home, and then turns off/on my stereo speakers accordingly.

The inputs come from runstatus and dhcp4d, the output goes to a Sonoff S26 Smart Power Plug running Tasmota.

type avrPowerLoop struct {
	statusLoop // for l.statusf() debugging

	midnaUnlocked          bool
	michaelPhoneExpiration time.Time
}

func (l *avrPowerLoop) ProcessEvent(ev MQTTEvent) []MQTTPublish {
	// Update loop state based on inputs:
	switch ev.Topic {
	case "runstatus/midna/i3lock":
		var status struct {
			Running bool `json:"running"`
		}
		if err := json.Unmarshal(ev.Payload.([]byte), &status); err != nil {
			l.statusf("unmarshaling runstatus: %v", err)
			return nil
		}
		l.midnaUnlocked = !status.Running

	case "router7/dhcp4d/lease/Michaels-iPhone":
		var lease struct {
			Expiration time.Time `json:"expiration"`
		}
		if err := json.Unmarshal(ev.Payload.([]byte), &lease); err != nil {
			l.statusf("unmarshaling router7 lease: %v", err)
			return nil
		}
		l.michaelPhoneExpiration = lease.Expiration

	default:
		return nil // event did not influence our state
	}

	// Publish desired state changes:
	now := ev.Timestamp
	phoneHome := l.michaelPhoneExpiration.After(now)
	anyoneHome := l.midnaUnlocked || (now.Hour() > 8 && phoneHome)
	l.statusf("midnaUnlocked=%v || (now.Hour=%v > 8 && phoneHome=%v)",
		l.midnaUnlocked, now.Hour(), phoneHome)

	payload := "OFF"
	if anyoneHome {
		payload = "ON"
	}
	return []MQTTPublish{
		{
			Topic:    "cmnd/tasmota_68462F/Power",
			Payload:  payload,
			Retained: true,
		},
	}
}

Conclusion

I like the Pub/Sub pattern for home automation, as it nicely uncouples all components.

It’s a shame that standards such as The Homie convention aren’t more widely supported, but it looks like software makes up for that via configuration options.

There are plenty of existing integrations that should cover most needs.

Ideally, more Smart Home and IOT vendors would add MQTT support out of the box, like Shelly.

at 2021-01-10 14:26

2020-12-26

atsutane

polkit und systemd System Units

Was in diesem Post als System Unit bezeichnet wird, ist eine systemd Unit deren Prozesse im system.slice laufen, was den meisten Service Units entspricht. Unter systemd haben Benutzer standardmäßig nicht die Rechte den Status solcher Units via systemd selbst zu beeinflussen. Es ist jedoch möglich Benutzern oder Gruppen für bestimmte Befehle dies mittels polkit zu erlauben.

systemd prüft bei der Ausführung der meisten Befehle mittels polkit, ob der den Befehl initierende Benutzer dies darf, bei Befehlen wie systemctl status oder systemctl show geschieht dies nicht. Für die Befehle zum Start, Stop oder Neustart einer Unit kann man diese Befugnis sehr spezifisch erteilen, bei Befehlen die über eine andere Policy gehandhabt werden ist eine spezifische Einschränkung nicht möglich, zum Beispiel das De-/Aktivieren von Units. Bei systemctl start|stop|restart wird org.freedesktop.systemd1.manage-units verwendet, für systemctl enable|disable ist es org.freedesktop.systemd1.manage-unit-files, auch wird im systemd Code letzterer beider Befehle nicht nach so vielen spezifischen Details gefragt wie bei den ersten drei.

Ein beispielhafter Anwendungsfall für solche Regeln wäre es, wenn eine Benutzergruppe auf einem Entwicklungsserver in keiner Form als root operieren darf. Den Entwicklern soll es jedoch möglich sein, die von ihnen entwickelte Software auf den Server zu deployen und exakt in der Form auszuführen, in der die Software später in einer produktiven Landschaft operiert. Hierzu bedürfe es also einer globalen Service Unit und einer polkit Regel, die es den Benutzern erlaubt den Service via systemctl start|stop|restart zu handhaben. Ob dies nun menschliche Benutzer oder der Benutzer einer continuous integration Lösung ist, der die Ergebnisse des Builds deployed sei dahingestellt, auf magische Art und Weise geschieht das Deployment natürlich auch immer absolut korrekt.

Die Service Unit foobar.service

[Unit]
Description=foobar is the most creative name for anything in the whole universe.

[Service]
Type=simple
ExecStart=/opt/foobar/executable
User=foobar
Group=foobar
# Imagine lots of other useful settings here

[Install]
WantedBy=multi-user.target

Die dazugehörige polkit Regel /etc/polkit-1/10-foobar.rules

polkit.addRule(function(action, subject) {
    if (action.id == "org.freedesktop.systemd1.manage-units" &&
        (subject.isInGroup("developer") && action.lookup("unit") == "foobar.service" &&
        (action.lookup("verb") == "start" ||
         action.lookup("verb") == "stop"  ||
         action.lookup("verb") == "restart"))) {
        return polkit.Result.YES;
    }
});

Der für granularere Restriktion ist neben der Gruppen auch der Benutzername als subject.user verfügbar.

by Thorsten Töpper at 2020-12-26 15:07 under howto, systemd, polkit

2020-12-23

michael-herbst.com

High-throughput density-functional theory calculations: An interdisciplinary challenge

Last Thursday I was invited to give a virtual talk at the Scientific Computing Seminar of working group of Prof. Nicolas Gauger at TU Kaiserslautern. Since the research in Prof. Gauger's group mostly concerns topics which are not directly related to electronic structure theory and density-functional theory (DFT), I chose to present my current research from a rather broad and introductory angle. Main focus of my talk was thus to hint at the interdisciplinary challenges arising in high-throughput methods in DFT simulations, followed by a summary of a few of my recent projects in the field.

I was glad for the opportunity to spread the word about the difficulties with high-throughput methods in DFT. Firstly because I think it is an absolutely fascinating topic, but secondly because it is one where input from fields beyond the standard "culprits" of chemistry, physics and materials science is beneficial to solve the upcoming problems. In fact exactly this hope to get other fields and people with non-standard backgrounds involved was one of the driving forces behind our density-functional toolkit DFT code, which I also briefly presented.

I hope that my talk got some of the audience more interested in DFT and I look forward to continue the discussions at a later point, hopefully meeting the Gauger group in person in Kaiserslautern.

Link Licence
High-throughput density-functional theory calculations: An interdisciplinary challenge (Slides) Creative Commons License

by Michael F. Herbst at 2020-12-23 17:00 under talk, electronic structure theory, Julia, DFTK, theoretical chemistry, numerical analysis, Kohn-Sham, high-throughput, invited talk, DFT, solid state

2020-11-30

sECuREs website

Fixing the Nuki Opener smart intercom IOT device (on the BTicino SCS bus intercom system)

I recently bought a Nuki Opener, which “turns your existing intercom into a smart door opener”.

Unfortunately, I have had a lot of trouble getting it to work.

I finally got the device working by interjecting my own micro controller between the intercom bus and the Nuki Opener, then driving the Nuki Opener in its Analogue mode:

The rest of this article outlines how this setup works at a high level.

Prerequisites

For reliable interpretation and transmission of SCS bus data, we’ll need:

  1. SCS receive/transmit circuits. These can be prototyped on a breadboard if you have the required diodes, transistors, resistors and capacitors.

  2. A microcontroller with an Analog Comparator. If your microcontroller has one, you’ll find a corresponding section in the datasheet. This function is sometimes abbreviated to CMP or AC, or might be part of a larger Analog/Digital Converter (ADC).

  3. A UART (serial) decoder. Most microcontrollers have at least one UART, but if you don’t have one available for whichever reason, you could use a software UART implementation, too.

SCS receive circuit

SCS receive circuit

An R-C network, directly connected to the SCS bus, is used for incoming signal conditioning.

The resistor values have been chosen to divide the voltage of the input signal from 28V down to approx. 2V, i.e. well within the 0-3.3V range for modern microcontroller GPIO pins.

A zener diode limits the 28V level to 3.3V, which should be safe for most microcontrollers.

Simulation: https://tinyurl.com/yxhrkejn


SCS transmit circuit

SCS transmit circuit

We directly connect the gate of a mosfet transistor to a GPIO pin of our microcontroller, so that when the microcontroller drives the pin high, we use the 100Ω resistor to attach a load to the SCS bus.

For comparison, the KNX bus, which is similar to the SCS bus, uses a 68Ω resistor here.

Simulation: https://tinyurl.com/y6nv4yg7


SCS lab setup

Use a lab power supply to generate 28V DC. I’m using the Velleman LABPS3005SM because it was in stock at Galaxus, but any power supply rated for at least 30V DC will do.

As the DIY home automation blog entry “A minimal KNX setup” describes, you’ll need to place a 47Ω resistor between the power line and your components.

Afterwards, just connect your components to the bus. The supply/ground line of a breadboard will work nicely.

SCS lab setup

Micro Controller choice

In this blog post, I’m using a Teensy 4 development board that is widely available for ≈20 USD:

Teensy 4

With its 600 MHz, the Teensy 4 has enough sheer clock frequency to allow for sloppier coding while still achieving high quality input/output.

The teensy tiny form factor (3.5 x 1.7 cm) works well for this project and will allow me to store the microcontroller in an existing intercom case.

The biggest downside is that NXP’s own MCUXpresso IDE cannot target the Teensy 4!

The only officially supported development environment for the Teensy 4 is Teensyduino, which is a board support package for the Arduino IDE. Having Arduino support is great, but let’s compare:

I also have NXP’s MIMXRT1060-EVK eval kit, which uses the same i.MX RT1060 micro controller family as the Teensy 4, but is much larger and comes with all the bells and whistles; notably:

  1. The MCUXpresso IDE works with the eval kit’s built-in debugger out of the box! Being able to inspect a stack trace, set breakpoints and look at register contents are invaluable tools when doing micro controller development.
  2. The MCUXpresso IDE comes with convenient graphical Pin and Clock config tools. Setting a pin’s alternate function becomes a few clicks instead of hours of fumbling around.
  3. The NXP SDK contains a number of drivers and examples that are tested on the eval kit. That makes it really easy to get started!

Each of these points is very attractive on their own, but together they make the whole experience so different!

Being able to deploy to the Teensy from MCUXpresso would be a killer feature! So many NXP SDK examples would suddenly become available, filling the Teensy community’s gaps.

Signal Setup

On a high level, this is how we are going to connect the various signals:

Step 1. We start with the SCS intercom bus signal (28V high, 22V low):

Step 2. Our SCS receive circuit takes the bus signal and divides it down to 2V:

voltage-divided SCS signal

Step 3. We convert the voltage-divided analog signal into a digital SCSRXOUT signal:

Analog Comparator output signal

Step 4. We modify our SCSRXOUT signal so that it can be sampled at 50%:

modified SCS signal

Step 5. We decode the signal using our micro controller’s UART:

Teensy 4 UART decodes SCS

Micro Controller firmware

Once I complete the next revision of the SCS interface PCB, I plan to release all design files, schematics, sources, etc. in full.

Until then, the following sections describe how the most important parts work, but skip over the implementation-specific glue code that wires everything together.

Analog Comparator

The Analog Comparator in our microcontroller lets us know whether a voltage is above or below a configured threshold voltage by raising an interrupt. A good threshold is 1.65V in my case.

In response to the voltage change, we set GPIO pin 15 to a digital high (3.3V) or low (0V) level:

volatile uint32_t cmpflags;

// ISR (Interrupt Service Routine), called by the Analog Comparator:
void acmp1_isr() {
  cmpflags = CMP1_SCR;

  { // clear interrupt status flags:
    uint8_t scr = (CMP1_SCR & ~(CMP_SCR_CFR_MASK | CMP_SCR_CFF_MASK));
    CMP1_SCR = scr | CMP_SCR_CFR_MASK | CMP_SCR_CFF_MASK;
  }

  if (cmpflags & CMP_SCR_CFR_MASK) {
    // See below! This line will be modified:
    digitalWrite(15, HIGH);
  }

  if (cmpflags & CMP_SCR_CFF_MASK) {
    digitalWrite(15, LOW);
  }
}

This signal can easily be verified by attaching an oscilloscope probe each to the SCSRX voltage-regulated bus signal input and to the SCSRXOUT GPIO pin output:

Analog Comparator output signal

Analog Comparator Modification

There is one crucial difference between SCS and UART:

To transmit a 0 (or start bit):

  • SCS is low 34μs, then high 70μs
  • UART is low the entire 104μs

UART implementations typically sample at 50%, the middle of the bit period.

For SCS, we would need to sample at 20%, because the signal returns to high so quickly.

While setting a custom sample point is possible in e.g. sigrok’s UART decoder, neither software nor hardware serial implementations on micro controllers typically support it.

On a micro controller it is much easier to just modify the signal so that it can be sampled at 50%.

In practical terms, this means modifying the acmp1_isr function to return to high later than the Analog Comparator indicates:

volatile uint32_t cmpflags;

// ISR (Interrupt Service Routine), called by the Analog Comparator:
void acmp1_isr() {
  cmpflags = CMP1_SCR;

  { // clear interrupt status flags:
    uint8_t scr = (CMP1_SCR & ~(CMP_SCR_CFR_MASK | CMP_SCR_CFF_MASK));
    CMP1_SCR = scr | CMP_SCR_CFR_MASK | CMP_SCR_CFF_MASK;
  }

  if (cmpflags & CMP_SCR_CFR_MASK) {
    // Instead of setting our output pin high immediately,
    // we delay going up by approx. 40us,
    // turning the SCS signal into a UART signal:
    delayMicroseconds(40);
    digitalWrite(15, HIGH);
  }

  if (cmpflags & CMP_SCR_CFF_MASK) {
    digitalWrite(15, LOW);
  }
}

You can now read this signal using your laptop and a USB-to-serial adapter!

On a micro controller, we now feed this signal back into a UART decoder. For prototyping, this can literally mean a jumper wire connecting the output GPIO pin with a serial RX pin. Some micro controllers also support internal wiring of peripherals, allowing you to get rid of that cable.

SCS RX (receive)

With the SCS intercom bus signal bytes now available through the UART decoder, we can design a streaming SCS decoder. The decoder self-synchronizes and skips invalid SCS telegrams by checking their checksum. We start with a ring buffer and a convenience working copy:

constexpr int telegramLen = 7;

typedef struct {
  // circular buffer for incoming bytes, indexed using cur
  uint8_t buf[telegramLen];
  int cur;

  uint8_t tbuf[telegramLen];
} scsfilter;

Each byte we receive from the UART, we store in our ring buffer:

void sf_WriteByte(scsfilter *sf, uint8_t b) {
  sf->buf[sf->cur] = b;
  sf->cur = (sf->cur + 1) % telegramLen;
}

After every byte, we can check if the ring buffer decodes to a valid ring signal SCS bus telegram:

bool sf_completeAndValid(scsfilter *sf) {
  const uint8_t prev = sf->buf[(sf->cur+(telegramLen-1))%telegramLen];
  if (prev != 0xa3) {
    return false; // incomplete: previous byte not a telegram termination
  }

  // Copy the whole telegram into tbuf; makes working with it easier:
  for (int i = 0; i < telegramLen; i++) {
    sf->tbuf[i] = sf->buf[(sf->cur+i)%telegramLen];
  }

  const uint8_t stored = sf->tbuf[5];
  const uint8_t computed = sf->tbuf[1] ^
    sf->tbuf[2] ^
	sf->tbuf[3] ^
	sf->tbuf[4];
  if (stored != computed) {
    return false; // corrupt? checksum mismatch
  }

  return true;
}

int sf_ringForApartment(scsfilter *sf) {
  if (!sf_completeAndValid(sf)) {
    return -1;
  }

  if (sf->tbuf[3] != 0x60) {
    return -1; // not a ring command
  }

  if (sf->tbuf[1] != 0x91) {
    return -1; // not sent by the intercom house station
  }

  return (int)(sf->tbuf[2]); // apartment id
}

SCS TX (send)

Conceptually, writing serial data to a GPIO output from software is done with e.g. the Arduino SoftwareSerial library, but there are plenty of implementations for different micro controllers. This technique is also sometimes called “Bit banging”.

I started with the the Teensy SoftwareSerial::write implementation and modified it to:

  1. Invert the output to drive the SCS transmit circuit’s Mosfet transistor gate, i.e. low on idle and high on transmitting a 0 bit.

  2. Return to idle 70μs earlier than the signal would, i.e. after ≈34μs already.

The modified write function looks like this:

#define V27 LOW
#define V22 HIGH

#define scs0() do { \
  while (ARM_DWT_CYCCNT - begin_cycle < (target-43750/*70us*/)) ; \
  digitalWriteFast(11, V27); \
} while (0)

size_t SCSSerial::write(uint8_t b)
{
  elapsedMicros elapsed;
  uint32_t target;
  uint8_t mask;
  uint32_t begin_cycle;

  ARM_DEMCR |= ARM_DEMCR_TRCENA;
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
  ARM_DWT_CYCCNT = 0;

  // start bit
  target = cycles_per_bit;
  noInterrupts();
  begin_cycle = ARM_DWT_CYCCNT;
  digitalWriteFast(11, V22);
  scs0();
  wait_for_target(begin_cycle, target);

  // 8 data bits
  for (mask = 1; mask; mask <<= 1) {
    if (b&mask) {
      digitalWriteFast(11, V27);
    } else {
      digitalWriteFast(11, V22);
    }
    target += cycles_per_bit;
    scs0();
    wait_for_target(begin_cycle, target);
  }

  // stop bit
  digitalWriteFast(11, V27);
  interrupts();
  target += cycles_per_bit;
  scs0();
  while (ARM_DWT_CYCCNT - begin_cycle < target) ; // wait
  return 1;
}

It works!

With the approach described above, I now have a micro controller that recognizes doorbell rings for my apartment and ignores doorbell rings for my neighbors. The micro controller can unlock the door, too, and both features are available through the Nuki Opener.

How is the Nuki Opener?

It took over 2 months before I saw the Nuki Opener working correctly for the first time.

I really hope the Nuki developers can work with what I described above and improve their product’s reliability for all customers with an SCS intercom system!

The device itself seems useful and usable, but time will tell how reliable it turns out in practice. I think I noticed push notifications when the door rang coming in rather late (many seconds later).

I’ll keep an eye on this and explore the various Nuki APIs more.

Appendix: Project Journal

  • 2020-09-26: I buy a Nuki Opener (Nuki Opener #1), but despite connecting it correctly, it never successfully opens the door. I start learning about the SCS home automation bus system that our intercom uses.
  • 2020-09-28: I publish an SCS bus decoder for sigrok and contact the Nuki Support.
  • 2020-10-15: I buy another Nuki Opener (Nuki Opener #2) to test their old firmware version, because downgrading firmware versions is impossible. Opener #2 actually opens the door, so I assume we are dealing with a firmware problem [turns out incorrect later].
  • 2020-10-16: I publish a detailed analysis of the Nuki Opener not sending the correct signal for the Nuki developers to go through.
  • 2020-11-03: I update my new Nuki Opener #2 to the latest firmware and realize that my old Nuki Opener #1 most likely just has some sort of hardware defect. However, Opener #2 has trouble detecting the ring signal: either it doesn’t detect any rings at all, or it detects all rings, including those for my neighbors!
  • 2020-11-16: In their 13th (!) email reply, Nuki Support confirms that the Opener firmware is capturing and matching the incoming ring signal, if I understand their developers correctly.
  • 2020-11-18: I suggest to Nuki developers (via Nuki Support) to decode the SCS signal with a UART decoder instead of comparing waveforms. This should be a lot more reliable!
  • 2020-11-23: My self-designed SCS receiver/transmitter/power supply PCB arrives. The schematics are based on existing SCS DIY work, but I created my own KiCad files because I was only interested in the SCS bus interface, not the PIC microcontroller they used.
  • 2020-11-25: Working on the intercom, I assume some wire touched an unlucky spot, and my BTicino intercom went up in smoke. We enabled the Nuki Opener’s ring sound and started using it as our main door bell. This meant we now started hearing the ring sound for (some) of our neighbors as well.
  • 2020-11-26: My Teensy 4 microcontroller successfully decodes the SCS bus signal with its Analog Comparator and UART decoder.
  • 2020-11-28: My Teensy 4 microcontroller is deployed to filter the SCS bus ring signal and drive the Nuki Opener in analogue mode.

at 2020-11-30 07:12

2020-11-14

RaumZeitLabor

Hurra, wir (k)leben weiter! Aktion zur Remote Chaos Experience (rc3)

Auch wenn ohne Mateflaschenumfallgeklirr, Einlassbändchenkratzrandertastung, Bratnudelstandfettdampfklamottengeruch und Messeglaswurstecho sicher einiges an Congressfeeling fehlen wird – (Aufkleber-)Goldschatzauffindungsimpressionen sollen trotzdem nicht zu kurz kommen.

Hierfür haben wir die Aktion „Schicke Sticker von den Sticker-Schickern“ ins Leben gerufen.

Wenn ihr Lust auf eine klebrige Überraschung habt, könnt ihr uns bis zum 4. Dezember 2020 einen adressierten und frankierten Rückumschlag (am besten C5) an unsere neue Adresse senden und wir schicken euch Aufcccleber und andere Goodies zurück.

Die Aktion wird – außer der Umschläge – kostenlos sein. Dennoch freuen wir uns natürlich über allgemeine Spenden auf unser Konto oder einen Beitrag via PayPal, um alle Kosten zu decken und den möglichen Rest in unseren neuen Space zu investieren.

Bei Fragen könnt ihr euch auf Twitter und Mastodon an uns wenden.

at 2020-11-14 00:00