The machines involved in this report are:
Back in 2018 I got a pair of Raspberry Pi 3B's, but had a great deal of trouble making the OpenSuSE ARM port work on them, so they ended up in the cold palace. See here for the history of the Raspberry Pi's in 2018, and here for a foray into a desktop replacement or thin client role (2020).
In 2021 I hit a bug in the MediaTek mt76x2u driver (x86_64) for mt7612u
in the Terow ROW02FD Wi-Fi NIC on Jacinth. (See
Wi-Fi Evolution on CouchNet in 2021
for info about this NIC and
reasons for picking it.)
The bug fortunately stays hidden on aarch64
(Raspberry Pi), so I moved Holly into Jacinth's cabinet and
transplanted the Terow NIC to it.
Now I have no desktop machine and I also have a maintenance problem: current
versions of Tumbleweed for ARM do not put out video on HDMI, so I involuntarily
have a headless RPi (no monitor, no
keyboard, no mouse). See
Holly Hosed, Tested Backup
about a bad update that made me reinstall
the image and lose the video.
Since Holly is mission critical (Wi-Fi access point), I can't just do random interventions and reboots to bring back video. Instead I'm going to resurrect the other RPi (Piki), which will start out as a clone of Holly, but at the end the lessons learned in making it go will be propagated back to Holly.
I mount discs by label, since a UUID is ugly and impossible to type by hand. Since it often happens, for repairs or investigations, that one machine has two or more entire discs mounted at the same time (e.g. ROOT, EFI and SWAP from both Holly and Piki), I've assigned a number to each disc which becomes a suffix on the label. Holly is on disc 03 and Piki is on 12.
These are the local configuration management (LCM) scripts mentioned in this document, and some miscellaneous scripts:
Hosts are members of various sets such as architecture, operating system versions, and roles. The hostgroup command and/or Perl module evaluates a set expression (intersection, union, etc) and produces a list of hosts in that set.
When a new machine is set up or after a major upgrade, post_jump installs locally managed configuration files, installs wanted but missing packages, removes unwanted packages, and numerous other nitpicky details that otherwise would get forgotten. It is named for the post_jump customization script used when installing Sun Microsystems' SunOS and Solaris operating systems.
It installs the appropriate repo definitions (/etc/zypp/repos.d/*.repo) for the target host's OS version and architecture, particularly non-SuSE repos like PackMan. I also have a caching proxy (squid) for package files, and audit-repos appends its URL to baseurls that are supposed to be accessed through it. Thus an update package to be installed on multiple hosts is only once downloaded (slowly) across the net.
/m1/custom/couchnet.sel and extra.sel are lists of specifically wanted packages, referred to as keystone packages, in Debian terminology, or @world for Gentoo. The files include the hostgroups where packages are wanted. audit-pkgs can install any missing keystone packages, or can remove unwanted ones (not removing dependencies of keystone packages), or can do an online update or dist-upgrade.
It uses /m1/custom/scripts.dat and scripts.extra to determine which service units (daemons, sockets, timers, etc.) are wanted, per hostgroups, and enables or disables them as appropriate.
SuSE's zypper is pretty good about keeping keystone packages installed if you originally install from the installation disc, but I had trouble with an image as the package source; zypper wanted to do the dist-upgrade for Tumbleweed by removing almost all the packages on the host including the kernel. Troubleshooting is hard and this may not be SuSE's fault, but I decided to get the keystone concept under my control by creating a metapackage, per host, that requires all keystone packages for that host. mkkeystone generates that package. It would have been more SuSE-like to create a keystone pattern, but my scripts are aligned to handle packages.
Information about each host is stored in a database, including its name, IP address(es), hostgroups and SSHFP records. From the database, /etc/hosts, DNS zone files (with DNSSEC), etc. are derived.
This isn't exactly a LCM script, but it extracts the present time from the maxter site and steps the local clock to match, after sanity checks. It gets about 20msec accuracy. This is essential on a RPi which has no realtime clock, so when booted or when post_jump starts, its clock will be off by days if not months. I normally run chrony, but on a new installation it will not have its LCM configuration file and will not be enabled, so post_jump has to run remote-time.cgi very early.
Normally chkstat sets permissions from the files in /etc/permissions* and /usr/share/permissions, but it makes judgments whether files have unexpected security issues, and if so it refuses to fix them, specifically their ownership. So I re-implemented chkstat without the paranoia.
When you install OpenSuSE from an installation disc or an image, the
numeric UIDs and GIDs are chosen per the order in which the packages are
installed, which is most likely unique per installation and certainly will
not match what's in my LCM files. The reown
script stats every
file on the machine, translates the file's UID and GID to alphabetic using
the old passwd and group files, translates them back using the new files,
and changes the file's ownership if wrong.
To mitigate the Logjam family of exploits, I rebuild the Diffie-Hellman groups monthly for SSH and OpenSSL. This takes a long time, up to 1/2 hour on the RPI's. Normally the housekeeping scripts run at midnight, when something long-running will not bother the humans.
ssync is my wrapper for rsync. Normally rsync just copies the files, with no feedback about what was copied. I add logging output by inserting this option: --log-format="%o %f"
This is pretty much identical to the procedure in
Holly Hosed, Tested Backup
.
Locate and download the current XFCE image for aarch64 (or whichever desktop environment you prefer). https://en.opensuse.org/HCL:Raspberry_Pi3 (hardware compatibility list) has a link to http://download.opensuse.org/ports/aarch64/tumbleweed/appliances/openSUSE-Tumbleweed-ARM-XFCE-raspberrypi.aarch64.raw.xz; if the link is broken, which isn't too rare, dig around for the latest version in the containing directory. Actually, for verifying the signature, it's a lot easier if you download using the URL with the file's full name. Today's size: 1.12e9 bytes (1.12 Gb) compressed, took 283sec, 4.2Mb/s.
Also download the SHA256 checksum from $URL.sha256 and the signature (with SuSE's key) called $URL.sha256.asc . To check the signature, first obtain a trusted instance of SuSE's package signing key on your keyring. (Tracing it back to a trust anchor is beyond the scope of this writeup.) Then:
gpg --verify $file.sha256.asc $file.sha256
Then to check the image itself:
sha256sum -c < $file.sha256
It should replyOK(not too swiftly; the file is big).
A commonly seen error is to get the wrong filename. The download site has a symlink to the current version, but $file.sha256 contains the actual filename separated from the hash by two blanks, so it's better to use the real filename in the download URL. (Or rename what you mistakenly downloaded.)
If you don't have the signing key in your keyring, it tells you which
key was used; in this case using RSA key B88B2FD43DBDC284
. An
expert in cyber warfare would laugh at the following suggestion, but if you
trust that your distro hasn't been tampered with, look in
/usr/lib/rpm/gnupg/keys/ for a key file with that key ID: currently
gpg-pubkey-3dbdc284-53674dd4.asc (the first hex number is the last 8 hex
digits of the key ID). The file is ASCII and gives the key's owner, which
you may or may not believe. More robust ways to trace the key to a trust
anchor are way out of scope for this document.
xzcat $image.xz | dd bs=4M of=/dev/mmcblk0 iflag=fullblock oflag=direct status=progress
sync
Rate ~17Mb/sec. Total (uncompressed): 5.91Gb, 373sec, 15.8Mb/sec.
At first boot, partition 3 will be expanded to fill the card (not swift). The partitions are labeled per their roles. If you have duplicate partition labels, you will need to do something special.
These command lines are going to re-label EFI, SWAP and ROOT. If a host partition has the same label, it isn't clear which wins, but avoid risks: figure out the device path of the partition on the Raspberry Pi's card (not the host), such as /dev/mmcblk0p3, and use that when re-labeling.
I expect that during troubleshooting I'm going to have to re-image Piki several times, so I set up a directory (/s1/holly/piki-conf/) that I can just copy (rsync) onto the card. Here's what's in it:
Piki needs its own SSH host keys and SSHFP records. To generate the host keys:
To generate the SSHFP records:
As part of putting the new SSHFP records in hostdata.db, Piki's IP address assignment will be revived and added to /etc/hosts, which needs to be copied into Piki's conf file directory, as well as being installed on all the other hosts.
Now that Jacinth knows Piki's IP address(es), rebuild kea-dhcp{4,6}.conf so Piki gets its proper IP address by DHCP.
The various services that use TLS for transport, such as Apache2 and Postfix, need Piki's host key. Resurrect it from the CFT Certificate Authority and copy the private and public keys into the piki-conf directory in /etc/ssl/{private,hostcerts}. It should be possible to run /etc/ssl/certsetup.sh specifying an absolute path to the cert just deposited, to produce the chain files and symlinks which that script generates.
Now we're ready to copy the configuration files onto Piki's card:
mount /dev/disk/by-label/ROOT-12 /mnt
ssync -a -O /s1/holly/piki-conf/ /mnt/
umount /mnt
I wrote the script reown
that changes all files' UID and GID
to match the alphabetic-numeric mapping in /etc/passwd and /etc/group that
was just copied onto the card (the originals being saved). I had planned
to let post_jump run it, but I got cold feet and decided to reown on the
host, before booting Piki.
Move the SD card into Piki, and turn on power so it can boot.
An image's root partition is only slightly bigger than the minimum to
hold the software. But when it is first booted, code in the initrd expands
the root partition to fill the rest of the media. Booting Piki took 5
minutes; don't panic. Due to the keys in piki-conf, you should be able to
connect to the machine by ssh with publickey authentication. If you do log
in to the physical console, the userID is root
and the password is
linux
. You should immediately change it to your normal root
password, even though the /etc/shadow file about to be installed will also
have the (hash of the) normal root password.
The next step should be to run post_jump on the distro master site. It has these major steps. But improvements were needed in some places, tagged with ** and discussed more in the next section.
reownto map the numeric UID-GID of installed programs to an alphabetic name as known to the system as installed, and then to map this back to numeric per the /etc/passwd and /etc/group that are about to be installed.
professionally.
When Linux is installed from an image or installation disc, the
numeric UIDs and GIDs are determined by what was installed
and their order. They won't match the UIDs and GIDs in the conf
files soon to be installed. My script reown
will sync them.
Permissions are normally fixed by chkstat. But it has a nasty
feature
that it is suspicious about security implications of file
and directory ownership, and it refuses to re-own some directories.
I re-implemented chkstat in my script chkstat.J, minus the paranoia,
and both are run after packages are installed or updated.
The Raspberry Pi image uses NetworkManager, but I put Wicked on my RPi's, more practical on non-interactive machines, and NetworkManager was removed. One time there was a crash during the online update (probably because the lease expired for the IP address, and NetworkManager didn't renew it because it had been removed), and when I tried to reboot, the net would not come up, a total loss on this headless machine. So in post_jump I moved audit-scripts just before the removal step, so NetworkManager would be forgotten and Wicked would get enabled. (So why didn't I just plug in a monitor and fix it by hand? Read all about that issue in this very document.)
Housekeeping scripts mostly run quickly, but two are long: mandb (index
of man pages for quick finding) and generating custom Diffie-Hellman groups
to resist the Logjam family of exploits. Together they take about half an
hour on a RPi, which is very annoying when you're doing a development
installation. Without mandb being run, the man
command can fall
back to searching the manpath, so mandb can be deferred until the normal
daily housekeeping run at midnight. The Diffie-Hellman group files (for
OpenSSL and for SSH) are installed from the master site, which is not best
practice but is better than using the builtin default group, so again,
generating them can be deferred. (I rebuild them monthly, independently on
each host, and the master instance will be recognized as being out of
date.)
The image uses u-boot-rpiarm64 so it can work on the broadest range of RPi's. But mine are Raspberry Pi 3B's. I'm not sure what exactly are the differences between u-boot-rpiarm64 and u-boot-rpi3, but it seems more prudent to use the latter, being the closest match to the actual hardware. This selection is made in /m1/custom/couchnet.sel (LCM).
sshd got into a mode where you do ssh to Piki and it does nothing, sans error messages. I pulled Piki's plug and brought the SD card back to Iris.
less.
ssh-keygen -Ato generate missing host keys, which fails, and sshd never starts. (So how could there be no error message, connection refused, from ssh?)
permanently: ln -s /dev/null /s1/holly/piki-conf/etc/systemd/system/multi-user.target.wants/nscd.service
This problem occurred part way through post_jump, before audit-scripts was run, which would have disabled nscd. In the past I've had various mysterious problems with nscd, and while it's a great concept, I've made a policy to turn it off. But to investigate this issue I started it again by hand and ran strace.
Another similar issue where sshd accepted a connection but did nothing;
the message in the journal was:
sshd[5034]: error: PAM: pam_open_session(): Cannot make/remove an entry for the specified session
Shortening a long story: PAM login uses pam_ldap, which relies on
nslcd (LDAP cache daemon), which was not running. Enabling and starting
nslcd fixed this problem.
Google search for: pam_open_session "Cannot make/remove an entry for the specified session"
Forum post on StackExchange:
OP MattBianco, 2014-08-26. The message means that some PAM module returned
some error. Very heterogeneous causes; many modules can return many kinds
of fatal errors. SELinux enforcing (and
misconfigured) is a common problem causing this message.
ConsoleKit used to handle giving permission for devices like the keyboard to a user when logging in. But ConsoleKit has been deprecated for a long time in favor of systemd-logind, and I added pam_systemd to my PAM scripts but never removed pam_ck_connector. It's available for x86_64, which is why I wasn't motivated to get rid of it, but it's not available for aarch64. I finally removed pam_ck_connector.so from my PAM scripts.
pam_quality has replaced pam_cracklib on x86_64, but neither of these
are available for aarch64 as far as I can see in the package searcher,
nor per zypper info pam_pwquality
.
PAM is good about missing modules, not dying; it just means that I can't
change my password on a Raspberry Pi, and there's a nonfatal error message
on each login. Hiss, boo, but I think I can't or shouldn't do anything
to fix this issue.
Update: it has reappeared in repo openSUSE-Tumbleweed-Oss! Installed on Piki and Holly.
texlive is soooo screwed up! Something's posttrans script complains
that there are no formatting packages, only binary code. Why do we even
have it on the RPi's? I did zypper --non-interactive remove --dry-run
texlive
, and these packages were to be removed (excluding those named
texlive*): keystone-piki texinfo . Obviously texinfo is the culprit. It
is a keystone package; nothing else depends on it.
I edited couchnet.sel, wanting texinfo only on hostgroup jim
,
specifically Jacinth and Xena (my laptop). The texlive family was wanted
on hostgroup user
, changed to jim
also.
I cleaned up the package signing keys, tossing those that did not sign any packages, and rearranging couchnet.sel to match. This was a big job, necessary for getting the installed packages right, but is off topic for this document.
/etc/systemd/system/bluetooth-fw.J.path waits for /dev/ttyAMA1 to appear, then starts bluetooth-fw.J.service which runs btattach on this tty. However, with the present kernel, device tree, etc. /dev/ttyAMA1 no longer appears. Even so, the HCI gets registered as /sys/class/bluetooth/hci0. From /var/log/boot.msg, we have:
Apple AirPods on RPi Bluetooth
-- I spotted this tidbit:
By default, it cannot be found by Bludtooth scanning. You need to
install bluez-auto-enable-devices…
I wonder if this is a BLE
issue and if it is related to the issue of discovering my new Bluetooth
folding keyboard.
The tests are sha512 sums of big files in the buffer cache, and using
dd
to copy a block device to /dev/null. Scores are in kbytes/sec.
Let N = the number of cores.
The total
score is 3/8*sha512 + N/8*sha512 + IOspeed.
Host | Sha512 | x#cores | IOspeed | Total | Rounded |
---|---|---|---|---|---|
holly | 41905 | 167620 | 1785 | 37559 | 40000 |
piki | 31735 | 126940 | 1889 | 28712 | 28000 |
diamond | 201352 | 805408 | 86335 | 219350 | 220000 |
It's interesting that the two RPi's are not identical in CPU speed, even though they are both the same model and are running the same software. I/O speed is not supposed to influence this test, plus Piki's SD card is faster while its CPU is slower.
Diamond is currently the fastest machine in the house; an Intel NUC 11PAHi5. CPU is an Intel Core® i5-1135G7 @2.4GHz and the disc is a NVME SSD (about 2.4e9 bytes/sec; can't tell the actual Chinese vendor).
The above list condenses 7 repetitions of installing Linux on Piki from scratch. Finally all the steps get done error-free (cross fingers), I have a backup if graphics fixes trash the card and Piki is now ready for interventions in the area of graphics.
In hindsight, I have two related issues: /dev/dri/card0 is missing, which
the modesetting
X-Server driver needs for communicating with the
graphics processor, and the X-Server can't fall back to the EFI framebuffer
because the fbdev driver is not installed. The first priority is to
activate anything capable of displaying video, and then to revive
/dev/dri/card0.
The first thing I did was to install xf86-video-fbdev. This produced normal graphical output on the monitor. It's not very fast, but it's a whole lot better than nothing. I installed package glmark2, a graphics benchmark for OpenGL; details below. As a positive control I ran it on Xena with Intel® UHD Graphics; the overall score was 2019, looks like the average of frames per second over 32 tasks which are tuned to all give similar speeds on particular reference hardware. On Piki, with software rendering (no acceleration), fbdev's overall score was 22. Which is usable, though you won't be playing video games on the framebuffer.
So why wasn't xf86-video-fbdev installed in the first place? Because it wasn't configured as a wanted package, and was tossed during post_jump. Not a brilliant maneuver. Fixed.
The Hardware
Compatibility List page for Raspberry Pi-3 in the
Troubleshooting - Graphics Acceleration section, recommends to install
xf86-video-fbturbo to get the framebuffer working. (And a configuration
option has to be added to load the relevant module.) I tried it and ran
glmark2: the overall score was still 22 and the individual tests' scores
were almost the same as with fbdev. Glmark2 may not have been using the
aspects (moving and scrolling windows) that fbturbo particularly targets.
However, fbturbo on Holly goes a lot faster, 137 FPS, but Holly has no
monitor and has related configuration differences which likely give a speed
improvement
, so the worth of fbturbo is not assured. However, I
kept fbturbo configured on both hosts.
Further in the Graphics Acceleration section, package Mesa-dri-vc4 was
mentioned. As with fbdev, I installed it (and added it to couchnet.sel)
and from etc/X11/Xorg.conf.d/20-kms.conf I removed
Option "AccelMethod" "none"
. However, /dev/dri/card0 did
not appear.
How to use the EFI framebuffer if VC4 is not delivering output: In /boot/efi/config.txt replace dtoverlay=vc4-kms-v3d,cma-default with dtoverlay=disable-vc4 Alternatively or in parallel, add to kernel cmdline: modprobe.blacklist=vc4 (I didn't do either of these -- yet.)
To activate graphics acceleration: (I think they're talking about
text mode, either full screen or in non-graphic windows). Install package
xf86-video-fbturbo and in /etc/X11/xorg.conf.d/99-fbturbo.conf put:
Section "Module" \n Load "shadow" \n EndSection
For accelerated graphical graphics, install Mesa-dri-vc4 and toss Option "AccelMathod" "none" from 20-kms.conf (which I don't have). Apparently you need both interventions.
Finding the packages: xf86-video-fbturbo and Mesa-dri-vc4 are both
available in the SuSE download repo, but the latter is marked
experimental
. Neither have any dependencies. Trouble free
installation. I need to add these to couchnet.sel (if they work).
20-kms.conf is gone; no AccelMethod overrides anywhere else either. I added /etc/X11/xorg.conf.d/99-fbturbo.conf per the above instructions, but /usr/share/X11/xorg.conf.d/99-fbturbo.conf also exists and defines a device with the fbturbo driver. Starting out evaluation with both of these files active.
I restarted display-manager, which restarts the X-Server. /var/log/Xorg.0.log had these items:
modesettingdriver if the screen had been DRI2 capable.
We have /usr/lib64/xorg/modules/drivers/ = *_drv.so where * = ati dummy fbturbo modesetting radeon. Most, except fbturbo, come from package xf86-video-*. None of these are vc4, and fbdev and vesa are also missing.
Update: The modesetting
driver (which we have) is the one that
should be used, but it requires DRM (direct rendering), cued by
/dev/dri/card0, and will refuse to load if it's missing.
Sidetrack, I see /boot/efi/overlays/disable-vc4.dtbo Digging in /boot/efi/config.txt , this is not included. Whew! And the vc4 kernel module and dependencies are loaded, see /proc/modules.
"zypper info 'xf86-video-*'" produces these values for *: amdgpu ark ati chips dummy fbdev fbturbo fbturbo-live i128 mach64 mga neomagic nouveau nv qxl r128 savage sis sisusb tdfx v4l vesa voodoo (fbturbo-live description == fbturbo, but from a git source.) None of these are vc4.
Next try: add to /boot/efi/extraconfig.txt: dtoverlay=disable-vc4 Yay! Items that are working:
Neatening things up for installation on Holly: in /etc/lightdm/lightdm-gtk-greeter.conf add or change to background=/m1/custom/background.jpeg
Non-RPi hosts show their custom backgrounds. When I set this up initially, Piki showed its background (the frog) but it faded into SuSE wallpaper with Geeko. Once the user (the user who previously logged in) sets his desktop background, the lightdm greeter no longer uses the Geeko backgrojnd, leaving the configured custom backgroujnd visible. This may or may not have to be done separately for VNC and for the physical console.
Holly (VNC) shows a black screen at 1024x768px, probably because I haven't yet done any of these mitigations on Holly.
I'm not going to install from scratch; I'll do post_jump steps by hand. These snapshots were saved:
Diffs between Piki (left) and Holly (right) excluding obvious ones like host keys. These descriptions were updated after several interventions, not showing the state when I first turned to this step.
Key interventions to sync Holly with Piki:
Google found nothing for xf86-video-vc4
with the quotes.
The Gentoo wiki page about
Raspberry Pi VC4
produced some useful information, including a key quote:
The Raspberry Pi 3 VC4 driver is NOT available on 64bit ARM. The
RPi Foundation has stated 'we are not working on this, and are unlikely
to do so in the near future.' Using the open source vc4-fkms-v3d
driver is recommended.
(This is in extraconfig.txt.) Here's
their checklist:
Jimc finds: Red Hat has a package glx-utils that provides just glxgears and glxinfo, and another package mesa-demos (lower case) with more demos for the Mesa direct rendering libraries. Both of these are available on the SuSE Build Service as community packages, for x86_64 only. But OpenSuSE also has Mesa-demos (capitalized) that includes glxgears and glxinfo and many more, again as a community package (and I recognize the name of the SuSE developer who maintains it), for both x86_64 and aarch64 (ARM). Many but not all of the demos depend on packages glew and libGLEW2_2 or 2_1, depending. I'm installing Mesa-demos on the RPi's and using only the ones, like glxgears, that don't need libGLEW2_x. Someday I should recompile Mesa-demos from source on Tumbleweed, both architectures. But I'm not going to hold up this project to get the complete set of demos.
Normally glxgears is synced with vertical blanking, so the reported
frame rate will equal the monitor's vertical refresh rate. To make it run
as fast as possible, set this environment variable: vblank_mode=0 glxgears .
Also useful is glxgears -info
but it
shows a gazillion supported extensions. Note, glxgears is not designed
as a benchmark and particularly should not be used for purchasing
decisions, only to see if accelerated graphics is doing anything.
Frame rates:
When the greeter is showing, the screensaver runs and puts the monitor in black screen mode (if so configured) but doesn't turn it off with DPMS, despite power management configuration to do that. (Workaround: monitor's power button.) A lot of people complain about this; so far I haven't found a fix.
Re-learn how to run glmark2, and get a baseline for non-accelerated X-Windoes.
Then try to activate acceleration on Piki.
Installing glmark2 on Piki: It's for OpenGL 2.0, but 3.0 is coming soon. OpenGL ES is a nonproper subset of full OpenGL intended for embedded systems (think Android), and it has a matching glmark2-es. Even though the RPi is basically a cellphone motherboard, you run a desktop OS on it and you have full OpenGL, and should test it with glmark2, not glmark2-es. glmark2 is already installed on the x86_64 hosts (except virtual ones). Its info says that it uses only ES compatible API. (So what's the difference from glmark2-es2?)
Results from glmark2 on various hosts. FPS
is frames per
second, reporting the slowest and fastest values with the corresponding
test names. Score
seems to be the average FPS over 32 tests.
Output from time
is also given: elapsed, user and system times
in secs. The user and system times are for the client, not counting
the server. and can be greater than the elapsed time because the CPU
has multiple cores.
Jimc's report from 2018: https://forums.raspberrypi.com/viewtopic.php?t=223592
Steps from that howto:
Edit /etc/X11/xorg.conf.d/20-kms.conf and comment out Option "AccelMethod" "none".
Install package Meta-dri-vc4 which provides the direct rendering module for the X-Server.
In /boot/efi/extraconfig.txt you need dtoverlay=vc4-kms-v3d or dtoverlay=vc4-fkms-v3d ("fake" KMS). There are varying reports which variant works better or doesn't work at all. 2018-era forum posts suggest that fkms works better for streaming video, so I'm trying that one first.
The kernel command line needs a nonzero cma allocation. Based on another forum post I'm using cma=300M (unit of megabytes is required). Different people recommend different values. I suspect without proof that this is an upper bound; the driver is known to expand and contract video RAM dynamically. For this driver gpu_mem is irrelevant and can be left at the default (32 on SuSE, 16 on Gentoo, in megabytes).
Reboot to get the correct dtoverlay and cma value.
Doing this on Piki:
Do I have the needed device tree overlay installed? Yes, /boot/efi/overlays/vc4-fkms-v3d.dtbo is a copy of /boot/vc/overlays/vc4-fkms-v3d.dtbo which is owned by raspberrypi-firmware-dt-2022.01.19-1.1.noarch
Mesa-dri-vc4 version 21.3.6-301.1.aarch64 is installed, providing /usr/lib64/dri/vc4_dri.so , the direct rendering module.
About /etc/X11/xorg.conf.d/20-kms.conf: It currently (2022) isn't
part of package Mesa-dri-vc4, identified as Eric Anholt's
driver
. It creates a Device using the modesetting
driver.
The 2018 version had 'Option "AccelMethod" "none"'
which you need to remove to turn on 3D acceleration, but in the absence
of this file, nothing needs to be removed.
xf86-video-fbdev and fbturbo are installed but fbturbo is not configured for use.
/boot/efi/extraconfig.txt has dtoverlay=vc4-fkms-v3d
/boot/grub2/grub.cfg and /etc/default/grub in the Linux command
line include cma=300M
, as does /proc/cmdline.
Rebooting (140 sec) and checking out results:
failed to get clock: -2with the result that
vc4-drm: probe of soc:gpu failed with error -2. Very likely this is the major culprit.
I'm re-enabling fbturbo on Piki.
What I should have done earlier is, put the RPi-OS image on a card and see whether it does 3D acceleration, and if so, what they do right that I'm not doing.
RPi-OS download page. There are about 6 versions; I want
Raspberry Pi OS with desktop (64bit),
Size: 1.14Gb compressed,
4.16Gb uncompressed. They include the SHA256 hash as text.
Copy to the SD card (19.7Mb/sec)
unzip -p $file.zip | dd bs=4M of=/dev/mmcblk0 iflag=fullblock oflag=direct status=progress
On the first boot it's slow because it's resizing the root partition.
It came up with (correct) IPv4+6 addresses and DNS, courtesy of DHCP and router advertisements.
I'm using it exactly as installed, just with my locale and user password per the provided setup app. Out of the box, 3d acceleration is not enabled.
Chromium is the official web browser. I wanted to install glmark2 and/or Mesa-demos with glxgears etc, but could not find them. There's a Snapcraft package called glmark2-example, some kind of IoT demo so it says, but the description is skimpy and I was hesitant to install it without knowing more about it.
To test 3D acceleration I played several of my video test files using VLC. Without 3D acceleration, all of them used 200% to 300% CPU (multiple cores). All could not keep up with the 25FPS frame rate even though the VLC window was small, probably 640x480px. Originals were 1080p (1920x1080px).
Checking in /var/log/Xorg.0.log: /dev/dri/card0 was opened and
modesetting
was the driver used. Glamor was disabled.
AIGLX says: Screen 0 is not DRI2 capable, using swrast.
I tried raspi-config. The GUI version doesn't have the Advanced
tab that changes the device tree overlay; I used the ncurses one, just
raspi-config
with no command line arguments, in an Xterm.
Legacy
means vc4-disable, the installation default. Full
KMS
is what yu want; they've deprecated fake KMS. You also have to
tell it to activate 3d acceleration (i.e. remove Option AccelMethod
None).
Repeating the video test files on VLC: it was definitely better. CPU loads were 100% to 150% and a lot fewer frames were delayed. But it's not what you would expect on a $700 modern laptop: on Xena doing the same tests, CPU never got over 25% and performance was totally smooth.
Conclusion: It's likely that these tests showed the best the Raspberry Pi 3B can do, and the result is not good enough to make it worthwhile to do a giant project to get DRI working on the RPi 3B. Probably a RPi 4 would be worth it. So I'm closing the 3D acceleration part of this project.
Reverting (on SuSE) to vc4-disable, with the fbturbo driver.