Skip to content

Instantly share code, notes, and snippets.

@gvolpe
Created December 17, 2020 09:23
Show Gist options
  • Save gvolpe/8bc75f89f3a58f596dfd556be54c5387 to your computer and use it in GitHub Desktop.
Save gvolpe/8bc75f89f3a58f596dfd556be54c5387 to your computer and use it in GitHub Desktop.
[Dec16 23:16] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -2!
[ +0.000908] gmc_v9_0_process_interrupt: 24 callbacks suppressed
[ +0.000010] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000007] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f0000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00640C51
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x6
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x1
[ +0.000007] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f1000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000007] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f2000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000006] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f3000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000002] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000004] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f4000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000007] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f5000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000004] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f6000 from client 27
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ +0.000007] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process X pid 1073 thread X:cs0 pid 1095)
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00008001018f7000 from client 27
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000002] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ +0.000000] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000001] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
@gvolpe
Copy link
Author

gvolpe commented Jan 24, 2021

Another one from today on Linux 5.10.7...

Jan 24 09:59:15 tongfang-amd kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
Jan 24 12:16:03 tongfang-amd kernel: amdgpu_cs_ioctl: 79 callbacks suppressed
Jan 24 12:16:03 tongfang-amd kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -2!

@gvolpe
Copy link
Author

gvolpe commented Jan 27, 2021

This is on Linux 5.10.9.

Jan 27 13:11:36 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:11:36 tongfang-amd kernel: retire_capture_urb: 79 callbacks suppressed
Jan 27 13:11:44 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:11:57 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:11:57 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:11:57 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:11:57 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:12:09 tongfang-amd kernel: xhci_hcd 0000:04:00.4: ERROR unknown event type 37
Jan 27 13:12:09 tongfang-amd kernel: retire_capture_urb: 17 callbacks suppressed
Jan 27 13:12:39 tongfang-amd kernel: amdgpu_cs_ioctl: 1 callbacks suppressed
Jan 27 13:12:39 tongfang-amd kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -2!
Jan 27 13:12:59 tongfang-amd kernel: GpuWatchdog[11742]: segfault at 0 ip 00007f48720a2fa6 sp 00007f4869b8a030 error 6 in libcef.so[7f486dd95000+75cd000]
Jan 27 13:12:59 tongfang-amd kernel: Code: 89 de e8 cd 2f 63 ff 80 7d cf 00 79 09 48 8b 7d b8 e8 0e 1f 5f fe 41 8b 84 24 e0 00 00 00 89 45 b8 48 8d 7d b8 e8 9a 2e cf fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
Jan 27 13:13:05 tongfang-amd kernel: GpuWatchdog[1228]: segfault at 0 ip 000055c760a18107 sp 00007f07ffa1c430 error 6 in signal-desktop[55c75d837000+53d6000]
Jan 27 13:13:05 tongfang-amd kernel: Code: 7d b7 00 79 09 48 8b 7d a0 e8 35 52 d3 fe 8b 83 00 01 00 00 85 c0 0f 84 91 00 00 00 48 8b 03 48 89 df be 01 00 00 00 ff 50 68 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 17 bc 6f 02 01 80 7d 87 00

@ostrosablin
Copy link

I've also experienced same error twice to date on Gentoo and recent kernels (since about ~2 months). Last one has occured today. Since you've reported this issue to some bugtrackers, guess my feedback might be useful, too.

Looks like it's a generic driver issue, since my GPU seems to be different from yours (previous generation).

Kernel: 5.10.15-gentoo
GPU: AMD Radeon RX 480

In my case, I don't use suspend, and it triggers quite rarely (so, perhaps, suspend increases chances of bug triggering?). Mouse cursor still moves, but other than that - desktop completely locks up. Kernel hotkeys, such as Ctrl+Alt+F1 also stop working (equivalent command chvt 1 also hangs indefinitely), which prevents to attempt any fix from the freezed machine directly. But since I have ssh on machine, I still able to interact with it. Hanging chvt seems to indicate that kernel ioctl hangs completely. Which, also suggests that not only userspace is broken - something in kernel seems to lock-up too.

Also, page faults don't occur on my installation, when desktop freezes.

Ultimately, there is a way out of lock-up (at cost of killing all running graphical processes). Not great, but if you have something running on the machine in ssh session/tmux that you want to keep running - it might be better than hard reboot. You can use through ssh:

loginctl terminate-session <your GUI session ID>

This will try to gracefully terminate userspace session. And as soon as it's finished - you can restart X Window System and it should work, machine should un-freeze.

Actually, it's possible (but I have no proof yet) that this might be caused by some single program running in GUI session and killing it will unfreeze the desktop. So, perhaps, GUI processes could be killed one by one through SSH to see if desktop un-freezes (will try this on next freeze).

In the past (couple years ago) there was a bug in mesa git master, where GUI Java programs triggered bug, causing desktop to freeze. Killing the java process was sufficient to unfreeze the desktop without terminating entire GUI session. And now that I think about it, during both freezes I had JetBrains IDE (which is Java-based) running, though it might be coincidence and unrelated to this particular bug.

@gvolpe
Copy link
Author

gvolpe commented Feb 26, 2021

@tmp6154 I reported the issue here and I've been pointed to a Mesa bug that might be related. There was a PR (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9006) allegedly fixing the issue.

I'm now running Linux Kernel 5.11.0 with a Mesa build from master that includes this fix but since I've been traveling I had to turn off the laptop a couple of times, couldn't test it properly. I will start testing it this Sunday.

@ostrosablin
Copy link

@gvolpe Oh, that's great! Will try fixed version as well (both Mesa and newer kernel) and test them as well.

@gvolpe
Copy link
Author

gvolpe commented Mar 2, 2021

I bumped into a different issue while testing Linux Kernel 5.11.0 and a Mesa version with that fix.

Mar 02 10:09:32 tongfang-amd kernel: amdgpu 0000:04:00.0: amdgpu: 000000002db3ee17 pin failed
Mar 02 10:09:32 tongfang-amd kernel: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

I reported it here: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment