Page: Previous  1, 2, 3  Next

Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
sgfxi did run successfully, but after I reset the computer, I got the same green tinted hang on console. So I assume it's an nVidia driver problem.

Luckily, sgfxi doesn't install to all kernels so I was able to boot into a previous one. Hmmm.... DKMS is dangerous.

:: Code ::
$ inxi -bxx
System:    Host: ronin Kernel: 4.14.0-11.1-liquorix-amd64 x86_64
           bits: 64 gcc: 7.2.0
           Desktop: Xfce 4.12.4 (Gtk 2.24.31) dm: lightdm
           Distro: Debian GNU/Linux buster/sid
Machine:   Device: desktop Mobo: ASUSTeK model: ROG STRIX B350-F GAMING v: Rev X.0x serial: N/A
           UEFI [Legacy]: American Megatrends v: 3401 date: 12/04/2017
CPU:       Quad core AMD Ryzen 5 1500X (-HT-MCP-) arch: Zen rev.1
           speed/max: 1550/3500 MHz
Graphics:  Card: NVIDIA GK208 [GeForce GT 730]
           bus-ID: 08:00.0 chip-ID: 10de:1287
           Display Server: x11 (X.Org 1.19.5 ) driver: nouveau
           Resolution: 1280x1024@60.02hz
           OpenGL: renderer: NV106
           version: 4.3 Mesa 17.3.1 (compat-v: 3.0) Direct Render: Yes
Network:   Card: Intel I211 Gigabit Network Connection
           driver: igb v: 5.4.0-k port: e000
           bus-ID: 03:00.0 chip-ID: 8086:1539
Drives:    HDD Total Size: 370.1GB (36.8% used)
Info:      Processes: 232 Uptime: 1:31 Memory: 2840.3/7976.9MB
           Init: systemd v: 236 runlevel: 5 Gcc sys: 7.2.0
           Client: Shell (bash 4.4.121 running in xfce4-terminal) inxi: 2.3.45

Back to top
damentz
Status: Assistant
Joined: 09 Sep 2008
Posts: 1135
Reply Quote
For future reference, I would recommend using nvidia drivers from the experimental branch if they exist.

:: Code ::
$ apt-cache policy nvidia-driver
nvidia-driver:
  Installed: 387.34-2
  Candidate: 387.34-2
  Version table:
 *** 387.34-2 400
        400 https://mirrors.kernel.org/debian experimental/non-free amd64 Packages
        100 /var/lib/dpkg/status
     384.111-1 500
        500 https://mirrors.kernel.org/debian unstable/non-free amd64 Packages
     384.98-3 450
        450 https://mirrors.kernel.org/debian testing/non-free amd64 Packages


In this case, the 387.34 driver has been in experimental for quite a while now, and has compatibility with the 4.14 kernel series. As far as 384.98, I have no idea, but it's supposed to work. Not entirely sure why it wasn't building for you.

:: Code ::
echo >&2;   \
echo >&2 " ERROR: Kernel configuration is invalid.";   \
echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\
echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it.";   \
echo >&2 ;   \


Looking online, this is the same message you get if you attempt to build a module without your kernel headers set up properly. On brand new installs, I've never seen this occur. Maybe something's busted on pure Debian at the moment?

I'd recommend using Siduction if you want to use Debian Unstable / Testing - they seem to have figured out all these problems ahead of time.

Mirror on liquorix.net for ISO files: liquorix.net/siduction/iso/
Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
The issue started with 384.98, but continues with 387.34.

The drivers now usually build okay, and I can even use them right after sgfxi. BUT, next time it boots up, the drivers crash. I managed to enable systemd logs.

They crash the Liquorix kernel, but on the Sid kernels they just fail to VESA.

:: Code ::
Jan 08 22:47:03 ronin kernel: divide error: 0000 [#1] PREEMPT SMP
Jan 08 22:47:03 ronin kernel: Modules linked in: btrfs zstd_compress zstd_decompress xxhash xor raid6_pq edac_mce_amd kvm_amd kvm eeepc_wmi asus_wmi sparse_keymap irqbypass rfkill
Jan 08 22:47:03 ronin kernel:  crc32c_intel i2c_piix4 libata i2c_algo_bit dca ptp xhci_pci pps_core scsi_mod xhci_hcd rtc_cmos gpio_amdpt gpio_generic i2c_designware_platform i2c_d
Jan 08 22:47:03 ronin kernel: CPU: 2 PID: 347 Comm: systemd-udevd Not tainted 4.14.0-11.1-liquorix-amd64 #1 liquorix 4.14-14
Jan 08 22:47:03 ronin kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B350-F GAMING, BIOS 3401 12/04/2017
Jan 08 22:47:03 ronin kernel: task: ffff88020eb73500 task.stack: ffffc900014c8000
Jan 08 22:47:03 ronin kernel: RIP: 0010:nvGetClocks+0x176/0x260 [nvidiafb]
Jan 08 22:47:03 ronin kernel: RSP: 0018:ffffc900014cb7f8 EFLAGS: 00010246
Jan 08 22:47:03 ronin kernel: RAX: 0000000000000000 RBX: ffff8802152e2420 RCX: 0000000000000000
Jan 08 22:47:03 ronin kernel: RDX: 0000000000000000 RSI: ffffc900014cb834 RDI: ffff8802152e2420
Jan 08 22:47:03 ronin kernel: RBP: ffff8802152e2518 R08: ffffc900014cb838 R09: 0000000000000000
Jan 08 22:47:03 ronin kernel: R10: 0000000000000068 R11: 00000000002e18c8 R12: 0000000000062570
Jan 08 22:47:03 ronin kernel: R13: 000000000000000e R14: 0000000000000010 R15: 0000000000000008
Jan 08 22:47:03 ronin kernel: FS:  00007fe8ed56b400(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
Jan 08 22:47:03 ronin kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 08 22:47:03 ronin kernel: CR2: 00007f546a15c870 CR3: 000000020df0a000 CR4: 00000000003406e0
Jan 08 22:47:03 ronin kernel: Call Trace:
Jan 08 22:47:03 ronin kernel:  NVCalcStateExt+0x189/0x8e0 [nvidiafb]
Jan 08 22:47:03 ronin kernel:  nvidiafb_set_par+0x47c/0x9f0 [nvidiafb]
Jan 08 22:47:03 ronin kernel:  fbcon_init+0x59e/0x780
Jan 08 22:47:03 ronin kernel:  visual_init+0xca/0x120
Jan 08 22:47:03 ronin kernel:  do_bind_con_driver+0x2ab/0x640
Jan 08 22:47:03 ronin kernel:  do_take_over_console+0x22d/0x470
Jan 08 22:47:03 ronin kernel:  fbcon_event_notify+0x90d/0xa20
Jan 08 22:47:03 ronin kernel:  blocking_notifier_call_chain+0x5d/0x80
Jan 08 22:47:03 ronin kernel:  register_framebuffer+0x1d5/0x2f0
Jan 08 22:47:03 ronin kernel:  nvidiafb_probe+0x6b2/0xa80 [nvidiafb]
Jan 08 22:47:03 ronin kernel:  pci_device_probe+0x1e4/0x340
Jan 08 22:47:03 ronin kernel:  driver_probe_device+0x3d4/0x4a0
Jan 08 22:47:03 ronin kernel:  __driver_attach+0xd1/0xe0
Jan 08 22:47:03 ronin kernel:  ? driver_probe_device+0x4a0/0x4a0
Jan 08 22:47:03 ronin kernel:  bus_for_each_dev+0x57/0x80
Jan 08 22:47:03 ronin kernel:  bus_add_driver+0x191/0x210
Jan 08 22:47:03 ronin kernel:  driver_register+0x78/0xf0
Jan 08 22:47:03 ronin kernel:  ? nvidiafb_setcolreg+0x2a0/0x2a0 [nvidiafb]
Jan 08 22:47:03 ronin kernel:  do_one_initcall+0x46/0x190
Jan 08 22:47:03 ronin kernel:  do_init_module+0x58/0x2f9
Jan 08 22:47:03 ronin kernel:  load_module+0x1dfd/0x2760
Jan 08 22:47:03 ronin kernel:  ? SyS_finit_module+0x91/0xb0
Jan 08 22:47:03 ronin kernel:  SyS_finit_module+0x91/0xb0
Jan 08 22:47:03 ronin kernel:  do_syscall_64+0x64/0x190
Jan 08 22:47:03 ronin kernel:  entry_SYSCALL64_slow_path+0x25/0x25
Jan 08 22:47:03 ronin kernel: RIP: 0033:0x7fe8ece94da9
Jan 08 22:47:03 ronin kernel: RSP: 002b:00007ffe2837a368 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 08 22:47:03 ronin kernel: RAX: ffffffffffffffda RBX: 000055d1362632f0 RCX: 00007fe8ece94da9
Jan 08 22:47:03 ronin kernel: RDX: 0000000000000000 RSI: 00007fe8ecb9f2d5 RDI: 0000000000000010
Jan 08 22:47:03 ronin kernel: RBP: 00007fe8ecb9f2d5 R08: 0000000000000000 R09: 0000000000000000
Jan 08 22:47:03 ronin kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
Jan 08 22:47:03 ronin kernel: R13: 000055d1362591a0 R14: 0000000000020000 R15: 000055d136249140
Jan 08 22:47:03 ronin kernel: Code: f0 0f 00 00 3d 00 03 00 00 74 73 3d 30 03 00 00 74 6c 41 8b 89 04 05 00 00 0f b6 c5 44 0f b6 c9 c1 e9 10 0f af c2 31 d2 83 e1 0f <41> f7 f1 d3 e
Jan 08 22:47:03 ronin kernel: RIP: nvGetClocks+0x176/0x260 [nvidiafb] RSP: ffffc900014cb7f8

Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
I'm trying to report this but nvidia-bug-report.sh is nowhere to be found. Do I have to tear open the .run file in order to get it?

EDIT: So it's in /usr/bin... I think the Debian installer put it in /usr/src
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4127
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
Always try: which <program> before believing it's not there.

nvidia puts it in the right place, debian in the wrong, it's an application, and should be in /usr/bin.
Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
Here is the bug report thread:

devtalk.nvidia.com/default/topic/1028621/linux/384-11-driver-crashes-kernel/post/5232420/#5232420

So, nvidiafb is part of Nouveau, and should have been blacklisted by sgfxi, but it doesn't work for some reason on Liquorix.
Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
So, I pressed e in the grub menu to look at the kernel line. It had nouveau.modeset=0. It had no modprobe lines. I added modprobe.blacklist=nouveau, but that didn't work. So I tried modprobe.blacklist=nvidiafb, and that worked. I was able to run sgfxi and it worked.

So for some reason the way sgfxi blacklists nouveau doesn't work anymore.

For context, the last time I used sgfxi it told me to reboot since it couldn't unload nouveau. It must have attempted to blacklist nouveau, but did it unsuccessfully or incompletely, resulting in nvidiafb being loaded and then crashing.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4127
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
oh, that's interesting, I'm not surprised that this stuff changed once again.

This is a big reason linux can't get any traction on the consumer desktop, these changes happen too often and are really pointless, yet frustrating.

I'll look at sgfxi.
Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
So yesterday I upgraded to Liquorix 4.14.0-15.1 from 4.14.0-14.3.

I started it and graphics wouldn't load. This is normal for a new kernel, I have to run sgfxi again. But oddly enough the nVidia drivers wouldn't load for any other kernel either.

I ran sgfxi and it failed saying it couldn't load the module with no other messages in the installer log. Build was successful. I checked kernel boot parameters which were okay.

Oddly when the nVidia drivers fail, I find that once every few keystrokes, pressing a key will generate 3 or more characters, but this only happens in the terminal, no where else. So it was impossible to log in. So I switched to the Debian kernel and I was able to log in.

So I ran sgfxi -N nouveau

It worked. Right after installing the terminal screen switched to rendered mode. So I knew it worked, even though sgfxi told me I would have to reboot. So I exited sgfxi and ran lightdm.

Chrome kept crashing so I wasn't able to do much. Liquorix kernels still crashed with a green tinted screen, the newest one which failed to VESA. I looked through journalctl and couldn't find any interesting messages for the last few failed boots.

So I ran sgfxi again. It didn't fail. I rebooted, and it worked.

Nothing makes sense anymore.
Back to top
damentz
Status: Assistant
Joined: 09 Sep 2008
Posts: 1135
Reply Quote
The only downside to using sgfxi is the default options don't enable kernel DKMS with the nvidia run package. This means on every kernel update and installation, you _must_ re-run sgfxi before X starts on the next boot.

If this is a problem, I'd recommend sticking with Debian's packages since they handle all that for you, specifically if you install nvidia-kernel-dkms as part of installing nvidia's packages. As of now, I'm actually using Debian's packages since that's the only sane way to get CUDA support in Debian Unstable - all other install scripts expect you're running Ubuntu 16.04 or older (2 years old at this point).
Back to top
Display posts from previous:   
Page: Previous  1, 2, 3  Next
All times are GMT - 8 Hours