So I have another big rant to do today. If you looked at the preview image for this post you understood who is the target: +NVIDIA
. And no it is not about Optimus technology even if the situation is not much different than 3 years ago, when +Linus Torvalds
majestically expressed what all Linux users with the bad luck of having an NVIDIA Optimus laptop are thinking. . It is about CUDA.
Few days ago I got an email from a person trying to get the latest cuda version running on a Dell Precision M4800 (which incidentally has optional optimus technology). The problem was she was not able to get updated nvidia drivers on Ubuntu or CentOS. The result of what she was doing was leading to an unbootable system. It was not hard to believe that at all, being there myself so many times in the past. Since the M4800 is a laptop after all (well two if you just look at the weight eheheh) I just called her to my office so I can see the problem directly without asking details via email.
I resintalled CentOS from scratch for her. She disabled optimus already from the UEFI settings already after my advise. This would make her life so less complicated. System was up and running with nouveau, time to install cuda and nvidia drivers. I looked at the official documentation  and I'm greeted with an unexpected surprise: they now provide repositories for quite a lot of distributions, including Red Hat / CentOS. Well better than installing manually I suppose... less chance to break and you get updates quite easily alongside the system's. Usually I'm not happy with vendor repositories, they easily mess up or are slow in picking up distributions change / updates, but this is NVIDIA right? They know their shit right? And anyway I don't have the time to create a package for her and maintain it. So be it, following the instructions in sections 3. Package Manager Installation, then reaching 3.2.8 where it tells you to jump to section 6. Post-installation Actions. Done them, time to reboot. This was quite a long day, even if this short post doesn't give you an idea, I'm almost celebrating we are over. And kernel panics right after grub.
I look at the trace and I see a drm_kms_helper call. Again WTF? KMS? We are going to use the proprietary nvidia driver not support KM..... oh crap.... want to see. Reboot and stop at grub, editing the menu entry, looking for the kernel command line.....
There is a little detail missing: "nouveau.modeset=0 modprobe.blacklist=nouveau" (I know this because the rpmfusion provided driver for fedora does that, too bad they don't package cuda as well. ditto for elrepo). Added it manually, rebooted, works.
Holy crap NVIDIA is it too much to ask you not to break the damn system by installing your crap into it? Did you even tested one single Linux system? In section 4. of your manual "Runfile Installation" section you clearly mention "disable nouveau" and explain how to do so. In your repo installation I suppose that's should happen automatically, but it clearly doesn't.
Now I can't really know if she had the exact same problem with Ubuntu, there are so many things that can go wrong, but she tried like 10 times before coming to me.
Now NVIDIA how can you solve this? Well you already depend on EPEL repo according to your guide. Why don't you use the driver from the elrepo instead of installing your own broken one? This would even remove you from the burden and elrepo provides all the major series not only the latest one. Also you are welcome to contribute a missing version if you like so, I'm sure! And you package CUDA SDK only in your own repo? Or even better, why don't you contribute
a cuda package to elrepo for CentOS, rpmfusion for fedora and so on? On other distros might be more difficult since they don't want package updates in general. Their choice, their problem, but they also provide a cuda package at least in the official repo. At least don't reinvent the wheel, Fedora and CentOS have food repos with up to date drivers, use them! Stop providing your own solution breaking fresh installations
. Thank you.
: note I also think this is not entirely a fault of NVIDIA. There are quite a few limitation possibly to blame in X11. However NVIDIA never really did something to change this, while the brave people behind the bumblebee project actually achieved quite good results with what was available at the time. It's just not easy to have out of the box and every application must be started with optirun. I myself am a very happy user of bumblebee, playing high end steam games on my Dell Alienware 15 on Ferdora 22, but I had to package it all by myself! Granted I enjoy doing such a thing, but the normal user is just let down in the dust.
Now it's true NVIDIA contrinuted to the Linux kernel the DMA_BUF PRIME helpers (nice, thank you, that's appreciated and the right way to go). But this was in early 2013, two years ago! Well the NVIDIA KMS driver might be finally near, we will see if something finally materialize. It will still be a proprietary blob in the end, but my hope is that the user experience will be at least less painful. Luckly most users don't need the power of nvidia and they can simply use Intel, delivering a super nice experience out of the box, straight from the upstream Linux kernel, no additional action required. Also the new HD 6xxxx start to actually have some power.... watch out nvidia!