Building a CentOS 7 Machine Learning Station with NVIDIA/CUDA integration: Everything I Learned

15 min readMar 5, 2018

UPDATE: After using this build for a couple months, one day I restarted the computer, and X just stopped working. I couldn’t log in. I’ve been unable to get the ML station working for CentOS since then. I did, however, have more lasting success with Fedora + Docker, and wrote a post about that here. Around two months after I wrote this update, Fedora 30 was released. Since then I’ve been struggling to get NVIDIA docker going.The open source community needs a bit of time to catch up when these kind of major version upgrades happen. It’s been a bit of a never ending struggle, but I stay in it because I’m still learning. Good luck, and let me know if I can be helpful!

I’m a data science practitioner who has recently found it challenging to cultivate my practice. As Credijusto tech grows along with the rest of the company, I find that I spent less time each day on data analytics and code, and more on the many non-technical tasks that are crucial for building a 10x team. I love turning raw data into insights and am fascinated with how emerging technologies today help us understand complex systems through data. To stay engaged with the field I decided I needed a home research station, and set off to create the most awesome one that I could afford.

I’m delighted with what I ended up with, but it took 5 months of frustration, 2 busted motherboards, and about $750 of extra expenses, before I got the project off the ground. The goal here is to share lessons learned, both moral and technical, in the hopes that in doing so your process will be about 10x smoother than mine was. To get the basics read the tl;dr just below. If you read beyond that, here’s what I’ll talk about:

Local hardware versus cloud — Why would you build your own machine when you can rent cloud capacity? I’ll list the components used to build ghost, and do a bit comparing the economics of cloud versus local. There are good reasons to go either way.
A few high-level takeaways— Pedagogical notes for people about to embark on a similar project. If you’re ADD, like me, and lean a bit too much toward jumping into things before researching them, then this section is worth reading.
Technical lessons — What most data scientists and technicians will actually want to read. If you’ve gotten your machine to post and now just want to know how to setup your environment then skip to here.

tl;dr

I built a workstation with an NVIDIA GeForce GTX 1080 Ti, CUDA-integrated, with an Intel i7–7700 CPU and 64GB’s of RAM. All-in cost was $2662. When you compare this to dedicated GPU instance plans on Paperspace the economics of cloud versus local are kind of a toss-up. What sold me on building a local station was convenience, but industry economics will increasingly move all but the most hard-core practitioners toward the cloud.

A few high-level takeaways: Front-load planning and research before jumping in; Make a plan and stick to it — don’t react based on negative emotional energy; Handle your components with utter reverence; Don’t alternate graphics cables between your integrated and GPU ports; Experiments with software are cheap and fast. Experiments with hardware are expensive and slow. Exhaust potential software fixes before you mess around with the hardware; When stuck, invest in good help; Don’t let the perfect be the enemy of the great.

Some technical lessons: Use CentOS with GNOME — unlike Ubuntu with Unity, everything worked fine I got the machine booting up; Configuring the NVIDIA drivers and installing CUDA is a detailed but straightforward process that’s pretty easy to navigate. See the Technical notes section to learn more.

Local hardware versus the cloud

Here’s the components I used for the base system (I only included the final motherboard). I bought all of this in Mexico, but converted the prices to USD at the time of purchase:

NZXT Phantom 410 Mid Tower Case — $86
Cooler Master Hyper 212 EVO CPU Fan — $32
Corsair CMK32GX4M2A2666C16 RAM Memory Kit, 2x 16GB —$293
Kingston HX421C14FB/16 RAM Memory Kit, 2x 16GB — $287
Kingston SUV400S37/240G SSD — $88
Intel i7- 7700K Quad Core processor — $296
Gigabyte GV-N108TAORUS-11GD GeForce GTX 1080 Ti — $826
Corsair 850W Power Source — $97
NZXT FZ -140mm LED Case Fan —$20
Extra fan cords — $17
Gigabyte AORUS GA-Z270X-Gaming K7 Gaming Motherboard — $214

That’s the base hardware. Of course you also need to setup your desktop environment. Those components were:

3 foot DPI / HDMI cable — $9
15 meter Ethernet cable — $15
Multi-plug with 6 foot connector — $15
BenQ GL2760H 27" Monitor LED Full HD — $192
2 meter HDMI cable — $8
TP-Link Archer T9E AC1900 Wireless Adapter — $56
ASUS VS228H-P 21.4" Monitor — $86
Cooler Master Devastator II Keyboard/Mouse Combo — $25

Grand total for the base components: $2256. Grand total for the desktop environment: $406. All-in: $2662. Now, I did have two motherboard casualties that added around $500 of extra costs. I will likely be able to get $250 back on warranty. More below.

A fair comparison of local versus cloud economics would account for the fact that, even when you go with the cloud, you still need a client machine to work on. I’m going to adjust those client machine costs out of my all-in costs, assume that you prefer working either in Unix or Linux, and that you’re willing to invest a little bit in user experience and hardware quality. Using those assumptions, a basic MacBook Air, which runs $999, seems like a reasonable choice. Subtracting the client machine costs I’m left with about a $1663 up-front investment in computing power.

How much work do you have to do locally at that price point until you start getting a positive ROI? More than you might think. The costs of GPU cloud computing have really been coming down, making a locally-integrated cloud research environment a pretty good choice for many applications. The best cloud comp that I could find was the P5000 dedicated GPU instance on Paperspace. The P5000 has a 16GB memory speed compared to 11GB for the 1080. Both machines compute at a rate of 9 teraflops, meaning that for parallel computation they are going to have similar performance. Like the dedicated instance, my machine has 8 CPU cores (4 of which are virtual), but wins handily on RAM with 64GB versus 30GB. This instance costs either $0.65/hour or $290/month. If you go with hourly, you need to power down the machine whenever you’re not using it to avoid racking up heavy usage fees. If you use the hourly plan for only 10% of the hours in a month, accounting for the $5 storage fee, your monthly cost is going to be $44.

We’ll call the $290/month price point the practitioner’s price point. This is someone who is training heavy models requiring a couple hours each, optimizing hyper-parameters across multiple training runs, and maybe mining some crypto. This person gets a positive ROI with my specs in less than 6 months. $44 a month is the hobbyist’s price point. This is someone who ducks into their environment here and there to train a couple models, maybe do the fast.ai course, and play around with the occasional Kaggle competition. This person gets a positive ROI in about 3 years, and should probably go cloud.

I’m somewhere between the practitioner and hobbyist. On a strictly dollars and cents basis it’s a tossup whether this project is economically justified. For now, the convenience of being able to work locally was the final selling point. I don’t have to worry about turning the instance on and off. It’s easy to set up a multi-monitor working environment. I don’t need to port data back and forth between the client and the cloud server. If I take my WiFi adapter card out, which I’m currently not using anyway, I can add another powerful GPU. As the technology advances and the costs of high-end GPU computing continues to come down, it is likely that in a year or so I’ll be making similar arguments to fellow practitioners about the advantages of cloud that I currently make for companies. In the meantime local still felt like the right choice in terms of both the costs of computation and user experience.

High-level takeaways

Front-loading planning and research — I bought the wrong sized case at first and had to return it; Forgot to purchase the CPU fan; Needed extra case fans and connector cables; Didn’t think about the network card. Life keeps reminding me that putting in the time to make a solid plan, or at least to inform yourself, is typically worth it. Don’t get analysis paralysis, but with a highly technical project like this invest in informing yourself before jumping in.
Treat your components with utter reverence — At some point in my first build attempt I bent the pins in the motherboard’s CPU slot. This is the ultimate avoidable problem and is easily solved by just being careful.
Make a plan and stick to it/don’t be reactive— The first time I got ghost posting I installed Ubuntu and right away started having problems installing the NVIDIA drivers and the getting dual monitor display to work. Instead of attacking the problem in a structured way I started making a bunch of apt-get calls, switching back and forth on what problem I was trying to solve, and ultimately getting frustrated and yanking out the GPU to try and run both displays off of the CPU’s integrated graphics. This cost me the second of my two motherboards and leads me to the next point…
When troubleshooting, rule out the low-cost, low-risk factors first — Experiments with software are cheap and fast. Experiments with hardware are expensive and slow. Exhaust potential software fixes before you mess around with the hardware. After a full day of trouble installing NVIDIA drivers and getting dual monitors running in Ubuntu I pulled out the GPU and connected my cables to the CPU’s integrated graphics ports. I got into the BIOS this way, but couldn’t log in to the operating system. I then replaced the GPU and hooked the graphics cables back to its ports. Apparently alternating cable connections between your integrated and GPU ports is a really bad idea: the machine didn’t even post after this. It would have been far cheaper to have tried out a different Linux distro, e.g. CentOS 7, before I start experimenting with hardware. This was what I went with, ultimately, and had I tried this first I would have saved a motherboard.
When stuck, invest in good help — My friend José Carlos Nieto, Co-Founder of Mazing Studio, got me back on track after I killed my second motherboard. He’s also the one who turned me on to CentOS 7 instead of Ubuntu, which ended up being a great choice. His time wasn’t free, but it was totally worth it.
Don’t let the perfect be the enemy of the great — I still haven’t gotten my WiFi adapter running. I froze the OS on José Carlos’ build when I dropped a bunch of .sofiles into the /lib/firmare directory. I had to rebuild the OS and configure all dependencies and drivers from scratch once more after this. After 5 months of hangups and procrastination I decided that I wasn’t going to make this a sticking point. I have a 15 meter Ethernet cable running through my living room and am leaving the WiFi for another day. Lemme know if you have any tips.

Technical notes: getting it all running

There are a lot of great guides out there for building a desktop PC. If you’re looking for a great machine learning rig then buy everything I’ve listed here and put it all together. You should consider buying a more powerful GPU, perhaps an Intel i9 CPU, and a faster SSD if you need to do a lot of I/O from the hard drive. In general, just go through the components list above and make sure you have all the key ingredients. Put it all together, get the BIOS posting, and the you’re ready to setup the environment:

Installing the OS

Once you’re in the motherboard’s BIOS you’ll need a boot image. I found a great guide for how to create this with macOS here. The steps were:

Download the DVD ISO from the CentOS website.
From the download directory run hdiutil convert -format UDRW -o centosdvd.img CentOS-7.0–1406-x86_64-DVD.iso to convert the iso file to a img file.
Plug in an 8 GB USB and usediskutil list to find that disk. Unmount it with diskutil unmountDisk /dev/[disk_name]. The disk name will typically be something like disk1 or disk2.
Copy the boot file img to the disk with sudo dd if=centosdvd.img.dmg of=/dev/[disk_name]. This will take a long time, in my case a couple hours.

From here plug the USB into your rig while it’s turned completely off and turn on the power. From the BIOS menu you should have easy access via the UI to the boot settings, where you will select the USB drive you just inserted. For both the MSI motherboards that I destroyed and the Aorus motherboard that I finally used this was very straightforward. Select the boot disk as your first boot option, restart the computer, and go through the CentOS 7 install steps. I did the full install, but you could probably get away with the basic install, dropping in your key dependencies later.

Configuring the NVIDIA drivers

A quick note before I walk through how I did this with CentOS: after I crashed my first successful CentOS 7 install by adding some bad .so files to the /lib/firmware directory I tried to do this with Ubuntu. Surely this was going to be easier, right? After all, Ubuntu is know as the most user-friendly distro of Linux out there. You don’t have to spend much time googling for answers to realize that it’s very well-supported by the open source community. Without going into too much detail, I eventually wound up at this stack post as I tried to troubleshoot some issue related to X server/lightdm. I was able to finish the driver install but unable to to restart lightdm, unable to get back into the Unity GUI, and unable to install CUDA. After a full day of frustration which ended in me totally killing Unity and only being able to interact with the OS via the terminal in run level 3, I decided to give CentOS 7 another try. As a plus, I also liked the GNOME GUI a lot better, which came out-of-the-box with the CentOS 7 install. If you’re a Mac user, its multi-desktop environment will feel familiar. There was also zero fuss getting multiple monitors running.

Once back in CentOS 7 territory these were the steps I followed to get the drivers running. I aggregated information and instructions from a few different sites. If I’ve missed any important steps here or you took a different route that worked well please let me know so that I can update. I started with this this guide. Several of the steps here are copied straight from that page:

Run the following commands:

$ yum -y update
$ yum -y groupinstall "GNOME Desktop" "Development Tools"
$ yum -y install kernel-devel

2. Download the appropriate NVIDIA driver. This will not necessarily be the “Latest Long Lived Branch version” that the install guide recommends. I recommend going to the NVIDIA Driver Downloads page and using the selectors to locate the appropriate driver file. For the 1080 Ti on a 64 bit Linux system that was this one. (If you have any doubts about whether you are running 32 or 64 bit you can confirm with uname -a in terminal)

3. Reboot your computer and then append rd.driver.blacklist=nouveau nouveau.modeset=0 to the GRUB_CMDLINE_LINUX section of /etc/default/grub.

4. Generate a new grub configuration to include the above changes:
grub2-mkconfig -o /boot/grub2/grub.cfg.

5. Edit (or create if it doesn’t exist) the file /etc/modprobe.d/blacklist.conf and append blacklist nouveau.

6. Backup your old initramfs and then build a new one:

$ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r) nouveau.img 
$ dracut /boot/initramfs-$(uname -r).img $(uname -r)

7. Reboot your machine. At this point I stopped using the first guide. The systemctl call it mentions wasn’t necessary. After the reboot your interface will likely look grainy and clunky. Don’t worry, you didn’t fry GNOME, you’re just not done with the install. Hit Cntl-Alt-F3 to enter your terminal prompt and enter root user with sudo su.

8. cd to the directory where you’ve installed your NVIDIA driver and execute the following:

$ chmod +x [NVIDIA_driver_file].run
$ ./[NVIDIA_driver_file].run

9. Accept X override when asked.

10. User init 5 to return to desktop mode. If GNOME isn’t back to normal you may need to reboot the system once more. To do this enter terminal with init 3 and then sudo reboot.

From there I was good to go with the NVIDIA drivers and GNOME was back to looking and functioning normally. The final river to cross on the way to enjoying machine learning paradise was the CUDA install. Again, the steps here were detailed but pretty easy to follow. I used the CUDA Toolkit Documentation from NVIDIA as my primary reference here.

You can go through the preliminary steps to check the availability of a CUDA-enabled GPU, appropriate Linux distro, and your gcc installation if you like, but chances are that if you got this far you’ve already got these sorted.
You should already have your kernel headers and development packages installed from the driver install, but run sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) just in case.
Download the appropriate NVIDIA CUDA toolkit runfile from here. If you have the same specs as me this should be Linux -> x86_64 -> CentOS -> 7 -> runfile (local).
Not all the steps listed in the guide aren’t necessary if you’ve just gone through the driver install process. Enter the shell with sudo init 3 and cd to the directory containing your runfile download.
Run the installer on silent mode to automatically accept the EULA and accept default parameters with sudo sh cuda_<version>_linux.run — silent. If you want finer control over the install remove the --silent flag. I did this, but I can’t recall if I changed any installation defaults. I’m not sure if this step was necessary, but you can also create an xorg.conf file from the NVIDIA GPU display with sudo nvidia-xconfig.
After the install completes successfully sudo reboot to reboot the system and enter the GNOME GUI.
Update your path variables with the CUDA binaries (change your version if not using 9.1):

$ export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

8. A couple quick checks to verify both your NVIDIA driver version and CUDA install. I used this post-install guide as a reference for the following steps:

$ cat /proc/driver/nvidia/version #..should output something like this:
NVRM version: NVIDIA UNIX x86_64 Kernel Module  387.26  Thu Nov  2 21:20:16 PDT 2017
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)$ nvcc -V #..should output something like this:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

9. Install a writable copy of the CUDA samples and build them (again, pay attention to the 9.1 version number — depending on when you’re reading you may need to update it):

$ cuda-install-samples-9.1.sh ~ 
$ cd ~/NVIDIA_CUDA-9.1_Samples/5_Simulations/nbody 
$ make

10. The runfile installer guide tells you to run the nbody example with ./nbody but at least when I went through these steps I did not have that in the root samples directory. Instead I followed the post-install guide I referenced above to run deviceQuery and bandwidthTest. Note that the paths mentioned in the guide have changed, or at least they were different for my setup. I entered the appropriate directory and executed the tests as follows:

$ cd bin/x86_64/linux/release #..from root testing directory 
$ ./deviceQuery #..should see something like this: 
./deviceQuery Starting...CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11169 MBytes (11711807488 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1683 MHz (1.68 GHz)
...$ ./bandwidthTest #..should see something like this: 
[CUDA Bandwidth Test] - Starting...
Running on...Device 0: GeForce GTX 1080 Ti
 Quick ModeHost to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432   12709.4Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432   12893.4Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(MB/s)
   33554432   371522.7Result = PASSNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

10. That’s it. For good measure do one more sudo reboot, get back in to your terminal, and run nvcc --version and echo $PATH . The former should repeat the successful result, above, and the latter should include /usr/local/cuda/bin in the output. If that’s the case, congrats, you’re ready for training! If it’s not please let me know in the comments and let’s see if I can help you out with troubleshooting.

Wrap-up

This was a challenging project, but it feels great having the rig up and running. I only recommend this if your level of commitment is high and you have significant prior experience with machine learning and know that it’s something you really want to invest in. You could also mine some crypto that isn’t bitcoin or ethereum, but the economics of this won’t really add up for most folks. Otherwise start with an on-demand Paperspace GPU and get your feet wet with a few tutorials. I’m getting back into the flow with the fast.ai course, Deep Learning for Coders, and the Kaggle 2018 Data Science Bowl.

Thinks I’d still like to solve on the hardware end include how to set up a reliable ssh/VPN tunnel (my router is an SAP, not a primary), and how to setup my WiFi adapter without crashing my build. Any tips would be welcome.

Have fun out there. At the end of all this here’s how the workstation came out: