Migrating a Big Data Cluster from Linux to FreeBSD

Benedict Reuschling <bcr@freebsd.org>

This talk will cover the on-going process of migrating a big data cluster from a Linux-only system to a mixed OS environment using more and more BSD. I’ll cover how the cluster started, what it is used for and the current setup (hardware and software). Particular focus will be put on the migration of the fileserver that has been successfully migrated from a hardware RAID configuration using Linux to an OpenZFS-based FreeBSD setup. I’ll detail how it was done, the planning, preparation, and important learnings I too away from the experience. The talk closes with an outline of what steps will follow in the future. The talk is intended for people interested in such setups, migration strategies, steps to take and pitfalls to avoid.

I manage the Big Data cluster of the University of Applied Sciences, Darmstadt, Germany as system administrator. As such, I’m responsible for providing the compute resources to researchers and teachers during the semester, as well as students doing projects and thesis work on the cluster nodes. When I took over the cluster a couple of years ago, it was a Linux-only system. I’ve managed to convert more and more nodes to using FreeBSD and OpenZFS. My talk will focus on the how and why, elaborating benefits of the approach and rough edges that still need to be worked on. In the summer of 2018, I took on one of the biggest tasks yet: the migration of the central file server for the cluster. This fileserver provides the home directories to each node via NFS. I was successful in migrating it from a pure hardware RAID setup to a FreeBSD-based OpenZFS software RAID solution. This provides the usual benefits associated with OpenZFS such as compression, quotas and reservation, as well as data protection on various levels. My talk will detail the preparation that I took before the migration and provide some insight into how the setup works now. Future work is discussed and I hope to gather some feedback from the audience about topics like monitoring, backup, and outstanding work.