I have been using ZFS for a while on more than one platform and I thought it might be interesting to write about my setup at home. I was —admittingly to a small degree— involved testing1 when OpenZFS on OS X was in its early development, so I can also present a small part of ZFS history from my personal point of view.
But let’s get started with some basics for those that haven’t heard of ZFS. At the lowest level ZFS is a filesystem, similar to HFS+ and the recently announced APFS on the Mac, and NTFS on Windows; it stores and retrieves data to and from disk. But ZFS has many advanced features beyond what HFS+ or NTFS offer. Automatic data integrity checking and repair of corrupt data, data deduplication, volume management, built-in RAID —without the dreaded write hole error— and snapshots are some of the improvements over traditional filesystems. What I personally liked was ZFS’ ability to detect and repair data corruption (bitrot), incidentally one of the major design goals of ZFS. That was important to me, as I moved from keeping paper copies of documents to electronically storing them. ZFS guaranteeing that the stored data hasn’t been corrupted was a welcome assurance to forego paper copies altogether2.
ZFS has a long history, going all the way back to its initial development at Sun Microsystems in 2001. That history culminated into the foundation of OpenZFS in 2013, an umbrella project aimed at bringing together those that use and work on improving ZFS. On the main page of OpenZFS are links to ZFS implementations for illumos, FreeBSD, Linux and Mac OS X. While I have no personal experience with the FreeBSD or Linux versions3, I can vouch for ZFS on OmniOS —an illumos based operating system— and the implementation for Mac OS X1.
It is fair to say that ZFS on illumos is regarded as the gold standard for all ZFS implementations, owing to the fact that illumos is a direct descendent from Sun Microsystem’s Solaris operating system, where ZFS was originally designed and implemented.
There a few operating system distributions based on the illumos kernel. OmniOS is one of them. It is a capable, enterprise grade operating system using ZFS as its native file system that was tailored for server use. You need Linux or Windows? So do I. Thankfully, OmniOS can run other operating systems as guests in a virtual environment4. OmniOS was a good fit for what I needed and wanted at home.
My home server runs OmniOS on bare metal on the Supermicro A1SRM-2758F-O motherboard. The operating system resides on two mirrored 1TB disks, where all the virtual machines are stored as well. That setup can not only survive the complete loss of one of the disks, it also eliminates any data integrity issues thanks to ZFS’s ability to check and repair any disk read errors. A 128GByte SSD is used as a fast read cache (L2ARC), which kicks in should the RAM based first level cache (ARC) be exhausted. As the previous link touched upon, ZFS also supports a write cache (ZIL). In my home server setup, where the main focus is on safe long term storage as well as serving video files to my Roku streaming box, a write cache is not needed5.
The main user data, video files, documents,and backups of desktop and laptop computers in my home6 are stored on a three-way mirror of HGST drives. For the rational as to why I’m using a three-way mirror instead of a RAID-Z configuration head over to a blog post written by Constantin Gonzalez. I also found Richard Elling’s blogpost interesting to get an idea of the differences in mean time to data loss for the various ZFS pool configurations.
With all that, this is the disk layout of my ZFS pools on the server:
# zpool status pool: data state: ONLINE scan: scrub repaired 0 in 3h51m with 0 errors on Sun Jul 10 17:16:15 2016 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: scrub repaired 0 in 6h47m with 0 errors on Sun Jul 3 20:12:14 2016 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t0d0s0 ONLINE 0 0 0 c2t1d0s0 ONLINE 0 0 0 cache c3t0d0s0 ONLINE 0 0 0 errors: No known data errors
My interest in ZFS started on the Mac at a time when I did not know what illumos and OmniOS were. At that time, Don Brady, a prior OS X filesystem architect founded his company Ten’s Complement to commercialize a version of ZFS for the Mac. Prior to Ten’s Complement existence, Apple was working on a port to ZFS, but abandoned it; presumably due to licensing concerns. I bought a license to a ZFS version from Ten’s Complement, but that company was short lived. Rumor has it they were blindsided by Apple’s move to hardwire new technology into HFS+ that wasn’t easily replicated with ZFS. The code base was acquired by Greenbytes with lots of fanfare, only to be neglected shortly after, and sent to its death throes with Oracle’s acquisition of Greenbytes.
Fortunately, at the same time that it became apparent that Zevo —Greenbyte’s ZFS implementation for Mac OS X— was a dead end, a new version of ZFS was ported to the Mac7. Jorgen Lundman (the principal developer) and the mysterious, yet aptly named ilovezfs were the main people driving the implementation forward, and are doing so to this date8.
There is tons of information about ZFS online that I don’t want to repeat here. However, my setup at home —a server running OmniOS and Open ZFS on OS X on the Mac— might be uncommon enough to be of interest to others. Specifically, I'll show how ZFS on those two systems allows me to use ZFS’ built-in send/receive commands to stream a snapshot of the data on the OmniOS server to a ZFS pool connected to the Mac for backup purposes.
Similar to the OmniOS server, the Mac has a mirrored drive setup, housed in an external enclosure from OWC to take advantage of ZFS’ self healing abilities. Hardwired Gigabit Ethernet connects the server as well as the Mac to minimize the time spent transferring what amounted to a terrabyte of data from one ZFS pool to another.
The first thing that had to be set up was the ability to log into the Mac from the OmniOS server through ssh via public-key authentication. Interestingly enough, even trying to simply log in via password failed out of the box; I was greeted with this error message:
# ssh -l ottmarklaas 192.168.0.104 no common kex alg: client 'diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1', server 'email@example.com,ecdh-sha2-nistp256,ecdh-sha2-nistp384, ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1'
Turned out that adding the missing cipher to the sshd config file on the Mac (/etc/ssh/ssd_config)
# Ciphers and keying Ciphers aes128-ctr,aes192-ctr,aes256-ctr,firstname.lastname@example.org, email@example.com,firstname.lastname@example.org,aes128-cbc KexAlgorithms email@example.com,ecdh-sha2-nistp256,ecdh-sha2-nistp384, ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1, diffie-hellman-group1-sha1
fixed that problem. With basic ssh working, you can follow any of the many instructions on how to setup ssh with public-key authentication, an example can be found here.
I believe there has been some progress in executing ZFS commands on the Mac without the need to be root since I set this all up. You might want to skip the step that enables root on your Mac and see if you can execute the send/receive operations as a normal user. Otherwise, enable root so that the Mac can invoke all ZFS commands when logged in via ssh as root.
The work so far was needed to set things up. We only have to do that once. For the actual backups there are two steps left, the first one is the creation of a recursive ZFS snapshot on the server:
oklaas@Antec:/export/home/oklaas$ sudo zfs snapshot -r data@151206082745
To transfer the snapshot to the disks attached to the Mac, I executed on the server
zfs send -R data@151206082745 | ssh firstname.lastname@example.org /usr/local/bin/zfs recv -F OWCPool/data Running process: '/usr/sbin/diskutil' 'unmount' '/Volumes/OWCPool/data' Unmount successful for /Volumes/OWCPool/data
The whole process took about 10 hours, which turned out to be a transfer rate of about 26MBytes/second. The process can be sped up by using mbuffer, as this website explains.
After the first full backup, the next time ZFS can transfer the incremental changes between two snapshots. Creating a new snapshot:
# zfs snapshot -r data@160528120341
followed by instructions to ZFS to send the changes between the two snapshots
# sudo zfs send -R -i data@151206082745 data@160528120341 | ssh email@example.com /usr/local/bin/zfs recv -F OWCPool/data
makes that a relatively quick affair.
I would be remiss not pointing out that the whole process can easily be automated. The tool I recommend is znapzend. It offers automatic snapshot creation as well as automatic backup via ZFS send/receive. I use it to automatically create snapshots, but perform the backup by hand. The reason is simple: I prefer to have my backup disks to be physically disconnected so I can’t accidentally delete any files. Been there, done that!
I have fond memories of my involvement in the testing of early releases of the Mac OS X version as well as my interactions with Jorgen Lundman and ilovezfs, the two people that deserve most of the credit for bringing a version of OpenZFS to the Mac. ↩ ↩
As an example, CGP Grey laments on a Cortex podcast about data loss that he attributed to an error of the underlying file system (HFS+). His backups were of no use to recover the uncorrupted files as the error propagated to all his backups before he noticed the problem. What was worse was that given the specific circumstances didn’t even know how much data he had lost. John Siracusa has lots to say about this issue as well on episode 175 of the accidental tech podcast. ↩
To be honest, even the SSD as a second level read cache is mostly overkill in my home setup, but (i) I had an unused SSD, (ii) ZFS makes it so easy that a cache can be added literally within minutes, and (iii) there still was an unused SATA connector on the motherboard available. So, why not? ↩
I’m using Arq to create TimeMachine like backups via sftp. With a bit more work you can convince Mac OS X’ TimeMachine to directly store the backup on a drive that is shared via the Open Source implementation of AFP, called netatalk. That is supported, but entails a bit of work. ↩