Hunting a python bug in a chroot
Being a "full stack" developer you often end up doing sysadmin jobs. And this week I've been dealing with an issue with chroot and SSL certificate verification, or at least I though so.
In the app I'm working on, a cloud based code editor, every user get's their own shell.
A shell is an interface to put commands into a computer/server. Often via a unix type command line interface.
To add a bit of convenience and a layer of security each user is chrooted in their home directory.
Chroot basically means the root folder / is translated to another path, in this case the user's home folder.
When a program tries to read /etc/passwd it's instead reading from /home/user/etc/passwd.
Only problem is that many programs require libraries located elsewhere, like in /lib/ and /usr/lib.
Thus I have to mount --bind those folders from the system path to the user's home dir. So it goes "full circle":
A request to /lib/foo, gets translated to /home/user/lib/foo, which has mounted /lib/foo.
This seems a bit tedious, but the advantage is that I can pick which folders the user can access, it doesn't take up any extra hard drive space, and the libraries and executables are the same as the host system, so they'll be updated when the host system updates.
You could say a chroot is a lighweight container. The reason why I'm not using a container like LXC or Docker, besides the extra disk space and resource use, is the time it takes too boot it up, compared to a chroot which is instant.
There is however the chance that someone more clever then me can escape the chroot and gain full system access.
You need root privileges to create a chroot, and the most common mistakes with chroot is not dropping the root privilege.
Besides dropping the root privilege by setuid and setguid to the actual user id/group, I also use Apparmor.
Apparmor adds an additional security layer where I explicitly have to define which directories and what resources a program is able to access.
One program that users are allowed to run is Mercurial,
which is a source management tool developed in the programming language Python.
This week while going the "happy path" ( meaning everything I do have been well tested and polished, useful for when demoing the product) I cloned a repository from Github (possible with the hggit Mercrial addon). And got the following error message:
abort: error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
Reading from the error message it seems to have something to do with SSL certificates.
I tried to clone from another server that also uses SSL and it worked fine ...
It wouldn’t be the first time Github change their certificate supplier/chain though, which was the root of an earlier bug. So I updated all root certificates and tried again. But still the same error.
Reading the Mercurial manual I found the --insecure flag which skips the certificate verification ... Adding the --insecure to hg clone gave the same error !
Also found the --traceback flag while reading the Mercurial manual, but that didn't give any useful information.
Tried searching to find out what was on _ssl.c line 590 and found: PyErr_Clear(); inside the function fill_and_set_sslerror.
Which is called from _setSSLError which in turn is called from ... I'm not very good with Python or C so I just gave up.
Rant: If you are going to crash/exit, just exit on the spot with a stack trace! Don't have a bunch of error handlers that obfuscate the actual error !
Then I tried running hg clone from the host server, and it worked !
So it must have something to do with either Apparmor or the chroot ...
I disabled Apparmor. Still same error. So it has to do with the Chroot then ...
I used tracefile perl script which outputs all files that are accessed by a program. Then made sure the chroot had everything resembling SSL ... Still same erroor ...
Damn it. So I added each folder one by one to the chroot until it worked. It turned out to be /usr/share/ that was needed.
The Mercurial clone command accesses the following files form /usr/share/:
I still don't know what the actual error was. I guess it has something to do with locale formatting. If you have any idea, post int he comments below:
Written by Johan Zetterberg May 4, 2018.