Consider the following use case:
PREFIX = '/home/user/files/'
full_path = os.path.join(PREFIX, filepath)
read(full_path, 'rb')
...
Assuming that filepath
is user-controlled, a malicious user user might attempt a directory traversal (like setting filepath
to ../../../etc/passwd
). How can we make sure that filepath cannot traverse “above” our prefix? There are of course numerous solutions to sanitizing input against directory traversalthat. The easiest way (that I came up with) to do so in python is:
filepath = os.normpath('/' + filepath).lstrip('/')
It works because it turns the path into an absolute path, normalizes it and makes it relative again. As one cannot traverse above /
, it effectively ensures that the filepath
cannot go outside of PREFIX
.
Post updated: see the comments below for explanation of the changes.
Hi, I try your solution on Python 2.7.3.
Almost works, but if filepath is begin with ‘/’, it has problem again.
Try another solution? Thanks.
For example:
===
import os
PREFIX = ‘/home/user/files/’
filepath = “/etc/passwd”
filepath = os.path.normpath(‘/’ + filepath)[1:]
full_path = os.path.join(PREFIX, filepath)
print full_path
===
full_path will be “/etc/passwd”
Thanks for pointing it out. In the use-case I had in mind for this snippet absolute paths weren’t a problem, but I should have tested it better anyway. Using
.lstrip('/')
instead of[1:]
fixes the issue (I’ve updated the post as well). The reason for why it fails is quite surprising (in my opinion).Explanation: The issue stemmed from two issues one in
normpath
and the other inos.path.join
. It turns out that whennormpath
(orabspath
) gets an absolute path starting with a single slash or 3+ slashes, the result would have a single slash. However, if the input had exactly two leading slashes the output will retain them. This behavior conforms to an obscore passage in the POSIX standard (last paragraph):As a result, pythons leaves the two slashes intact which is kind of unexpected (as this bug report may attest).
The leading two slashes issue, means that after the string slicing the result is still an absolute path. Here comes another possible unexpected behavior (albeit this time well documented) – in case one of the arguments to
os.path.join
is an absolute path, the function would discard all preceding arguments. Thus in our case it would discard the prefix path, causing the bug.Windows’ version: https://ideone.com/ErrOkF At least it breaks directory traversal with full name. Curious that output in ipython is ‘c:\c:\qwe’
Yes, replace ‘/’ with r’\/’ or os.sep
what’s wrong with os.path.basename? thx
Because that doesn’t let you have any directories at all in the user provided path.