Transferring file over high-bandwidth, high-latency link - bbcp

As you might know, I live in Canada while my home server is located in Macau. Both sides have decent internet connection: 150Mbps symmetrical in Macau, 350/100Mbps in Canada. While not crazy fast, this is fast enough for most cases, even for transferring large files. But there is a problem: I can never get this speed...

Speed test from my computer in Canada to my server in Macau via LibreSpeed. Look at that juicy ping and revolutionary upload speed...

The problem

The problem is simple: I can download from my home server somehow close to the limit (150Mbps) but I can only upload at less than 10% of theoretical upload speed, seemingly for no reason.

The investigation

I tried running speedtest from Canada to different hosts in Macau: different ISPs, my server with LibreSpeed, my server with iperf. I found that I only get good speeds when I use multiple connections: multi connection feature on Speedtest.net, iperf with LOTS of parallel streams.

The solution

Find a file transfer software that supports parallel connections and has high latency in mind.

Unfortunately, this ruled out many common names like FTP, SFTP, SMB, NFS, etc. and I am left with not too many choices. Most are extremely expensive (targeted towards TV/media/game production industries), and most of the free ones are hard to setup (or need to be heavily adapted to work in my use case). I will leave the detailed explanation to this wonderful investigation done by Harry Mangalam, a Research Computing Specialist at UC Irvine: http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html. But I went with bbcp for the relative simplicity.

BBCP project page: https://www.slac.stanford.edu/~abh/bbcp. Today surely is a show of all the big universities isn't it.

BBCP is not really a file server, but more like supercharged scp, supporting many tweakable parameters like TCP window size, parallel streams and so much more. We won't need to know all the parameters but it's good to know what is available and what do the ones I suggest mean. No daemon/service is needed - you just need to install the program on both sides.

Installation

The binaries hosted on the website are old so I needed to compile them for my Ubuntu 20.04 and 22.04 machines. I also needed to install some dependencies; if you are using other distributions, you can skip this step and install them as errors come up:

sudo apt install curl libssl-dev zlib1g-dev
curl https://www.slac.stanford.edu/~abh/bbcp/bbcp.tgz |tar -xzf -
cd bbcp/src
make  # will get deprecated warning, seems fine to ignore
sudo cp ../bin/amd64_linux/bbcp /usr/local/bin/  # or somewhere else
sudo chmod +x /usr/local/bin/bbcp

BBCP is installed! Repeat this on any machines you want to use as the sender or receiver.

Preparation

I am connecting my sending machine to a VPN back to my home server's network so no port forwarding/firewall shenanigans, and I would suggest you to do this too: exposing your ssh port is not the brightest idea.

You also need to mount the source and destination folders. I am using VirtualBox for my source Ubuntu 22.04 installation so Shared Folders works. I then mounted my NAS on the receiving side Ubuntu 20.04 machine.

The command

I will preface this by saying that these commands worked really well for me, but they are not optimal, especially since I haven't even fully read the man page and my internet is not actually that fast to require serious tweaking. I actually took it from a Chinese site but stuck with it as it works. They might not even work for you. Please read the man page and search around if you experienced any issues. I will keep this page updated as I use it more.

Explanation of the flags:
-v: verbose
-s: parallel streams - lowering reduces speed for me. Experiment from maybe 10
-F: do not check destination free space - probably do not need this...
-f: force, overwrite existing files
-w: TCP window - man page has a formula for calculating it. The '=' sign ignores automatic window size tuning which increased performance for me.
-P: frequency of updating transfer progress (seconds)
-r -g -A: recursive, create directory structure, create destination directory

# Single File
bbcp -v -s 60 -F -f -w =4m -P 10 /path/to/source.file [user]@[dest-ip]:/path/to/dest

# Folder (recursive)
bbcp -v -s 60 -F -f -w =4m -P 10 -r -g -A /source/folder/ [user]@[dest-ip]:path/to/dest

Results and Conclusion

Saturated my upload link!

I started this investigation because I wanted to upload a 20GB file to my NAS and could not stand the 500KB/s upload speed with SMB, which would take 12 hours. Instead of living with it, I spend 3+ days figuring it out and the transfer took 15 minutes instead. I still don't know if it's a good use of time...

Unfortunately, this will not be able to fit in my regular use case of my NAS, which is via SMB on Windows. Some possible improvements would be using WSL instead of virtual machine, and installing bbcp on the NAS. Either way, I am very happy I can upload files back to my server at reasonable speeds.

Leave a Reply

Your email address will not be published. Required fields are marked *