Why Does Dropbox Add a Unique ID to Every Photo?

Following on from my last post about Dropbox changing my photos, I noticed a new exif field of “Image Unique ID” embedded by Dropbox in the image.

This ID would allow Dropbox to track unique files across their storage estate to avoid duplication. Equally it could used to track the original file and who uploaded it from a cropped version posted online, especially if law enforcement turned up with legal papers and demanded access.

Think about leaked documents or protest photos, yes it’s good practise to strip the meta data out but not everyone does.

This again comes back to what Dropbox and it’s camera upload feature is doing and is it documented anywhere?

Note Google Photos does not embedded any tracking data in the exif of the image I tested by uploading and downloading it.

The Hash

The hash for IMG_7082 is 8af323e74def610b0000000000000000 which looks like a 128bit hash but with only the first 64 bits populated. I’ve tried a number of hash tools on various parts of the original but they don’t match the unique ID. I have tested just the pure image data from original and Dropbox modified images.

Reverse engineering the hash function isn’t the real issue here,  the real question is why has this ID been added?

Dropbox iPhone Camera Upload Changes Photos

When Google announced their new Photos tools I decided to give it a go and see what Google’s machine learning could extract from my 83,292 photos stretching back 15 years. I’m sure you know that Google are offering “unlimited” and “free” storage for photos so long as you allow them to optimize your photos. I’m happy with the trade-off in quality as I already manage an archive of full resolution (or so I thought) photos via f-spot and have backup arrangements for it.

Dropbox Camera Upload

I have used the Dropbox Camera Upload feature for about 18 months to get photos off my iPhone and on to my various other devices and offsite backup server. Dropbox state that “When you open the app, photos and videos from your iPhone or iPad are saved to Dropbox at their original size and quality in a private Camera Uploads folder.”

This statement hides the fact, that the Dropbox app re-compresses your photos before it uploads them. I found this out when I used the desktop backup client to seed Google Photos from my Dropbox camera folder, before activating the apps on my iPhone and iPad.

Google checksum all photos before uploading to avoid duplication. When I enabled the Google Photos app on my iOS devices to upload directly from the iOS camera roll, the app started to upload all my photos again. This led to duplicated photos and a few gigs of wasted upload bandwidth. I wanted to understand why this happened and adapt my photo work flow to avoid it happening again.

Image checksums

First of all I extracted a single photo IMG_7082 taken that day directly from my iPhone over USB. I copied the file from the DCIM folder on the phone, gaving me a 2.8MB file as my “master copy”. Checking my Dropbox “Camera Uploads” folder I found the same photo as expected had been renamed by Dropbox but unexpectedly had a different checksum and was over 1 megabyte smaller, the plot thickens!

The obvious next question was what is changing the file, so I extracted the same image file via email (sent as full resolution), iCloud and Photos on my MacBook each time it was the same size with a matching sha1 checksum. Uploading the master file to the free tier of Google Photos and then extracting it via Google Drive or the web UI did change the file but Google are upfront about that.

The Proof

I have created a github repo with all the photos I used in testing if you want to have a look at them yourself it’s here: https://github.com/TimJDFletcher/IMG_7082

Quality Change

I had a quick go at reproducing the same change in size of the image using GIMP and changing the JPEG compression level. I found that at 85% the file size was very close to the file size the both Google Photos and Dropbox produced. This is pretty crude test and is not to say this is the only compression that Google and Dropbox do.

Lessons Learned

The main lesson for me is that I should confirm how applications I rely on to move data work as advertised. I do understand why Dropbox re-compress photos as it gives a large saving in storage and bandwidth, I wish they were as upfront about this as Google are.

Google says they will optimize your photos, if you don’t like this then you can pay money to store the originals. Dropbox on the other hand say “Don’t worry about losing those once-in-a-lifetime shots, no matter what happens to your iPhone.”

Fixing the duplicates

Fixing the duplicates was fairly simple in the end, I just got a list of files uploaded from my iPhone and then deleted them from Google Drive using google-drive-ocamlfuse and a bit of shell script.

Finding my wife’s missing phone with OpenWRT

A few weeks ago my wife and I where visiting some friends in Sweden, they own a small cottage in the woodland near Stockholm. The location is lovely but slightly out of the way with a small lake about 200m from the cottage.

Swedish LakeWe went for a walk round the lake to enjoy the glorious weather and take some photos of the mirror smooth lake. About 1/2 way round the lake my wife realised that she didn’t have her phone. We did the obvious things of retracing our steps looking at the floor and trying to work out which of the little paths we took round the lake. We found a number of things
including large numbers of frogs and my sunglasses which I didn’t even realise I had lost, but no phone.

Normally the next step would be to get the phone to make some noise, either by ringing it or using the anti-theft product that was installed on it. Both of these ideas had problems, the phone was on silent meaning ringing the phone was a waste of time. The problem with the anti-theft product was that it needed a data connection and because the phone was
connected to a foreign network it was in roaming mode and the data connection was disabled to avoid massive bills. So what next?

The phone has WIFI and will automatically join an AP it knows about, but which SSIDs could I be certain it would join and how would I get an AP close enough to the phone in the middle of the Swedish woodland?

I always carry a little bag of tricks and I thought there might be something in there that could to help. In the bag I had a mobile phone charging battery, a TP-Link WR703 with OpenWRT installed and I had an ssh client (Prompt from Panic Inc) on my iDevices.

So I can build a mobile AP (WR703 + USB Battery) and because it’s running OpenWRT I can customize and monitor the AP. Next I needed some configuration for the AP particularly an SSID that the missing phone is guaranteed to join. The obvious option is the SSID from home. So I logged into the AP at home via OpenVPN copied /etc/config/wireless to the WR703 I had running. After I reboot I had an AP that appear to be the same as the one at home and my iDevices joined straight away, step one completed.

We loaded my rucksack with all of the kit and set off for another walk round the lake. I had logged into AP from my iPhone and was running logread -f to monitor the logs on the AP. We repeated the route again, debating about which path we took and poking about in bushes looking for a phone. We where getting towards the point where my wife realised she has lost her phone and starting to give up hope. Then another device joined the access point, authenticated and got an IP from the DHCP server it was an Android device. This was a big step forward, we knew that there was an Android phone within about 100m of us, that had our WIFI password set on it.

Because of the way that radio works with a bit of walking about with my iPhone and my bag hung on a tree we could map the edge of the wifi from the AP. Once we had a clear idea where we should be looking we had another good hard look at the 100m or so of path that my mapping suggested the phone could be in. No luck and it was starting to get dark
and the phone’s battery would probably not last the night, time for some more thinking. The real thing that would help would getting an internet connection on the phone, so back to the cottage for another think.

One of our friends in Sweden had a Windows phone that includes mobile AP, which unlike the iPhone you could set the SSID and password on. After typing in the long random WPA password I set on the home AP, I set off again with a mobile AP in my bag but this time with internet access.

Moonlight

After a little detour to take photos of the moonlight on the lake, I returned to the tree where the missing phone got WIFI marked with a Swedish napkin. I triggered off another round of alerts to the phone over the internet but no luck, no noises from the undergrowth and it was finally getting dark even in Sweden and it was time to head back to the cottage.

Now was the time for any last good ideas, after a bit of thought I had an idea! Wifi works in a circle and while I could be pretty sure that the phone wasn’t 100m into the lake I should really check if there was another path further into the woodland. I went a bit further along the path and there was a fork leading back into the woodland that looked promising. I set off down the path and after a minute I could hear something ahead and sure enough there was the missing phone.

Fixing my iPad’s photo order with Samba

Since upgrading to iOS 5.0 the photos on my iPad and iPhone have been in the “wrong” order and after spending a few hours googling, reading this thread on the Apple forums and researching I now understand why and how to work round it.

Summary

iTunes when syncing to iOS 5.0/5.1 doesn’t use the exif date, file timestamps or filenames to determine the order of photos in an album, instead the order is based on the order of files returned by the QueryDirectory syscall. You can work round this by presorting the list of files returned by Samba using the VFS module dirsort.

Background

My photos are shared from a HP Microserver running Ubuntu, normally I access and edit my photos under Linux via iSCSI but I have an automatically updating LVM snapshot of my master photo archive exported to Windows by Samba for iTunes to sync my iDevices.

The exact details here don’t really matter, the keys points are that I have my photos stored on Linux and exported via Samba to iTunes running on Windows 7.

The problem

In iOS 4.x synced photos in the photos app where displayed in exif date order, in iOS 5.0/5.1 they are displayed it what seems to be a random order. This order isn’t affected by any of the time stamps on the files (ctime/mtime/atime for those who know unix), the names of the files or the exif timestamp embedded in the files.

Debugging

After trying all the tricks I could think of fiddling with timestamps and names of the files under Linux I tried copying the files on my Windows machines local NTFS drive and resyncing the iPad and it worked!

So the problem wasn’t in the files or the file names it had to be somewhere between Windows and Samba, next I fired up process monitor from sysinternals to watch exactly what iTunes was doing. After a bit tinkering with the filters on process monitor I finally found something that matched the “random” order of photos on my iPad, I had found the problem!

The cause

When iTunes builds the list of files to sync onto a iDevice it uses the order of files returned by the QueryDirectory syscall, when this syscall is used to list a directory under NTFS it returns the list of files in the directory sorted in alphabetical order, but Samba returns it unsorted in disk order. As you can see from this screenshot the list of files is unsorted, and it matches perfectly the apparently random order of photos on my iPad in the album.

The solution

Some further googling led me to discover that other people have had problems with Windows applications that assume directory listings are returned sorted, and someone has written a Samba VFS module that works round the problem. The module is called dirsort and is included in modern versions of Samba so all you need to work round the problem is add the following line to /etc/samba/smb.conf for your photo’s share definition.

vfs objects = dirsort

After making this change and restarting Samba this is output of the QueryDirectory syscall for the same directory.

This does assume that your photos are named in the order you want to show them, I fix this by using the following exiftool command.

exiftool "-FileName<CreateDate" -d "%Y%m%d_%H%M%S.%%e" DIR