Log in

View Full Version : PDF file 'fingerprints'


Diabolique
August 24th, 2006, 01:17 AM
I couldn't find anything on this using a search, and believe it may be a 'good-to-know' item.

Create a working copy of a .pdf file you have just created.
Rename it from .pdf to .doc, and open it in wordpad.
Do a find for "author".
Is that you?

Delete the name of the author and save.
Rename the file back to .pdf, and access it with acrobat reader.
Corrupted file.
If you replace the author's name, it will work again, if acrobat didn't repair the 'damage' for you.

Electronic fingerprints!

nbk2000
August 24th, 2006, 04:10 AM
If your smart, your real name never goes on your computer...EVER.

Not when installing the OS, nor installing programs, nor your ISP, nor stored in a file on your computer, etc.

Pay your bills with money orders through the mail. Order stuff online by printing out the form and mailing it in.

Much discussed here.

ShadowMyGeekSpace
August 24th, 2006, 09:11 PM
Also, using a hex editor and just replacing your name with nulls would be a much safer, efficiant way of doing this...

megalomania
August 24th, 2006, 09:26 PM
I have attached the NSA guidelines on redaction procedures for word and acrobat in another thread. The upcoming version of Office 2007 is supposed to have integrated redaction tools to strip all meta data. The newest version of acrobat 3d does not have any redaction tools at all, although the plugin Redax can add that functionality.

The current version of Redax seems rather hard to come by and I have not found a crack for 4.0 yet. Any one out there have a copy?

reamio
August 25th, 2006, 04:42 AM
Mega, I can confirm that Word 2007 has a function to remove all meta data included.

It's in the Finish.. Inspect Document command (Microsoft calls it the Document Inspector in their literature).

The Inspector finds all instances of meta data and asks you confirmation before removing it. You can then save a "cleaned" file with a different filename.

I have tried it, and later searched the file with a hex editor and was unable to find any of my meta data remaining in the document.

I have been using Office 2007 (beta) since it was released.

However, it's worthwhile to reiterate NBK's reply to my post on Windows Vista Bitlocker drive protection:

General consensus? Don't trust it.

I believe that Pb1's reply to the same post also expresses the forum's general feeling about MS security:

I wouldn't trust anything from Micro$oft as far as I could throw the CD.

Maybe it would be better to depend upon Redax or similar third-party software.

BTW - Word 2007 can create .pdf files internally (without installing Acrobat!).
However, it is not a complete replacement for Acrobat, since Excel 2007 does not include this feature.:mad:

Diabolique
August 25th, 2006, 02:21 PM
Mega: I've read those guidelines (and others) - too narrow in scope, too many formats not mentioned.

nbk: My problem was I stayed with their rules too often. A real ID, and playing by their rules, will give the sheepdogs (sheepledogs?) the feeling you are on the level, and give you access to things that would have been otherwise blocked. No longer true, they block everyone now.

Also, some of my software, in the past, required a real ID to obtain it as well as use it. One even sent a tech to install the access software to use their software on their website (no longer have that one, too expensive for the yearly license once I retired).

I have been doing some experimenting with my grand-niece, who showed me the problem a month ago (teach a 5 yo how to use a computer, and by the time they are 10 yo, they are teaching you). So far, .pdf's that are made in Acrobat from scan images do not have this, or similar, fields. It seems that the Word format is again at fault. I will be spending the rest of the afternoon testing other formats with vareous ways of converting to .pdf, and post my findings. I am also going to see if it is Acrobat or Acrobat distiller at fault here. If it's the distiller, an alternative distiller may solve the problem.

It seems that at least some of the metafields in pdf files are used to encrypt the file, even when its security is set to public mode (off). Deleting these entries means that Acrobat no longer has all the data to open the file.

Regardless of what is found, it may be worth the effort checking all pdf's for private info leakage. However, this does not mean some form of ID, even a serial number for software, isn't encrypted somewhere.

Diabolique
August 25th, 2006, 03:31 PM
I just finished some tests with interesting results.

Using Notepad, I took an ASCII text file and printed it - paper only, no pdf - I'll have to reset my primary printer to Acrobat distiller.

I renamed this text document to .doc and loaded it into Wordpad - the resulting pdf (using Acrobat distiller) had the ID fields.
Saved to the other formats (rtf, Word 6.0, unicode, etc) - they all created a pdf with ID fields. It is either Wordpad (and likely all Macroslop programs) or Acrobat distiller at fault.

I took one of the 'fingerprinted' pdfs, and using Acrobat, attached it to a clean pdf made from scanned images, and saved it. The 'fingerprints disappeared! Doing a Save As reduced the size by eliminating unused objects did not bring any 'fingerprints' back, but the new file is slightly larger than the original 'fingerprinted' file.

More work needed on this, such as what happens with html files? I felt this was too important not to post immediately.

Posting this here first, before I send these results to the security experts I know - quid pro quo can pay off.

Nihilist
August 29th, 2006, 08:53 AM
Something else you guys may want to be aware of is that word documents and PDF files have a tendency to 'pick up' stray bytes left sitting on your hard drive. It has to do with the way files are stored and over-written, and some of the optimization tricks they use to make file access faster. Though most forensics experts would probably overlook these details, a clever person could probably glean some interesting info with a little luck and determination.

Diabolique
August 29th, 2006, 04:25 PM
Nihilist: think "steganography"

Nihilist
August 29th, 2006, 06:03 PM
Nihilist: think "steganography"

It's not steganography. It happens without you knowing about it, or wanting it to happen.

Diabolique
August 30th, 2006, 02:37 AM
You missed my hint. What if that were done deliberately by you to hide information? It should be simple to write a program that would hook a small file onto the end of a pdf file in a way that would not show up in the pdf reader. Use a string of end-of-file markers to seperate the two. A reverse program could split it off later to retrieve it.

I showed some friends how to 'tag' data onto image files in a camera to smuggle humanitarian investigation information past authorities. It can no longer be used, they have learned how to detect the simpler methods.

Nihilist
August 30th, 2006, 05:33 AM
You missed my hint. What if that were done deliberately by you to hide information? It should be simple to write a program that would hook a small file onto the end of a pdf file in a way that would not show up in the pdf reader. Use a string of end-of-file markers to seperate the two. A reverse program could split it off later to retrieve it.

I showed some friends how to 'tag' data onto image files in a camera to smuggle humanitarian investigation information past authorities. It can no longer be used, they have learned how to detect the simpler methods.

I don't mean to insult you, but that is an *extremely* poor steganography technique. To begin with, before embedding any data, you should encrypt it. The purpose there is two-fold:

1. Encryption, if it's good, will increase the entropy of the data-set, making it much harder to identify heuristically.

2. If they somehow do decide that there is some kind of stego'd info in your file, they still can't access it, or even figure out for sure if they've broken your stego algorithm(because it would appear as gibberish either way).

Secondly, instead of just concatenating the data onto the end of the file, you shoudl be interspersing it throughout, make it part of the file. For instance, you could alter each pixel in an image slightly, such that it is undetectable to the human eye, but compared with the original file...you can extract a single bit of stego'd info from each pixel. Or in a text file, sentences beginning in 2 spaces are a 1 bit, sentences beginning with 1 space are a 0 bit. If you used that format for an entire ebook, you could write a small message that way. The more imaginative your method is, in general, the more secure it will be.

Diabolique
August 30th, 2006, 09:03 PM
Agreed, it is primative. Just trying to get you to think beyond the box. When you noticed that pdf files were aquiring bits, and were not affected, you should have though of how that could be made use of. That is how new ideas start.

Entropy is just the start in ciphers. There is also auto- and cross-correlation, logical distance between cipher elements, and patterns within these.

Crypto, in many parts of the world, is becoming illegal. What is needed is non-crypto crypto. Information theory gives hints of how this is done.

The bit is the smallest particle of COMPLETE data. There are particles that are incomplete, but can be used to regenerate the original bit. I used this for a totally asynchronous multiplexor - no common timing between signals.

For security, use a Hamming error correction code (7:4:1 - 7 transmitted bits; 4 data bits; one error correctable). Split the seven bits into seven files. Seperate the seven files from each other. At least six files are required to recover the original data, so if you can keep at least two out of the hands of an opponent, they cannot read the original data. If you lose a file, the others will allow you to recover the original data. I'll let you figure out the interleave patterns to keep files from being all data bits and no syndrom bits. No keys to lose or disclose, nothing to betray under interogation. Non-cryptographic crypto.