@slock - K-Money's Lemmy

slock@lemmy.world · 17 days ago

Most actual poisoning techniques don’t actually work that well. When I end up with a PDF, I usually strip out the existing text layer, apply a denoiser and a few other preprocessing steps to correct common errors, then a layout / reading order detector, and finally OCR the different blocs. This is against the most common poisoning techniques, and one of the most efficient, called : someone printed a document, forgot about it for 3 years, then scanned it slightly tilted (and dirty, crumpled, …), and the scanner decided to apply its crappy OCR.

Using screenshots of the PDF also avoid any kind of font face poisoning, and anti copy protection.

If you really, really need to protect your PDF, please consider accessibility first, then what would work imho is to use the scripting features of pdf to actually render your content on the fly. That would probably mess up most of the “automatic” processes.

slock@lemmy.world · 1 year ago

It is also a huge deal because since (at least in France) the government forced ISPs to log DNS queries, a lot of browsers (and latest android and iOSversion’s) have now migrated to DNS over https or TLS DNS, which means that the only clear text DNS query they can intercept is the one to fetch your secure DNS service address. Now, having a trusted CA installed in browsers means that they can also spoof the identity of this secure name service, and regain a bit of control.

They invested a lot in surveillance technology (for both good and bad reasons), and https, DNS and encrypted messaging / phone calls means this was all for nothing.

And yes, by being authorized as a trusted CA, you can effectively spoof pretty much anything by setting a proxy. Some tools even leverage this for app analysis. Look up mitmproxy for example, or squid. A lot of companies already do this to inspect inbound / outbound traffic.