I was reading a post on another board about SEO penalties from Google for duplicate content. Initially I thought this was limited to duplicate content within ones own site. However, someone suggested that it relates to duplicate content across all time and websites. Meaning, if I write a paragraph on my website Google compares it to every paragraph written on every page of every website ever created. So, they could tell if something I wrote today duplicated something written by someone back in 1987.
That seems a bit ambitious. Even if they purge websites from their system that no longer exist, that's a lot of data to process.
I know how they analyze sites is top secret, but does anyone have any insight?