Why is it 50 petabytes? If we are talking content of books in some kind of markup, assuming a heavy duty 10mb per book...It would be 1 petabyte. Reasonably, I would assume it would be a few hundred GB.
What is in the data that's making it so heavy - the original scanned images ?
What is in the data that's making it so heavy - the original scanned images ?