Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is it 50 petabytes? If we are talking content of books in some kind of markup, assuming a heavy duty 10mb per book...It would be 1 petabyte. Reasonably, I would assume it would be a few hundred GB.

What is in the data that's making it so heavy - the original scanned images ?



I think it's the scans, yeah, so ~50mb for the scans and ~1mb for the OCR'd version.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: