Python library for extraction of text from almost any file format, including sound (!).

Textract is a Python library for pulling raw machine-readable text from pretty much anything: from Excel files, pdfs, images and, yes, even sound.

