PDFBox: Extract all text from a document


/ Published in: Java
Save to your folder(s)

Opens an existing PDF and extracts all the text from it


Copy this code and paste it in your HTML
  1. /**
  2.  * @param args the command line arguments
  3.  * @throws java.io.IOException
  4.  */
  5. public static void main(String[] args) throws IOException {
  6. File file = new File("");
  7.  
  8. PDDocument document = PDDocument.load(file);
  9.  
  10. PDFTextStripper pdfStripper = new PDFTextStripper();
  11.  
  12. String text = pdfStripper.getText(document);
  13. System.out.println(text);
  14.  
  15. document.close();
  16. }

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.