/** * 提取部分页面文本 * @param file pdf文档路径 * @param startPage 开始页数 * @param endPage 结束页数 */ public static string ExtractTXT(String file, int startPage, int endPage) { String content = string.Empty; try { PDDocument document = PDDocument.load(file); //获取一个PDFTextStripper文本剥离对象 PDFTextStripper stripper = new PDFTextStripper(); // 设置按顺序输出 stripper.setSortByPosition(true); // 设置起始页 stripper.setStartPage(startPage); // 设置结束页 stripper.setEndPage(endPage); //获取文本内容 content = stripper.getText(document); document.close(); } catch (java.io.FileNotFoundException ex) { } catch (java.io.IOException ex) { } return(content); }
public static string Pdf2txt(string pdfName) { PDDocument doc = PDDocument.load(pdfName); PDFTextStripper pdfStripper = new PDFTextStripper(); // 设置按顺序输出 pdfStripper.setSortByPosition(true); return(pdfStripper.getText(doc)); }
protected void Button1_Click(object sender, EventArgs e) { String savePath = @"C:\upload"; if (FileUpload1.HasFile) { String fileName = FileUpload1.FileName; savePath += fileName; FileUpload1.SaveAs(savePath); } java.io.File mfile = new File(savePath); PDDocument doc = PDDocument.load(mfile, MemoryUsageSetting.setupMainMemoryOnly()); PDFTextStripper Stripper = new PDFTextStripper(); Stripper.setSortByPosition(true); TextBox1.Text = (Stripper.getText(doc)); }