pip install PyPDF2
import PyPDF2
بعد ذاللك نفتح ملف PDF في شكل ثنائي
myfile=open ('filename.pdf',mode='rb')
إنشاء كائنات لـ PDF
pdf_reader= PyPDF2.PdfFileReader(myfile)
لمعرفة عدد الصفحات في ملف PDF الحالي
pdf_reader.numPages
page_one=pdf_reader.getPage(0)
print(page_one.extractText())
# create page object and extract text
pageObj = pdf_reader.getPage(0)
page1 = pageObj.extractText()
page1
# strip away page header
page1 = page1[25:]
# insert commas to separate variables and then remove excess strings
page1 = page1.replace('\n \n',', ').replace('\n','')
myfile.close()
pdf_writer=PyPDF2.PdfFileWriter()
pdf_writer.addPage(page_one)
pdf_output=open('New updated file.pdf',mode='wb')
pdf_writer.write(pdf_output)
page=pdf_writer.getPage(0).extractText()
print(pdf_reader.isEncrypted)
pip install tabula-py
import tabula
df = tabula.io.read_pdf(url, pages='all')
then you will get many tables, you can call it by using index, it's like printing element from list, Example:
more info here - https://pypi.org/project/tabula-py/
# ex
df[0]
مكتبة
Camelot
!pip install "camelot-py[cv]"
!apt install python3-tk ghostscript
df_table = camelot.read_pdf('file.pdf', pages='1,2,4-5')
#To display the ith table as Pandas Data frame
tables[i].df
https://camelot-py.readthedocs.io/en/master/user/install-deps.html
العودة إلي لغة البرمجة البايثون Python