Chat GPT — Bill amount extraction using chat GPT web API and python
Problem Statement
In many organizations, they have a portal where we can reimburse the amount we spend on business travel, stay, etc., In most organizations, there will be a person who approves the amount by manually going through the bill and the submitter also have to type the amount by uploading each bill. This might go wrong many times and it will be hectic if there are bills for a month. we are trying to solve this problem using chat GPT web API and python
Generating Open API keys
Create an account in open API by navigating to this link. Once you created the account navigate to Account -> View API keys, and click on Create a new secret. Give a secret name and click on Create secret key. Now the secret key will be created. you can also navigate to API keys through this link.
Python Implementation
Prerequisites
- The OpenAI library. You can install it using pip:
pip install openai
- To read pdf and extract text. You have to install Fitz :
pip install fitz
- To read images and extract text. you have to install py-tesseract :
pip install pytesseract
- Additionally, to read the images we have to additionally install Tesseract-ocr from this link.
Setting up the code
First, let’s import the necessary libraries and set up the API key.
import fitz, openai, datetime, pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract'
from PIL import Image
Next step, extract the text from the pdf/image so that we can prompt to chat GPT API and get the result.
# Open the PDF file
pdf_file = 'C:\\Dinakaran\\Test\\Test.pdf'
pdf_doc = fitz.open(pdf_file)
# Extract text from each page
pdf_text = ''
for page in pdf_doc:
pdf_text += page.get_text()
# Close the PDF file
pdf_doc.close()
For images use this to extract text from images
# Open the image
image = Image.open("C:\\Dinakaran\\Test\\Test.jpg")
# Convert the image to grayscale
image = image.convert("L")
# Perform OCR using Tesseract
image_text = pytesseract.image_to_string(image)
Now we can start requesting open API with the prompt to get the completion
prompt = f'''Assuming yourself as a financial evaluator and extract the total bill amount from the bill {image_text}. Also tell the accuracy of the result retrieved with tax details'''
print("Contacting OpenAI")
completions = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=2048,
n=1,
stop=None,
temperature=0.3,
)
message = completions.choices[0].text
print(message)
prompt
: The prompt or context for the conversation. This can be a single line of text or a multi-line prompt separated by newlines.model_engine
: The name of the model you want to use. We will be using thetext-davinci-003
model for this tutorial. To learn more about the model go to the link here.max_tokens
: The maximum number of tokens (words and punctuation) to generate in the response. The minimum is 1 and the maximum is 2048.temperature
: Accepted value between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.- n: How many completions should be used to generate for each prompt.
stop:
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Response
Conclusion
There are many tools in the market that can do the same. But one thing that stands out about chat GPT from my perspective it can be trained to do 100 different things that 100 tools can do it. so we can replace 1 tool with 100 tools licensing that will help us save money.