CN118781621B

CN118781621B - Invoice information extraction and invoice category judgment method based on large language model

Info

Publication number: CN118781621B
Application number: CN202410922663.2A
Authority: CN
Inventors: 沈晓怡; 丁苏成; 李梦浩; 王相红; 史苏娟
Original assignee: Zhongke Lanba Digital Technology Suzhou Co ltd
Current assignee: Zhongke Lanba Digital Technology Suzhou Co ltd
Priority date: 2024-07-10
Filing date: 2024-07-10
Publication date: 2025-02-25
Anticipated expiration: 2044-07-10
Also published as: CN118781621A

Abstract

The present application relates to the field of image recognition technology, and in particular to a method for extracting invoice information and determining invoice categories based on a large language model, the method comprising obtaining an invoice file, extracting the invoice text of the invoice file; designing a first-level prompt word, inputting the first-level prompt word and the invoice text into a preset large language model, and the large language model outputs the key information of the invoice text under the guidance of the first-level prompt word; for each commodity in the key information of the invoice text, designing a second-level prompt word, the second-level prompt word includes the commodity name, the corresponding commodity category and the invoice category, inputting the second-level prompt word into the large language model, and the large language model outputs the invoice category coding result of each commodity under the guidance of the second-level prompt word; integrating and outputting the key information of the invoice text and the invoice category coding result of each commodity. The present application can achieve a significant improvement in the efficiency and accuracy of invoice processing.

Description

Invoice information extraction and billing category judgment method based on large language model

Technical Field

The application relates to the technical field of picture recognition, in particular to an invoice information extraction and invoicing type judgment method based on a large language model.

Background

With the penetration of enterprise digital transformation, the use of electronic invoices is becoming more and more common. As an important component of modern enterprise financial management, the electronic invoice not only can improve the enterprise financial processing efficiency, but also can reduce the management and storage cost brought by paper invoices, so that the application range of the electronic invoice among enterprises is continuously expanded. With the popularization of electronic invoices, the number of invoices to be processed by enterprises is increased sharply, and higher requirements are also put on accurate extraction and classification management of invoice information.

At present, the traditional invoice information extraction and invoicing type judgment method mainly depends on manual auditing and template matching or an automatic scheme of a rule engine. In the manual auditing process, financial staff needs to check invoices one by one, manually input key information and judge the billing category according to commodity names. To increase processing efficiency, some enterprises employ automation schemes based on template matching or rule engines. The schemes automatically extract key information in the invoice through predefined templates or rules and perform preliminary classification. The automatic processing method reduces the workload of manual auditing to a certain extent and improves the processing speed.

Although the prior art improves the efficiency of invoice processing to some extent, there is still a major problem of insufficient adaptability and accuracy. When facing invoices of different formats and different languages, the traditional method is easy to generate errors, and the accuracy of information extraction and classification judgment is difficult to ensure.

Disclosure of Invention

The application provides an invoice information extraction and invoicing type judgment method based on a large language model. The application provides the following technical scheme:

in a first aspect, the present application provides a method for extracting invoice information and judging an invoicing category based on a large language model, the method comprising:

acquiring an invoice file, and extracting an invoice text of the invoice file;

Designing a first-level prompt word, inputting the first-level prompt word and an invoice text into a preset large language model, and outputting key information of the invoice text by the large language model under the guidance of the first-level prompt word;

designing a second-level prompt word for each commodity in the key information of the invoice text, wherein the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompt word is input into the large language model, and the large language model outputs an invoicing category coding result of each commodity under the guidance of the second-level prompt word;

And integrating and outputting the key information of the invoice text and the billing category coding result of each commodity.

In a specific embodiment, the obtaining the invoice file and extracting the invoice text of the invoice file include:

The obtaining the invoice file, and extracting the invoice text of the invoice file comprises:

Acquiring an invoice file, wherein the invoice file comprises four forms, namely an invoice picture, a Word file, a form file and a PDF file;

judging whether the PDF file is a scanning piece or not, detecting the text length of the PDF file, if the text length is smaller than a preset threshold value, considering the PDF file as the scanning piece, if the PDF file is the scanning piece, converting the PDF file into a picture for subsequent processing, and if the PDF file is not the scanning piece, directly extracting an invoice text from the PDF file by using a text extraction tool;

Extracting invoice text from the Word file by using a text extraction tool for the Word file;

and for the table file, carrying out semantic understanding and information extraction by using a preset large language model to finish the extraction of invoice text.

In a specific embodiment, the obtaining the invoice file, and extracting the invoice text of the invoice file further includes:

for invoice pictures and pictures converted by the scanning piece in the PDF file, detecting text areas of the pictures based on the local binary pattern and the phase consistency:

The calculation formula for each pixel LBP value is as follows:

Wherein, I (x, y) is the gray value of the pixel point (x, y), I (x _i,y_i) is the gray value of the pixel point (x _i,y_i), P is the number of surrounding pixel points, s (x) is a sign function, specifically as follows:

the definition formula of phase consistency is as follows:

Wherein A _n (x, y) is the amplitude of the nth frequency component, Is the phase, N is the total number of frequency components, and the similarity of each pixel to other pixels is calculated based on the local binary pattern and the phase consistency, as follows:

After the similarity between each pixel and other pixels is calculated, detecting a text region by using a region growing technology;

the text of all text regions is extracted and integrated into invoice text.

In a specific embodiment, the detecting text regions using region growing techniques includes:

selecting a seed pixel from the picture as a starting point, and selecting a pixel higher than a preset similarity threshold as the seed pixel;

Determining the condition of merging pixels into a region, defining a similarity threshold value, and merging the adjacent pixels into the same region only when the similarity of the adjacent pixels is higher than the threshold value;

Starting from the seed pixel, checking its neighboring pixels, checking for similarity to pixels in the current area for each neighboring pixel, adding it to the current area if the similarity of neighboring pixels is higher than a preset threshold and has not been assigned to any area, and marking it as accessed;

The process is repeated recursively or iteratively until no more eligible neighbor pixels can be found, and when all neighbor pixels have been examined, all pixels in the current region are merged into one connected region.

In a specific embodiment, the extracting text of all text regions and integrating into invoice text comprises:

Connecting pixels in the text region into character shapes, and determining boundaries of the characters according to the relative positions of the pixels and the pixel intensities;

extracting characters by detecting specific patterns or outlines of character shapes, and performing simple pattern matching or rule matching on the extracted characters;

The recognized and extracted characters are reconstructed into complete text lines or paragraphs according to the layout sequence of the characters in the picture, and the reconstructed text lines or paragraphs are combined into invoice texts of the whole picture.

In a specific implementation manner, for each commodity in the key information of the invoice text, designing a second-level prompt word, where the second-level prompt word includes a commodity name, a corresponding commodity category and an invoicing category, inputting the second-level prompt word into the large language model, and outputting an invoicing category encoding result of each commodity by the large language model under the guidance of the second-level prompt word includes:

Constructing a mapping relation correspondence table from commodity names to commodity categories, from billing categories to codes and from commodity names to codes;

designing a second-level prompt word for each commodity in the key information of the invoice text based on the mapping relation correspondence table;

inputting the second-level prompt words into a large language model, and outputting billing category coding results of each commodity by the large language model under the guidance of the second-level prompt words;

and carrying out post-processing on the billing type coding result of each commodity output by the large language.

In a specific implementation manner, the step of inputting the second-level prompt word into the large language model, and the large language model outputs the billing category encoding result of each commodity under the guidance of the second-level prompt word further comprises:

For commodities which cannot output the billing category encoding results, repeatedly designing wider third-level prompt words and inputting the wider third-level prompt words into the large language model until the large language model can output the billing category encoding results of the rest commodities under the guidance of the third-level prompt words.

In a second aspect, the present application provides a system for extracting invoice information and judging invoice types based on a large language model, which adopts the following technical scheme:

an invoice information extraction and invoicing type judgment system based on a large language model, comprising:

The text extraction module is used for acquiring an invoice file and extracting an invoice text of the invoice file;

The information extraction module is used for designing a first-level prompt word, inputting the first-level prompt word and an invoice text into a preset large language model, and outputting key information of the invoice text under the guidance of the first-level prompt word by the large language model;

The classification judging module is used for designing a second-level prompt word for each commodity in the key information of the invoice text, wherein the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompt word is input into the large language model, and the large language model outputs an invoicing category coding result of each commodity under the guidance of the second-level prompt word;

and the data output module is used for integrating and outputting the data of the key information of the invoice text and the billing category coding result of each commodity.

In a third aspect, the application provides an electronic device, which comprises a processor and a memory, wherein a program is stored in the memory, and the program is loaded and executed by the processor to realize the invoice information extraction and invoicing type judgment method based on a large language model according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having a program stored therein, which when executed by a processor is configured to implement a method for extracting invoice information and determining a class of invoices based on a large language model as described in the first aspect.

In summary, the beneficial effects of the present application at least include:

1) Through the carefully designed multi-level prompt words, the complex task can be decomposed into a plurality of subtasks, and each subtask is explicitly required through the prompt words, so that the model is guided to gradually complete the task. The first-level prompting words are used for extracting basic invoice information, the second-level prompting words are used for commodity classification judgment, and if necessary, the third-level prompting words can be used for further multi-round inquiry. The multi-level prompt words not only improve the completion degree of the task, but also enhance the controllability and the accuracy of the large model.

2) By combining the mapping relation corresponding table related to the enterprise collection field with the large language model, the accuracy of commodity classification judgment can be remarkably improved. The commodity category and the standard billing category codes are introduced into the large model through the prompt words, so that the large model can fully utilize knowledge in the fields when carrying out commodity classification judgment, and the accuracy of classification judgment is greatly improved.

3) A multi-round interrogation mechanism is designed. If the first classification judgment fails, a third-level prompt word is constructed, and the guide model selects the related class in a wider range. The multi-round inquiry mechanism can improve the success rate of classification judgment, ensure that each commodity can find out the proper billing category, and avoid omission and error classification.

And acquiring and processing ticket documents, including pictures and PDF documents, and acquiring invoice texts through preprocessing and text extraction technologies. And then, carrying out semantic understanding and information extraction on the invoice text by using the set prompt words and the large language model so as to accurately acquire key information such as buyers, sellers and commodity details. Specifically, for each commodity, the accurate billing category code is determined by combining the multi-level prompt words and the large language model. The method improves the accuracy and adaptability of information extraction and classification through high automation and semantic understanding, and effectively solves the problems of limitation and error rate of the traditional method in processing diversified invoices. And finally, outputting the structured JSON format invoice data, and directly providing for subsequent system processing and management, thereby realizing remarkable improvement of invoice processing efficiency and precision.

The foregoing description is only an overview of the present application, and is intended to provide a better understanding of the present application, as it is embodied in the following description, with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

Fig. 1 is a flow chart of an invoice information extraction and invoicing type judgment method based on a large language model in an embodiment of the application.

Fig. 2 is a flow chart of extracting invoice pictures and invoice text of pictures converted by a scanning component in a PDF file according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of step S103 in the embodiment of the application.

FIG. 4 is a flowchart of a method for extracting invoice information and determining the type of invoices based on a large language model according to an embodiment of the present application.

FIG. 5 is a block diagram of a system for invoice information extraction and billing category judgment based on a large language model in an embodiment of the application.

FIG. 6 is a block diagram of an electronic device for invoice information extraction and invoice category determination based on a large language model in an embodiment of the application.

Detailed Description

The following describes in further detail the embodiments of the present application with reference to the drawings and examples. The following examples are illustrative of the application and are not intended to limit the scope of the application.

Optionally, the method for extracting invoice information and judging the invoice category based on the large language model provided by each embodiment is used for illustration in electronic equipment, wherein the electronic equipment is a terminal or a server, the terminal can be a mobile phone, a computer, a tablet personal computer and the like, and the embodiment does not limit the type of the electronic equipment.

Referring to fig. 1, a flow chart of a method for extracting invoice information and determining an invoice category based on a large language model according to an embodiment of the present application is shown, and the method at least includes the following steps:

And step S101, acquiring an invoice file, and extracting an invoice text of the invoice file.

Step S102, designing a first-level prompt word, inputting the first-level prompt word and the invoice text into a preset large language model, and outputting key information of the invoice text by the large language model under the guidance of the first-level prompt word.

Step S103, designing a second-level prompt word for each commodity in the key information of the invoice text, wherein the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, inputting the second-level prompt word into a large language model, and outputting an invoicing category coding result of each commodity under the guidance of the second-level prompt word by the large language model.

And step S104, integrating and outputting the key information of the invoice text and the billing category coding result of each commodity.

In step S101, an invoice file is first obtained, where the invoice file generally includes four forms, an invoice picture, a Word file, a table file and a PDF file, and for the PDF file, whether the PDF file is a scanned item is first determined, and if the text length is smaller than a preset threshold, the PDF file is considered to be the scanned item, and the preset threshold is generally 50 characters. If the PDF file is a scanning piece, converting the PDF file into a picture for subsequent processing, and if the PDF file is not the scanning piece, directly extracting invoice text from the PDF file by using a text extraction tool. For Word files, the text extraction tool is also directly used to extract invoice text from the Word files.

Alternatively, the present application may use existing text extraction tools, such as PyMuPDF library, etc., and may use other existing techniques to extract the text of the PDF file or Word file, and the present application is not limited to the technology of extracting the text of the PDF file.

For the form file, since the form file may include a large amount of commodity information and the generation logic of each form is different, a preset large language model is used to perform semantic understanding and information extraction through the large language model, so that the extraction of invoice texts of the batch commodity information is completed.

Alternatively, the large language model of the present application is an existing large language model, such as GTP-4, and the like, and may be other existing large language models, and the present application is not limited to the specific type of large language model.

Referring to fig. 2, a flow chart of identifying invoice pictures and invoice texts of pictures converted by scanning members in PDF files in an embodiment of the present application is shown, where the method at least includes the following steps:

S1011, preprocessing the picture.

Specifically, the noise in the picture is reduced by first using non-local mean denoising, the sharpness of the text is preserved, and then the contrast and brightness of the picture are enhanced using the following formula:

Wherein I' (x, y) is a pixel value of the picture subjected to contrast and brightness enhancement processing, I (x, y) is a pixel value of the original picture, I _min and I _max are minimum and maximum pixel values of the picture respectively, and α and β are adjustment parameters for adjusting contrast and brightness. In the above formula, by calculation Normalizing the original pixel values to a range of 0-1 enables the effects of contrast and brightness enhancement to be applied more accurately on the picture. And carrying out exponential transformation on the normalized pixel value to enhance or weaken the contrast of the pixel value. Such non-linear transformations may make dark and bright details in the image more prominent. And finally, performing edge detection and enhancement by using Gaussian filtering and a Sobel operator to assist in extracting the outline and edge characteristics of the subsequent text.

S1012, detecting text areas of the preprocessed pictures based on the local binary pattern and the phase consistency.

In particular, the local binary pattern is a method for describing local texture features of an image. For each pixel, a binary number is generated by comparing the gray value of the pixel with the gray values of surrounding pixels, so as to obtain an LBP value, and the calculation formula of the LBP value of each pixel is as follows:

Phase consistency is a method of detecting an image structure based on phase information. It detects edges and texture features in the image by computing the phase consistency of the local frequency components. The definition formula of phase consistency is as follows:

Wherein A _n (x, y) is the amplitude of the nth frequency component, Is the phase and N is the total number of frequency components. The similarity of each pixel to other pixels is then calculated based on the local binary pattern and phase consistency as follows:

In an implementation, after the similarity between each pixel and other pixels is calculated, a region growing technique is used to detect a text region, specifically, a seed pixel is selected from a picture as a starting point. Typically, pixels with high similarity or other predefined properties are selected as seed pixels. In the present application, pixels above a preset similarity threshold are selected as seed pixels. Conditions for merging pixels into a region are determined. These conditions are typically based on similarity or other characteristics between pixels. In text detection, a similarity threshold may be defined, which is incorporated into the same region only if the similarity of neighboring pixels is above this threshold. Starting from the seed pixel, its neighboring pixels are examined. For each neighboring pixel, its similarity to the pixels in the current region is checked. If the similarity of adjacent pixels is above a preset threshold and has not been assigned to any region, it is added to the current region and marked as accessed. This process is repeated recursively or iteratively until no more eligible neighbor pixels can be found. When all the adjacent pixels are inspected, all the pixels in the current region are merged into one connected region. For the detected text region, some post-processing steps are required, such as filling holes, removing small regions or smoothing boundaries, to improve the accuracy and continuity of the text region.

S1013, extracting texts of all text areas and integrating the texts into invoice texts.

Specifically, for each text region, a method based on pixel intensity and shape characteristics may be employed to extract possible characters, either through connected region analysis or pixel-based operations. The specific method comprises the steps of connecting pixels in a text area into a character shape, and determining the boundary of the character according to the relative position of the pixels and the pixel intensity. The character is extracted by detecting a specific pattern or outline of the character shape. And performing simple pattern matching or rule matching on the extracted characters. For example, some basic rules may be defined to identify numbers, letters, or specific symbols. These rules may determine the identity of the character based on the shape, size, and pixel distribution of the character. The recognized and extracted characters are reconstructed into complete text lines or paragraphs in the order of their layout in the picture. This may be done by concatenating the characters and ordering them according to their position in the picture. Finally, the reconstructed text lines or paragraphs are assembled into invoice text for the entire picture. The invoice text may be organized in the format of text lines or paragraphs and output in the form of text strings.

In addition, preferably, the invoice picture and the invoice text of the picture converted by the scanning piece in the PDF file can be extracted by combining a non-local mean denoising algorithm and a convolutional neural network, wherein the method comprises the steps of firstly processing the picture by using the non-local mean (NLM) denoising algorithm. The NLM algorithm reduces the noise level of a picture by comparing the similarity of each pixel in the picture to surrounding pixels. This is particularly important when detecting and identifying text in a picture, as text may be disturbed by a surrounding complex background or noise. Then, on the picture subjected to NLM denoising, an existing convolutional neural network, such as YOLOv, SSD and the like, is used for text region detection, and the model can accurately locate the text region in the image and generate a bounding box to identify the position and the size of the text. After the existing convolutional neural network is used for generating the boundary frame, the intersection ratio IoU of the boundary frame and the real text boundary frame is calculated, if the IoU value is higher, the model detection accuracy is high, if the IoU value is lower, the model parameters are required to be further optimized, the IoU value is judged to be high or low by comparison with the threshold value, and the IoU value is the ratio of the area of the overlapped part of the two boundary frames to the combined area. Finally, a text region is detected, the text content is identified by using the existing model based on the convolutional neural network and the cyclic neural network structure, and the CNN-RNN structure can process various complex image scenes and different font styles. The visual features extracted by the CNN can adapt to text areas under different illumination, angles and background conditions, and the RNN can learn and infer grammar and semantic information of the text.

In the implementation, after the invoice picture and the invoice text of the picture converted by the scanning piece in the PDF file are extracted by the two methods, the two extracted invoice texts are processed in a union mode to be used as a final invoice text. Some text regions may be missed or otherwise not recognized in full by one method, while another method may be able to recognize such missed text content. Through the union processing, all identifiable text information can be ensured to be captured and integrated as much as possible, and the integrity and coverage rate of text extraction are improved. For tasks involving large amounts of text data, such as invoice text recognition, robustness and reliability of the system is critical. By integrating text extraction results of multiple methods, a more robust and reliable text recognition system can be constructed, reducing system failure or incomplete problems caused by a single method deficiency.

Alternatively, other existing technologies may be used to directly perform text recognition on the invoice picture and the picture converted by the scanning piece in the PDF file, for example, OCR text recognition technology, etc., and the present application is not limited to the technology of extracting the picture text.

In another possible embodiment, after extracting the invoice text, the integrity of the invoice text is verified to determine whether the invoice text completely contains the necessary information, and specifically, the integrity score of the text is calculated using the following formula:

Where N is the total number of necessary fields, f _i denotes the i-th necessary field, presence (f _i) denotes the number of occurrences of field f _i in text, maxPresence (f _i) denotes the maximum number of occurrences of field f _i in text, and w (f _i) is the weight of field f _i. Structured text such as invoices typically have fixed fields and formatting requirements. The design formula can judge the integrity of the text according to the requirements, so that all necessary fields are correctly identified and contained. The above formula combines the presence of fields, importance weights and the proportion of occurrences. This comprehensive consideration enables a more comprehensive assessment of the integrity of the text rather than simply determining whether certain fields are present.

In implementation, after calculating the integrity score of the text, comparing the integrity score with a preset threshold, if the integrity score is greater than or equal to the preset threshold, judging that the extracted invoice text is complete, and if the integrity score is less than the preset threshold, extracting the invoice text again until the integrity score is greater than or equal to the preset threshold.

For example, assuming an invoice text, the integrity of which needs to be assessed, the necessary fields include invoice number, date and total amount. The invoice text contains the following information:

the invoice number is INV-2024-001;

2024-07-15;

The total amount is $500.00;

assuming that the importance weight of each field is 1, the preset threshold is also 1, using the improved integrity assessment formula mentioned earlier:

And finally, calculating the integrity score equal to a preset threshold value, wherein the integrity score indicates that the invoice text is complete.

In step S102, first-level hint words are first designed, where the first-level hint words are used to specify fields extracted from invoice text by a large language model, such as buyer information (buyer name, address, contact), seller information (seller name, address, contact), commodity name (commodity name, quantity, unit price, total price), and the like. And then inputting the first-level prompt words and the invoice text in the JSON format into a preset large language model, carrying out semantic understanding and information extraction by the large language model under the guidance of the first-level prompt words, and outputting key information of the invoice text in the JSON format.

In implementation, after outputting key information of an invoice text in a JSON format by the large language model, post-processing is carried out on JSON data output by the large language model, including garbage cleaning, abnormal condition processing and the like, so that the data type and format of each field are ensured to meet expected requirements. By parsing the JSON string, it is converted into structured data objects that the program can understand and process. For example, JSON parsers or libraries in programming languages are used to convert text in JSON format into corresponding data structures, such as objects, arrays, or hash tables.

In step S103, referring to fig. 3, a flowchart of step S103 in an embodiment of the present application is shown, and the method at least includes the following steps:

S1031, constructing a mapping relation correspondence table from commodity names to commodity categories, from billing categories to codes and from commodity names to codes.

For example, the commodity category corresponding to the commodity name citrus is fruit, the corresponding code is 001, and the billing category corresponding to the code 001 is orange.

S1032, designing second-level prompt words for each commodity in the key information of the invoice text based on the mapping relation corresponding table.

Specifically, the second-level prompt word includes a commodity name, a corresponding commodity category and an invoicing category. For example, if the product name is orange, the corresponding second-level cue words are the product name orange, the product category fruit and the billing category orange.

S1033, inputting the second-level prompt words into a large language model, and outputting the billing category coding result of each commodity by the large language model under the guidance of the second-level prompt words.

Specifically, the second-level prompt word in the JSON format is input into a large language model, and the large language model outputs the billing category coding result of each commodity in the JSON format under the guidance of the second-level prompt word. For example, if the input commodity is citrus, the corresponding model output results are citrus and 001.

S1034, repeatedly designing wider third-level prompt words for commodities which cannot output the billing category coding results of the large language model, and inputting the wider third-level prompt words into the large language model until the large language model can output the billing category coding results of the rest commodities under the guidance of the third-level prompt words.

In particular, if the billing category of certain merchandise does not find the corresponding exact category or code in the standard code, it is necessary to enter the phase of multiple rounds of interrogation. The purpose of this stage is to guide the model to find the relevant or broader categories from the standard code by designing third level hints to determine the billing category and code for the good as accurately as possible. First, a new third-level prompt word is designed for commodities of which the billing category or code cannot be determined, and a model can be guided to search for the related category or code in a wider range. These hints may be required to more broadly describe merchandise characteristics or classification requirements. And then inputting the designed third-level prompt word into a large language model, and recalling the model for classification judgment. This step is similar to the previous invocation of the large language model, but the goal is to find a more appropriate billing category or code. . If there are still undetermined circumstances, multiple iterations of invoking the large model may be required until the billing category and code for all the goods can be determined.

For example, if the large model cannot output the billing type coding result of the commodity under the premise that the second-level prompting words are commodity names of oranges, commodity type fruits and billing type oranges, then the new third-level prompting words are redesigned to be commodity names of oranges, commodity type agricultural products and billing type fruits.

S1035, post-processing the billing category coding result of each commodity output in the large language.

Specifically, validity verification is performed on JSON strings output by the model. This includes checking if the syntax structure of the JSON is correct, ensuring that all keys and values meet the format requirements of the JSON. And data cleaning is performed, possibly removing unnecessary spaces, special characters, or other unnecessary content. This ensures that the output JSON string is clean, containing only the necessary data. In addition, any possible anomalies are handled, such as processing missing data fields, undefined classes or codes, etc. These situations may require appropriate handling in accordance with business logic, such as filling in defaults, providing warnings or error messages, etc. Finally, the JSON character string is converted into a structured data object which can be understood and processed by a program by analyzing the JSON character string. For example, JSON parsers or libraries in programming languages are used to convert text in JSON format into corresponding data structures, such as objects, arrays, or hash tables.

In practice, special goods are sometimes encountered whose name does not find an exact match in the standard billing category. To solve this problem, the present application devised a multi-round interrogation mechanism. If the first classification judgment fails, a third-level prompt word is constructed, and the guide model selects the related class in a wider range. The multi-round inquiry mechanism can effectively improve the success rate of classification judgment, ensures that each commodity can find out a proper billing category to a certain extent, and avoids omission and wrong classification.

In step S104, the key information of the invoice text and the billing category encoding result of each commodity are integrated into structured invoice data, and the structured invoice data is output in JSON format for use by the subsequent invoice processing system. JSON is a lightweight data exchange format suitable for cross-platform and cross-language data exchange, and its output enables the entire invoice processing system to provide the identified, categorized and output information to other business systems or users in an easily managed and utilized format.

In summary, referring to fig. 4, an overall flowchart of a method for extracting invoice information and determining an invoice category based on a large language model according to an embodiment of the present application is provided, first, a ticket document including a picture and a PDF document is acquired and processed, and an invoice text is acquired through preprocessing and text extraction techniques. And then, carrying out semantic understanding and information extraction on the invoice text by using the set prompt words and the large language model so as to accurately acquire key information such as buyers, sellers and commodity details. Specifically, for each commodity, the accurate billing category code is determined by combining the multi-level prompt words and the large language model. The method improves the accuracy and adaptability of information extraction and classification through high automation and semantic understanding, and effectively solves the problems of limitation and error rate of the traditional method in processing diversified invoices. And finally, outputting the structured JSON format invoice data, and directly providing for subsequent system processing and management, thereby realizing remarkable improvement of invoice processing efficiency and precision.

FIG. 5 is a block diagram of a system for extracting invoice information and determining the type of invoices based on a large language model according to an embodiment of the present application. The device at least comprises the following modules:

and the text extraction module is used for acquiring the invoice file and extracting the invoice text of the invoice file.

The information extraction module is used for designing first-level prompt words, inputting the first-level prompt words and invoice texts into a preset large language model, and outputting key information of the invoice texts under the guidance of the first-level prompt words.

The classification judging module is used for designing a second-level prompt word for each commodity in the key information of the invoice text, wherein the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompt word is input into the large language model, and the large language model outputs the invoicing category coding result of each commodity under the guidance of the second-level prompt word.

For relevant details reference is made to the method embodiments described above.

Fig. 6 is a block diagram of an electronic device provided in one embodiment of the application. The device comprises at least a processor 401 and a memory 402.

Processor 401 may include one or more processing cores such as a 4-core processor, an 8-core processor, etc. The processor 401 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 401 may also include a main processor, which is a processor for processing data in a wake-up state, also called a CPU (Central Processing Unit ), and a coprocessor, which is a low-power processor for processing data in a standby state. In some embodiments, the processor 401 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 401 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the large language model based invoice information extraction and invoicing category determination method provided by the method embodiments of the present application.

In some embodiments, the electronic device may also optionally include a peripheral interface and at least one peripheral. The processor 401, memory 402, and peripheral interfaces may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface via buses, signal lines or circuit boards. Illustratively, the peripheral devices include, but are not limited to, radio frequency circuitry, touch display screens, audio circuitry, and power supplies, among others.

Of course, the electronic device may also include fewer or more components, as the present embodiment is not limited in this regard.

Optionally, the application further provides a computer readable storage medium, wherein a program is stored in the computer readable storage medium, and the program is loaded and executed by a processor to realize the invoice information extraction and billing category judgment method based on the large language model in the method embodiment.

Optionally, the application further provides a computer product, which comprises a computer readable storage medium, wherein a program is stored in the computer readable storage medium, and the program is loaded and executed by a processor to realize the invoice information extraction and billing category judgment method based on the large language model in the method embodiment.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The invoice information extraction and billing category judgment method based on the large language model is characterized by comprising the following steps of:

acquiring an invoice file, and extracting an invoice text of the invoice file;

For each commodity in the key information of the invoice text, designing a second-level prompt word, wherein the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompt word is input into the large language model, the large language model outputs an invoicing category coding result of each commodity under the guidance of the second-level prompt word, and for each commodity in the key information of the invoice text, designing a second-level prompt word, the second-level prompt word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompt word is input into the large language model, and the large language model outputs the invoicing category coding result of each commodity under the guidance of the second-level prompt word and comprises:

The method comprises the steps of establishing a mapping relation corresponding table of commodity names to commodity categories, billing categories to codes and commodity names to codes, designing a second-level prompt word for each commodity in key information of invoice texts based on the mapping relation corresponding table, inputting the second-level prompt word into a large language model, outputting billing category coding results of each commodity under the guidance of the second-level prompt word by the large language model, repeatedly designing a wider third-level prompt word for the commodity which cannot output billing category coding results by the large language model, inputting the wider third-level prompt word into the large language model until the large language model can output billing category coding results of the rest commodity under the guidance of the third-level prompt word, and carrying out post-processing on the billing category coding results of each commodity outputted by the large language;

2. The method for extracting invoice information and determining the type of invoices based on a large language model according to claim 1, wherein the steps of obtaining an invoice file and extracting the invoice text of the invoice file include:

3. The method for extracting invoice information and determining the type of invoices based on a large language model according to claim 2, wherein the steps of obtaining an invoice file, extracting the invoice text of the invoice file further comprise:

The calculation formula for each pixel LBP value is as follows:

the definition formula of phase consistency is as follows:

the text of all text regions is extracted and integrated into invoice text.

4. The large language model based invoice information extraction and invoice category judgment method as claimed in claim 3, wherein the text region detection using region growing technique comprises:

5. The method for extracting invoice information and determining the type of invoices based on a large language model as claimed in claim 3, wherein the steps of extracting the text of all text areas and integrating the text into invoice text include:

6. An invoice information extraction and billing category judgment system based on a large language model is characterized by comprising:

The system comprises a classification judging module, a large language model and a second-level prompting word, wherein the classification judging module is used for designing a second-level prompting word for each commodity in key information of an invoice text, the second-level prompting word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompting word is input into the large language model, the large language model outputs an invoicing category coding result of each commodity under the guidance of the second-level prompting word, the second-level prompting word is designed for each commodity in the key information of the invoice text, the second-level prompting word comprises a commodity name, a corresponding commodity category and an invoicing category, the second-level prompting word is input into the large language model, and the large language model outputs the invoicing category coding result of each commodity under the guidance of the second-level prompting word and comprises:

7. An electronic device, comprising a processor and a memory, wherein the memory stores a program that is loaded and executed by the processor to implement a large language model-based invoice information extraction and invoicing category determination method as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium in which a program is stored, which when executed by a processor is configured to implement a large language model-based invoice information extraction and invoicing category judgment method as claimed in any one of claims 1 to 5.