CN114816435A

CN114816435A - A software development method based on reverse technology

Info

Publication number: CN114816435A
Application number: CN202210253180.9A
Authority: CN
Inventors: 平洋; 陈�光; 张伟华; 梁东晨; 白小燕; 钟远; 杨华
Original assignee: Research Institute of War of PLA Academy of Military Science
Current assignee: Research Institute of War of PLA Academy of Military Science
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-07-29

Abstract

本发明提供了一种基于逆向技术的软件开发方法，包括，S1.对目标软件进行反编译获得源码；S2.对目标软件进行功能分析，确定源码中各代码块与功能的对应关系；S3.对目标软件进行解构分析以获得目标软件包括系统部署环境、体系架构、调度模式、接口规范、通信协议、引用方式、数据存储方式在内的软件信息；S4.基于所述的软件信息重构系统，并将与功能具有对应关系的代码块填充入重构系统中。本方案根据软件实际功能有针对地对软件部分进行逆向，剩余部分可按需扩展，能够减轻逆向开发工作量，同时能够最大程度实现软件逆向的可操作性。The present invention provides a software development method based on reverse technology, comprising: S1. Decompiling target software to obtain source code; S2. Performing function analysis on the target software to determine the corresponding relationship between each code block and function in the source code; S3. Deconstructing and analyzing the target software to obtain software information of the target software, including system deployment environment, architecture, scheduling mode, interface specification, communication protocol, reference method, and data storage method; S4. Reconstructing the system based on the described software information , and populate the refactoring system with code blocks that correspond to functions. This solution reverses the software part in a targeted manner according to the actual function of the software, and the remaining part can be expanded as needed, which can reduce the workload of reverse development, and at the same time can maximize the operability of software reverse engineering.

Description

A software development method based on reverse technology

技术领域technical field

本发明属于软件开发技术领域，尤其是涉及一种基于逆向技术的软件开发方法。The invention belongs to the technical field of software development, in particular to a software development method based on reverse technology.

背景技术Background technique

逆向工程又称软件反向工程，是指从可运行的程序系统出发, 运用反汇编、系统分析、程序理解等多种计算机技术,对软件的结构、流程、算法、代码等进行逆向拆解和分析,推导出软件产品的源代码、设计原理、结构、算法、处理过程、运行方法及相关文档等。可以简单理解为通过识别并分析计算机软件的源代码来构造出一个新的系统轮廓。它对计算机软件的原始系统进行基础分析,继而识别系统软件的构成部分,通过将软件各组成部分的关系明确化而构造全新的、高级的软件系统。通常,人们把对软件进行反向分析的整个过程统称为软件逆向工程,把在这个过程中所采用的技术都统称为软件逆向工程技术。Reverse engineering, also known as software reverse engineering, refers to starting from a runnable program system and using various computer technologies such as disassembly, system analysis, and program understanding to reverse disassemble and disassemble the structure, process, algorithm, code, etc. of the software. Analyze and deduce the source code, design principle, structure, algorithm, processing process, operation method and related documents of the software product. It can be simply understood as constructing a new system outline by identifying and analyzing the source code of computer software. It conducts basic analysis on the original system of computer software, then identifies the components of the system software, and constructs a new and advanced software system by clarifying the relationship of each component of the software. Usually, the whole process of reverse analysis of software is collectively called software reverse engineering, and the techniques used in this process are collectively called software reverse engineering technology.

通过软件逆向技术可以探究目前软件防护的漏洞，有针对的对现有软件成果的保护进行审查，或者实现在生产环境无源码的情况下进行对软件的调试以及功能修复等，也可以对软件进行功能扩展，基于已有软件开发新软件。现有的软件逆向都是对已有软件进行全部逆向，但是全部细节逆向是一个巨大耗时的工程，效率很低，而且还存在可操作性不强的问题。Through software reverse technology, we can explore the loopholes of current software protection, review the protection of existing software achievements in a targeted manner, or realize software debugging and function repair in the case of no source code in the production environment. Function expansion, develop new software based on existing software. The existing software reverse engineering is to reverse all the existing software, but the reverse engineering of all the details is a huge time-consuming project with low efficiency and low operability.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对上述问题，提供一种基于逆向技术的软件开发方法。The purpose of the present invention is to provide a software development method based on reverse technology in view of the above problems.

为达到上述目的，本发明采用了下列技术方案：To achieve the above object, the present invention has adopted the following technical solutions:

一种基于逆向技术的软件开发方法，包括以下步骤：A software development method based on reverse technology, comprising the following steps:

S1.对目标软件进行反编译获得源码；S1. Decompile the target software to obtain the source code;

S2.对目标软件进行功能分析，确定源码中各代码块与功能的对应关系；S2. Perform functional analysis on the target software, and determine the corresponding relationship between each code block and function in the source code;

S3.对目标软件进行解构分析以获得目标软件包括系统部署环境、体系架构、调度模式、接口规范、通信协议、引用方式、数据存储方式在内的软件信息；S3. Deconstructing and analyzing the target software to obtain software information of the target software including system deployment environment, architecture, scheduling mode, interface specification, communication protocol, reference method, and data storage method;

S4.基于所述的软件信息重构系统，并将与功能具有对应关系的代码块填充入重构系统中。S4. Reconstructing the system based on the software information, and filling the code blocks with corresponding functions into the refactoring system.

在上述的基于逆向技术的软件开发方法中，步骤S1具体包括：In the above software development method based on reverse technology, step S1 specifically includes:

S11.读取目标软件的目标代码至内存；S11. Read the target code of the target software to the memory;

S12.对目标代码进行分析以分离出指令代码和数据；S12. Analyze the target code to separate out the instruction code and data;

S13.通过反汇编工具对目标代码进行反汇编得到汇编文件；S13. Disassemble the object code through a disassembly tool to obtain an assembly file;

S14.通过反编译工具对汇编文件进行反编译得到源码。S14. Decompile the assembly file through a decompilation tool to obtain the source code.

在上述的基于逆向技术的软件开发方法中，步骤S11具体为：In the above software development method based on reverse technology, step S11 is specifically:

A1.从目标二进制格式文件中读取若干个字节存放到 Content对象里；A1. Read several bytes from the target binary format file and store them in the Content object;

A2.将Content对象存放到Vector容器里；A2. Store the Content object in the Vector container;

A3.重复步骤A1和A2，直到文件结尾。A3. Repeat steps A1 and A2 until the end of the file.

在上述的基于逆向技术的软件开发方法中，步骤S12具体为：In the above software development method based on reverse technology, step S12 is specifically:

B1.跟踪指令控制流，遍历并标识出每条指令；B1. Track the instruction control flow, traverse and identify each instruction;

B2.将指令流可到达的代码部分标识为指令代码，其余部分标识为数据。B2. Identify the part of the code that can be reached by the instruction stream as instruction code, and identify the rest as data.

在上述的基于逆向技术的软件开发方法中，步骤S13具体为：In the above-mentioned software development method based on reverse technology, step S13 is specifically:

C1.从Vector容器里依次取出对象，并根据指令代码和数据的分离结果判断该对象是指令代码还是数据；C1. Take out the objects in sequence from the Vector container, and judge whether the object is the instruction code or the data according to the separation result of the instruction code and the data;

C2.若对象是指令代码，则通过反汇编工具将指令代码反汇编成汇编指令形式；若是数据，则直接或通过反汇编工具将数据翻译成数据的值。C2. If the object is an instruction code, disassemble the instruction code into an assembly instruction form through a disassembly tool; if it is data, translate the data into the value of the data directly or through a disassembly tool.

在上述的基于逆向技术的软件开发方法中，步骤S13和S14 之间还包括：In the above software development method based on reverse technology, between steps S13 and S14 further includes:

D1.将汇编指令代码归一化为中间代码；D1. Normalize the assembly instruction code into intermediate code;

D2.提取库函数，并识别系统库函数和用户自定义函数；D2. Extract library functions, and identify system library functions and user-defined functions;

D3.恢复用户自定义函数包括名称、参数个数、返回值和类型在内的关键信息；D3. Restore key information of user-defined functions including name, number of parameters, return value and type;

步骤S14中，反编译工具对系统库函数和用户自定义函数分别进行反编译。In step S14, the decompilation tool decompiles the system library function and the user-defined function respectively.

在上述的基于逆向技术的软件开发方法中，步骤S2中，根据运行手册、帮助文档以及通过动态运行目标软件的方式确定目标软件的功能。In the above software development method based on reverse technology, in step S2, the function of the target software is determined according to the operation manual, the help document, and by dynamically running the target software.

在上述的基于逆向技术的软件开发方法中，步骤S2中，通过对源码进行动态调试的方式查找和提取关键代码，并根据确定的功能标记关键代码与功能之间的对应关系。In the above software development method based on reverse technology, in step S2, the key code is searched and extracted by dynamically debugging the source code, and the corresponding relationship between the key code and the function is marked according to the determined function.

在上述的基于逆向技术的软件开发方法中，步骤S3中，通过静态解析和/或动态解析方法对目标软件进行解构分析。In the above software development method based on reverse technology, in step S3, the target software is deconstructed and analyzed by static analysis and/or dynamic analysis.

在上述的基于逆向技术的软件开发方法中，步骤S4之前先将源码/代码块转换为目标开发语言；In the above software development method based on reverse technology, the source code/code block is converted into the target development language before step S4;

步骤S4中，所述的重构系统为具有开放接口的开放系统以供用户开发、完善和校核。In step S4, the reconstruction system is an open system with an open interface for users to develop, improve and check.

本发明的优点在于：The advantages of the present invention are:

1、根据软件实际功能有针对地对软件部分进行逆向，剩余部分可按需扩展，能够减轻逆向开发工作量，同时能够最大程度实现软件逆向的可操作性；1. According to the actual function of the software, the software part is reversed in a targeted manner, and the remaining part can be expanded as needed, which can reduce the workload of reverse development and maximize the operability of software reverse engineering;

2、首先分离目标代码中的指令代码和数据，在使用反汇编工具进行反汇编时能够直接针对指令代码和数据进行有针对性的处理，避免数据对指令代码反汇编工作造成干扰，提高反汇编效率；2. First separate the instruction code and data in the target code. When disassembling using the disassembly tool, the instruction code and data can be directly processed in a targeted manner, so as to avoid the interference of the data with the instruction code disassembly work, and improve the disassembly. efficiency;

3、使用Vector容器存放Content对象，便于内容提取与标记以及便于后续跟踪指令控制流；3. Use the Vector container to store the Content object, which is convenient for content extraction and marking and for subsequent tracking of instruction control flow;

4、通过跟踪PC值的方式跟踪指令控制流能够保证遍历每一条指令，从而确保指令代码与数据分离的彻底分离，保证分离效果；4. Tracking the instruction control flow by tracking the PC value can ensure that each instruction is traversed, thereby ensuring the complete separation of instruction code and data separation, and ensuring the separation effect;

5、对汇编文件进行处理之前先识别用户自定义函数，分开用户自定义函数和系统库函数，并还原用户自定义函数的关键信息，便于后续的源代码级别代码的还原工作。5. Identify user-defined functions before processing assembly files, separate user-defined functions and system library functions, and restore key information of user-defined functions, which is convenient for subsequent source code level code restoration.

附图说明Description of drawings

图1为本发明基于逆向技术的软件开发方法的方法流程图；Fig. 1 is the method flow chart of the software development method based on reverse technology of the present invention;

图2为本发明基于逆向技术的软件开发方法中获得源码的方法流程图；Fig. 2 is the method flow chart of obtaining source code in the software development method based on reverse technology of the present invention;

图3为本发明基于逆向技术的软件开发方法中读取目标代码至内存的方法流程图；Fig. 3 is the method flow chart of reading target code to memory in the software development method based on reverse technology of the present invention;

图4为本发明基于逆向技术的软件开发方法中跟踪指令控制流的方法流程图；Fig. 4 is the method flow chart of tracking instruction control flow in the software development method based on reverse technology of the present invention;

图5为本发明基于逆向技术的软件开发方法中反汇编过程的工作流程图；Fig. 5 is the working flow chart of disassembly process in the software development method based on reverse technology of the present invention;

图6为本发明基于逆向技术的软件开发方法中反汇编过程中对数据和指令代码的具体处理流程图；Fig. 6 is the concrete processing flow chart of data and instruction code in the disassembly process in the software development method based on reverse technology of the present invention;

图7为函数在存储器中的位置布局示意图。FIG. 7 is a schematic diagram of the location layout of functions in the memory.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明做进一步详细的说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

如图1所示，本实施例公开了一种基于逆向技术的软件开发方法，包括以下步骤：As shown in FIG. 1, the present embodiment discloses a software development method based on reverse technology, which includes the following steps:

具体地，步骤S2中，根据运行手册、帮助文档以及通过动态运行目标软件的方式确定目标软件的功能，并通过对源码进行动态调试的方式查找和提取与所有功能或主要功能或用户选中的功能对应的关键代码，并根据确定的功能标记关键代码与功能之间的对应关系。Specifically, in step S2, the function of the target software is determined according to the operation manual, the help document and by dynamically running the target software, and the functions related to all functions or main functions or functions selected by the user are searched and extracted by dynamically debugging the source code. Corresponding key codes, and mark the correspondence between key codes and functions according to the determined functions.

通过步骤S2，在代码块中对功能进行定位，然后标记代码块与功能之间的对应关系，用户可以根据需要选定关键功能，本系统提取用户选定的关键功能对应的代码块以实现对目标软件中的关键代码进行提炼，细节加以隐藏，从而简化软件系统，同时提高软件的可操作性。Through step S2, the function is located in the code block, and then the corresponding relationship between the code block and the function is marked, the user can select the key function according to the needs, and the system extracts the code block corresponding to the key function selected by the user to realize the matching The key codes in the target software are refined and the details are hidden, thereby simplifying the software system and improving the operability of the software.

进一步地，步骤S3中，通过静态解析和/或动态解析方法对目标软件进行解构分析。具体地，静态解析的工具可以采用c32asm等工具，动态解析的工具可以采用Ollydbg等工具。可以在对目标软件进行反编译的过程中同时对目标软件进行解构分析。Further, in step S3, the target software is deconstructed and analyzed by static analysis and/or dynamic analysis. Specifically, tools such as c32asm can be used as tools for static analysis, and tools such as Ollydbg can be used as tools for dynamic analysis. The target software can be deconstructed and analyzed in the process of decompiling the target software.

进一步地，通过对目标软件进行反编译获得的源码可能是C#、 JAVA等语言，有些语言便于开发，有些语言不便于开发，有些工程师习惯使用一种语言，有些工程师习惯使用另一种语言，所以本方案在根据目标代码得到源码之后进一步根据用户需求将源码 /代码块转换成目标开发语言。且步骤S4中重构的系统为具有开放接口的开放系统以供用户开发、完善、校核等工作，如接口开发、扩展逻辑、按需扩展功能等。Further, the source code obtained by decompiling the target software may be in languages such as C#, JAVA, etc. Some languages are easy to develop, and some languages are not easy to develop. Some engineers are accustomed to using one language, and some engineers are accustomed to using another language, so In this solution, after obtaining the source code according to the target code, the source code/code block is further converted into the target development language according to user requirements. And the reconstructed system in step S4 is an open system with an open interface for the user to develop, improve, check, etc., such as interface development, extension logic, and on-demand extension functions.

将源码/代码块转换成目标开发语言的具体方法如下：The specific method of converting source code/code block into target development language is as follows:

S1.将源码/代码块按照字符串的特点划分为纯文本替换部分和功能替换部分；S1. Divide the source code/code block into a plain text replacement part and a function replacement part according to the characteristics of the string;

S2.根据数据基，将纯文本替换部分的字符串替换为目标开发语言对应的字符串；将功能替换部分的字符串根据数据基及其含义替换为目标开发语言具有相应含义的字符串。S2. Replace the strings in the plain text replacement part with strings corresponding to the target development language according to the data base; replace the strings in the function replacement part with strings with corresponding meanings in the target development language according to the data base and its meaning.

这里的数据基为事先针对具体的代码块语言和目标转换代码语言构造的两种语言机制间的对应关系集。The data base here is a set of correspondence relationships between two language mechanisms constructed in advance for a specific code block language and a target conversion code language.

具体地，如图2所示，步骤S1具体包括：Specifically, as shown in Figure 2, step S1 specifically includes:

如图3所示，首先定义一个Content对象的Vector容器，且步骤S11具体为：As shown in Figure 3, first define a Vector container of a Content object, and step S11 is specifically:

A1.首先定义一个Content对象，从目标二进制格式文件中读取若干个字节存放到Content对象里；每次读取的字节数量由本领域技术人员根据具体情况确定，如针对32位指令的反汇编，可以每次读取4个字节；A1. First define a Content object, read several bytes from the target binary format file and store them in the Content object; the number of bytes read each time is determined by those skilled in the art according to the specific situation, such as the reverse for 32-bit instructions. Assembly, can read 4 bytes at a time;

步骤S12具体为：Step S12 is specifically:

具体地，如图4所示，步骤B1中通过以下方式跟踪指令控制流：Specifically, as shown in FIG. 4 , in step B1, the instruction control flow is tracked in the following manner:

B11.将PC值设为0；B11. Set the PC value to 0;

B12.例如，在对32位操作系统的指令进行反编译时，由于在32位操作系统中，每个指令存储为4个字节，所以这时，特定 Content对象是指下标为PC/4的Content对象，表示PC地址除以4然后取整，代表的是每个4字节指令的收尾地址，也就是指令的开始地址；B12. For example, when decompiling the instructions of a 32-bit operating system, since each instruction is stored as 4 bytes in a 32-bit operating system, at this time, the specific Content object refers to the subscript PC/4 The Content object indicates that the PC address is divided by 4 and then rounded up, which represents the ending address of each 4-byte instruction, that is, the starting address of the instruction;

B13.将取出的Content对象标识为指令，并标记该指令为已经访问过；B13. Identify the extracted Content object as an instruction, and mark the instruction as having been accessed;

B14.判断步骤B13中标识的指令是否为程序结束指令，若是，则执行步骤B15，否则执行步骤B16；B14. Determine whether the instruction identified in step B13 is the program end instruction, if so, execute step B15, otherwise execute step B16;

B15.继续判断显示表是否为空，若是，则结束跟踪，否则从显示表中取出一个Elem元素，显示表不为空则代表调用链中有压栈的Elem指令，这些指令仍然映射着Vector容器中的Content 对象，因此本方案在分离的过程中重复该过程，直至显示表没有 Elem元素以实现完全分离效果；B15. Continue to judge whether the display table is empty, if so, end the tracking, otherwise, take an Elem element from the display table. If the display table is not empty, it means that there are Elem instructions on the stack in the call chain, and these instructions are still mapped to the Vector container Therefore, this solution repeats the process during the separation process until the display table has no Elem element to achieve a complete separation effect;

并判断Elem元素中addr地址处的指令是否访问过，若没有，则恢复包括PC值的当前现场信息并回到步骤B12，即将addr地址处的数据压栈置顶，若有，则对下一个Elem元素中addr地址处的指令是否访问过进行判断，直到遍历所有Elem元素后结束跟踪And judge whether the instruction at the addr address in the Elem element has been accessed, if not, restore the current field information including the PC value and return to step B12, that is, push the data at the addr address to the top of the stack, if so, then the next Elem Whether the instruction at the addr address in the element has been accessed is judged until the tracking is ended after traversing all Elem elements

B16.进一步判断B13中标识的指令是否为转移指令，若是，则根据具体转移指令，更新PC值、显示表、返回表，否则将PC 自增，并回到步骤B12。该步骤是一个循环遍历过程，用于将识别出的符合指令转移表示的指令进行返回表和显示表的入栈操作。返回表，用于记录程序调用时的返回地址；显示表，碰到双分支指令时，将其显示地址和现场(程序各寄存器的值)填入该表中。B16. Further determine whether the instruction identified in B13 is a transfer instruction, and if so, update the PC value, display table, and return table according to the specific transfer instruction; otherwise, increment the PC automatically, and return to step B12. This step is a loop traversal process, and is used to push the identified instructions conforming to the instruction transfer representation into the return table and the display table. The return table is used to record the return address when the program is called; the display table, when a double branch instruction is encountered, the display address and the field (the value of each register of the program) are filled in the table.

具体地，步骤B16中，更新PC值、显示表、返回表的具体步骤如下：Specifically, in step B16, the specific steps of updating the PC value, displaying the table, and returning the table are as follows:

若为无条件转移指令(B指令等，MOV PC，0x16),则将此指令所在地址填段表,其显式地址填段表,且将显式地址作为当前PC 地址；If it is an unconditional transfer instruction (B instruction, etc., MOV PC, 0x16), fill in the segment table with the address of this instruction, fill in the segment table with its explicit address, and use the explicit address as the current PC address;

若为无条件转移指令子程序调用指令(BL指令),则将此指令所在地址填段表，返回地址填入返回地址表,显式地址填段表, 且将显式地址作为当前PC地址；If it is an unconditional transfer instruction subroutine call instruction (BL instruction), fill in the segment table with the address of the instruction, fill in the return address table with the return address, fill in the segment table with the explicit address, and use the explicit address as the current PC address;

若为无条件转移指令中的返回指令(MOV PC，LR),则在返回地址表中按“后进先出”原则找到返回地址,将此指令所在地址填段表,其返回地址填段表,且将返回地址作为当前PC地址；If it is the return instruction (MOV PC, LR) in the unconditional transfer instruction, the return address is found in the return address table according to the "last in first out" principle, the address of this instruction is filled in the segment table, and the return address is filled in the segment table, and Use the return address as the current PC address;

若为二叉点指令(BEQ，MOVEQ PC，0x16等),则将显式地址填入显式地址表(还要保存当时的寄存器值),然后将隐式地址作为当前PC地址。If it is a binary point instruction (BEQ, MOVEQ PC, 0x16, etc.), fill the explicit address into the explicit address table (and save the current register value), and then use the implicit address as the current PC address.

段表，是指将所有转移指令除条件转移的转移地址填入此表包括本指令地址和转向地址。通过段表能够得到若干段的代码段，使代码更加清晰明了便于后续的反汇编等工作。The segment table refers to filling in the branch address of all branch instructions except conditional branch into this table, including the address of this instruction and the redirection address. The code segment of several segments can be obtained through the segment table, which makes the code clearer and facilitates subsequent disassembly and other work.

进一步地，如图5和图6所示，步骤S13具体为：Further, as shown in Figure 5 and Figure 6, step S13 is specifically:

C2.若对象是指令代码，则将指令代码反汇编成汇编指令形式；若是数据，则将数据翻译成数据的值。先对代码进行分离，使反汇编器能够分别有针对性地对指令代码进行反汇编，对数据直接进行翻译，提高提高反汇编效率。C2. If the object is an instruction code, disassemble the instruction code into an assembly instruction form; if it is data, translate the data into the value of the data. The code is separated first, so that the disassembler can disassemble the instruction code in a targeted manner, and directly translate the data, so as to improve the disassembly efficiency.

进一步地，步骤S13和S14之间还包括：Further, between steps S13 and S14 also includes:

D1.将汇编指令代码归一化为中间代码D1. Normalize assembly instruction code into intermediate code

(Low2levelIntermediateLanguage,LIL)，并在转换过程中构建各种符号表,留待后期工作使用；(Low2levelIntermediateLanguage, LIL), and build various symbol tables during the conversion process, leaving it for later work;

D2.通过动态调试中间代码提取库函数并识别系统库函数和用户自定义函数；D2. Extract library functions through dynamic debugging intermediate code and identify system library functions and user-defined functions;

D3.恢复用户自定义函数包括名称、参数个数、返回值和类型在内的关键信息。D3. Restore the key information of the user-defined function including the name, the number of parameters, the return value and the type.

中间代码是源程序的一种内部表示，不依赖目标机的结构，将汇编指令归一化为中间代码再继续后面的工作有助于编译器程序的开发和移植(鲁棒性)，同时能够帮助用户更方便地对代码进行优化处理。动态调试是指让程序运行起来，可以采用Ollydbg 等动态调试工具。The intermediate code is an internal representation of the source program. It does not depend on the structure of the target machine. Normalizing the assembly instructions into the intermediate code and continuing the subsequent work is helpful for the development and porting (robustness) of the compiler program, and at the same time. Help users to optimize the code more conveniently. Dynamic debugging refers to making the program run, and dynamic debugging tools such as Ollydbg can be used.

反汇编工具和反编译工具可以采用同一个能够同时实现反汇编和反编译的反汇编工具，也能够分别采用一个工具，当采用同一个反汇编工具时，上述步骤D1-D3的函数识别程序可以嵌入于反汇编工具中，也可以与反汇编工具并列，反汇编工具在对目标代码进行汇编得到汇编文件后将汇编文件输出至函数识别程序进行函数识别，函数识别程序再将识别结果返回给反汇编工具，再由反汇编工具继续对系统库函数和用户自定义函数分别进行控制流分析和数据类型分析等工作完成编译工作进而得到源代码级别的反汇编结果。这里先将用户自定义函数和系统库函数分别开，对用户自定义函数进行信息恢复后再用反汇编工具进行控制流分析和数据类型分析等反汇编工作，能够避免用户自定义数据对反汇编工作的影响，提高反汇编效率的同时降低反汇编的错误率。The disassembly tool and the decompilation tool can use the same disassembly tool that can realize disassembly and decompilation at the same time, or can use a separate tool. When the same disassembly tool is used, the function recognition program in the above steps D1-D3 can be used. Embedded in the disassembly tool, it can also be paralleled with the disassembly tool. After the disassembly tool assembles the object code to obtain the assembly file, it outputs the assembly file to the function recognition program for function recognition, and the function recognition program returns the recognition result to the reverser. Assembly tool, and then the disassembly tool continues to perform control flow analysis and data type analysis on system library functions and user-defined functions respectively to complete the compilation work and obtain the disassembly result at the source code level. Here, the user-defined functions and system library functions are separately opened, and the information of the user-defined functions is recovered, and then the disassembly tool is used to perform disassembly work such as control flow analysis and data type analysis, which can avoid the disassembly of user-defined data. It can improve the efficiency of disassembly and reduce the error rate of disassembly.

具体地，步骤D2中，通过以下方式识别用户自定义函数：Specifically, in step D2, the user-defined function is identified in the following manner:

E1.准备若干只有一个库函数调用语句且仅在调用函数的参数方面存在不同的调用程序；E1. Prepare several calling programs that have only one library function calling statement and only differ in the parameters of the calling function;

E2.执行步骤E1准备的若干调用程序，并将有效操作指令固定不变的函数确定为用户自定义函数。这里的有效操作指令指用户自定义库函数指令的操作码，用户自定义库函数指令的操作码是固定不变的，在编译链接过程中不会发生地址重定位的问题，也不会受到不同版本编译器以及编译优化的影响。所以在不同的调用程序中，同一库函数的有效操作指令是不变的，所以通过上述步骤能够将用户自定义的库函数从系统库函数中分离出来。E2. Execute several calling programs prepared in step E1, and determine a function whose valid operation instruction is fixed as a user-defined function. The effective operation instruction here refers to the opcode of the user-defined library function instruction. The opcode of the user-defined library function instruction is fixed, and the problem of address relocation will not occur during the compilation and linking process, and will not be affected by different version compiler and the impact of compilation optimizations. Therefore, in different calling programs, the effective operation instructions of the same library function are unchanged, so the user-defined library functions can be separated from the system library functions through the above steps.

进一步地，对步骤D2中确定的用户自定义函数进行地址核对，选出最高地址的函数，并将该函数及所有更低地址的函数确定为用户自定义函数。如图7所示，用户自定义的库函数，其代码是连续存放的，存放顺序与它们在源程序中的定义顺序一致,并且用户自定义库函数代码位于低地址,系统库函数代码位于高地址，所以通过该方法，能够有效地将用户自定义的库函数从系统库函数中分离出来。Further, the addresses of the user-defined functions determined in step D2 are checked, the function with the highest address is selected, and the function and all functions with lower addresses are determined as user-defined functions. As shown in Figure 7, the code of user-defined library functions is stored continuously, and the storage order is consistent with their definition order in the source program, and the user-defined library function code is located at a low address, and the system library function code is located at a high address address, so through this method, the user-defined library functions can be effectively separated from the system library functions.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.

尽管本文较多地使用了目标软件、代码块、目标代码等术语，但并不排除使用其它术语的可能性。使用这些术语仅仅是为了更方便地描述和解释本发明的本质；把它们解释成任何一种附加的限制都是与本发明精神相违背的。Although the terms such as target software, code block, and target code are used frequently in this article, the possibility of using other terms is not excluded. These terms are used only to more conveniently describe and explain the essence of the present invention; it is contrary to the spirit of the present invention to interpret them as any kind of additional limitation.

Claims

1. A software development method based on reverse technology is characterized by comprising the following steps:

s1, decompiling target software to obtain a source code;

s2, performing function analysis on the target software, and determining the corresponding relation between each code block in the source code and the function;

s3, deconstruction analysis is carried out on the target software to obtain software information of the target software including a system deployment environment, a system architecture, a scheduling mode, an interface specification, a communication protocol, a reference mode and a data storage mode;

and S4, based on the software information reconstruction system, filling code blocks with corresponding relations with functions into the reconstruction system.

2. The software development method based on the reverse technology as claimed in claim 1, wherein step S1 specifically includes:

s11, reading a target code of the target software to a memory;

s12, analyzing the target code to separate an instruction code and data;

s13, disassembling the target code through a disassembling tool to obtain an assembly file;

s14, decompiling the assembly file through a decompiling tool to obtain a source code.

3. The software development method based on the reverse technology according to claim 2, wherein step S11 specifically includes:

A1. reading a plurality of bytes from the target binary format file and storing the bytes in the Content object;

A2. storing the Content object into a Vector container;

A3. steps A1 and A2 are repeated until the end of the file.

4. The software development method based on the reverse technology according to claim 3, wherein step S12 is specifically:

B1. tracing instruction control flow, traversing and identifying each instruction;

B2. the code portions reachable by the instruction stream are identified as instruction codes, and the remaining portions are identified as data.

5. The software development method based on the reverse technology as claimed in claim 4, wherein step S13 is specifically:

C1. sequentially taking out the objects from the Vector container, and judging whether the objects are instruction codes or data according to the separation result of the instruction codes and the data;

C2. if the object is an instruction code, disassembling the instruction code into an assembly instruction form through a disassembling tool; if so, the data is translated into values of the data, either directly or through a disassembly tool.

6. The reverse technology-based software development method according to claim 5, further comprising, between steps S13 and S14:

D1. normalizing the assembly instruction code into an intermediate code;

D2. extracting a library function, and identifying a system library function and a user-defined function;

D3. recovering key information of the user-defined function, including name, parameter number, return value and type;

in step S14, the decompilation tool decompilates the system library function and the user-defined function, respectively.

7. A method for software development based on reverse technology according to any of claims 1-6, characterized in that in step S2, the function of the target software is determined according to the operation manual, help document and by means of dynamic operation of the target software.

8. A method for developing software based on reverse technology according to claim 7, wherein in step S2, the key code is searched and extracted by dynamically debugging the source code, and the corresponding relationship between the key code and the function is marked according to the determined function.

9. A reverse-technology-based software development method according to claim 8, wherein in step S3, the target software is deconstructed and analyzed by static analysis and/or dynamic analysis.

10. The reverse technology-based software development method according to claim 9, wherein step S4 is preceded by converting the source code/code blocks into a target development language;

in step S4, the reconfiguration system is an open system with an open interface for user development, improvement and verification.