CN110347448B

CN110347448B - Method for constructing runtime model of terminal application behavior

Info

Publication number: CN110347448B
Application number: CN201910498727.XA
Authority: CN
Inventors: 蔡华谦; 黄罡; 张颖; 刘譞哲
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2021-02-12
Anticipated expiration: 2039-06-10
Also published as: CN110347448A; WO2020248512A1

Abstract

The invention discloses a method for constructing a run-time model of terminal application behavior. A complete, accurate and detailed application behavior self-statement is generated through a behavior interpreter, that is, a run-time model of terminal application behavior, which overcomes the problem of existing technology in The dynamic, changeable, and difficult-to-control application runtime environment has deficiencies in monitoring the application behavior of terminal applications. It realizes flexible and complete monitoring of application behaviors of terminal applications, and provides instructions for subsequent implementation of command-level control of application behaviors of terminal applications. technical support.

Description

Method for constructing runtime model of terminal application behavior

Technical Field

The invention relates to computer technology, in particular to a method for constructing a runtime model of terminal application behaviors.

Background

The internetware (also called terminal application) is an abstraction of the basic form of a software system in the open, dynamic and variable environment of the internet, is a natural extension of the traditional software structure, and has unique basic characteristics different from the traditional software form developed in the centralized packaging environment: 1) autonomy refers to relative independence, initiative and adaptivity of software entities in the internetware system. Autonomy distinguishes it from the dependencies and passivity of software entities in traditional software systems; 2) the cooperativity refers to interconnection, intercommunication, cooperation and alliance between software entities in the internetware software system under the open network environment according to a plurality of static connection and dynamic cooperation modes. The cooperativity makes the system different from a single static connection mode of a traditional software system in a closed centralized environment; 3) and the reactivity means that the internetware has the capability of perceiving the external operation and use environment and providing useful information for the system evolution. The responsiveness enables the internetware system to have the sensing capability of adapting to open, dynamic and changeable environments; 4) the evolution means that the network software structure can dynamically evolve according to the application requirements and the network environment changes, and mainly shows the variability of the number of elements, the adjustability of the structural relationship and the dynamic configurability of the structural form. The modeling enables the internetware system to have the strain capacity of adapting to open, dynamic and variable environments; 5) polymorphism, meaning the effect of an internetware system, represents compatible multiobjective. It can satisfy multiple compatible target forms under the dynamically changing network environment according to some basic cooperation principles. Polymorphism enables the internetware system to have certain flexibility and the capability of meeting personalized requirements under the network environment.

The implementation of the above described internetware features often requires modification of the software in the running state to ensure or improve quality, optimize or add new functions. Classical software engineering methods and techniques emphasize modifying software in the development state and do not support direct modification of software in the run state.

Correspondingly, system software such as programming languages, operating systems, middleware, and the like provide a common main mechanism for running state monitoring and control applications, namely, computational reflection (reflection). Various development frameworks and testing frameworks can be realized based on the computing reflection, so that the efficiency of developers in code development, testing and even running deployment is improved. In the computer field, b.smith gives a general definition of reflectivity: reflexibility is the ability of an entity to describe, manipulate and process itself in the same way as the main problem domain faced by the entity. This definition is subsequently interpreted as: reflexibility is the ability of a program to manipulate a set of data at runtime, which describes the running state of the program, the manipulation having two implications: 1) monitoring (Introspection), the program can observe and reason about its own state; 2) control (interaction), the program may change its operation or semantics. Both aspects require that the state of program execution be encoded as data, and providing such encoding is referred to as reflection, i.e., reflection essentially maps the running state of a program to a set of operational data. The former part constitutes a base layer entity, the latter part constitutes a meta-layer entity, and causal association is maintained between the base layer entity and the meta-layer entity. The calculated reflection is mainly divided into a structural reflection and a behavioral reflection according to the difference of basic entities. The basic entity of the structure reflection is the current program and its abstract data type (which can be regarded as the state of the application), and the basic entity of the behavior reflection is the execution behavior of the current program and the data required by the execution (which can be regarded as the behavior of the application).

Structural reflection refers to the ability of a programming language to provide reflection of current programs and their abstract data types, which is inherent in most programming language frameworks (runtime or framework) due to its natural existence in analogy with the capabilities of the programming language frameworks.

Behavior reflection refers to the ability of a programming language to provide data reflection on its execution semantics and its execution, i.e., the programming language framework itself needs to be reflected, behavior reflection faces two challenges in monitoring and control: for one, it is necessary to fully describe the existing application behavior, i.e. to monitor the execution of the application. The execution of the application can be regarded as a set of runtime activities, the finer the granularity of the activities is, the richer the monitored information is, the larger the resources occupied by the monitoring function is, and the more serious the resource competition between the monitoring function and the business logic is. At this time, the complexity and scalability of application behavior monitoring becomes a primary challenge for end-application behavior reflection. Secondly, the behavior reflection of the existing programming language, the operating system, the middleware and other system software does not support the behavior control at the instruction level, and the fundamental reason is the complex data and control dependence contained in the instruction sequence, so the instruction level control of the application behavior becomes the main difficulty of the behavior reflection of the terminal application.

Disclosure of Invention

The main object of the present invention is to provide a method for constructing a runtime model of a terminal application behavior, which overcomes the first challenge and realizes complete monitoring of the runtime behavior of the terminal application.

The invention is realized by the following technical scheme:

in order to solve the technical problems, the invention provides a method for constructing a runtime model of a terminal application behavior, wherein the runtime model comprises a runtime stack model and a runtime heap model, and the method comprises the steps of constructing the runtime stack model of the terminal application behavior and constructing the runtime heap model of the terminal application behavior;

the step of constructing the runtime stack model of the terminal application behavior comprises:

when the terminal application runs, acquiring a code which is really executed in a memory of the terminal application, and abstracting the code which is really executed to generate a control flow diagram;

inputting a control flow graph to be monitored to a preset behavior interpreter aiming at the control flow graph;

the control flow graph needing to be monitored is interpreted and executed by the behavior interpreter, and stack activity of the terminal application in operation is generated;

generating a dependency relationship between control flows of the stack activities when the terminal application runs to obtain a run-time stack model of the terminal application behaviors;

the step of constructing a runtime heap model of the end-application behavior comprises:

generating an initial state of a heap area when the terminal application runs;

and generating heap operation activities to obtain a runtime heap model of the terminal application behaviors.

Further, the method comprises a class filter and an activity type filter; the class filter is used for removing program activities which are not concerned by developers based on coarse-grained screening of regular matching of the package and the class name; the activity type filter is based on fine-grained filtering of activity types and is used for removing activity types which are not concerned by developers.

Further, the activity types of the stack activity comprise method start and method end, field reading, array reading and synchronization instructions;

the step of utilizing the behavior interpreter to interpret and execute the control flow graph needing to be monitored and generating stack activity in the runtime of the terminal application comprises the following steps:

interpreting and executing the control flow graph needing to be monitored by utilizing a behavior interpreter which has a monitoring function on the application behavior of the terminal application to obtain the activity of the terminal application in the running process;

according to the concerned class, performing coarse-grained screening on the activity of the terminal application in operation by using the class screener to generate stack activity caused by the class;

and aiming at the activity type of the stack activity, performing fine-grained screening on the stack activity by using the activity type filter.

Further, the step of constructing the runtime heap model of the terminal application behavior comprises:

the activities of the terminal application runtime include instantiation activities, modification activities, and reclamation activities.

Further, the activity types of the heap operation activity comprise object instantiation, array instantiation, object field writing, array element writing, clearing activity and compression activity;

the step of generating heap operational activity comprises:

according to the concerned class, performing coarse-grained screening on the activity of the terminal application in operation by using the class screener to generate heap operation activity caused by the class;

and aiming at the activity type of the heap operation activity, performing fine-grained screening on the heap operation activity by using the activity type screener.

Further, the dependency relationship includes a synchronization dependency and a communication dependency.

Further, when generating a synchronous dependency relationship between control flows, for a case that the end of one method depends on the end of another method, activities that can be matched in other threads are searched from back to front by using a timestamp, if the activity is found, the synchronous dependency relationship corresponds to one synchronous dependency relationship, for a case that the start of one activity depends on the end of another activity, a current thread is checked first, if the activity is the activity executed by the first one of the current threads, the activity depends on another thread to end the activity, otherwise, the activity is only a normal method call and does not depend on the activity of another thread.

Further, when generating the dependency relationship between the control flows of the stack activities, summarizing all classes related to inter-activity communication dependencies, and using the methods related to the classes and the methods related to thread dependencies together as a knowledge base for generating the communication dependencies.

Further, when the runtime model is generated, the activity sequence in the runtime model is stored in a buffer area with a configurable size, and when the number of activities exceeds a preset number, the activities in the buffer area are serialized and persisted in a local storage.

Further, the runtime heap model is represented in the form of a Pasteur paradigm.

Compared with the prior art, the invention generates a complete, accurate and detailed self-describing application behavior through the behavior interpreter, namely the runtime model of the terminal application behavior, overcomes the defects of the prior art in the monitoring of the terminal application behavior by the dynamic, variable and difficult application runtime environment, realizes the flexible and complete monitoring of the terminal application behavior, and provides technical support for the subsequent realization of the instruction level control of the terminal application behavior.

Drawings

Fig. 1 is a prior art 3G radio resource control state machine;

FIG. 2(a) is a schematic diagram of the flow of control of network requests before merging in an example of merging of network requests;

FIG. 2(b) is a schematic diagram of a flow of control of a merged network request in an example of merging network requests;

FIG. 3 is a schematic diagram of an example of communication dependencies between threads-a producer-consumer model;

FIG. 4 is a flow chart of the steps of a method of constructing a runtime model of terminal application behavior in accordance with the present invention;

FIG. 5 is an android multithreading example;

FIG. 6 is an example of inter-multithread inter-programming dependency;

FIG. 7(a) is a execute pre-heap object;

FIG. 7(b) is a post-execution heap region object;

FIG. 8 is a schematic diagram of an exemplary refletall model generation subsystem architecture of the present invention;

FIG. 9 is a schematic structural diagram of an interface operation subsystem of an example refletall of the present invention;

FIG. 10(a) is the experimental results on the open source application set;

FIG. 10(b) is the experimental results on the closed source application set;

FIG. 11 is a comparison of application launch time results for refletall and Emma when generating code coverage reports.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following embodiments and the accompanying drawings.

In order to better understand the technical problem of the application, the invention adopts the application function evolution scenes of two typical cases to analyze, so as to determine the root cause of the inapplicability of the existing behavior reflex.

Case one:

with the development of smart phones, mobile applications of terminals increasingly rely on software and hardware resources provided by a cloud to provide better services. However, the communication between the cloud and the terminal consumes a large amount of power. The networking application (such as weather, mail, news and the like) presents the typical componentization characteristics of the internetware, and the communication between the terminal and each component in the cloud is realized by utilizing the network. Particularly, under the 3G/4G environment, the networking application acquires corresponding push messages by utilizing the network at intervals for a long time in the background. Such long-time, intermittent message pushing puts a great strain on the endurance of smartphones with limited battery capacity. 3G and 4G are mobile cellular networks which are currently used in mainstream, and the power consumption characteristics of the mobile cellular networks are more complex. On the one hand, because cellular networks are highly mobile, it is possible for a mobile device to quickly handoff to a different cellular network base station as the physical location moves. Thus, it is not possible for a cellular network base station to always allocate a channel to a mobile device. On the other hand, as the mobile device has limited endurance and is connected to the cellular network base station for a long time, the power consumption of the mobile device is greatly increased, and the endurance is affected. Therefore, in the cellular network standard, the state of a Radio Resource Control (RRC) module is further defined.

Taking the 3G network module in the mobile device as an example, a total of three states are included, as shown in fig. 1.

(1) IDLE: i.e. an idle state in which the 3G module consumes the lowest power and cannot transmit or receive any data. In this state, if data is to be transmitted or received, a transition is made to the CELL _ DCH state.

(2) CELL _ DCH: in this state, the bandwidth of the 3G module is maximized, where data transmission can be performed at the maximum rate, while its power consumption is maximized. If there is still no data transmission for a while, it transitions to the CELL _ FACH state. The time for continuously operating in the CELL _ DCH state is typically 5 to 10 seconds, depending on the settings of different operators.

(3) CELL _ FACH, in this state, the power consumption of 3G module is saved by 50% compared with CELL _ DCH, and at the same time, the network transmission rate is lower. If the data sent or received in this state is greater than a certain threshold, the state will be transferred back to the CELL _ DCH state. And if no data is transmitted or received for a while in the CELL _ FACH state, it is transferred to the IDLE state. Typically this time is typically 10 seconds to 15 seconds.

Fig. 2(a) and 2(b) show an example of network request merging. FIG. 2(a) shows network request and power consumption of the wireless communication module before merging, with time on the horizontal axis and power consumption of the wireless communication module in the upper half; the dotted line in the lower half is the thread initiating the two network requests; the solid line in the lower half represents the control flow thereof. Firstly, a background news pushing thread wakes up a thread (I) responsible for sending a network request; after the thread is awakened, a network request is initiated, and at the moment, the power consumption of the wireless communication module is changed from low power consumption in an IDLE state to high power consumption in a CELL _ DCH state; after the whole request is completed, the thread responsible for sending the network request returns the result to the news pushing thread (c), at this time, although the wireless communication module does not receive or send data, the wireless communication module still keeps in a high power consumption state, and the power consumption of the wireless communication module from this beginning is called 'tail time power consumption', and corresponds to a diagonal line part used in fig. 2 (a); and after receiving the returned result, the news pushing thread processes the result and prompts on the notification bar. After a while, another version update thread also executes a similar logic sixth, and sends a network request. As shown in fig. 2(a), since the two network requests are separated by several tens of seconds, the wireless communication module is woken up twice, and there are two corresponding "tail times", thereby causing additional network energy consumption.

For android applications, a significant portion of background requests can be delayed for tens of seconds, or even two or three minutes, without affecting the user experience. Such as the above-described newsfeed, version update push, and the like. For these network requests, the "tail time" network power consumption can be reduced if the merging is done in the time dimension, i.e. two requests are sent simultaneously, instead of being sent several tens of seconds apart. Fig. 2(b) shows the power consumption of the wireless communication module and the control flow obtained by combining the two requests in fig. 2 (a). First, after the thread responsible for sending the network request is awakened by the newsfeed thread, the network request is not sent directly, but enters a waiting state. After a period of time, another network request thread is awakened by the background update push thread, and at the same time, it also enters a wait state. And ninthly, after the waiting state is finished, the two threads simultaneously send network requests, and the corresponding wireless communication modules are awakened once. As shown in fig. 2(b), the power consumption of the merged network request is much smaller than that of the network request before merging.

In order to implement network request merging, 1) a network request scheduling mechanism is needed, i.e., the network request which is originally and directly sent can be delayed to be sent; 2) a network request scheduling algorithm finds out the request which can be delayed and scheduled, and utilizes the scheduling mechanism to delay the transmission. The network request execution logic for automatically reconstructing the mobile application can be realized by utilizing the structural reflection, and a scheduling mechanism is built in the application. However, this requires that developers of different applications all use the same auto-reconfiguration framework and that all applications need to be recompiled, deployed and run. This is clearly not practical for a large number of closed-source applications that belong to different application developers.

Case two:

with the popularization of the WeChat, the WeChat not only is a simple communication application, but also becomes a necessary tool for work communication; the WeChat friend circle and the public number are used for marketing; becomes the largest self-media publishing platform. The core of the WeChat is used as a communication tool, and the function of the WeChat is mainly to meet the requirement of common users. Even so, it is difficult to satisfy the specific needs of the general users. For example, as WeChat is used for a longer time, the cached chat log file is also larger, and it is difficult for a common user to manage the chat log of the user. Further, it is difficult for wechat to meet the specific needs of a particular group of westerners, self-media people, etc. To implement open sharing of data and functions in a WeChat application, a user-oriented user interface needs to be converted into an interoperation-oriented programmable interface. Generally, for user-oriented user interface, the execution starts with operations such as clicking, dragging, and inputting of user interface elements. Through partial logic processing, external resources are accessed in the modes of network request, database query and file reading and writing, and corresponding data are acquired or corresponding functions are realized. In this process flow, most of the logic is similar to the execution logic of the interoperation-oriented programmable interface, except that the start of its execution is different. However, the granularity of existing behavioral reflex monitoring and control is at the method level. Based on the existing behavior reflection, the method of inserting some execution logic into the execution flow of the existing application is difficult to realize the conversion of the user interface facing the user into the programmable interface facing the interoperation: existing functions may correspond to a set of program activities at runtime, with method-granular behavior reflecting that its monitored content is limited, unable to monitor execution of instructions within the method, and then, unable to control. This has led to existing solutions often being based on existing code and documentation, and the flow of developers, the absence of documentation, and even irregular annotation of source code, can make iterative development of mobile applications difficult for a development team.

It can be seen from the above two case analyses that the fundamental reason for the difficulty in implementing the mobile application interoperation interface is that the existing work lacks a complete and detailed description of the application behavior, and there is no way to control the self-description of the instruction granularity. Therefore, whether a runtime model that completely describes the application behavior and is operable can be provided becomes a difficult point and key to solve the problems of the present invention.

In view of the above technical problems, an embodiment of the present invention provides a method for constructing a runtime model of a terminal application behavior, where the runtime model includes a runtime stack model and a runtime stack model.

After the application runs in the operating system, which may be referred to as one or more processes, the operating system loads the executable files needed by the mobile application into memory and begins execution. Generally, the memory occupied by a process can be divided into three regions:

a code segment: a memory area for storing the execution code, having read-only properties;

stacking area: the method can be divided into a memory area (data segment) for storing global variables and a memory area for dynamic allocation in process running, for example, in object-oriented programming language Java, a thread creates a new object which is equivalent to applying for a memory in a heap area;

a stack area: for temporary storage of local variables, etc. For example, in object-oriented programming language Java, when a thread calls a method, a frame (frame) is newly applied, and data such as parameters required by the method is stored in the frame.

The inventors have carefully studied that, when the terminal application runs, the execution of the code segment may cause the memory data in the stack area and the stack area to change. The runtime model of an application needs to be able to reflect the application's: 1) execution of the code: during development, the code of the mobile application can be abstracted into a control flow graph, and then corresponding to the operation, the execution condition of the code can be abstracted into one or more paths of the control flow graph; 2) change of memory data (e.g., heap area): at development time, a developer designs various Data structures to represent the Data Model (Data Model) of an application, and at runtime, execution of code causes creation, modification, deletion of instances of these Data structures, i.e., allocation and modification operations corresponding to a set of memory. From a memory area perspective, the most significant areas affected by program execution are the stack and heap areas of memory. 1) The path in the control flow graph in (1) can be regarded as a description of stack change, and the change of heap area data is mainly reflected in (2).

Thus, the application runtime model constructed by the present invention includes a runtime stack model that describes stack changes and a runtime heap model that describes heap changes. The runtime stack model also comprises code acquisition, so that the memory occupied by one process is completely divided into three areas. By the runtime stack model of the embodiment of the invention, the code execution condition of the mobile application at any moment can be known; while through the runtime heap model, the state of the object data on which code execution depends at any time can be known.

Runtime stack model

The control flow graph is a directed graph G ═ B, P >;

wherein B ═ B₁，b₂，…，b_nIs a basic block;

is a control flow path;

for arbitrary p_i＝(b_i1，b_i2)，p_iE P, if and only if b_i2Possibility b_i1And then executed. At run-time, the control flow graph is instantiated into one or more control flows and the basic blocks are executed according to paths in the control flow graph. The invention refers to the basic block executed at a certain moment as the activity, and the runtime stack model in a period of time is composed of a control flow graph, one or more control flows and a group of activity sequences. When the granularity of a basic block is instruction granularity, the active sequence is an instruction execution sequence. The following presents a formal definition of the runtime stack model described in the present invention.

Defining a runtime stack model as a set of one or more activities that the control flow takes place over a period of time, M ═ G, T, a, I, E >,

wherein G ═<B，P>To control a flow graph, T ═ T₁，t₂，…，t_nI ═ I, a set of times₁，i₂，…，i_nDenotes t₁To t_nHeap status of the program at time.

Let F be F₁，f₂，…，f_nA set of control flows, then a-F x I x T x B is the set of activities that occur over a period of time,

representing a set of contexts in which two activities occur.

The runtime stack model may be viewed as a collection of multiple paths of the control flow graph, and therefore, the edges in the runtime stack model must have corresponding edges in the control flow graph. Namely:

wherein a is_i(f_i1，t_i2，b_i3)， a_j＝(f_j1，t_j2，b_j3)a_j＝(f_j1，t_j2，b_j3) Is given by_i3，b_j3) E.g. P. In addition, the edges of the runtime stack model represent the context in which two activities occur, and for two activities in the same control flow, there is a chronological order; if an edge exists for two activities in different control flows, it indicates that there is also a dependency between the two activities.

If two activities have a tandem relationship in the same control flow, it is not possible for any other activity to occur between the two activities in the same control flow, i.e.

Wherein a is_i＝(f_i1，t_i2，b_i3)， a_j＝(f_j1，t_j2，b_j3) If f is_i1≠f_j1Then, then

In different control flows, if two activities have a tandem relationship, then for the control flow in which the next activity is locatedOther activities may occur first after the time that the previous activity occurred.

Wherein a is_i＝(f_i1，t_i2，b_i3)，a_j＝(f_j1，t_j2，b_j3) If f is_i1≠f_j1Then t is_i2＜t_j2。

Defining program activity a_jSynchronization is dependent on program activity a_iIf a is_jIs started or ended by a_iIs determined, in general, by_iOften some threads operate synchronously. Scale a_jCommunication is dependent on a_iIf a is_jIs a by_iAn activity is generated. Taking object-oriented programming language Java as an example, the granularity of the basic block is the granularity of the basic block of the source code. Each control flow of the run stack model corresponds to an execution sequence of a Java thread. There are six states for state transitions of a thread:

creating: the thread object is just created and is in the state when not started;

operation: a thread is in a running state where it may wait for some system resources, such as a CPU;

blocking, in which a thread is waiting for a monitoring Lock (Monitor Lock), for example, when the thread enters a method for modifying a synchronized keyword or a code block, the thread enters a blocking state;

wait/timed wait-a thread is waiting, for example, when it calls the wait method of an object to enter a wait state. When the notify method of the object is used, the thread can re-enter the running state;

death: when run method execution for a thread is finished, the thread enters a dead state.

From the above state transition, it can be seen that in some cases, one thread in a running state may wake up another thread in a non-running state to enter a running state. The invention refers to the relation between threads as synchronous dependency relation. When these inter-thread wakeups occur, there is an edge across threads (across control flows) corresponding to the activity that occurs for a thread in a running state in the runtime stack model than the activity that occurs when a non-running state thread enters a running state. From the Java language level, these thread dependencies can be reduced into four classes, as shown in Table 1.

Table 1: java language level synchronous dependency classification

In Table 1, the activity of each running thread corresponds to the activity of the non-running thread. Therefore, the inter-thread dependencies in Table 1 are referred to as synchronization dependencies. Based on the state transition of the threads, Java provides various multi-thread programming libraries to support, for example, read-write locks, reentrant locks, blocking locks, thread pools, and the like, provided in Java.

FIG. 3 illustrates an example of communication dependencies between threads-a producer-consumer model. In this example, the Task class represents a computational Task; static field tasks represents a task queue to be processed; the postTask method represents the generation and submission of tasks; the handleTask method represents a processing task. As shown in fig. 3, there are two threads: 1) thread 1 represents a producer thread and submits a task to a task queue to be processed; 2) thread 2 represents a consumer thread that looks at the pending task queue at intervals and processes the corresponding task. In this example, there is no synchronization dependency between the producer thread and the consumer thread — the consumer thread automatically transitions from the timed wait state to the run state at intervals, but there is a communication dependency — if the producer thread does not submit a task, the consumer thread's task run method is not called.

It can be seen from the above example that the generation of the active relationships in the runtime stack model must rely on the corresponding data at runtime. In a classical data flow analysis, a data flow analysis algorithm calculates a data flow equation according to the structure of a control flow graph and iterates to a stable point. Therefore, the runtime model of the application needs a runtime stack model describing the change of the data state of the memory stack area in addition to the runtime stack model.

Runtime heap model

Classical dataflow graphs are often used in the demand analysis phase. Software utilizes a dataflow graph to decompose a software system to be developed from abstraction to concrete layer by layer. The data flow graph is a directed graph and comprises two different types of edges and a plurality of different nodes for describing data from an initial node, and layer-by-layer calculation is carried out to finally obtain a final result. At run-time, a certain node of the dataflow graph essentially corresponds to a change in a set of memory data. Therefore, the heap model of the behavioral runtime model of the application of the present invention focuses on the change of the memory data, not on the changed operation. The heap model during the operation of the invention only models the heap area of the memory during the application operation from the memory data change angle.

The runtime heap model is a set of memory modification activities M ═ D, a, T, R > from a set of initial values of a set of memory data and the heap occurring over a period of time;

wherein D ═ { D ═ D₁，d₂，…，d_nIs the initial value of a set of memory addresses, a ═ i₁，i₂，…，i_nT ═ T, activity that causes memory data changes₁，t₂，…，t_nAnd is a time stamp.

For different object-oriented programming languages, different application programming interfaces are provided to realize dynamic allocation and recovery of memory. For example, C/C + + language, and realizes the allocation and recovery of the memory by providing malloc and free functions in the standard library function; in the Java language, a new object can be created through new keywords to realize the allocation of the memory, and the recovery of the memory is realized through an automatic garbage recovery mechanism.

In view of the technical problem of the present invention, how to construct the above runtime model of the embodiment of the present invention will be described in detail below.

Referring to FIG. 4, a flow chart of steps of a method of constructing a runtime model of a terminal application behavior of the present invention is shown, the method comprising the steps of constructing a runtime stack model of the terminal application behavior and constructing a runtime stack model of the terminal application behavior.

The step of constructing the runtime stack model of the terminal application behavior may specifically include:

step S401: when the terminal application runs, acquiring a code which is really executed in a memory of the terminal application, and abstracting the code which is really executed to generate a control flow diagram;

step S402: inputting a control flow graph to be monitored to a preset behavior interpreter aiming at the control flow graph;

step S403: the control flow graph needing to be monitored is interpreted and executed by the behavior interpreter, and stack activity of the terminal application in operation is generated;

step S404: generating a dependency relationship between control flows of the stack activities when the terminal application runs to obtain a run-time stack model of the terminal application behaviors;

in the embodiment of the present invention, building a runtime stack model includes the following three basic elements: 1) a control flow graph: all possible activities and all possible activity relationships are contained, and the source code of the program or the abstract representation of the intermediate code is adopted; 2) a set of activities occurring at runtime, i.e. a path in a control flow graph, can be considered as a set of nodes in the stack model; 3) the relationship between activities that occur at runtime, i.e., the edges of the stack model.

The construction of the stack model requires efforts to address three challenges: firstly, due to technologies such as compiling optimization and runtime just-in-time compiling, the source code of the application and the byte code generated by compiling may be different from the code segment in the runtime memory, so how to ensure that the activities of the control flow graph and the runtime can be mapped correctly; second, how to generate activities of different granularity to describe the change of the running state of the complex application; thirdly, as the existing application uses multithread compiling in a large quantity to ensure the response speed of the interface, the user experience is improved, and how to generate the dependency relationship between the control flows in the runtime is realized. For the above three challenges, the first step of the embodiment of the present invention is to obtain the actually executed code in the memory during runtime, abstract the currently executed code, and can ensure accurate mapping between the control flow graph and runtime activities. And in the second step, a behavior interpreter is provided, and the behavior interpreter takes the control flow graph generated in the previous step as input to interpret and execute the control flow graph. Thirdly, explaining the activity of generating the application runtime in the execution process; and the last step of model generation is to generate the dependency relationships between the control flows at runtime.

The generation steps of the runtime stack model are further outlined below.

Firstly, generating a control flow graph. During the development of the application, since the installation package release will contain the intermediate code compiled by the application, the application will obfuscate the generated intermediate code, for example, dex bytecode under android, for protection purposes by using various obfuscation tools. This can result in source code that is directly provided that is difficult to map with the activities that the application runtime performs. The obfuscated code is loaded for execution by the application runtime environment. For example, the Dex bytecode in Android applications is executed in Android Runtime (ART). The invention obtains the applied byte codes by modifying the application operation mode. The method has two advantages that firstly, intermediate codes or source codes matched with the intermediate codes or the source codes are not required to be provided, and the practicability of the method is improved; and secondly, the consistency of the executed activities can be ensured by the intermediate code generated during the operation of the application, so that the matching of the control flow graph and the control flow during the operation is ensured.

Specifically, the generation of the control flow graph:

deriving the boundaries of the basic block according to the class of instructions, an instruction being the start of the basic block if and only if: 1) it is the first instruction of a method or 2) there is a possibility that an instruction may jump to the current instruction. And one instruction is the end of the basic block if and only if: 1) it is the return of a method, such as a return, throw instruction; or 2) it is a jump instruction, such as if, goto, or the instruction may throw an exception. After the start and the end of the basic block are defined, the control flow graph generation algorithm of the invention is divided into the following three steps:

the target addresses of all jump instructions (including explicit jumps and exception jumps) are computed, and the instructions at that address are marked as instructions that can begin as a basic block.

Initializing a basic block queue to be empty, traversing each instruction from low to high, if the instruction is the start of a basic block or the current basic block is empty, establishing a new basic block as the current basic block and placing the new basic block at the end of the basic block queue; if the instruction is the end of the basic block, the instruction is put into the current basic block, and a new basic block is created as the current basic block and put at the end of the basic block queue.

Traversing the whole basic block queue, and establishing the predecessor and successor relations of the basic block: if the last instruction of a basic block is a jump instruction, adding a directed edge between the basic block and a jump target basic block; if a basic block is not a return or goto instruction, a directed edge is added to the next basic block in the queue.

And secondly, reallocating the execution of the control flow graph according to the demand, and generating the activity of the application operation by the behavior interpreter. During application operation, each thread corresponds to a control flow, and each control flow can be regarded as an ordered set of activities. This set of activities can be considered as a path of the control flow graph generated in the previous step. The present invention therefore proposes a behaviour interpreter adapted to monitor the execution of a program. And according to the configuration, distributing the control flow graph needing to be monitored to a behavior interpreter for execution. If the execution of each instruction corresponds to an activity, this results in the set of instruction sequences becoming very large and difficult to handle: 1. numeric computation statements are difficult to correspond to semantics; 2. the large amount of activity generated by program loops can overwhelm the real processing logic. Thus, the present invention separates activities into numerical computations, branch control, method calls, etc., and implements an activity filter in the behavior interpreter that provides activity filtering of multiple granularities in order to generate the appropriate stack model.

In the construction method of the embodiment of the invention, the method comprises a class filter and an activity type filter; the class filter is used for removing program activities which are not concerned by developers based on coarse-grained screening of regular matching of the package and the class name; the activity type filter is based on fine-grained filtering of activity types and is used for removing activity types which are not concerned by developers.

The activity types of the stack activity comprise method start and method end, field reading, array reading and synchronous instructions; based on the activity types, the implementation method of step S403 includes:

The embodiment of the invention can generate the required stack model by flexibly appointing the specific package, class and instruction type, thereby improving the usability.

To improve the accuracy of the construction model, the invention considers the beginning and the end of the execution method calling instruction as the activity and records the activity. From the aspect of Java's method calls, a tree-like structure appears as the call: for a method call, multiple method calls may occur during execution. Therefore, in order to ensure that the generated sequence can be restored to such a tree structure, the invention uses the subscript s to indicate the activity of starting the method call, and the subscript e indicates the activity of ending the method call. The two program execution cases for the above example would correspond to two different sequences:

1) if calcultates are all called in doInBackground, the sequence is d_s→c_s→c_e→c_s→c_e→d_e；

2) If there is a calrule that is its own recursive call, the sequence is d_s→c_s→c_s→c_e→c_e→d_e。

The method call for restructuring the generated activity sequence into a tree structure may employ a call tree construction algorithm. The process of the algorithm is actually a process simulating the execution flow of the Java virtual machine. At the start of the algorithm, the activity of each thread corresponds to an actions object. For each thread, two data structures are maintained: 1) queue the executed child control flow; 2) a stack of functions for the execution of the current control flow. Traversing each activity in actions in order, and making the following decisions:

if there is no current control flow, one is instantiated and pushed onto the function stack.

If the current activity is a method start type, a new child control flow is instantiated, the newly instantiated child control flow is pushed to a function stack, and the new child control flow is added into an activity queue of the current control flow. Finally, the current control flow is set as the just instantiated child control flow.

And if the current activity is the method ending type, performing stack popping operation. If the function stack is empty after the stack popping is finished, the execution of the sub control flow of the current thread is finished, and the function stack can be added into an executed sub control flow queue; if the function stack is not empty, the current control flow is set to the child control flow at the top of the function stack.

Otherwise, the current activity is pushed into the activity queue of the child control flow.

Similar to the method call instruction, other types of instructions may have both instruction start execution and execution end activities. Because these instructions are atomic, i.e., there is no other activity occurring in the same thread between the start and the end of instruction execution, there is only the activity for these types of instructions to start execution.

In a specific implementation, the activity representation implementation at runtime may have a storage form: it may be an object in memory, or a persistent binary file or ASIC II file. In the present invention, the runtime heap model may be represented in the form of a Backus-Van.

The present invention achieves this scalability through a mechanism of serialization and deserialization of activities. When the runtime model is generated, storing the activity sequence in the runtime model in a buffer area with a configurable size, and when the number of activities exceeds a preset value, serializing the activities in the buffer area and persisting the activities in the buffer area to a local storage.

And thirdly, generating the dependency relationship between the control flows. Multi-threaded programming has become an important part of android application development. Efficient response of the user interface and parallel acceleration of multiple computing tasks can be achieved using multi-threaded programming. Thread synchronization and mutual wake-up in multi-threaded programming (referred to as thread dependencies) can be abstracted as to edges between control flows in the stack model. The dependency of a thread is a time-dependent relationship: at some point, the main thread may send a computing task to the background thread, at which point the activity performed for the background thread depends on the activity of the main thread; at the next moment, after the background thread finishes the calculation task, the main thread is informed to update the interface; at this point, the activities performed for the main thread are dependent on the activities of the background thread. Thus, the present invention classifies these inter-thread dependencies and processes the different types of dependencies to generate these dependencies at runtime.

In the embodiment of the present invention, the dependency relationship includes a synchronization dependency and a communication dependency. The invention relates to a method for realizing the cooperation among a plurality of threads by utilizing thread state transition related methods provided in Java language specifications, such as thread. The invention refers to that the dependency relationship among the threads is communication dependency. In actual development, an application developer can reuse various multithreading programming classes provided by a framework layer to improve development efficiency. Although the framework layer provides a good semantic application programming interface to the class of the application layer and shields the implementation details, the implementation is often complex in order to ensure the performance and robustness of the framework. Programs implemented using these programming frameworks may be both synchronous and communication dependent between runtime threads.

Taking the beginning of the background task. execute method call to the end of the onPostexecute method call in FIG. 5 as an example, there are two active threads in total, and they have mutual synchronization dependency and communication dependency. The method calls of the procedure are shown in FIG. 6: the upper and lower axes in the diagram represent the situation of the method stack of the foreground thread and the background thread changing with time respectively; the boxes in the figure represent methods, wherein the grey boxes represent methods of the framework layer and the white boxes represent methods of the application layer, i.e. methods implemented by the application developer; arrows in the figure represent inter-thread dependencies, where solid arrows represent synchronous dependencies and dashed arrows represent communication dependencies. In the method calling process shown in fig. 6, the background task execute method calls the threadpoixeexecutor execute method in the execution process (activity:), and then calls the start method (activity:) of the background thread object, and further causes the call of the run method (activity:) of the background thread. After the run method of the background thread starts to be executed, the background task and dolnbackground method is finally called through layer-by-layer calling, and in the execution process of the method, besides the calculation task of the computing method, the computing task of the computing method is called, the AsyncTask and publishing progress method is called (activity (c)), so that the foreground thread calls the onProgressUpdated method (activity (c)) to update the interface. Subsequently, after the background thread finishes the computing task, the foreground process is notified that the current task is finished in a similar manner again. Wherein, the activity II is synchronous dependency with the activity III, and the activity III is communication dependency with the activity IV.

Generation of synchronization dependencies:

in order to realize the generation of the synchronization dependency, the methods related to the synchronization dependency in Java are considered to be activities that need to be collected. The runtime stack model thus collects various synchronization dependency related activities as in table 1.

For two activities where there is a dependency on synchronization, the latter activity may be the end of a method or the start of a method, whereby the dependency on synchronization can be divided into two and processed separately:

for the case that the end of one method depends on the end of another method, the activity which can be matched in other threads is searched from back to front by using the timestamp, and if the activity is found, the synchronization dependency relationship is corresponded. For example, the end of thread.join depends on the end of thread.run; wait ends dependent on the end of object.notify, for methods like thread. If found, it corresponds to a synchronization dependency.

When the synchronous dependency relationship between control flows is generated, for the condition that the end of one method depends on the end of the other method, searching activities which can be matched in other threads from back to front by using a timestamp, and if the activities are found, corresponding to the synchronous dependency relationship; for example, the end of thread.join depends on the end of thread.run; wait ends dependent on the end of object.notify, for methods like thread. If found, it corresponds to a synchronization dependency.

For the case that the start of one activity depends on the end of another activity, the current thread is checked first, if the activity is the first one in the current thread to execute, the activity depends on the other thread to end the activity, otherwise the activity is only a normal method call and does not depend on the activity of the other thread. Run starts depend on the end of run start, for example, since in Java the start of an activity (i.e. the invocation of a certain method) can be done anywhere any number of times, including the run method. Therefore, to determine whether a call of a thread.run method depends on a call of the thread.start method, it is necessary to check the current thread.run first, that is, whether the call of the thread.start method cannot directly find a match from the next according to the timestamp: if the activity is the activity executed first in the current thread, it is thread.start ending activity dependent on another thread, otherwise it is only a normal method call and not dependent on the activity of another thread.

Generation of communication dependencies:

the method for realizing the thread state transfer between the threads is not based on Java, and the cooperation between multiple threads is realized.

Taking the activities of (c) and (c) in fig. 6 as examples, the concrete implementation is the next method and the enqueueMessage method based on MessageQueue. In this process, if there are elements in the pending queue of the foreground thread waiting for processing, the enqueueMessage method caused by activity five will only add the current task to the queue and will not explicitly wake up the foreground thread. Logically, however, it can be considered that for a certain MessageQueue object, the Messageobject returned by the next method is transmitted to the dispatchMessage method in the form of a parameter by the Handler, so the end of the next method of the MessageQueue can be considered to depend on MessageQueue.

When generating communication dependency relationships between control flows, all classes related to inter-activity communication dependencies are summarized, and methods related to these classes and methods related to thread dependencies are used together as a knowledge base for generating communication dependencies. The knowledge base may also support customization of applications.

Referring to fig. 4, the step of constructing the runtime heap model of the terminal application behavior may specifically include:

step S405: generating an initial state of a heap area when the terminal application runs;

step S406: and generating heap operation activities to obtain a runtime heap model of the terminal application behaviors.

In an embodiment of the invention, the runtime heap model comprises the following basic elements: 1) an initial state of a heap area; 2) a set of activities that occur at runtime that affect heap region data. The invention firstly gives a description method of the initial state of the heap area and generates the initial state of the heap data conforming to the representation during operation. Secondly, the invention provides a description method of heap operation activities, and constructs the activities in the runtime heap model at runtime. Finally, the BNF representation of the heap area initial state and heap operation activity is given.

The generation steps of the runtime heap model are further outlined below.

First, the initial state of heap area is generated. The initial state of the heap area is the state of heap area data at the start time. In the Java virtual machine specification, only the simplest description is given of heap regions: a heap is an area used at runtime to analyze all class instances and arrays, managed by an automated storage management system (i.e., garbage collector). Objects in the heap are never explicitly reclaimed, but are automatically reclaimed by the garbage reclaimer. The initial state of the heap area can be regarded as a snapshot of the heap area data at a certain time, so if there is another thread to continue execution and perform heap operation (for example, create an object, perform garbage collection, etc.) while generating the data state of the heap area, the atomicity of the initial state is destroyed. Therefore, the invention firstly provides a BNF representation for describing the initial state of the heap area, and adopts a mode of 'freezing' the heap area data when the initial state of the heap area data is generated, thereby ensuring the atomicity of the initial state generation process.

And II, generating activities in the heap model. When the application runs, the Java garbage collector can generate the activity of recovering the memory. In addition to these activities, other activities may be considered a subset of the activities in the runtime stack model. On the one hand, if each operation that affects the heap area data corresponds to an activity, this set of activities becomes extremely large and difficult to handle. For example, if there is an I/O operation of a large file in an application, if all the operations are recorded in the form of activity, the data volume of the activity will not be less than that of the large file; on the other hand, similar to the control flow model, it is possible that only partial classification, execution of the method is focused on, and it is difficult to generate an excessively large heap model and analyze it instead. The description of the activity is extended to support the description of garbage collection activity, and similar to a runtime stack model, various granularities of activity selection screening options are provided to generate an appropriate heap model. The generated heap model describes the change condition of the concerned object in detail, so that the state of the heap object at any time can be queried by using a timestamp-based heap object state query algorithm.

Next, a specific example is used to describe the runtime heap model modeling process:

the data of the Java heap area includes only instantiated objects and arrays. For an application, the creation of an object may occur in the application layer code or the framework layer code, so we divide the object in the application into an application layer and a framework layer, and taking the code implemented in fig. 5 as an example, before triggering a click event, the object in the heap area is as shown in fig. 7 (a). Each circle in the graph represents an object, and the lines between circles represent reference relationships. The objects related to the application service logic in fig. 7(a) include a display interface, flowactivity, a Button that can trigger a background computing task, a background task backgroudtask to be processed, TextView for displaying a task computing result, and a click event listener, oncolicktriesener. In addition to the objects related to business logic, there are many framework level objects. For example, the frame layer is a MessageQueue object. The implementation of the background processing task can inform the foreground to update. Solid arrows in fig. 7(a) and 7(B) indicate the reference relationship of the objects, that is, if an object a has a field pointing to another object B, there is a directed connection from a to B; the dashed arrows indicate that the reference relationship of the object exists during the whole event processing, and at the end, the reference relationship does not exist.

During event triggering, the following objects are created: in the execution process of the background task, a Thread execution method is called, and at this time, as the object executes the execution method for the first time, the object creates a Thread object (i) which is a Thread executed by the background; after calling the start method of the background thread object, the method further causes the call of the run method of the background thread and formally starts to call the doInBackground method. In the method, besides calling a calclean method to perform calculation tasks, also calling an AsyncTask.publichprogress method, and before executing the method, the introduced parameters are packaged into a newly created Integer [ ] object; in the process of executing the method, a Message object is newly created and put into a global Message queue. When the foreground thread receives the Message and executes onprogress published, a StringBuilder object is created to construct parameters required by setText, and a new Stringobject is instantiated by a StringBuilder. Before the execution of the doInBackground method is finished, a new StringBuilder object is created, and a return value String object is calculated. The String object is re-encapsulated into a newly created Message object (and informs the foreground thread to execute the onPostExecute method. The above process has simplified part of the steps, and more object creations exist in the actual operation. For example, the Thread object does not directly depend on the backgrounttask but indirectly depends on the backgrounttask object through layer-by-layer encapsulation of FutureTask, Callable and other objects. After the process is finished, all the objects (r) to (b) may be recycled in a certain garbage recycling.

In the heap model, the present invention treats the instantiation, field assignment, and reclamation of each object in FIG. 7(b) as an activity. Like the representation of the runtime stack model, the present invention below presents a representation of the runtime stack model, preferably in the form of a Backos-paradigm.

The DataAction is similar to a ControlAction described in a key technology for constructing the runtime stack model, the ControlAction is used for describing an executed instruction condition, and the DataAction is used for describing a memory data change condition. Number represents a Number type, and can be a numerical value or a memory address; string indicates the String type. From the above representation, it can be seen that the complexity of the model depends mainly on: 1) the number of objects in the initial state; 2) the number of heap area data activities.

In the android implementation, the heap area can be divided into three sub-areas: 1) an application Heap (App Heap), a memory region used when the current application instantiates objects and arrays; 2) a mirror Image Heap (Image Heap) which is loaded with a memory area of the current application mirror Image; 3) hatching heaps (Zygote Heap), memory areas where system classes loaded at system start-up are stored. The initial state described in the present invention focuses primarily on the application heap that changes most at runtime. For the Dalvik virtual machine and the ART virtual machine on the android platform, the state of the current application heap can be saved in the state of a file at any time (heap dump operation). The file is in a private memory mirror format and can be converted into an hprof format according with the specification of a J2EE platform through an android developer tool.

However, the heap dump based on the current application can only reflect the heap state at a certain time, and the heap state at any time in a period of time is difficult to reflect. First, it is time-consuming to suspend all threads to perform a heap dump operation once, and the generated files vary from several tens megabytes to several hundreds megabytes, and it is difficult to implement a heap state that reflects an arbitrary time within a period of time by performing a heap dump operation once at intervals. Secondly, the heap dump operation is executed without dumping the objects recycled by the recycler, including the temporary objects generated in the execution process, however, for an execution process, the generated temporary objects are also important for describing the execution of the process, for example, the objects from (c) to (b) in fig. 7 are temporary objects, and the objects cannot be persisted by directly using the heap dump.

In this embodiment of the present invention, the step of constructing the runtime heap model of the terminal application behavior includes: the activities of the terminal application runtime include instantiation activities, modification activities, and reclamation activities.

Therein, the instantiation activity (NewAction), i.e., the activity of creating a new object, a new array, may correspond to the execution of instructions in bytecode, such as newInstance, newArray, etc., at runtime.

ModifiyAction, i.e., an activity that modifies the values of static fields of a class, fields of an object, elements of an array, may correspond to instructions in bytecode such as sput, iput, aput, etc.

A reclamation activity (GCAction), an activity that impacts objects in the heap when performing garbage reclamation. For the recovery activities, the garbage recovery mechanism is an automatic memory management mechanism. When the data in one piece of memory is not used any more, the data is recycled and released so as to facilitate the next distribution. The specific garbage collection algorithm is implemented by a reference counting method, a reachability analysis algorithm and the like.

The recycle activity is for instructions that do not correspond to dex bytecode at runtime because its specific implementation is at the virtual machine level. Reclamation activities may be further subdivided into cleanup activities and compaction activities. The so-called purge activity is the purging of objects that are no longer needed; the compression activity is to arrange active objects into a continuous memory space, so as to avoid the situation of allocation failure caused by fragmentation when allocating a large memory.

In addition to the objects created by the application layer code implemented, the framework layer code creates a large number of objects, in some cases even several times larger than the objects created by the application layer. There is a need to provide a mechanism for heap model complexity management to ensure the accuracy and ease of use of the generated runtime stack model. Similar to the two-level screening mechanism described above, there is also a two-level screening mechanism for activity generation of the heap model. Coarse-grained screening based on regular matching of packets and class names and fine-grained screening based on activity types.

The invention preferably provides 6 heap operation activities, and the activity types of the heap operation activities comprise object instantiation, array instantiation, object field writing, array element writing, clearing activity and compression activity;

the step of generating heap operational activity comprises:

For the control challenge of behavior reflection, i.e. the second challenge stated in the background of the present invention, the behavior control at the instruction level is supported, which is not the focus of the present invention, and thus is not described herein in detail.

Next, a specific example is used to verify the effectiveness of the embodiment of the present invention in monitoring the application behavior of the terminal.

Aiming at android mobile applications widely used in the mobile internet, a prototype system implementation of a behavior reflection framework is given: refletall. The Reflection is called as Reflection at low level interpreters, and has double meanings, namely, Reflection realized based on a behavior interpreter at the bottom layer; secondly it can monitor and control the application behavior at the instruction level. Refletall is based on an android operating system open source project. In order to realize the monitoring and control of the mobile application behaviors, the refleltall platform can be divided into a behavior runtime model construction subsystem, a model analysis and code generation subsystem and a running subsystem, so that the monitoring and control in a behavior reflection framework are realized.

Referring to fig. 8, a schematic diagram of a subsystem architecture for refleltall model generation is shown. The behavior runtime model construction subsystem of refletall realizes the construction of a behavior runtime model of mobile application, and the core of the behavior runtime model construction subsystem is realized in a system layer and consists of an optimization-inverse optimizer, a behavior interpreter, a model construction module and an interface layer. The four modules realize the monitoring and control of the mobile application behaviors.

Wherein the optimizer-inverse optimizer: the android runtime environment can load native instructions that the CPU can directly execute. Therefore, the native instructions need to be switched into bytecodes, i.e. to be optimized and interpreted by the behavior interpreter, so as to monitor the runtime activity of the mobile application. Due to the complexity of mobile applications, it is difficult to monitor all activities in the mobile application execution, and therefore a two-level screening mechanism is introduced. The optimization-inverse optimizer realizes a class screening mechanism in a two-stage screening mechanism, and can inversely optimize the class to be monitored into byte codes as required and perform interpretation and execution through the optimization-inverse optimizer; while for classes that are not monitored, execution is still in the native executor. The optimiser-inverse optimiser will trigger in three cases: 1) when a command for starting monitoring is received, screening the currently loaded class method according to the configured parameters, and performing inverse optimization; 2) when receiving a command of finishing monitoring, re-optimizing the class which is subjected to reverse optimization at present, and enabling the class to enter the native actuator again for execution; 3) when the class linker loads a new class, the class is subjected to a screening and de-optimization process similar to that in case 1). In order to ensure the correctness of program execution, the process of inverse optimization needs to suspend the execution of all threads as part of garbage collection algorithm, and the execution of the threads is resumed after the execution of inverse optimization is finished. By such local de-optimization, and maintaining the state in which interpretive execution coexists with native execution, the performance overhead of monitoring can be greatly reduced.

A behavior interpreter: the behavior interpreter is an interpreter for interpreting byte codes in the execution dex format and can monitor the activities occurring in the current program execution during the interpretation execution. The activities in the mobile application behavior runtime model are mostly generated by a behavior interpreter. In addition to activities generated by the behavior interpreter, the garbage collector may also generate partial activities — garbage collection activities. The behavior interpreter also implements an activity screening mechanism in a two-level screening mechanism, which can generate different types of activities according to the configured activity collection granularity.

A model builder: the activities generated by the behavior interpreter and the garbage collector are built in the model builder. When the activities generated during the operation are more, the memory occupation is larger. Thus, the model builder enables online and offline model building. When the activity is less, the model builder runs in an online model building mode, and when the activity number reaches a configured threshold value, the model builder can persist the currently generated activity sequence and persist the activity sequence to storage in a file form.

Interface layer: the functionality provided by the optimiser-inverse, the behaviour interpreter and the model builder is encapsulated. While also providing the interface required for deserialization activities such as finding an object from an address and translating a given object to an address, etc.

In the prototype implemented by the embodiment of the present invention, two mobile application behavior runtime models can be generated: 1) a refined model containing runtime data dependence; 2) do not contain a reduced model of runtime data dependencies. Based on the implementation of a system layer, at a framework layer, a refletall comprises a group of behavior reflection interfaces, and can monitor the activities of applications with different granularities and generate application behavior runtime models with different granularities; and the remote debugging connection interfaces can control the start and the end of application activity monitoring. And in the application layer, the interface of the framework layer is encapsulated, so that an android application is realized, and a remote debugging interface can be externally provided in a Web service form.

The refletall analysis and code generation subsystem is a browser-server architecture. The analysis and code generation subsystem realizes that:

version management: and managing different versions of the mobile application and the interoperation interface by utilizing git. Meanwhile, the server side is supported to compile, and the compiled dex byte codes are pushed to the client side by using the interface management application of the client side.

Visualization of a stack model: provides a tree-like view and supports keyword-based data-dependent contamination analysis.

The interface running subsystem of refletall adds a behavior reflection type loader on the framework layer of the android source project, as shown in fig. 9, which is a schematic structural diagram illustrating the interface running subsystem of refletall according to an example of the present invention. When the application process starts, it will check if there is a loadable behavior reflection interface bytecode file. And if the behavior reflection interface byte code file suitable for the current application exists, loading the behavior reflection interface byte code file into an application process through a behavior reflection class loader, and simultaneously registering the interoperation interface provided by the current application with the interface management application by using a Binder communication mechanism. The interface management application provides services such as interface forwarding, state detection, and the like. The caller process can interoperate with the designated application through the interface management application.

During specific verification, the performance generated by the refletall model is verified by using an open-source application set containing 69 open-source android applications and a closed-source application set containing 39 closed-source applications.

The construction cost of the mobile application behavior runtime model is positively correlated with the activity quantity of the model, i.e., the more complex the application and the more activities, the greater the cost for generating the behavior runtime model. Compared with the closed-source application, the open-source application is far less complex to implement than the closed-source application. The median of the number of classes of the open source application set is 58, and the median of the number of methods is 246; the number of classes of applications in 75% of the open source application set is no greater than 167; the number of methods is not more than 859. And for the applications in the closed source application set, the median of the number of the classes of the applications is 14266, and the median of the number of the methods is 87717, which are 245 and 102 times the corresponding values in the open source application set. The hardware configuration used for the experiment was as follows: 1) the Android smartphone red rice 2A is used, the CPU of the Android smartphone red rice 2A is 1.5GHz, the internal memory is 1GB, and the version of the Android operating system is 5.1.1. 2) The experiment uses a common PC as a remote control terminal to control a mobile phone to carry out the experiment, the CPU of the PC is Intel core i 53427U (1.8GHz), the memory is 4GB, and an OSX 10.11 operating system is operated.

At present, the method for monitoring the application execution flow includes two modes of binding the run-time meta-message and reconstructing the compiled byte code, in addition to the mode of implementing the behavior interpreter in the present invention. Table 2 gives the granularity of activity monitoring supported by the three modes. Refletall is finer in granularity of activity monitoring than runtime binding-based approaches: support activity monitoring to the instruction level; meanwhile, the method has wider application range than the method based on byte code reconstruction: the bytecode-based reconstruction method needs to modify the compilation process of the original application and is difficult to be directly used on the application subjected to confusion and reinforcement.

Table 2: method comparison for monitoring program execution flow

	Granularity of execution flow monitoring	Whether or not byte codes are required
			Reflectall	Monitoring supporting method level and instruction level	Does not need to use
Method based on runtime binding	Supporting method level monitoring	Does not need to use
			Method for reconstructing based on byte code	Monitoring supporting method level and instruction level	Need to make sure that

The invention compares refletall and the performance of the method based on runtime binding in the aspect of monitoring the program execution flow in the experiment. Reflectall is compared with a bytecode-based reconstruction method in an experiment to monitor the performance of activities at instruction granularity.

Experiment one: method for binding comparison run-time meta-message

The Xposed framework is a framework service (rovo89,2012) that can monitor and modify program execution behavior without modifying the APK. Similar to refletall, the Xposed framework is also modified at the system level of the android operating system. The Xposed framework implements the behavior reflection of the meta-message model, i.e. Xposed binds corresponding meta-objects according to the configuration to the specified method when the application runs. In subsequent execution, the methods bound to the meta-object call the before and after execution the before and after methods in the meta-object. The Xpos module, which is executed by a similar monitoring program as refletall, is implemented herein using the Xpos framework. In the section, application starting time is used as an index, refletall and a monitoring module based on Xpos are respectively deployed on two red rice 2A mobile phones with the same hardware configuration, and the android operating systems are both 5.1.1. Performance of all application class executions of the application were monitored on the open source application set and the closed source application set by comparing Reflectall with the Xposed framework based method through the following 6 different experimental scenarios, as shown in table 3. In each scenario, each application was started 10 times, and the results of the experiment are shown in fig. 10(a) and 10 (b).

Table 3: method comparison for monitoring program execution flow

Fig. 10(a) is the experimental results on the open source application set. In the open source application set, 69 applications can be started normally under the above 6 scenes. The solid line portions in fig. 10(a) are three scenes in which refletall is deployed; the dotted parts are three scenes in which the Xposed framework is deployed. Without monitoring the program execution flow, the average startup time of the handset (scenario 1) deploying refletall is 392 milliseconds, and the platform startup time of the handset (scenario 3) deploying the Xposed framework is 449 milliseconds. This is because the implementation of the Xposed framework has some overhead when the application is loaded even if the meta-object is not bound. While Reflectall's optimizers-anti-optimizers enable all code to be executed in the native executor when not being monitored. In scenario 2, the average start-up time of refletall is 486 milliseconds, which is 23% more overhead than the non-monitoring case (392 milliseconds); whereas the Xposed framework based approach has an average startup time of up to 2078 milliseconds in scene 2 with 368% overhead compared to the no-monitoring case (449 milliseconds). While the refletall overhead is only 27% when generating more complex behavioral runtime models (scenario 3 and scenario 6), while Xposed framework-based approaches are up to 477%.

Fig. 10(b) is an experimental result on the closed source application set. In the non-monitoring scenario, the average start-up time of the closed-source application is 936 milliseconds (refletall) and 1010 milliseconds (Xposed) compared to implementing a simple open-source application. In scenario 2, however, due to the complexity of the application, refletall has 3 applications without response, and the average start-up time of the remaining 36 applications is 1601 milliseconds, which results in an overhead of 71% compared to the scenario without monitoring (936 milliseconds). Whereas there are 22 applications unresponsive under scenario 4, the average launch time for the remaining 17 applications is 4593 milliseconds, with an overhead of 354% compared to the unmonitored scenario (1010 milliseconds). While when generating more complex behavioral runtime models (scenario 3 and scenario 6), refletall has an overhead of 98%, while Xposed framework-based approaches are up to 470%.

The difference between the performance of the refletall generated behavior runtime model on the open source application set and the performance of the refletall generated behavior runtime model on the closed source application set is larger, because the application implementation in the closed source application set is more complex, the generated model is also larger in scale, and multiple garbage collection and activity persistence processes can be caused, so that more performance overhead is brought. The Xposed-based method has close overhead in the two application sets because 22 applications do not respond when monitoring is performed by using Xposed, and the cost accounts for 57% of the whole closed-source application set. One reason why the performance overhead of Reflectall is lower than that of the Xposed framework-based method is that the programming language used for the Xposed framework-based method is Java, while the behavior interpreter programming language implemented by Reflectall is C + +. When the execution flow of the application is more complex, the memory is allocated and recycled more frequently by the method based on the Xposed framework than by refletall. The above experiments show that this implementation of refletall of the present invention can handle more complex applications.

Experiment two: method for comparing reconstruction based on byte code

Byte code reconstruction frameworks are used in many commonly used Java libraries. One important use case for bytecode reconstruction is program analysis. For example, the popular bug locating tool FindBugs uses ASM at the bottom to analyze bytecode and locate bugs. Another common usage scenario is to reconstruct the code coverage report of the generator using bytecodes, such as Emma (Roubtsov,2005), JCover (JCover, 2017). The compact model generated by refletall may be converted into a code coverage report. This experiment will compare the differences between refletall and Emma in generating code coverage reports. Since the method based on bytecode reconstruction is not suitable for application to closed-source applications with application installation packages only. Therefore, this part is only done for the set of open source applications. The implementation still uses the application starting time as an index, and refletall and unmodified android systems are respectively deployed on two red-m 2A mobile phones with the same hardware configuration, and the android operating systems are both 5.1.1. The original edition application is installed on the mobile phone with the refletall deployed in the experiment; the application which is subjected to Emma instrumentation is installed on an unmodified android mobile phone, and the performance of generating a code coverage report on a refletall generation open source application set and an Emma generation open source application set is compared in the following 3 different experimental scenes. In each scenario, each application was started 10 times and the results of the experiment are shown in fig. 11.

Experimental results show that the application average starting time of refletall is close to that of Emma, the average starting time is 442 milliseconds and 455 milliseconds, and the overhead is 13% and 16%, respectively. Refletall is richer than Emma in terms of code coverage information generated. Table 4 gives the differences between refleltall in monitoring granularity and deployment run of code coverage reports. Emma may have inaccuracy on the report of block coverage and does not support the number of branch executions, and refletall can guarantee the accuracy of coverage report based on a behavior interpreter, and meanwhile statistics on the number of executions of each branch of branch instructions (such as If-gt and Packed-Switch) is achieved. Another difference is that Emma needs to be configured to reconstruct the bytecode, and repackage is needed after reconstruction; refletall does not require these configurations and does not change the compilation flow of the mobile application. Therefore, refletall is easier and more practical to use than byte-code based reconstruction tools such as Emma.

Table 4: comparison of monitoring granularity of refletall and Emma

Comparison categories	Emma	Reflectall
			Class overlays	Support for	Support for
Method overlay	Support for	Support for
			Block overlay	Partial support	Support for
Branch overlay	Partial support	Support for
			Row overlay	Support for	Support for
Number of branch executions	Do not support	Support for
			Instruction override	Do not support	Support for
Whether or not byte codes are required	Need to make sure that	Does not need to use
			Whether repackaging is required	Need to make sure that	Does not need to use

In summary, the method for constructing the runtime model of the terminal application behavior provided by the present invention can regard the execution of the application as a programming language framework (e.g. interpreter, virtual machine), and perform read-write operations on the memory according to the code segments of the application. What method is performed may correspond to the operation of a programming language framework on a stack; what object data is modified may correspond to an operation on the heap by the programming language framework. The method and the device realize flexible and complete monitoring of the application behaviors of the terminal application and provide technical support for the subsequent realization of instruction level control of the application behaviors of the terminal application. The calculation reflection engine designed according to the method can be used as a single running environment, and can also be integrated into various mainstream development platforms or commercial software, so that the basic capability of monitoring the application running time is provided for developers.

The above-described embodiments are merely preferred embodiments, which are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. a method for constructing a runtime model of terminal application behavior, wherein the runtime model comprises a runtime stack model and a runtime stack model, and the method comprises a runtime stack model for constructing the terminal application behavior and the steps of constructing a runtime heap model of the behavior of the terminal application;

The step of constructing the runtime stack model of the terminal application behavior includes:

When the terminal application is running, the code that is actually executed in the memory of the terminal application is obtained, and the code that is actually executed is abstracted to generate a control flow graph;

For the control flow graph, input the control flow graph to be monitored into a preset behavior interpreter;

Use the behavior interpreter to interpret and execute the control flow graph that needs to be monitored, and generate stack activities when the terminal application is running;

When the terminal application is running, a dependency relationship between the control flows of the stack activity is generated, and a runtime stack model of the terminal application behavior is obtained;

The step of constructing a runtime heap model of the behavior of the terminal application includes:

generating an initial state of the heap area when the terminal application is running;

generating a heap operation activity to obtain a runtime heap model of the terminal application behavior;

Wherein, the dependencies include synchronization dependencies and communication dependencies;

When generating the synchronization dependency, for a situation where the end of one method depends on the end of another method, the time stamp is used to search for activities that can be matched in other threads from back to front, and if found, it corresponds to a synchronization dependency;

For the situation where the start of an activity depends on the end of another activity, the current thread is checked first. If the activity is the first activity executed in the current thread, the activity depends on another thread to end the activity, otherwise the activity Activities are just normal method calls and do not depend on the activity of another thread.

2. The method of claim 1, wherein the method comprises a class filter and an activity type filter; wherein the class filter is based on coarse-grained filtering based on regular matching of packages and class names, for removing Program activities that developers don't care about; the activity type filter is based on fine-grained filtering of activity types, and is used to remove activity types that developers don't care about.

3. method as claimed in claim 2 is characterized in that, the activity type of described stack activity comprises method start and method end, field reading, array reading and synchronization instruction;

Using the behavior interpreter to interpret and execute the control flow graph that needs to be monitored, the steps of generating the stack activity of the terminal application runtime include:

Interpret and execute the control flow graph that needs to be monitored by using a behavior interpreter that has a monitoring function for the application behavior of the terminal application, and obtain the activities of the terminal application when the terminal application is running;

According to the concerned class, use the class filter to perform coarse-grain screening on the activities of the terminal application runtime, and generate stack activities caused by the class;

For the activity type of the stack activity, fine-grained filtering is performed on the stack activity by using the activity type filter.

4. The method of claim 1, wherein the step of constructing a runtime heap model of the behavior of the terminal application comprises:

The activities of the terminal application runtime include instantiation activities, modification activities and recycling activities.

5. The method of claim 2, wherein the activity types of the heap operation activity include object instantiation, array instantiation, object field writing, array element writing, clearing activity and compacting activity;

The steps of generating a heap operation activity include:

According to the concerned class, use the class filter to perform coarse-grained screening on the activities of the terminal application runtime, and generate heap operation activities caused by the class;

For the activity type of the heap operation activity, use the activity type filter to perform fine-grained filtering on the heap operation activity.

6. The method according to claim 1, characterized in that, when generating the dependencies between the control flows of the stack activities, summarizing all the classes related to the communication dependencies between the activities, and combining the dependencies of the classes. Methods together with thread-dependent-related methods serve as a knowledge base for generating communication dependencies.

7. The method of claim 1, wherein when the runtime model is generated, the activity sequence in the runtime model is stored in a buffer of a configurable size, and when the number of activities exceeds a preset , the activity of the buffer is serialized and persisted to local storage.

8. The method of claim 1, wherein the runtime heap model is represented in Backus normal form.