[go: up one dir, main page]

US20140090054A1 - System and Method for Detecting Anomalies in Electronic Documents - Google Patents

System and Method for Detecting Anomalies in Electronic Documents Download PDF

Info

Publication number
US20140090054A1
US20140090054A1 US13/824,211 US201213824211A US2014090054A1 US 20140090054 A1 US20140090054 A1 US 20140090054A1 US 201213824211 A US201213824211 A US 201213824211A US 2014090054 A1 US2014090054 A1 US 2014090054A1
Authority
US
United States
Prior art keywords
function call
intercepted
detection model
good
application program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/824,211
Inventor
Damiano Bolzoni
Emmanuele Zambon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SecurityMatters BV
Original Assignee
SecurityMatters BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SecurityMatters BV filed Critical SecurityMatters BV
Priority to US13/824,211 priority Critical patent/US20140090054A1/en
Assigned to SECURITYMATTERS B.V. reassignment SECURITYMATTERS B.V. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ZAMBON, Emmanuele, BOLZONI, DAMIANO
Publication of US20140090054A1 publication Critical patent/US20140090054A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0245Filtering by information in the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • the present invention relates generally to detecting anomalous or malicious content in electronic documents.
  • malware particularly the distribution of electronic documents such as computer files, websites, and the like which have malicious content, usually hidden.
  • Such malware is commonly designed to surreptitiously install programs on a target computer system that allow the target computer system to be exploited remotely, for example capturing keystrokes, accessing files on the target computer system, accessing network connections, and the like.
  • HTML documents such as web site Hyper Text Markup Language (HTML) documents, documents using the Portable Document Format (PDF) championed by Adobe® Systems and now a standard (ISO 32000-1:2008), documents for Microsoft Office® from Microsoft®, Inc., image documents, and others.
  • PDF formats are container formats, as they allow different types of content to be included in one document, for example combining text and graphics with scripting for executing computer programs contained in the electronic document.
  • PDF and HTML documents support objects of many kinds, including text, graphics, and computer scripting using JavaScriptTM or Flash®, all combined into one document.
  • the computer operating system activates the application program associated with the electronic document, such as Adobe® Reader® or Adobe Acrobat® from Adobe Systems.
  • the application for example Adobe Reader, opens the electronic document and interprets the objects the electronic document contains to display the electronic document's contents on a computer screen.
  • a given document may contain not only text and graphics, but also malicious commands which cause a scripting engine such as JavaScript to breach security on the computer system by exploiting software flaws, and surreptitiously install malicious software.
  • malicious software can be difficult to detect, and expensive to remedy once present and detected on a computer system.
  • a widely used approach to malware detection is based on digital signatures of electronic documents.
  • the company responsible for the detection system takes a malware-containing electronic document and computes a digital signature for the electronic document.
  • digital signature algorithms are well known to the computer arts.
  • the digital signature of the malware containing electronic document is then distributed to the company's customers, where detection software running on target computers computes digital signatures on electronic documents on the computer system, including incoming documents, and compares those signatures to a library of malware signatures, alerting if a match is found, and possibly taking other actions such as quarantining the suspect electronic document.
  • Signature-based malware detection systems have a number of serious difficulties. One difficulty is that they only detect malware that has already been identified; they defend against yesterday's known attacks, but not the unknown attacks of tomorrow.
  • An electronic document must have been previously identified as malicious. Then the electronic document must be sent to the company responsible for the detection system. The company verifies the malicious nature of the electronic document, and computes its digital signature. That digital signature is then made available to customers. The updated digital signature must make its way to customer systems, a path fraught with its own difficulties.
  • This process of identification, creating a digital signature, and distributing the digital signature to customers may take hours, days, or longer from the time the electronic document is first identified as malicious and submitted to the company. An electronic document may never be submitted as malicious if it is not recognized as malicious; thus a carefully crafted malicious electronic document may continue to be successfully malicious for months or even years.
  • a digital signature algorithm related to hashing algorithms in the computer arts, takes an electronic document or computer file and produces a digital signature representing that electronic document or file.
  • a detection system may create 256-byte digital signatures from electronic documents. Since most electronic documents are larger than this 256-byte signature, mathematically the process is a many-to-one mapping in which at least two different documents having the same 256-byte signature must exist. While digital signature and hashing algorithms are designed to minimize such collisions, mathematically such collisions must exist.
  • the detection system mistakenly identifies an innocuous file as malicious. This is known as a false positive. Instances of false positives are to be minimized as they impede or deny access to valid electronic documents and files.
  • An additional difficulty arising from the digital signature process comes from a goal of the digital signature algorithms themselves, that small changes in an electronic document result in large changes in its digital signature.
  • a malware generation or distribution system which introduces a slight variation in each of the malicious electronic documents it delivers thus produces malicious electronic documents each having different digital signatures, thus evading signature-based detection mechanisms.
  • anomaly-based malware detection systems attach themselves to the internals of the computer operating system, monitoring system functions for suspicious behavior. As an example, such a system would alert on an attempt to modify a file marked as belonging to the operating system, or on an attempt to create files in operating system portions of the computer file system. Such anomaly-based systems also have issues with false positive alerts, for example during application program installation or updating, when application program component files must be created or modified.
  • a method of detecting an anomaly in an electronic document comprises: a detection engine intercepting a function call and at least one argument value of the function call, the function call for a service provided through an application program interface (API), the function call generated by an application program processing the electronic document containing the function call, the detection engine determining that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and issuing an alert that an anomaly has been detected in the electronic document.
  • API application program interface
  • the step of determining that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises: determining, by the detection engine, that an entry for the intercepted function call is not in the detection model, or determining, by the detection engine, that an entry for the intercepted function call is present in the detection model and that at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • building the detection model comprises: processing, by the application program, a plurality of known good electronic documents, each containing at least one good function call, intercepting, by the detection engine, a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, adding, by the detection engine, an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and repeating the intercepting and adding steps for each good function call in each known good electronic document.
  • a threshold may be applied to the detection model by removing function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • an apparatus for detecting an anomaly in an electronic document comprises: a memory configured to contain a detection model, and a microprocessor coupled to the memory, the microprocessor configured to: intercept a function call and at least one argument value of the function call, the function call for a service provided through an API, the function call generated by an application program processing the electronic document containing the function call, determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model, and issue an alert that an anomaly has been detected in the electronic document.
  • the apparatus for detecting an anomaly in an electronic document is configured to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model by being further configured to: determine that an entry for the intercepted function call is not in the detection model, or determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • the apparatus for detecting an anomaly in an electronic document is configured to build the detection model contained in the memory by being further configured to: process a plurality of known good electronic documents, each containing at least one good function call, intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, add an entry for the intercepted good function call to the detection model contained in the memory, the entry including the at least one argument value, and repeat the intercepting and adding steps for each good function call in each known good electronic document.
  • the apparatus for detecting an anomaly in an electronic document is further configured to remove function call entries from the detection model contained in the memory where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • a non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document comprises: a code segment to intercept a function call and at least one argument value of the function call, the function call for a service provided through an API, the function call generated by an application program processing the electronic document containing the function call, a code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and a code segment to issue an alert that an anomaly has been detected in the electronic document.
  • the non-transitory computer readable medium where the code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises: a code segment to determine that an entry for the intercepted function call is not in the detection model, and a code segment to determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • the non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document further having stored thereupon computing instructions for building the detection model comprising: a code segment to process a plurality of known good electronic documents, each containing at least one good function call, a code segment to intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, a code segment to add an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and a code segment to repeat the intercepting and adding steps for each good function call in each known good electronic document, thereby building the detection model.
  • the non-transitory computer readable medium further comprising: a code segment to remove function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • FIG. 1 is a block diagram of a computer system according to an embodiment.
  • FIG. 2 is a flowchart of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 3 is a diagram of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 4 is a diagram of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 5 is a flowchart of building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 6 is a diagram of building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • a common vehicle for distributing malware is the use of malicious electronic documents which outwardly appear to be innocuous and of interest to a user, but contain embedded commands to exploit a vulnerability in the software running on the computer system and install malicious software or perform some malicious action.
  • a user may receive a document attached to an electronic mail message, the document having a title such as “QuarterlyBonusInfo.PDF” or “WeekendPartyPics.PDF” which may contain some legitimate content, but also contains function calls generated by an application program such as Adobe Reader for a service provided through an API such as JavaScript, Flash, or a dynamically linked library (DLL) to exploit a vulnerability in the system software, and install malicious software.
  • an application program such as Adobe Reader for a service provided through an API such as JavaScript, Flash, or a dynamically linked library (DLL) to exploit a vulnerability in the system software, and install malicious software.
  • DLL dynamically linked library
  • signature-based malware detection at best protects against yesterday's attacks, and is easily circumvented.
  • a detection engine monitors an application program processing an electronic document, for example Adobe Reader processing a PDF electronic document.
  • the detection engine is a separate software component which intercepts function calls from the separate application program to a service provided through an API as the application program processes the electronic document, and determines if those function calls represent anomalies which should result in an alert being issued.
  • the application program processes the electronic document.
  • the electronic document may contain function calls to a service provided through the API provided by the application program. These function calls may be present in the electronic document in the form of text, or encoded in binary or other suitable representation.
  • the application program takes the function call present in the electronic document and generates the function call to the service provided through the API to execute the function call.
  • the application program generates the function call from the electronic document for example by translating the text or encoded function call in the document to the form required for the API. This process will be determined by the requirements of the API, and may involve processes known to the computer arts such as tokenizing, table look-ups, compiling, interpretation, or the like to generate a function call to the service provider as required by the API.
  • an API provides a mechanism for an application program to make use of services provided by other computer programs such as scripting engines, dynamic linked libraries (DLLs), dynamic libraries (dylibs), ActiveX control, shared object files (so), and the like.
  • the API mechanism allows a computer program stored as one computer file, e.g. an application program, to make use of services provided by a computer program stored as another computer file, e.g., a scripting engine, DLL, dylib, ActiveX control, or the like.
  • Unix, Linux, and Apple® Macintosh® OSX computer systems use the API mechanism to provide services such as scripting engines (e.g., JavaScript, Flash or Visual Basic), system extensions such as device drivers and kernel extensions, and shared libraries.
  • Microsoft Windows systems use the API mechanism to provide services such as scripting engines (e.g. JavaScript, Flash and Visual Basic), dynamic linked libraries, and system extensions such as device drivers and ActiveX controls.
  • Adobe Reader provides APIs to make use of services provided by the JavaScript scripting engine.
  • the JavaScript scripting engine can be updated to provide additional functionality or to fix programming bugs without having to modify Adobe Reader.
  • Adobe Reader can be updated without having to modify JavaScript, as the API provides access to services in a manner independent of the versions of the application programs or services.
  • the detection engine intercepts function calls from the application program through the API to a service provided through the API, e.g., the JavaScript scripting engine.
  • the function calls are generated by the application program as it interprets an electronic document.
  • the detection engine determines if those function calls represent anomalies which should result in an alert being issued.
  • an application program such as a web browser (e.g., Firefox®, Chrome®, Safari®, Internet Explorer®, or the like) processes text from an electronic document such as a web page (i.e. a HTML document).
  • a web page i.e. a HTML document.
  • the web browser processes text marked as JavaScript, for example a fragment such as “parseFloat(kstr)”
  • the web browser generates a function call to the JavaScript parseFloat( ) function, passing the string argument kstr.
  • the JavaScript scripting engine processes this function call to the parseFloat function and returns a floating point number.
  • An application program such as Adobe Reader goes through the same process, taking text contained in an electronic document and generating, from text marked as JavaScript, function calls to the JavaScript scripting engine.
  • An application program such as Microsoft Word or Microsoft Excel, components of Microsoft Office, go through the same process in taking text marked as Visual Basic, for example in a macro contained in an electronic document such as a
  • the detection engine uses a detection model which is built by causing the application program to process a set of known good electronic documents.
  • the set of electronic documents for example a set of PDF electronic documents
  • the commands, such as function calls and argument values to the function calls for the services provided through an API, e.g., function calls to JavaScript, contained in these known good electronic documents are also assumed to be good.
  • the detection engine populates the detection model with entries generated by these known good documents, building entries on observed function calls and observed argument values contained in these known good electronic documents.
  • the detection engine intercepts function calls and their arguments generated by the application program as it processes the electronic document prior to those function calls being passed from the application program to the scripting engine.
  • safe and unsafe are used herein to refer to the determinations by the detection engine with respect to a particular function call and the arguments to that function call. These determinations of safe or unsafe indicate a possible attempt to exercise a vulnerability leading to a compromise of the computer system. As such, they are distinct from the use of a function call and its arguments in a computer programming sense.
  • a particular function call may be defined as a legitimate function call in a particular service provided through an API such as JavaScript, Flash, or in a particular DLL, and thus available for use by programmers, but if an entry for that function call does not exist in the detection model, indicating that the function call was not observed in the set of known good electronic documents, the detection engine will consider that function call to be unsafe. When such a condition occurs, the detection engine issues an alert that an anomaly has been detected in the electronic document.
  • the detection engine issues an alert that an anomaly has been detected in the electronic document.
  • An alert can include one or more of: a visual display on a computer screen such as an alert dialog box, logging the anomaly on the computer system or through a logging service, or aborting further processing of the electronic document by the application program.
  • CPU 110 may be a microprocessor such as an x86 class processor from Intel Corporation or Advanced Micro Devices. Other microprocessors such as those offered by MIPS, Advanced Risc Machines (ARM), and others may also be used.
  • CPU 110 may be a microprocessor such as an x86 class processor from Intel Corporation or Advanced Micro Devices. Other microprocessors such as those offered by MIPS, Advanced Risc Machines (ARM), and others may also be used.
  • Memory hierarchy 120 includes any combination of a permanent memory device for use in initializing the computer system on power-up, fast read-write main memory such as Random Access Memory (RAM) for holding instructions and data for use by microprocessor 110 , and file storage devices including but not limited to flash memory, disc drives including solid state disks, memory cards and the like, for storing electronic documents which include operating system files, programs including applications programs, and data files for use by the computer system.
  • RAM Random Access Memory
  • file storage devices including but not limited to flash memory, disc drives including solid state disks, memory cards and the like, for storing electronic documents which include operating system files, programs including applications programs, and data files for use by the computer system.
  • Network interface 130 may include wired and wireless interfaces such as those compatible with IEEE 802.3 wired Ethernet standards or IEEE 802.11 WiFi standards, and connects to local and/or wide area networks, not shown.
  • Input/Output interface 140 may include support for keyboards and graphic input devices such as mice and tablets, and output devices such as a display shown as DISP 150 .
  • Computer system 100 operates under the control of an operating system, such as Microsoft Windows from Microsoft Corporation, OS/X from Apple Computer, or one of the many open-source Linux operating systems.
  • an operating system such as Microsoft Windows from Microsoft Corporation, OS/X from Apple Computer, or one of the many open-source Linux operating systems.
  • the detection engine is attached to an application.
  • the application program processes electronic documents of a particular document type, for example, Adobe Reader processes PDF electronic documents, sending function calls and arguments for those function calls to a service provided through an API such as JavaScript.
  • the detection engine attaches itself to the application to intercept function calls from the application program to a scripting engine.
  • the Adobe Reader application program from Adobe Systems running on a Windows® operating system such as Windows 7 from Microsoft® Corporation uses an API to access services provided by the JavaScript scripting engine.
  • Other services provided through similar APIs which may be supported in other embodiments include Java® from Oracle Corporation, Adobe Flash® the open-source Python, or Visual Basic from Microsoft Corporation.
  • Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader MethodDispatcher API to intercept function calls from application program 320 , Adobe Reader, to scripting engine 330 , JavaScript. Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader ArgumentParser API to retrieve argument values for the intercepted function call.
  • APIs application program interfaces
  • Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader MethodDispatcher API to intercept function calls from application program 320 , Adobe Reader, to scripting engine 330 , JavaScript.
  • Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader ArgumentParser API to retrieve argument values for the intercepted function call.
  • detection engine 340 intercepts a function call and arguments for the function call from application program 320 to scripting engine 330 .
  • electronic document 310 may contain many different objects. In the case of a PDF electronic document, these objects include text, graphics, and scripting instructions to a scripting engine such as JavaScript, including function calls to JavaScript functions.
  • application program 320 in the embodiment, Adobe Reader, interprets the contents of electronic document 310 , application program 320 turns these scripting instructions into function calls to be sent from application program 320 to scripting engine 330 .
  • function calls from application program 320 to scripting engine 330 are intercepted by detection engine 340 using the Adobe Reader MethodDispatcher API.
  • Detection engine 340 retrieves the argument values for the intercepted function call using the Adobe Reader ArgumentParser API.
  • the detection engine determines if the intercepted function call is unknown to the detection model. In an embodiment, detection engine 340 determines if an entry for the intercepted function call is present in detection model 350 . If no entry for the intercepted function call is present in detection model 350 , the function call is deemed unsafe.
  • electronic document 310 contains a function call f42( . . . ).
  • application program 320 In processing electronic document 310 , application program 320 generates a function call to scripting engine 330 .
  • This function call is intercepted by detection engine 340 .
  • detection model 350 contains entries for f1 through f12, but does not contain an entry for f42. Thus the intercepted function call f42 is deemed unsafe by detection engine 340 .
  • step 240 if no entry for the intercepted function call is present in the detection model, an alert is issued by detection engine 340 indicating that an anomaly has been detected in the electronic document.
  • Issuing an alert may include one or more of displaying an alert on a computer display, logging the alert, or aborting processing of the electronic document by the application program.
  • an alert dialog box may be displayed to a user indicating an anomaly has been detected in the electronic document.
  • the alert may be logged, such as to a log file on the computer system, or through a network-based logging mechanism. Further processing of the electronic document by the application may be aborted.
  • these alert options may be configurable, for example, by a user or a management service.
  • step 250 a determination is made regarding whether the argument values of the intercepted function call matches a known value or are out of range. If an entry in the detection model is present for the intercepted function call, which has been determined to be present in Step 230 , the argument values for the intercepted function call are matched against values and/or ranges present for the arguments in the detection model entry for the intercepted function call. If any argument values do not match the values and/or ranges present in the detection model, the function call is deemed unsafe by detection engine 340 .
  • electronic document 410 contains a function call f2(2, ⁇ 4000).
  • application program 320 In processing electronic document 410 , application program 320 generates a function call to scripting engine 330 .
  • This function call is intercepted by detection engine 340 .
  • detection model 350 contains an entry for f2.
  • the entry for f2 in detection model 350 shows two arguments, first argument a1 with valid values from the set 1, 2, 4, 8, 16 and second argument a2 with valid values in the integer range 0 to 255.
  • the first argument to function f2 is 2, which matches the set and is valid.
  • the second argument is ⁇ 4000, which is out of the integer range 0 to 255, and is unsafe.
  • the intercepted function call f2(2, ⁇ 4000) is deemed unsafe by detection engine 340 .
  • step 260 if an argument for the intercepted function call does not match or is out of range when compared to the entry in detection model 350 for the function call, an alert is issued by detection engine 340 indicating that an anomaly has been detected in the electronic document.
  • step 270 because the intercepted function call and argument values to the intercepted function call have been determined to be valid by Steps 230 and 250 , the intercepted function call is allowed to proceed to scripting engine 330 .
  • this process is repeated each time a function call from the application program to the scripting engine is generated by the application program processing the electronic document.
  • FIG. 5 shows a flowchart for building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • the detection model is built by populating the detection model with known good function calls generated by processing a plurality of known good electronic documents by the application program.
  • a set of known good documents, 610 a , 610 b , 610 c and so on are processed by application program 320 .
  • the detection engine is operated in a detection model building mode for a period of time, such as a period of hours, e.g., twenty four hours, during which known good electronic documents are processed by the application program.
  • step 520 of FIG. 5 the detection engine 340 intercepts a function call and argument values from application program 320 to scripting engine 330 as application program 320 processes a known good document.
  • step 530 an entry on the intercepted function call and its argument values are added to detection model 350 .
  • added to detection model 350 means that if an entry for the intercepted known good function call is not present in detection model 350 , an entry for the intercepted function is added. This entry includes observed argument values. Similarly, if an entry already exists for this intercepted known good function call, argument values for the intercepted known good function call are combined with the argument values previously added to the detection model entry. As an example, with integer arguments, values are accumulated as sets or ranges. For strings, information such as allowable characters and string lengths are accumulated.
  • a count of the number of times this intercepted function has been observed is also part of the entry in detection model 350 ; for the first time this intercepted function is observed, this count is set to 1. When this intercepted function is subsequently observed, the count in detection model 350 for this function is incremented.
  • detection engine 340 populating detection model 350 with entries on intercepted good function calls from known good electronic documents 610 as processed by application program 320 .
  • known good electronic document 610 a contains known good function calls to f2 and f3.
  • Known good electronic document 610 b has known good function calls f20 and f1.
  • Known good electronic document 610 c has known good function calls f1 and f2, and so on through the set of known good documents.
  • a threshold is applied to detection model 350 , removing entries for function calls if the number of times the function call was intercepted in processing the plurality of known good electronic documents 610 is below the threshold.
  • the threshold is applied in such an embodiment based on the premise that the threshold insures that a valid sample size of intercepted function calls for the particular function call have been obtained.
  • this count data is only used during the detection model building phase, and is not needed for the operation of the detection engine in detecting anomalies in electronic documents. As such, the count data could be removed from detection model 350 after detection model 350 is populated and the threshold of step 560 has been applied.
  • a scripting engine such as JavaScript is an example of a service provided through an API.
  • Certain aspects of the described method and apparatus may be readily implemented, for example, with application programs using services provided through APIs such as dynamically linked libraries (DLLs) to extend the functionality of the application program. Examples include but are not limited to ActiveX controls on Microsoft Windows operating systems, Java DLLs such as JAR files, and shared object (so) and dynamic library (dylib) files on Unix, Linux and Apple® Macintosh® OSX operating systems.
  • Application programs making use of services provided by APIs include but are not limited to Adobe Acrobat, Adobe Reader, Microsoft Internet Explorer, and Microsoft Office.
  • the methods may be practiced on a wide range of computing equipment, including but not limited to servers, desktop computers, virtualized systems, embedded systems, and portable devices such as laptops, tablets, smart phones, appliances, and other devices containing embedded computer systems which may use, process, display, or transport electronic documents which may have anomalous or malicious content, operating under operating systems including Windows operating systems from Microsoft Corporation, OSX and iOS operating systems from Apple Inc, Unix, or Linux operating systems among others.
  • the described method and apparatus can be implemented in numerous ways, including as a process, an apparatus, or a system.
  • the methods described herein may be implemented by program instructions for instructing a processor to perform such methods, and such instructions recorded on a non-transitory computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, memory cards, etc., or a computer network wherein the program instructions are sent over optical or wired or wireless electronic communication links.
  • a non-transitory computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, memory cards, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A system and method are described herein for detecting an anomaly in an electronic document. In a computer system, a detection engine is attached to an application program which processes the electronic document. Function calls to a service provided through an application program interface (API) are intercepted by the detection engine as the application program processes the electronic document. If an entry for the intercepted function call is not present in the detection model, or an entry is present but the argument value does not match the argument value in the detection model, an alert is raised. The detection model is populated by processing a plurality of known good documents, populating the detection model with entries on intercepted good function calls and their argument values. A threshold may be applied to the detection model, removing from the detection model function calls which were observed less than the threshold amount.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is related to, and claims priority to, Patent Cooperation Treaty (PCT) International Application Number PCT/NL2012/050537, entitled “METHOD AND SYSTEM FOR CLASSIFYING A PROTOCOL MESSAGE IN A DATA COMMUNICATIONS NETWORK” and filed Jul. 26, 2012, which claims the benefit of U.S. Provisional Patent Application No. 61/511,685, entitled, “METHOD AND SYSTEM FOR CLASSIFYING A PROTOCOL MESSAGE IN A DATA COMMUNICATIONS NETWORK” and filed Jul. 26, 2011, and Netherlands Application No. NL 2007180, entitled “METHOD AND SYSTEM FOR CLASSIFYING A PROTOCOL MESSAGE IN A DATA COMMUNICATIONS NETWORK” and filed Jul. 26, 2011. Each of the aforementioned applications is incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates generally to detecting anomalous or malicious content in electronic documents.
  • 2. Description of the Prior Art
  • Along with the rise of the use of computers in modern life, there has been a rise in the misuse of such computers. One area of this misuse can be referred to as malware, particularly the distribution of electronic documents such as computer files, websites, and the like which have malicious content, usually hidden. Such malware is commonly designed to surreptitiously install programs on a target computer system that allow the target computer system to be exploited remotely, for example capturing keystrokes, accessing files on the target computer system, accessing network connections, and the like.
  • Increasingly, malware appears in the guise of seeming innocuous documents, such as web site Hyper Text Markup Language (HTML) documents, documents using the Portable Document Format (PDF) championed by Adobe® Systems and now a standard (ISO 32000-1:2008), documents for Microsoft Office® from Microsoft®, Inc., image documents, and others. These document formats are container formats, as they allow different types of content to be included in one document, for example combining text and graphics with scripting for executing computer programs contained in the electronic document. As examples, PDF and HTML documents support objects of many kinds, including text, graphics, and computer scripting using JavaScript™ or Flash®, all combined into one document.
  • When an electronic document such as a PDF document is to be opened on a computer system, the computer operating system activates the application program associated with the electronic document, such as Adobe® Reader® or Adobe Acrobat® from Adobe Systems. The application, for example Adobe Reader, opens the electronic document and interprets the objects the electronic document contains to display the electronic document's contents on a computer screen.
  • Unfortunately, a given document may contain not only text and graphics, but also malicious commands which cause a scripting engine such as JavaScript to breach security on the computer system by exploiting software flaws, and surreptitiously install malicious software. Such malicious software can be difficult to detect, and expensive to remedy once present and detected on a computer system.
  • Various approaches have been developed to deal with the issues surrounding malicious software, and in preventing malicious software from entering a computer system.
  • A widely used approach to malware detection is based on digital signatures of electronic documents. In such a signature-based detection system, the company responsible for the detection system takes a malware-containing electronic document and computes a digital signature for the electronic document. Such digital signature algorithms are well known to the computer arts. The digital signature of the malware containing electronic document is then distributed to the company's customers, where detection software running on target computers computes digital signatures on electronic documents on the computer system, including incoming documents, and compares those signatures to a library of malware signatures, alerting if a match is found, and possibly taking other actions such as quarantining the suspect electronic document.
  • Signature-based malware detection systems have a number of serious difficulties. One difficulty is that they only detect malware that has already been identified; they defend against yesterday's known attacks, but not the unknown attacks of tomorrow. An electronic document must have been previously identified as malicious. Then the electronic document must be sent to the company responsible for the detection system. The company verifies the malicious nature of the electronic document, and computes its digital signature. That digital signature is then made available to customers. The updated digital signature must make its way to customer systems, a path fraught with its own difficulties.
  • This process of identification, creating a digital signature, and distributing the digital signature to customers may take hours, days, or longer from the time the electronic document is first identified as malicious and submitted to the company. An electronic document may never be submitted as malicious if it is not recognized as malicious; thus a carefully crafted malicious electronic document may continue to be successfully malicious for months or even years.
  • Additional difficulties come from the nature of the digital signature process. A digital signature algorithm, related to hashing algorithms in the computer arts, takes an electronic document or computer file and produces a digital signature representing that electronic document or file. As an example, a detection system may create 256-byte digital signatures from electronic documents. Since most electronic documents are larger than this 256-byte signature, mathematically the process is a many-to-one mapping in which at least two different documents having the same 256-byte signature must exist. While digital signature and hashing algorithms are designed to minimize such collisions, mathematically such collisions must exist. In practice in a signature-based malware detection system, when such a collision occurs, the detection system mistakenly identifies an innocuous file as malicious. This is known as a false positive. Instances of false positives are to be minimized as they impede or deny access to valid electronic documents and files.
  • An additional difficulty arising from the digital signature process comes from a goal of the digital signature algorithms themselves, that small changes in an electronic document result in large changes in its digital signature. A malware generation or distribution system which introduces a slight variation in each of the malicious electronic documents it delivers thus produces malicious electronic documents each having different digital signatures, thus evading signature-based detection mechanisms.
  • Other approaches to dealing with malicious electronic documents and malicious software are anomaly-based, designed in different ways to prevent malware from taking root in a computer system by detecting and preventing malicious behavior as it occurs.
  • Some anomaly-based malware detection systems attach themselves to the internals of the computer operating system, monitoring system functions for suspicious behavior. As an example, such a system would alert on an attempt to modify a file marked as belonging to the operating system, or on an attempt to create files in operating system portions of the computer file system. Such anomaly-based systems also have issues with false positive alerts, for example during application program installation or updating, when application program component files must be created or modified.
  • What is needed is a better way to detect malicious content in electronic documents.
  • SUMMARY
  • In one embodiment a method of detecting an anomaly in an electronic document comprises: a detection engine intercepting a function call and at least one argument value of the function call, the function call for a service provided through an application program interface (API), the function call generated by an application program processing the electronic document containing the function call, the detection engine determining that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and issuing an alert that an anomaly has been detected in the electronic document.
  • In an embodiment, the step of determining that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises: determining, by the detection engine, that an entry for the intercepted function call is not in the detection model, or determining, by the detection engine, that an entry for the intercepted function call is present in the detection model and that at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • In an embodiment, building the detection model comprises: processing, by the application program, a plurality of known good electronic documents, each containing at least one good function call, intercepting, by the detection engine, a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, adding, by the detection engine, an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and repeating the intercepting and adding steps for each good function call in each known good electronic document.
  • In an embodiment where the decision model includes at least a number of times the function call is intercepted by the detection engine while building the detection model, a threshold may be applied to the detection model by removing function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • In an embodiment, an apparatus for detecting an anomaly in an electronic document comprises: a memory configured to contain a detection model, and a microprocessor coupled to the memory, the microprocessor configured to: intercept a function call and at least one argument value of the function call, the function call for a service provided through an API, the function call generated by an application program processing the electronic document containing the function call, determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model, and issue an alert that an anomaly has been detected in the electronic document.
  • In an embodiment, the apparatus for detecting an anomaly in an electronic document is configured to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model by being further configured to: determine that an entry for the intercepted function call is not in the detection model, or determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • In an embodiment, the apparatus for detecting an anomaly in an electronic document is configured to build the detection model contained in the memory by being further configured to: process a plurality of known good electronic documents, each containing at least one good function call, intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, add an entry for the intercepted good function call to the detection model contained in the memory, the entry including the at least one argument value, and repeat the intercepting and adding steps for each good function call in each known good electronic document.
  • In an embodiment where the decision model includes a number of times the function call is intercepted by the detection engine while building the detection model, the apparatus for detecting an anomaly in an electronic document is further configured to remove function call entries from the detection model contained in the memory where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • In an embodiment, a non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document comprises: a code segment to intercept a function call and at least one argument value of the function call, the function call for a service provided through an API, the function call generated by an application program processing the electronic document containing the function call, a code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and a code segment to issue an alert that an anomaly has been detected in the electronic document.
  • In an embodiment, the non-transitory computer readable medium where the code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises: a code segment to determine that an entry for the intercepted function call is not in the detection model, and a code segment to determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
  • In an embodiment, the non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document further having stored thereupon computing instructions for building the detection model comprising: a code segment to process a plurality of known good electronic documents, each containing at least one good function call, a code segment to intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the API, the good function call generated by the application program processing the known good electronic document containing the good function call, a code segment to add an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and a code segment to repeat the intercepting and adding steps for each good function call in each known good electronic document, thereby building the detection model.
  • In an embodiment where the code segment to add an entry for the known good function call to the document model includes a code segment to include in the decision model entry a number of times the function call is intercepted by the detection engine while building the detection model, the non-transitory computer readable medium further comprising: a code segment to remove function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 is a block diagram of a computer system according to an embodiment.
  • FIG. 2 is a flowchart of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 3 is a diagram of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 4 is a diagram of detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 5 is a flowchart of building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • FIG. 6 is a diagram of building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • DETAILED DESCRIPTION OF THE INVENTION
  • A common vehicle for distributing malware is the use of malicious electronic documents which outwardly appear to be innocuous and of interest to a user, but contain embedded commands to exploit a vulnerability in the software running on the computer system and install malicious software or perform some malicious action. For example, a user may receive a document attached to an electronic mail message, the document having a title such as “QuarterlyBonusInfo.PDF” or “WeekendPartyPics.PDF” which may contain some legitimate content, but also contains function calls generated by an application program such as Adobe Reader for a service provided through an API such as JavaScript, Flash, or a dynamically linked library (DLL) to exploit a vulnerability in the system software, and install malicious software.
  • Presented herein are various embodiments of a method and apparatus for detecting such malicious content in electronic documents.
  • As described previously herein, signature-based malware detection at best protects against yesterday's attacks, and is easily circumvented.
  • Accordingly, a method and apparatus are described in which a detection engine monitors an application program processing an electronic document, for example Adobe Reader processing a PDF electronic document. The detection engine is a separate software component which intercepts function calls from the separate application program to a service provided through an API as the application program processes the electronic document, and determines if those function calls represent anomalies which should result in an alert being issued.
  • The application program processes the electronic document. The electronic document may contain function calls to a service provided through the API provided by the application program. These function calls may be present in the electronic document in the form of text, or encoded in binary or other suitable representation. The application program takes the function call present in the electronic document and generates the function call to the service provided through the API to execute the function call. The application program generates the function call from the electronic document for example by translating the text or encoded function call in the document to the form required for the API. This process will be determined by the requirements of the API, and may involve processes known to the computer arts such as tokenizing, table look-ups, compiling, interpretation, or the like to generate a function call to the service provider as required by the API.
  • As is known in the art, an API provides a mechanism for an application program to make use of services provided by other computer programs such as scripting engines, dynamic linked libraries (DLLs), dynamic libraries (dylibs), ActiveX control, shared object files (so), and the like. The API mechanism allows a computer program stored as one computer file, e.g. an application program, to make use of services provided by a computer program stored as another computer file, e.g., a scripting engine, DLL, dylib, ActiveX control, or the like. Unix, Linux, and Apple® Macintosh® OSX computer systems use the API mechanism to provide services such as scripting engines (e.g., JavaScript, Flash or Visual Basic), system extensions such as device drivers and kernel extensions, and shared libraries. Microsoft Windows systems use the API mechanism to provide services such as scripting engines (e.g. JavaScript, Flash and Visual Basic), dynamic linked libraries, and system extensions such as device drivers and ActiveX controls.
  • As an example, Adobe Reader provides APIs to make use of services provided by the JavaScript scripting engine. By providing services through an API, both the application program making use of the services and the service provider can be maintained and upgraded separately and independently. As an example, the JavaScript scripting engine can be updated to provide additional functionality or to fix programming bugs without having to modify Adobe Reader. Similarly, Adobe Reader can be updated without having to modify JavaScript, as the API provides access to services in a manner independent of the versions of the application programs or services.
  • By attaching to APIs, for example the APIs provided by Adobe Reader, the detection engine intercepts function calls from the application program through the API to a service provided through the API, e.g., the JavaScript scripting engine. The function calls are generated by the application program as it interprets an electronic document. The detection engine determines if those function calls represent anomalies which should result in an alert being issued.
  • In operation, an application program such as a web browser (e.g., Firefox®, Chrome®, Safari®, Internet Explorer®, or the like) processes text from an electronic document such as a web page (i.e. a HTML document). When the web browser processes text marked as JavaScript, for example a fragment such as “parseFloat(kstr)”, the web browser generates a function call to the JavaScript parseFloat( ) function, passing the string argument kstr. The JavaScript scripting engine processes this function call to the parseFloat function and returns a floating point number. An application program such as Adobe Reader goes through the same process, taking text contained in an electronic document and generating, from text marked as JavaScript, function calls to the JavaScript scripting engine. An application program such as Microsoft Word or Microsoft Excel, components of Microsoft Office, go through the same process in taking text marked as Visual Basic, for example in a macro contained in an electronic document such as a spreadsheet or word processing document, and generating function calls to the Visual Basic scripting engine.
  • The detection engine uses a detection model which is built by causing the application program to process a set of known good electronic documents. As the set of electronic documents, for example a set of PDF electronic documents, are known to be good, the commands, such as function calls and argument values to the function calls for the services provided through an API, e.g., function calls to JavaScript, contained in these known good electronic documents are also assumed to be good. The detection engine populates the detection model with entries generated by these known good documents, building entries on observed function calls and observed argument values contained in these known good electronic documents.
  • In operation, that is, after the detection model has been built, as the application program processes an electronic document, the detection engine intercepts function calls and their arguments generated by the application program as it processes the electronic document prior to those function calls being passed from the application program to the scripting engine.
  • The terms “safe” and “unsafe” are used herein to refer to the determinations by the detection engine with respect to a particular function call and the arguments to that function call. These determinations of safe or unsafe indicate a possible attempt to exercise a vulnerability leading to a compromise of the computer system. As such, they are distinct from the use of a function call and its arguments in a computer programming sense. For example, a particular function call may be defined as a legitimate function call in a particular service provided through an API such as JavaScript, Flash, or in a particular DLL, and thus available for use by programmers, but if an entry for that function call does not exist in the detection model, indicating that the function call was not observed in the set of known good electronic documents, the detection engine will consider that function call to be unsafe. When such a condition occurs, the detection engine issues an alert that an anomaly has been detected in the electronic document.
  • Similarly, if an entry for the intercepted function call is present in the document model, the argument values of the intercepted function call are tested, and if an argument value does not match or is outside the range for that argument contained in the detection model entry for the intercepted function, the intercepted function call is likewise considered unsafe. When such a condition occurs, the detection engine issues an alert that an anomaly has been detected in the electronic document.
  • An alert can include one or more of: a visual display on a computer screen such as an alert dialog box, logging the anomaly on the computer system or through a logging service, or aborting further processing of the electronic document by the application program.
  • Turning now to FIG. 1, a block diagram of a computer system 100 which may be used to practice the invention is shown in simplified form. As understood in the art, computer system 100 comprises a central processing unit (CPU) 110 which is coupled to memory hierarchy 120, network interface 130, and Input/Output (I/O) interface 140. CPU 110 may be a microprocessor such as an x86 class processor from Intel Corporation or Advanced Micro Devices. Other microprocessors such as those offered by MIPS, Advanced Risc Machines (ARM), and others may also be used.
  • Memory hierarchy 120, as understood by the art includes any combination of a permanent memory device for use in initializing the computer system on power-up, fast read-write main memory such as Random Access Memory (RAM) for holding instructions and data for use by microprocessor 110, and file storage devices including but not limited to flash memory, disc drives including solid state disks, memory cards and the like, for storing electronic documents which include operating system files, programs including applications programs, and data files for use by the computer system.
  • Network interface 130 may include wired and wireless interfaces such as those compatible with IEEE 802.3 wired Ethernet standards or IEEE 802.11 WiFi standards, and connects to local and/or wide area networks, not shown. Input/Output interface 140 may include support for keyboards and graphic input devices such as mice and tablets, and output devices such as a display shown as DISP 150.
  • Computer system 100 operates under the control of an operating system, such as Microsoft Windows from Microsoft Corporation, OS/X from Apple Computer, or one of the many open-source Linux operating systems.
  • Referring now to the flowchart of FIG. 2 and the diagrams of FIGS. 3 and 4, operation of the detection engine in detecting an anomaly in an electronic document according to an embodiment will now be described.
  • In step 210, the detection engine is attached to an application. As is known in the art, the application program processes electronic documents of a particular document type, for example, Adobe Reader processes PDF electronic documents, sending function calls and arguments for those function calls to a service provided through an API such as JavaScript. In an embodiment, the detection engine attaches itself to the application to intercept function calls from the application program to a scripting engine.
  • In an embodiment, the Adobe Reader application program from Adobe Systems running on a Windows® operating system such as Windows 7 from Microsoft® Corporation, uses an API to access services provided by the JavaScript scripting engine. Other services provided through similar APIs which may be supported in other embodiments include Java® from Oracle Corporation, Adobe Flash® the open-source Python, or Visual Basic from Microsoft Corporation.
  • As is understood in the art, the process of attaching a software component such as the detection engine to an application program such as Adobe Reader is dependent on the operating system on which the detection engine and application program run. In the embodiment described, Adobe Reader running on a Windows operating system, Adobe Reader provides application program interfaces (APIs) to a MethodDispatcher and an ArgumentParser. Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader MethodDispatcher API to intercept function calls from application program 320, Adobe Reader, to scripting engine 330, JavaScript. Detection engine 340 attaches computer code in detection engine 340 to the Adobe Reader ArgumentParser API to retrieve argument values for the intercepted function call.
  • In step 220 of FIG. 2, detection engine 340 intercepts a function call and arguments for the function call from application program 320 to scripting engine 330. In an embodiment, electronic document 310 may contain many different objects. In the case of a PDF electronic document, these objects include text, graphics, and scripting instructions to a scripting engine such as JavaScript, including function calls to JavaScript functions. As application program 320, in the embodiment, Adobe Reader, interprets the contents of electronic document 310, application program 320 turns these scripting instructions into function calls to be sent from application program 320 to scripting engine 330.
  • In an embodiment, function calls from application program 320 to scripting engine 330 are intercepted by detection engine 340 using the Adobe Reader MethodDispatcher API. Detection engine 340 retrieves the argument values for the intercepted function call using the Adobe Reader ArgumentParser API.
  • In step 230, the detection engine determines if the intercepted function call is unknown to the detection model. In an embodiment, detection engine 340 determines if an entry for the intercepted function call is present in detection model 350. If no entry for the intercepted function call is present in detection model 350, the function call is deemed unsafe.
  • This process is shown in more detail in FIG. 3. As shown, electronic document 310 contains a function call f42( . . . ). In processing electronic document 310, application program 320 generates a function call to scripting engine 330. This function call is intercepted by detection engine 340. As shown, detection model 350 contains entries for f1 through f12, but does not contain an entry for f42. Thus the intercepted function call f42 is deemed unsafe by detection engine 340.
  • In step 240, if no entry for the intercepted function call is present in the detection model, an alert is issued by detection engine 340 indicating that an anomaly has been detected in the electronic document.
  • Issuing an alert may include one or more of displaying an alert on a computer display, logging the alert, or aborting processing of the electronic document by the application program. In an embodiment, an alert dialog box may be displayed to a user indicating an anomaly has been detected in the electronic document. The alert may be logged, such as to a log file on the computer system, or through a network-based logging mechanism. Further processing of the electronic document by the application may be aborted. In an embodiment, these alert options may be configurable, for example, by a user or a management service.
  • In step 250, a determination is made regarding whether the argument values of the intercepted function call matches a known value or are out of range. If an entry in the detection model is present for the intercepted function call, which has been determined to be present in Step 230, the argument values for the intercepted function call are matched against values and/or ranges present for the arguments in the detection model entry for the intercepted function call. If any argument values do not match the values and/or ranges present in the detection model, the function call is deemed unsafe by detection engine 340.
  • This is shown in more detail in FIG. 4. As shown, electronic document 410 contains a function call f2(2, −4000). In processing electronic document 410, application program 320 generates a function call to scripting engine 330. This function call is intercepted by detection engine 340. As shown, detection model 350 contains an entry for f2. The entry for f2 in detection model 350 shows two arguments, first argument a1 with valid values from the set 1, 2, 4, 8, 16 and second argument a2 with valid values in the integer range 0 to 255. In the intercepted function call, the first argument to function f2 is 2, which matches the set and is valid. The second argument is −4000, which is out of the integer range 0 to 255, and is unsafe. Thus the intercepted function call f2(2, −4000) is deemed unsafe by detection engine 340.
  • In step 260 if an argument for the intercepted function call does not match or is out of range when compared to the entry in detection model 350 for the function call, an alert is issued by detection engine 340 indicating that an anomaly has been detected in the electronic document.
  • In step 270, because the intercepted function call and argument values to the intercepted function call have been determined to be valid by Steps 230 and 250, the intercepted function call is allowed to proceed to scripting engine 330.
  • In an embodiment, this process is repeated each time a function call from the application program to the scripting engine is generated by the application program processing the electronic document.
  • FIG. 5 shows a flowchart for building a detection model for use in detecting an anomaly in an electronic document according to an embodiment.
  • In step 510, the detection model is built by populating the detection model with known good function calls generated by processing a plurality of known good electronic documents by the application program. In an embodiment, referring to FIG. 6, a set of known good documents, 610 a, 610 b, 610 c and so on are processed by application program 320. In an embodiment, the detection engine is operated in a detection model building mode for a period of time, such as a period of hours, e.g., twenty four hours, during which known good electronic documents are processed by the application program.
  • In step 520 of FIG. 5 the detection engine 340 intercepts a function call and argument values from application program 320 to scripting engine 330 as application program 320 processes a known good document.
  • In step 530, an entry on the intercepted function call and its argument values are added to detection model 350.
  • In an embodiment, added to detection model 350 means that if an entry for the intercepted known good function call is not present in detection model 350, an entry for the intercepted function is added. This entry includes observed argument values. Similarly, if an entry already exists for this intercepted known good function call, argument values for the intercepted known good function call are combined with the argument values previously added to the detection model entry. As an example, with integer arguments, values are accumulated as sets or ranges. For strings, information such as allowable characters and string lengths are accumulated.
  • In an embodiment, a count of the number of times this intercepted function has been observed is also part of the entry in detection model 350; for the first time this intercepted function is observed, this count is set to 1. When this intercepted function is subsequently observed, the count in detection model 350 for this function is incremented.
  • At the completion, for example of the period of time, when all electronic documents 610 in the set of known good electronic documents have been processed by application program 320, with detection engine 340 populating detection model 350 with entries on intercepted good function calls from known good electronic documents 610 as processed by application program 320.
  • As an example of populating detection model 350, referring to FIG. 6, known good electronic document 610 a contains known good function calls to f2 and f3. Known good electronic document 610 b has known good function calls f20 and f1. Known good electronic document 610 c has known good function calls f1 and f2, and so on through the set of known good documents.
  • In an embodiment where a count of the number of times a function has been observed is kept in detection model 350, in step 560, a threshold is applied to detection model 350, removing entries for function calls if the number of times the function call was intercepted in processing the plurality of known good electronic documents 610 is below the threshold. The threshold is applied in such an embodiment based on the premise that the threshold insures that a valid sample size of intercepted function calls for the particular function call have been obtained.
  • Referring again to FIG. 6, in an embodiment where a count of the number of times a function has been observed is kept in detection model 350, assume that the threshold is 10. As shown in detection model 350, function f1 was intercepted 20 times, function f2 was intercepted 67 times, function f3 was intercepted 2 times, and function f12 was intercepted 19 times. Of these entries, function f3 is below the threshold of 10, and the entry for function f3 is therefore removed from detection model 350.
  • It should be noted that in an embodiment where a count of the number of times a function has been intercepted is kept in detection model 350, this count data is only used during the detection model building phase, and is not needed for the operation of the detection engine in detecting anomalies in electronic documents. As such, the count data could be removed from detection model 350 after detection model 350 is populated and the threshold of step 560 has been applied.
  • While the disclosed method and apparatus has been explained with respect to particular embodiments, such as using Adobe Reader and PDF electronic documents containing scripting in JavaScript, other embodiments will be apparent to those skilled in the art in light of this disclosure, including but not limited to processing of Flash embedded in PDF electronic documents, HTML documents by web browsers such as Internet Explorer, processing of Microsoft Office documents by Microsoft Office, and the like.
  • As described previously herein, a scripting engine such as JavaScript is an example of a service provided through an API. Certain aspects of the described method and apparatus may be readily implemented, for example, with application programs using services provided through APIs such as dynamically linked libraries (DLLs) to extend the functionality of the application program. Examples include but are not limited to ActiveX controls on Microsoft Windows operating systems, Java DLLs such as JAR files, and shared object (so) and dynamic library (dylib) files on Unix, Linux and Apple® Macintosh® OSX operating systems. Application programs making use of services provided by APIs include but are not limited to Adobe Acrobat, Adobe Reader, Microsoft Internet Explorer, and Microsoft Office.
  • Certain aspects of the described method and apparatus may readily be implemented using configurations other than those described in the embodiments above, or in conjunction with elements other than those described above. For example, the methods may be practiced on a wide range of computing equipment, including but not limited to servers, desktop computers, virtualized systems, embedded systems, and portable devices such as laptops, tablets, smart phones, appliances, and other devices containing embedded computer systems which may use, process, display, or transport electronic documents which may have anomalous or malicious content, operating under operating systems including Windows operating systems from Microsoft Corporation, OSX and iOS operating systems from Apple Inc, Unix, or Linux operating systems among others.
  • Further, it should also be appreciated that the described method and apparatus can be implemented in numerous ways, including as a process, an apparatus, or a system. The methods described herein may be implemented by program instructions for instructing a processor to perform such methods, and such instructions recorded on a non-transitory computer readable storage medium such as a hard disk drive, floppy disk, optical disc such as a compact disc (CD) or digital versatile disc (DVD), flash memory, memory cards, etc., or a computer network wherein the program instructions are sent over optical or wired or wireless electronic communication links. It should be noted that the order of the steps of the methods described herein may be altered and still be within the scope of the disclosure.
  • It is to be understood that the examples given are for illustrative purposes only and may be extended to other implementations and embodiments with different conventions and techniques. While a number of embodiments are described, there is no intent to limit the disclosure to the embodiment(s) disclosed herein. On the contrary, the intent is to cover all alternatives, modifications, and equivalents apparent to those familiar with the art.
  • In the foregoing specification, the invention is described with reference to specific embodiments thereof, but those skilled in the art will recognize that the invention is not limited thereto. Various features and aspects of the above-described invention may be used individually or jointly. Further, the invention can be utilized in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded as illustrative rather than restrictive. It will be recognized that the terms “comprising,” “including,” and “having,” as used herein, are specifically intended to be read as open-ended terms of art.

Claims (22)

1-34. (canceled)
35. A method of detecting an anomaly in an electronic document comprising:
intercepting, by a detection engine, a function call and at least one argument value of the function call, the function call for a service provided through an application program interface, the function call generated by an application program processing the electronic document containing the function call,
determining, by the detection engine, that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and
issuing an alert, by the detection engine, that an anomaly has been detected in the electronic document.
36. The method of claim 35 where the service provided through the application program interface is a scripting engine.
37. The method of claim 35 where the step of issuing an alert by the detection engine comprises one or more of:
displaying the alert,
logging the alert, or
aborting any further processing of the electronic document by the application program.
38. The method of claim 35 where the step of determining, by the detection engine, that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises:
determining, by the detection engine, that an entry for the intercepted function call is not in the detection model, or
determining, by the detection engine, that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
39. The method of claim 35 further comprising building the detection model by:
processing, by the application program, a plurality of known good electronic documents, each containing at least one good function call,
intercepting, by the detection engine, a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the application program interface, the good function call generated by the application program processing the known good electronic document containing the good function call,
adding, by the detection engine, an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and
repeating the intercepting and adding steps for each good function call in each known good electronic document.
40. The method of claim 39 where the step of adding, by the detection engine, an entry for the intercepted good function call to the detection model further comprises: including a number of times the function call is intercepted by the detection engine while building the detection model.
41. The method of claim 40 further comprising:
removing, by the detection engine, function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
42. The method of claim 35 further comprising:
attaching, by the detection engine, the detection engine to the application program,
before the step of intercepting, by the detection engine, the function call and at least one argument value of the function call
43. An apparatus for detecting an anomaly in an electronic document comprising:
a memory configured to contain a detection model, and
a microprocessor coupled to the memory, the microprocessor configured to:
intercept a function call and at least one argument value of the function call, the function call for a service provided through an application program interface, the function call generated by an application program processing the electronic document containing the function call,
determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model, and
issue an alert that an anomaly has been detected in the electronic document.
44. The apparatus of claim 43 where the microprocessor is further configured to issue an alert by one or more of:
displaying a visible alert on a display coupled to the microprocessor,
logging the alert to a file stored in the memory, or
aborting any further processing of the electronic document by the application program.
45. The apparatus of claim 43 where the microprocessor is configured to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model by being further configured to:
determine that an entry for the intercepted function call is not in the detection model, or
determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
46. The apparatus of claim 43 where the microprocessor is configured to build the detection model contained in the memory by being further configured to:
process a plurality of known good electronic documents, each containing at least one good function call,
intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the application program interface, the good function call generated by the application program processing the known good electronic document containing the good function call,
add an entry for the intercepted good function call to the detection model contained in the memory, the entry including the at least one argument value, and
repeat the intercepting and adding steps for each good function call in each known good electronic document.
47. The apparatus of claim 46 where the microprocessor is configured to add an entry for the intercepted good function call to the detection model contained in the memory bye being further configured to include in the entry at least a number of times the function call is intercepted by the detection while building the detection model.
48. The apparatus of claim 47 where the microprocessor is further configured to:
remove function call entries from the detection model contained in the memory where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
49. The apparatus of claim 43 where the microprocessor is further configured to attach the detection engine to the application program, before intercepting the function call.
50. A non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document comprising:
a code segment to intercept a function call and at least one argument value of the function call, the function call for a service provided through an application program interface, the function call generated by an application program processing the electronic document containing the function call,
a code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to a detection model, and
a code segment to issue an alert that an anomaly has been detected in the electronic document.
51. The non-transitory computer readable medium of claim 50 where the code segment to issue an alert by the detection engine further comprises a code segment to:
display a visible alert,
log the alert, or
abort any further processing of the electronic document by the application program.
52. The non-transitory computer readable medium of claim 50 where the code segment to determine that the intercepted function call is unsafe by comparing the intercepted function call and the at least one argument value to the detection model comprises:
a code segment to determine that an entry for the intercepted function call is not in the detection model, and
a code segment to determine that an entry for the intercepted function call is present in the detection model and that the at least one argument value of the intercepted function call does not match a predefined value or range present in the entry for the function call in the detection model.
53. The non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document of claim 52 further having stored thereupon computing instructions for building the detection model comprising:
a code segment to process a plurality of known good electronic documents, each containing at least one good function call,
a code segment to intercept a good function call of the at least one good function call, and at least one argument value of the good function call, the good function call for the service provided through the application program interface, the function call generated by the application program processing the known good electronic document containing the good function call,
a code segment to add an entry for the intercepted good function call to the detection model, the entry including the at least one argument value, and
a code segment to repeat the intercepting and adding steps for each good function call in each known good electronic document, thereby building the detection model.
54. The non-transitory computer readable medium having stored thereupon computing instructions for detecting an anomaly in an electronic document of claim 53 where the code segment to add an entry for the intercepted good function call further includes a code segment to include in the entry a number of times the function call is intercepted by the detection engine while building the detection model.
55. The non-transitory computer readable medium of claim 54 further comprising:
a code segment to remove function call entries from the detection model where the number of times the function call was intercepted by the detection engine in processing the plurality of known good electronic documents is less than a threshold, after the plurality of known good electronic documents are processed by the application program.
US13/824,211 2011-07-26 2012-07-26 System and Method for Detecting Anomalies in Electronic Documents Abandoned US20140090054A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/824,211 US20140090054A1 (en) 2011-07-26 2012-07-26 System and Method for Detecting Anomalies in Electronic Documents

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201161511685P 2011-07-26 2011-07-26
NL2007180A NL2007180C2 (en) 2011-07-26 2011-07-26 Method and system for classifying a protocol message in a data communication network.
NL2007180 2011-07-26
US13/824,211 US20140090054A1 (en) 2011-07-26 2012-07-26 System and Method for Detecting Anomalies in Electronic Documents
PCT/NL2012/050537 WO2013015691A1 (en) 2011-07-26 2012-07-26 Method and system for classifying a protocol message in a data communication network

Publications (1)

Publication Number Publication Date
US20140090054A1 true US20140090054A1 (en) 2014-03-27

Family

ID=47601337

Family Applications (4)

Application Number Title Priority Date Filing Date
US14/234,669 Active 2033-12-13 US9628497B2 (en) 2011-07-26 2012-07-26 Method and system for classifying a protocol message in a data communication network
US13/824,211 Abandoned US20140090054A1 (en) 2011-07-26 2012-07-26 System and Method for Detecting Anomalies in Electronic Documents
US15/461,816 Active 2034-12-09 US11012330B2 (en) 2011-07-26 2017-03-17 Method and system for classifying a protocol message in a data communication network
US17/236,305 Active US11902126B2 (en) 2011-07-26 2021-04-21 Method and system for classifying a protocol message in a data communication network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/234,669 Active 2033-12-13 US9628497B2 (en) 2011-07-26 2012-07-26 Method and system for classifying a protocol message in a data communication network

Family Applications After (2)

Application Number Title Priority Date Filing Date
US15/461,816 Active 2034-12-09 US11012330B2 (en) 2011-07-26 2017-03-17 Method and system for classifying a protocol message in a data communication network
US17/236,305 Active US11902126B2 (en) 2011-07-26 2021-04-21 Method and system for classifying a protocol message in a data communication network

Country Status (11)

Country Link
US (4) US9628497B2 (en)
EP (1) EP2737683B1 (en)
JP (1) JP6117202B2 (en)
CN (1) CN103748853B (en)
BR (1) BR112014001691B1 (en)
CA (1) CA2842465C (en)
EA (1) EA037617B1 (en)
ES (1) ES2581053T3 (en)
IL (2) IL230440A (en)
NL (1) NL2007180C2 (en)
WO (1) WO2013015691A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140297572A1 (en) * 2011-07-26 2014-10-02 Security Matters B.V. Method and system for classifying a protocol message in a data communication network
CN104270275A (en) * 2014-10-14 2015-01-07 步步高教育电子有限公司 Auxiliary analysis method for causes of exceptions, server and intelligent equipment
US20150264074A1 (en) * 2012-09-28 2015-09-17 Hewlett-Packard Development Company, L.P. Application security testing
WO2016184194A1 (en) * 2015-10-29 2016-11-24 中兴通讯股份有限公司 Method and device for intercepting push information, and terminal
TWI562013B (en) * 2015-07-06 2016-12-11 Wistron Corp Method, system and apparatus for predicting abnormality
US9953158B1 (en) * 2015-04-21 2018-04-24 Symantec Corporation Systems and methods for enforcing secure software execution
US20200410166A1 (en) * 2017-05-10 2020-12-31 Oracle International Corporation Enabling chatbots by detecting and supporting affective argumentation
US10887328B1 (en) * 2015-09-29 2021-01-05 Fireeye, Inc. System and method for detecting interpreter-based exploit attacks
US20210042473A1 (en) * 2017-05-10 2021-02-11 Oracle International Corporation Enabling chatbots by validating argumentation
US11005823B2 (en) * 2016-01-08 2021-05-11 Capital One Services, Llc Field level security system for securing sensitive data
CN113742475A (en) * 2021-09-10 2021-12-03 绿盟科技集团股份有限公司 Office document detection method, apparatus, device and medium
US11363059B2 (en) * 2019-12-13 2022-06-14 Microsoft Technology Licensing, Llc Detection of brute force attacks
US20220253611A1 (en) * 2017-05-10 2022-08-11 Oracle International Corporation Techniques for maintaining rhetorical flow
US20220318513A9 (en) * 2017-05-10 2022-10-06 Oracle International Corporation Discourse parsing using semantic and syntactic relations
US11775771B2 (en) 2017-05-10 2023-10-03 Oracle International Corporation Enabling rhetorical analysis via the use of communicative discourse trees
US12001804B2 (en) 2017-05-10 2024-06-04 Oracle International Corporation Using communicative discourse trees to detect distributed incompetence

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11126720B2 (en) * 2012-09-26 2021-09-21 Bluvector, Inc. System and method for automated machine-learning, zero-day malware detection
US9292688B2 (en) * 2012-09-26 2016-03-22 Northrop Grumman Systems Corporation System and method for automated machine-learning, zero-day malware detection
JP6273834B2 (en) * 2013-12-26 2018-02-07 富士通株式会社 Information processing apparatus and logging method
CN105704103B (en) 2014-11-26 2017-05-10 中国科学院沈阳自动化研究所 Abnormal detection method of Modbus TCP communication behavior based on OCSVM double contour model
US10572811B2 (en) * 2015-01-29 2020-02-25 Splunk Inc. Methods and systems for determining probabilities of occurrence for events and determining anomalous events
US10015188B2 (en) * 2015-08-20 2018-07-03 Cyberx Israel Ltd. Method for mitigation of cyber attacks on industrial control systems
CN105306436B (en) * 2015-09-16 2016-08-24 广东睿江云计算股份有限公司 A kind of anomalous traffic detection method
US10955810B2 (en) * 2015-11-13 2021-03-23 International Business Machines Corporation Monitoring communications flow in an industrial system to detect and mitigate hazardous conditions
WO2017119888A1 (en) * 2016-01-07 2017-07-13 Trend Micro Incorporated Metadata extraction
US9998487B2 (en) * 2016-04-25 2018-06-12 General Electric Company Domain level threat detection for industrial asset control system
CN106022129B (en) * 2016-05-17 2019-02-15 北京江民新科技术有限公司 Data characteristics extracting method, device and the virus characteristic detection system of file
CN106209843A (en) * 2016-07-12 2016-12-07 工业和信息化部电子工业标准化研究院 A kind of data flow anomaly towards Modbus agreement analyzes method
CN106603531A (en) * 2016-12-15 2017-04-26 中国科学院沈阳自动化研究所 Automatic establishing method of intrusion detection model based on industrial control network and apparatus thereof
CN106790108B (en) * 2016-12-26 2019-12-06 东软集团股份有限公司 Protocol data analysis method, device and system
RU2659482C1 (en) * 2017-01-17 2018-07-02 Общество с ограниченной ответственностью "СолидСофт" Protection of web applications with intelligent network screen with automatic application modeling
US11165802B2 (en) * 2017-12-05 2021-11-02 Schweitzer Engineering Laboratories, Inc. Network security assessment using a network traffic parameter
NL2020552B1 (en) * 2018-03-08 2019-09-13 Forescout Tech B V Attribute-based policies for integrity monitoring and network intrusion detection
CA3092260A1 (en) * 2018-03-08 2019-09-12 Forescout Technologies, Inc. Attribute-based policies for integrity monitoring and network intrusion detection
US11475370B2 (en) * 2018-11-29 2022-10-18 Microsoft Technology Licensing, Llc Providing custom machine-learning models
FR3090153B1 (en) * 2018-12-17 2022-01-07 Commissariat Energie Atomique Method and system for detecting anomaly in a telecommunications network
HRP20220802T1 (en) * 2018-12-28 2022-10-14 Nozomi Networks Sagl Method and apparatus for detecting the anomalies of an infrastructure
US10802937B2 (en) * 2019-02-13 2020-10-13 United States Of America As Represented By The Secretary Of The Navy High order layer intrusion detection using neural networks
KR102204290B1 (en) * 2019-08-23 2021-01-18 고려대학교 세종산학협력단 Identification of delimiter and static field in protocol reverse engineering using statistic analysis
US11621970B2 (en) * 2019-09-13 2023-04-04 Is5 Communications, Inc. Machine learning based intrusion detection system for mission critical systems
CN110912908B (en) * 2019-11-28 2022-08-02 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Network protocol anomaly detection method and device, computer equipment and storage medium
CN111126627B (en) * 2019-12-25 2023-07-04 四川新网银行股份有限公司 Model training system based on separation index
CN111585993B (en) * 2020-04-27 2022-08-09 深信服科技股份有限公司 Method, device and equipment for detecting communication of hidden channel
CN111885059B (en) * 2020-07-23 2021-08-31 清华大学 A method for detecting and locating anomaly in industrial network traffic
CN111865723A (en) * 2020-07-25 2020-10-30 深圳市维度统计咨询股份有限公司 Network data acquisition system based on big data
US20240028010A1 (en) * 2020-09-29 2024-01-25 Fanuc Corporation Network relay device
CN112887280B (en) * 2021-01-13 2022-05-31 中国人民解放军国防科技大学 Network protocol metadata extraction system and method based on automaton
US11363050B1 (en) 2021-03-25 2022-06-14 Bank Of America Corporation Information security system and method for incompliance detection in data transmission
US11799879B2 (en) 2021-05-18 2023-10-24 Bank Of America Corporation Real-time anomaly detection for network security
US11792213B2 (en) 2021-05-18 2023-10-17 Bank Of America Corporation Temporal-based anomaly detection for network security
US11588835B2 (en) 2021-05-18 2023-02-21 Bank Of America Corporation Dynamic network security monitoring system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153785A1 (en) * 2006-10-30 2010-06-17 The Trustees Of Columbia University In The City Of New York Methods, media, and systems for detecting an anomalous sequence of function calls
US20110167493A1 (en) * 2008-05-27 2011-07-07 Yingbo Song Systems, methods, ane media for detecting network anomalies

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09153924A (en) * 1995-11-28 1997-06-10 Nec Corp Procedure error detection system for communication control system
US6078874A (en) * 1998-08-04 2000-06-20 Csi Technology, Inc. Apparatus and method for machine data collection
US6925454B2 (en) * 2000-12-12 2005-08-02 International Business Machines Corporation Methodology for creating and maintaining a scheme for categorizing electronic communications
US6898737B2 (en) * 2001-05-24 2005-05-24 Microsoft Corporation Automatic classification of event data
US7225343B1 (en) 2002-01-25 2007-05-29 The Trustees Of Columbia University In The City Of New York System and methods for adaptive model generation for detecting intrusions in computer systems
US8370936B2 (en) * 2002-02-08 2013-02-05 Juniper Networks, Inc. Multi-method gateway-based network security systems and methods
EP1490768B1 (en) * 2002-03-29 2007-09-26 Global Dataguard, Inc. Adaptive behavioural intrusion detection
US7039702B1 (en) * 2002-04-26 2006-05-02 Mcafee, Inc. Network analyzer engine system and method
US7349917B2 (en) * 2002-10-01 2008-03-25 Hewlett-Packard Development Company, L.P. Hierarchical categorization method and system with automatic local selection of classifiers
US7305708B2 (en) * 2003-04-14 2007-12-04 Sourcefire, Inc. Methods and systems for intrusion detection
WO2004107706A1 (en) * 2003-05-30 2004-12-09 International Business Machines Corporation Detecting network attacks
EP2618538B1 (en) * 2003-11-12 2018-09-05 The Trustees Of Columbia University In The City Of New York Apparatus, Method and Medium for Detecting Payload Anomaly using N-Gram Distribution of Normal Data
US8656488B2 (en) * 2005-03-11 2014-02-18 Trend Micro Incorporated Method and apparatus for securing a computer network by multi-layer protocol scanning
US7860006B1 (en) * 2005-04-27 2010-12-28 Extreme Networks, Inc. Integrated methods of performing network switch functions
US8631483B2 (en) * 2005-06-14 2014-01-14 Texas Instruments Incorporated Packet processors and packet filter processes, circuits, devices, and systems
US7757283B2 (en) * 2005-07-08 2010-07-13 Alcatel Lucent System and method for detecting abnormal traffic based on early notification
US9055093B2 (en) * 2005-10-21 2015-06-09 Kevin R. Borders Method, system and computer program product for detecting at least one of security threats and undesirable computer files
US20070150574A1 (en) * 2005-12-06 2007-06-28 Rizwan Mallal Method for detecting, monitoring, and controlling web services
IL189530A0 (en) * 2007-02-15 2009-02-11 Marvell Software Solutions Isr Method and apparatus for deep packet inspection for network intrusion detection
US20080295173A1 (en) * 2007-05-21 2008-11-27 Tsvetomir Iliev Tsvetanov Pattern-based network defense mechanism
US7966660B2 (en) * 2007-05-23 2011-06-21 Honeywell International Inc. Apparatus and method for deploying a wireless network intrusion detection system to resource-constrained devices
US9100319B2 (en) * 2007-08-10 2015-08-04 Fortinet, Inc. Context-aware pattern matching accelerator
CN100531073C (en) 2007-08-24 2009-08-19 北京启明星辰信息技术股份有限公司 Condition detection based protocol abnormity detecting method and system
CN101399710B (en) * 2007-09-29 2011-06-22 北京启明星辰信息技术股份有限公司 Detection method and system for protocol format exception
EP2200249A1 (en) * 2008-12-17 2010-06-23 Abb Research Ltd. Network analysis
FI20096394A0 (en) * 2009-12-23 2009-12-23 Valtion Teknillinen DETECTING DETECTION IN COMMUNICATIONS NETWORKS
US9100425B2 (en) * 2010-12-01 2015-08-04 Cisco Technology, Inc. Method and apparatus for detecting malicious software using generic signatures
NL2007180C2 (en) * 2011-07-26 2013-01-29 Security Matters B V Method and system for classifying a protocol message in a data communication network.
US9092802B1 (en) * 2011-08-15 2015-07-28 Ramakrishna Akella Statistical machine learning and business process models systems and methods
ES2755780T3 (en) * 2011-09-16 2020-04-23 Veracode Inc Automated behavior and static analysis using an instrumented sandbox and machine learning classification for mobile security
US9189746B2 (en) * 2012-01-12 2015-11-17 Microsoft Technology Licensing, Llc Machine-learning based classification of user accounts based on email addresses and other account information
US9292688B2 (en) * 2012-09-26 2016-03-22 Northrop Grumman Systems Corporation System and method for automated machine-learning, zero-day malware detection
US10514977B2 (en) * 2013-03-15 2019-12-24 Richard B. Jones System and method for the dynamic analysis of event data
US10476742B1 (en) * 2015-09-24 2019-11-12 Amazon Technologies, Inc. Classification of auto scaling events impacting computing resources

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153785A1 (en) * 2006-10-30 2010-06-17 The Trustees Of Columbia University In The City Of New York Methods, media, and systems for detecting an anomalous sequence of function calls
US20110167493A1 (en) * 2008-05-27 2011-07-07 Yingbo Song Systems, methods, ane media for detecting network anomalies

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11012330B2 (en) * 2011-07-26 2021-05-18 Forescout Technologies, Inc. Method and system for classifying a protocol message in a data communication network
US9628497B2 (en) * 2011-07-26 2017-04-18 Security Matters B.V. Method and system for classifying a protocol message in a data communication network
US20140297572A1 (en) * 2011-07-26 2014-10-02 Security Matters B.V. Method and system for classifying a protocol message in a data communication network
US20150264074A1 (en) * 2012-09-28 2015-09-17 Hewlett-Packard Development Company, L.P. Application security testing
US9438617B2 (en) * 2012-09-28 2016-09-06 Hewlett Packard Enterprise Development Lp Application security testing
CN104270275A (en) * 2014-10-14 2015-01-07 步步高教育电子有限公司 Auxiliary analysis method for causes of exceptions, server and intelligent equipment
US9953158B1 (en) * 2015-04-21 2018-04-24 Symantec Corporation Systems and methods for enforcing secure software execution
TWI562013B (en) * 2015-07-06 2016-12-11 Wistron Corp Method, system and apparatus for predicting abnormality
US10887328B1 (en) * 2015-09-29 2021-01-05 Fireeye, Inc. System and method for detecting interpreter-based exploit attacks
WO2016184194A1 (en) * 2015-10-29 2016-11-24 中兴通讯股份有限公司 Method and device for intercepting push information, and terminal
US11005823B2 (en) * 2016-01-08 2021-05-11 Capital One Services, Llc Field level security system for securing sensitive data
US11960844B2 (en) * 2017-05-10 2024-04-16 Oracle International Corporation Discourse parsing using semantic and syntactic relations
US11875118B2 (en) * 2017-05-10 2024-01-16 Oracle International Corporation Detection of deception within text using communicative discourse trees
US20210042473A1 (en) * 2017-05-10 2021-02-11 Oracle International Corporation Enabling chatbots by validating argumentation
US20200410166A1 (en) * 2017-05-10 2020-12-31 Oracle International Corporation Enabling chatbots by detecting and supporting affective argumentation
US12141535B2 (en) * 2017-05-10 2024-11-12 Oracle International Corporation Techniques for maintaining rhetorical flow
US20220253611A1 (en) * 2017-05-10 2022-08-11 Oracle International Corporation Techniques for maintaining rhetorical flow
US20220318513A9 (en) * 2017-05-10 2022-10-06 Oracle International Corporation Discourse parsing using semantic and syntactic relations
US12001804B2 (en) 2017-05-10 2024-06-04 Oracle International Corporation Using communicative discourse trees to detect distributed incompetence
US11748572B2 (en) * 2017-05-10 2023-09-05 Oracle International Corporation Enabling chatbots by validating argumentation
US20210165969A1 (en) * 2017-05-10 2021-06-03 Oracle International Corporation Detection of deception within text using communicative discourse trees
US11775771B2 (en) 2017-05-10 2023-10-03 Oracle International Corporation Enabling rhetorical analysis via the use of communicative discourse trees
US11783126B2 (en) * 2017-05-10 2023-10-10 Oracle International Corporation Enabling chatbots by detecting and supporting affective argumentation
US11363059B2 (en) * 2019-12-13 2022-06-14 Microsoft Technology Licensing, Llc Detection of brute force attacks
US11757931B2 (en) * 2019-12-13 2023-09-12 Microsoft Technology Licensing, Llc Detection of brute force attacks
US20220329620A1 (en) * 2019-12-13 2022-10-13 Microsoft Technology Licensing, Llc Detection of brute force attacks
CN113742475A (en) * 2021-09-10 2021-12-03 绿盟科技集团股份有限公司 Office document detection method, apparatus, device and medium

Also Published As

Publication number Publication date
US20170195197A1 (en) 2017-07-06
CN103748853B (en) 2017-03-08
IL230440A (en) 2017-10-31
CA2842465A1 (en) 2013-01-31
EP2737683B1 (en) 2016-03-02
BR112014001691A2 (en) 2020-10-27
EA037617B1 (en) 2021-04-22
EP2737683A1 (en) 2014-06-04
NL2007180C2 (en) 2013-01-29
US20210344578A1 (en) 2021-11-04
CN103748853A (en) 2014-04-23
US20140297572A1 (en) 2014-10-02
BR112014001691B1 (en) 2022-06-21
US9628497B2 (en) 2017-04-18
CA2842465C (en) 2021-05-04
US11012330B2 (en) 2021-05-18
IL254829B (en) 2018-06-28
WO2013015691A1 (en) 2013-01-31
US11902126B2 (en) 2024-02-13
IL254829A0 (en) 2017-12-31
ES2581053T3 (en) 2016-08-31
JP6117202B2 (en) 2017-04-19
EA201490333A1 (en) 2014-06-30
JP2014522167A (en) 2014-08-28
IL230440A0 (en) 2014-03-31

Similar Documents

Publication Publication Date Title
US20140090054A1 (en) System and Method for Detecting Anomalies in Electronic Documents
US11188650B2 (en) Detection of malware using feature hashing
Andronio et al. Heldroid: Dissecting and detecting mobile ransomware
US9846776B1 (en) System and method for detecting file altering behaviors pertaining to a malicious attack
US20100037317A1 (en) Mehtod and system for security monitoring of the interface between a browser and an external browser module
US9336390B2 (en) Selective assessment of maliciousness of software code executed in the address space of a trusted process
US8806641B1 (en) Systems and methods for detecting malware variants
EP2310974B1 (en) Intelligent hashes for centralized malware detection
US8191147B1 (en) Method for malware removal based on network signatures and file system artifacts
US10009370B1 (en) Detection and remediation of potentially malicious files
US8499350B1 (en) Detecting malware through package behavior
JP6909770B2 (en) Systems and methods for creating antivirus records
US20080010538A1 (en) Detecting suspicious embedded malicious content in benign file formats
US20140223566A1 (en) System and method for automatic generation of heuristic algorithms for malicious object identification
EP2984600A1 (en) A framework for coordination between endpoint security and network security services
US8869284B1 (en) Systems and methods for evaluating application trustworthiness
WO2014071867A1 (en) Program processing method and system, and client and server for program processing
US11349865B1 (en) Signatureless detection of malicious MS Office documents containing embedded OLE objects
US8099784B1 (en) Behavioral detection based on uninstaller modification or removal
US10339305B2 (en) Sub-execution environment controller
US11170103B2 (en) Method of detecting malicious files resisting analysis in an isolated environment
US10621345B1 (en) File security using file format validation
EP3019995B1 (en) Identifying misuse of legitimate objects
US11886584B2 (en) System and method for detecting potentially malicious changes in applications
EP4095727A1 (en) System and method for detecting potentially malicious changes in applications

Legal Events

Date Code Title Description
AS Assignment

Owner name: SECURITYMATTERS B.V., NETHERLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOLZONI, DAMIANO;ZAMBON, EMMANUELE;SIGNING DATES FROM 20130322 TO 20130325;REEL/FRAME:030079/0303

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION