US20110119100A1 - Method and System for Displaying Anomalies in Time Series Data - Google Patents
Method and System for Displaying Anomalies in Time Series Data Download PDFInfo
- Publication number
- US20110119100A1 US20110119100A1 US12/907,916 US90791610A US2011119100A1 US 20110119100 A1 US20110119100 A1 US 20110119100A1 US 90791610 A US90791610 A US 90791610A US 2011119100 A1 US2011119100 A1 US 2011119100A1
- Authority
- US
- United States
- Prior art keywords
- anomalies
- time
- attribute
- data
- time series
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 57
- 230000035945 sensitivity Effects 0.000 claims abstract description 29
- 230000004044 response Effects 0.000 claims abstract description 27
- 230000001419 dependent effect Effects 0.000 claims description 14
- 230000003993 interaction Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 description 37
- 230000008569 process Effects 0.000 description 25
- 230000002354 daily effect Effects 0.000 description 18
- 238000004891 communication Methods 0.000 description 16
- 230000003442 weekly effect Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 15
- 230000002776 aggregation Effects 0.000 description 13
- 238000004220 aggregation Methods 0.000 description 13
- 238000009499 grossing Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 238000012417 linear regression Methods 0.000 description 9
- 238000013459 approach Methods 0.000 description 8
- 230000004931 aggregating effect Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 230000001143 conditioned effect Effects 0.000 description 4
- 238000013479 data entry Methods 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013179 statistical model Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 241000239290 Araneae Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
Definitions
- the disclosed embodiments relate generally to web analytics data mining, and in particular, to a system and method for detecting and displaying events of potential interest in time series data.
- Web analytics is the measurement, collection, analysis and reporting of the traffic data of a web site for purposes such as understanding and optimizing web site usage.
- the traffic data is typically organized in the form of one or more multidimensional datasets whose metadata may include multiple dimensions and metric attributes (also known as “measures”).
- Conventional approaches typically generate multiple (sometimes hundreds of) reports by focusing on the factual aspects of the web traffic, e.g., by visualizing different subsets of a multidimensional dataset defined by various configurations of dimensions and metric attributes. From examining the visualized traffic data, a web analyst may be able to discover useful information for improving the quality and volume of the traffic to the web site.
- a computer-implemented method for detecting anomalies in time series data at a server system is disclosed.
- the server system is connected to one or more client devices through a network.
- the server system stores time series data for a data source.
- the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value.
- the server system For a particular attribute, the server system generates a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance.
- the server system determines whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value and tags the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models.
- the server system reports to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes.
- a server system for identifying anomalies in time series data is disclosed.
- the server system is connected to one or more client devices through a network.
- the server system includes one or more processors for executing programs and memory to store data and to store one or more programs to be executed by the one or more processors.
- the one or more programs including instructions for: storing time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute: determining whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value; and tagging the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models; and in response to a request from a client application for analytics information for the data source, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes.
- a computer readable-storage medium stores one or more programs for execution by one or more processors of a server system.
- the server system is connected to one or more client devices through a network.
- the one or more programs include instructions for: storing time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute: determining whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value; and tagging the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models; and in response to a
- a graphical user interface for presenting time series data and anomalies for a data source on a display of a client computer having a user input device.
- the graphical user interface includes a first window and a second window below the first window on the display.
- the first window on the display includes: a graph of time series data values for a first attribute for the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and a histogram of anomalies for the data source, each of the anomalies corresponding to a value of an attribute that is substantially different from an expected value of the attribute, the histogram having the same time axis scale as the graph and a dependent total anomalies axis.
- the height of a respective bar along the total anomalies axis represents a total number of anomalies for the data source at a corresponding time on the time axis.
- the second window on the display includes a list of automatic alerts characterizing a set of anomalies for the data source at a particular time on the time axis. The particular time is designated by a user via interaction with the graph through the user input device and each item of the list of automatic alerts corresponds to an anomaly associated with a respective attribute for the data source.
- FIG. 1A is an overview block diagram of an analytics system for collecting web traffic data and performing web analytics on the data in accordance with some embodiments.
- FIG. 1B is an overview block diagram of the analytics system for preparing and providing user-requested web analytics results to the users at different clients accordance with some embodiments.
- FIG. 2 is a block diagram of a data structure used in the hits database 155 to store sessionized web traffic data at different web sites in accordance with some embodiments
- FIG. 3 is a block diagram of a data structure used in the aggregates database 165 to store aggregated web traffic data at different web sites in accordance with some embodiments.
- FIG. 4 is a block diagram of a data structure used in the time series database 175 to store time series data extracted from the aggregated web traffic data in accordance with some embodiments.
- FIG. 5 is a block diagram of a data structure used in the events database 185 to store events of potential interest detected in the time series data in accordance with some embodiments.
- FIG. 6A is a flow chart of a process for updating the time series data using the aggregated data updates in accordance with some embodiments.
- FIG. 6B is a block diagram of an exemplary process for updating a time series on a weekly basis in accordance with some embodiments.
- FIGS. 7A and 7B are flow charts of a model-based process for detecting events of potential interest in a time series in accordance with some embodiments.
- FIG. 7C is a flow chart of a rule-based process for detecting events of potential interest in a time series in accordance with some embodiments.
- FIGS. 8A and 8B are flow charts illustrating how the analytics system prepares and serves a report of events of interest in response to a user request in accordance with some embodiments.
- FIG. 9 is a block diagram of a client device for requesting and rendering web analytics reports in accordance with some embodiments.
- FIG. 10 is a block diagram of an analytics system for processing web traffic data, identifying events of potential interest therein, and serving web analytics reports in response to user requests in accordance with some embodiments.
- FIGS. 11A to 11C are screenshots of graphical user interfaces that display daily, weekly, and monthly events of potential interest, respectively, in accordance with some embodiments.
- FIGS. 12A to 12E are screenshots of graphical user interfaces that displays information relating to events of potential interest in accordance with some embodiments.
- FIGS. 13A to 13C are screenshots of graphical user interfaces that display different numbers of events of potential interest based on a respective user-specified sensitivity threshold in accordance with some embodiments.
- FIGS. 14A and 14B are screenshots of graphical user interfaces that display events of potential interest based on a respective user-specified organization manner in accordance with some embodiments.
- FIGS. 15A and 15B depict a flow chart of a method for identifying anomalies in time series data in accordance with some embodiments.
- FIGS. 16A and 16B depict another flow chart of a method for identifying anomalies in time series data implemented by different components of a server system with a processor and memory in accordance with some embodiments.
- FIGS. 17A to 17C depict another flow chart of a method for detecting anomalies in web analytics data implemented at a server system in accordance with some embodiments.
- FIG. 1A illustrates a distributed computer system 100 in accordance with some embodiments.
- the distributed system 100 includes one or more web servers 120 that host web sites and serve web pages upon receiving requests from clients 110 .
- the web servers 120 collect web traffic data in logfiles 130 .
- the web pages hosted by the web servers 120 include one or more embedded computer programs such as Javascript codes for capturing the web traffic data.
- the embedded computer programs When a user requests and downloads the web pages to a client 110 , the embedded computer programs also reside in the client 110 and monitor the user's activities on the web pages. This approach can avoid some web caching-related issues and is sometimes referred to as “page tagging.”
- a web server 120 may employ both mechanisms for gathering web traffic data.
- the distributed system 100 includes an analytics system 140 that includes a log processor 150 for extracting web page hit data from the logfiles 130 or receiving web page hit data captured by the embedded computer programs from the clients 110 and storing the hit data in a hits database 155 .
- One or more aggregation servers 160 process the hit data and generate aggregated web analytics data that is stored in aggregates database 165 .
- the time series gathering servers 170 extract or receive newly aggregated data from the aggregates database 165 and create or update a plurality of time series for each web site, which are stored in the time series database 175 . In some embodiments, the time series gathering servers 170 also extract web analytics data from the hits database 155 .
- One or more event detection servers 180 process the time series in the database 175 at regular time interval (e.g., nightly, weekly or monthly) to detect events of potential interest therein and store the events in the events database 185 .
- the event detection process is a rule-based one in which the event detection servers 180 extract user-specified alert rules from the alert rules database 195 .
- the analytics system 140 includes a query processor 190 for accessing the aggregates database 165 , the time series database 175 , and the events database 185 , and returning the query results as web analytics reports to users of the analytics system 140 (who use the analytics system to track the visitors' activities at one or more of their web sites). If the user-requested data has not been aggregated, the query processor 180 reads the raw hits data in real time and computes the desired aggregates from it.
- the analytics system 140 processes and returns a set of the web analytics reports that correspond to a desired data view specified by a user. In some embodiments, the analytics system 140 identifies those hits in the hits database 155 that are context-insensitive and processes these hits to incrementally update a first plurality of aggregate tables in the aggregates database 165 . The analytics system 140 identifies those hits in the hits database 155 that are context-sensitive and processes these hits to incrementally update a second plurality of aggregate tables using the second context-sensitive entries, but only, at the end of the specified period of time, such as at the end of the day. Doing so speeds up the incremental updates to more than 90% of the data, as discussed below.
- the distributed system 100 also includes a plurality of data servers 106 that store one or more data structures, such as tables, that may be used by the analytics system 140 for storage.
- the data servers 106 store the logfiles 130 , the hit data 155 , the aggregate data 165 , the time series data 175 , and/or the events data 185 .
- data servers 106 are clustered in a data center or in two or more interconnected data centers.
- the distributed system 100 includes as many as 1000 data servers or more.
- the various components of the distributed system 100 are interconnected by a network 102 .
- the network 102 may be any suitable network, including but not limited to a local area network (LAN), a wide-area network (WAN), the Internet, an Ethernet network, a virtual private network (VPN), or any combination of such networks.
- the network 102 can be wired or wireless.
- the network 102 uses the HyperText Transport Protocol (HTTP) and the Transmission Control Protocol/Internet Protocol (TCP/IP) to transport information between different networks.
- HTTP permits client devices to access various information items available on the Internet via the network 102 .
- the various embodiments of the invention are not limited to the use of any particular protocol.
- the log data entry (stored in one or more databases represented by logfiles 130 or captured by the computer program embedded in the web page) records multiple variables about the visits, typically including the IP address, the user agent, the web page viewed, the time and date that the web page was accessed and a status field.
- Each data entry in a log file represents a single “hit” on a file hosted by a web server 120 , and consists of a number of fields (explained below in connection with FIG. 2 ). Any server request is considered a hit. For example, when a visitor calls up a web page with six images, that is seven hits—one for the page, and six for the images.
- the visitor may have employed a query in a search engine and the web-site under scrutiny was turned up in the search results.
- the corresponding entry in the log data may reveal a “reference” and the “search term” entered by the visitor.
- the visitor is not an individual, but rather a software process such as an Internet robot, web crawler or spider, link checker, mirror agent, hacker, or other such entity used to systematically peruse vast amounts of data available via the network 102 .
- the log data entry corresponding to such accesses may display an IP address, host name and/or user agent that may be associated with such entities.
- a session identifier or session ID is a unique identifier (such as, a fixed-length alphanumeric string) that a web server assigns to a specific user for the duration of that user's visit and that identifies the user's session (maybe a series of related message exchanges).
- Session identifiers become necessary in cases where the communications infrastructure uses a stateless protocol such as HTTP. For example, a buyer who visits a seller's web site wants to collect a number of articles in a virtual shopping cart and then finalize the shopping transaction by going to the site's checkout page. This typically involves an ongoing communication including several web pages requested by the client 110 and sent back by the server 120 . In such a situation, it is vital to keep track of the current state of the shopper's cart, and a session ID is one way to achieve that goal.
- a session ID is typically granted to a visitor on his first visit to a web site. It is different from a user ID because sessions are typically short-lived (they expire after a preset time of inactivity which may be minutes or hours) and may become invalid after a certain goal has been met (for example, once the buyer has finalized his order, he can not use the same session ID to add more items).
- FIG. 1B illustrates the distributed system 100 with an emphasis on the client-server interactions in accordance with some embodiments.
- a client 110 (also known as a “client device”) may be any computer or similar device through which a user of the client 110 can submit data access requests to and receive results or other services from the analytics system 140 . Examples include, without limitation, desktop computers, laptop computers, tablet computers, mobile devices such as mobile phones, personal digital assistants, set-top boxes, or any combination of the above.
- a respective client 110 may contain at least one client application 112 for submitting requests to the analytics system 140 .
- the client application 112 can be a web browser or other type of application that permits a user to access the services provided by the analytics system 140 .
- the client application 112 includes one or more client assistants 114 .
- a client assistant 114 can be a software application that performs tasks related to assisting a user's activities with respect to the client application 112 and/or other applications.
- a client assistant 114 includes a local copy of the executable version of the embedded computer programs for collecting web analytics data relating to web pages from a particular web site.
- the client assistant 114 may assist a user at the client 110 with browsing information (e.g., web pages), processing information (e.g., query results) received from the analytics system 140 , and monitoring the user's activities on the query results.
- the client assistant 114 is embedded in a web page (e.g., a query results web page) or other documents downloaded from the analytics system 140 .
- the client assistant 114 is a part of the client application 112 (e.g., a plug-in application of a web browser).
- the client 110 further includes a communication interface 118 to support the communication between the client 110 and other devices (e.g., the analytics system 140 or another client 110 ).
- the query processor 190 includes a web interface 192 (sometimes referred to as a “front-end server”) and a server application 194 (sometimes referred to as a “mid-tier server” or “mid-tier API”).
- the web interface 192 receives data access requests from client devices 110 and forwards the requests to the server application 194 .
- the server application 194 processes the requests including generating database queries associated with a request, applying the queries to different databases for data requested by the client, and returning the query results to the requesting clients 110 .
- the client application 112 at a particular client 110 displays the result to the user who submits the original request.
- each of the databases shown in FIGS. 1A and 1B is effectively a database management system including a database server that is configured to manage a large number of data records stored in the corresponding database.
- the database server In response to a query submitted by the server application 194 , the database server identifies zero or more data records that satisfy the query and returns the data records to the server application 194 for further processing.
- the analytics system 140 is an application service provider (ASP) that provides web analytics services to its customers (e.g., a web site owner) by visualizing the web traffic data generated at a web site in accordance with various user requests.
- ASP application service provider
- FIG. 2 is a block diagram of a data structure used in the hits database 155 to store sessionized web traffic data at different web sites in accordance with some embodiments.
- the web traffic data stored in the data structure 200 have a hierarchical structure. The top level of the hierarchy corresponds to different web sites 200 A, 200 B (i.e., different web servers). For a respective web site, the traffic data is grouped into multiple sessions 210 A, 210 B, and each session having a unique session ID 220 .
- a session ID uniquely identifies a user's session with the web site 200 A for the duration of that user's visit.
- session-level attributes include the operating system 220 B (i.e., the operating system the computer runs on from which the user accesses the web site), the browser name 220 C (i.e., the web browser application used by the user for accessing the web site) and the browser version 220 D, geographical information of the computer such as the country 220 E and the city 220 F, etc.
- the web traffic data within a user session is further divided into one or more hits 230 A to 230 N.
- a hit typically corresponds to a request to a web server for a document such as a web page, an image, a JavaScript file, a Cascading Style Sheet (CSS) file, etc.
- Each hit 230 A may be characterized by attributes such as the type of hit 240 A (e.g., transaction hit, etc.), the referral URL 240 B (i.e., the web page the visitor was on when the hit was generated), the timestamp 240 C that indicates when the hit occurs and so on.
- session-level and hit-level attributes as shown in FIG. 2 are listed for illustrative purposes only.
- a session or a hit of web traffic data may include many other attributes that either exist in the raw traffic data (e.g., the timestamp) or can be derived from the raw traffic data by the analytics system 150 (e.g., the average pageviews per session).
- the aggregation servers 160 is responsible for aggregating the data records in the hits database 155 at a regular time interval (e.g., per day or per hour) based on their respective session IDs and other dimension or metric attributes. For example, the aggregation servers 160 may determine the total number of visits to a web site during one day by counting the number of sessions associated with the web site for the same day. The aggregation servers 160 may also determine the total number of visits to a web site using a particular type or even version of web browser during one day by counting the number of sessions associated with the web site for the same day that have the specified type or even version of web browser. In some embodiments, the aggregation servers 160 determine values for hundreds or even thousands of predefined attributes based on the hits data records and store the determined values and their associated attributes in a data structure like the one shown in FIG. 3 in accordance with some embodiments.
- a regular time interval e.g., per day or per hour
- the aggregated data stored in the data structure 300 also has a hierarchical structure.
- the top level of the hierarchy corresponds to different sources 300 A, 300 B (e.g., different web sites), each source having a unique source ID 310 A.
- the aggregated metrics 310 B include those attributes and associated values that are determined from the hits data for a predefined period of time without applying any restrictions. For example, if the predefined period of time is one day, the visits attribute 320 A may be associated with one or more pairs of (time, value) 330 A in which the time represents a specific day such as Oct.
- the pageview attribute 320 B is also associated with one or more pairs of (time, value) 330 B in which the time represents a specific day and the value represents the total number of pageviews during the same day regardless of, e.g., what web browser is used for each pageview.
- a breakdown of a lump sum metric value (e.g., the visits 320 A) into multiple values defined by different conditions is desired because it can provide more information to a web analyst about the web traffic.
- the conditions 310 C limit the aggregation of web traffic data for a particular web site to sessions whose country is China.
- the aggregation servers 160 generate another set of aggregated metrics 320 C by skipping any session whose country is not China.
- the conditions 310 D focuses only on the sessions that use Firefox as the web browser. Accordingly, the aggregated metrics 320 D should not take into account of any session that uses Internet Explorer.
- condition-free aggregated metrics 310 B may be derived from the conditioned aggregated metrics 320 C, 320 D.
- the aggregate servers 160 typically pre-compute values for many hundreds of aggregated metrics with or without conditions and store those values in the aggregates database 165 for future use.
- One use of the aggregates database 165 is to detect events of potential interest in the web analytics data and present them to a web analyst in an intuitive manner.
- An event of potential interest (also referred to as an alert or an anomaly in this application) is something that might be valuable to the web analyst but is hidden in the vast amount of web traffic data and difficult to identify.
- a market analyst is very interested in learning the advertisement's effectiveness in terms of whether there is any traffic increase at the web site during a predefined time period, from what source it sees the largest traffic increase or decrease, and how much of the increased web traffic is related to the advertisement (e.g., as measured by the click-through rate).
- a webmaster concerned with the security of a web site is interested in learning about abnormal web traffic patterns as early as possible to prevent serious attacks.
- One aspect of the present application is to develop a system that can automatically detect those events of potential interest from the web analytics data with no or minimal user effort and present the detection result to the web analyst in an efficient and user-friendly manner to help the web analyst's decision making process.
- the process of identifying any events of potential interest in the web analytics data begins with deriving a number of time series or time sequences from the aggregated web analytics data stored in the data structure shown in FIG. 3 and store the time series in another data structure for further processing.
- a number of time series or time sequences from the aggregated web analytics data stored in the data structure shown in FIG. 3 and store the time series in another data structure for further processing.
- at least two ways of detecting events of potential interest are disclosed in the present application: (i) model-based event detection; and (ii) rule-based event detection.
- the model-based event detection method described herein applies one or more statistical models to a time series to forecast or predict or estimate one or more values for a future time period and then compares the predicted values with the actual value when available. If the differences between the predicted values and the actual value meet a predefined condition, an event of potential interest or an anomaly is identified for the corresponding time period.
- the rule-based approach combines the prediction models and the predefined condition of the model-based approach into a user-specified alert rule.
- one alert rule may specify that an event of potential interest is detected if the revenue metric attribute of a website at a particular date drops at least 15% than the revenue metric attribute of the same website at the same date of the previous year.
- the model-based or rule-based event detection method can also be performed on a collection of time series data, e.g., in a batch mode, to not only predict anomalies in the future (which is typically the current day, week, or month) but also identify anomalies in the past.
- the anomaly prediction for the current time period e.g., today, this week or month
- the prediction for the current time period may start right after the time series update with the data samples of the immediately previous time period.
- the anomaly prediction for the current time period uses the data samples from the current time period as well.
- FIG. 4 is a block diagram of a data structure that stores time series data extracted from the aggregated web traffic data in accordance with some embodiments.
- the time series data stored in the data structure 400 has a hierarchical structure.
- the top level of the hierarchy corresponds to different sources 400 A, 400 B (e.g., different web sites), each source having a unique source ID 410 A.
- the source ID 410 A may be the same as the source ID 310 A for the same source.
- each source in the data structure 400 may be associated with a plurality of time series, each time series having a unique combination of metric and condition.
- the metric 410 B is the number of new visits to a website during a day and the condition 410 C is that only new visits that come from Paris should be considered.
- the time series 410 D includes a time series ID 420 A and one or more time series updates 420 B, 420 C and each time series update includes one or more pairs of (time, value) 430 A wherein the “time” parameter corresponds to a particular day and the “value” parameter corresponds to a particular number of new visits from Paris during that day.
- time series including multiple updates is provided below in connection with FIG. 6B .
- each source may be characterized by hundreds of metric and dimension attributes in the hits database 155 .
- Different combination schemes of the metric and dimension attributes may produce thousands of possible time series. From a web analyst's perspective, not every possible time series is important enough to justify a spot in the time series database 175 .
- each (condition-free or conditioned) time series stored in the time series database 175 is generated because it may carry information of interest to many web analysts.
- a web master of a website is allowed to define his or her own new metric or dimension attributes or customize the existing metric or dimension attributes to have a better characterization of the traffic to the website.
- the new or customized attributes are additional sources for generating time series data for event detection using the invention disclosed in this application.
- a more detailed description of how to define new or customize existing attributes can be found in a pending application entitled “Extensible custom variables for tracking user traffic” (attorney docket number 060963-5420-US) filed Oct. 20, 2009, which is hereby incorporated by reference in its entirety.
- the time series in the data structure 400 are derived from the aggregated data in the data structure 300 of FIG. 3 . If a time series corresponds to the aggregated metrics of an entire source free of any precondition, the condition for this time series in the data structure 400 does not exist or is none. In this case, the time series is also referred to as a “condition-free” time series. If a time series corresponds to the aggregated metrics of the source with one or more conditions, the same conditions used for aggregating the web traffic data are also the conditions in the data structure 400 for the corresponding time series. In this case, the time series is also referred to as a “conditioned” time series.
- a source has a number (e.g., 10) of condition-free time series including the metrics like visits, pageviews, bounce rate, pages/visit, new visits, and average time on site, etc.
- the source may have more (e.g., 100) conditioned time series, each having a unique set of conditions for filtering out data that does not meet any of the predefined conditions.
- the time series gathering servers 170 may need to access the hits database 155 to build the time series directly on top of the hits data or even the raw web traffic data from the logfiles 130 or the Javascript code of a client assistance 114 that monitors the user activities at a web page.
- the time series gathering servers 170 can send a request to the aggregation servers 160 for aggregating the hits data according to the time series definition and return the aggregated data to the time series gathering servers 170 .
- time series database 175 does not include every possible time series that can be derived from a website's hits data, it is a challenge for the time series database 175 to host so many time series related to different sources.
- some data quantization and compression techniques may be employed to keep the time series storage relatively small. For example, a value in the time series database 175 is rounded and stored in the form of an expression like a*2 b , where the parameter “a” is encoded with a small number (e.g., 5) of bits and the parameter “b” can have more bits such that the difference between the value and the expression is as small as possible.
- This data quantization scheme is acceptable as long as the loss of precision does not defeat the purpose of detecting those events of potential interest.
- each value at a particular date may be a very large number (e.g., three or four digits) but the difference between two consecutive dates may be much smaller (e.g., only two digits).
- a very large number e.g., three or four digits
- the difference between two consecutive dates may be much smaller (e.g., only two digits).
- one way of saving the storage space in this situation is to calculate the difference between two consecutive values and store the differences like v 2 -v 1 , v 3 -v 2 , etc. in the time series database 175 as long as the base value v 1 is available for reconstructing the actual values when needed.
- FIG. 5 is a block diagram of a data structure that stores events of potential interest detected in the time series data in accordance with some embodiments.
- the events data stored in the data structure 500 also has a hierarchical structure.
- the top level of the hierarchy corresponds to different sources 500 A, 500 B (e.g., different web sites), each source having a unique source ID 510 A.
- the source ID 510 A may be the same as the source ID 310 A in the aggregates database 165 and the source ID 410 A in the time series database 175 for the same source.
- Each event 510 B is associated with an event ID 510 C, a metric 520 A, one or more conditions 520 B, a pair of (time, value) 520 C wherein the value is the actual value for that time period, a pair of (minimum, maximum) 520 D wherein the minimum and maximum values are usually determined through one or more statistical models, a significance factor 520 E that indicates the interest level of this event to a web analyst, etc.
- a significance factor 520 E that indicates the interest level of this event to a web analyst, etc.
- time series gathering servers 170 for updating the time series database 175
- event detection servers 180 for updating the events database 185 .
- the initial setup of the analytics system 140 is completed and different components within the system 140 are in a normal operation mode.
- FIG. 6A is a flow chart of a process for updating the time series data using the aggregated data updates in accordance with some embodiments.
- the time series gathering servers 170 receive one or more aggregated data updates ( 610 ).
- an aggregated data update provides information about the user activities at one or more websites during the recent predefined time interval.
- the update may include a number of visits to a particular website or any other aggregated metrics that have been collected in the time series database 175 .
- the invention of this application is not limited to web traffic data. In fact, it can be used to identify or predict anomalies in almost any type of time series data.
- the updates are pulled out of the aggregates database 165 by the time series gathering server 170 .
- the aggregation servers 160 push the updates to the time series gathering servers 170 for further processing.
- the time series gathering servers 170 identify the time series in the database 175 for updating ( 620 ).
- the time series data in the time series database 175 are organized under different sources as different sets of metrics and conditions.
- the time series gathering servers 170 collect the aggregated data updates corresponding to different time series and then apply each of them to a corresponding time series in the database 175 .
- the metric and dimension attributes associated with different updates are part of the key for identifying the corresponding time series in the database 175 .
- the data structure of the aggregated data updates is similar to the data structure 300 in FIG. 3 .
- the time series gathering servers 170 For each source ID in the update, the time series gathering servers 170 find the corresponding entry in the data structure 400 in FIG. 4 that has the same source ID. Next, the time series gathering servers 170 update the identified time series using the data entries in the update ( 630 ) and consolidates the time series updates if predefined conditions are met ( 640 ).
- FIG. 6B is a block diagram of an exemplary process for updating a time series on a weekly basis in accordance with some embodiments.
- the updates to the time series database 175 happen on a daily basis and a time series consolidation process occurs every week.
- the time series 650 includes only one time series update 650 - 0 .
- the time series update 650 - 0 includes a plurality of (time, value) pairs, one pair per day and each value corresponding to an actual value for that day.
- the oldest entry of these (time, value) pairs may be dated a long time (e.g., two years) back and the newest entry (T N , V N ) is generated this Sunday.
- each time series is used for predicting one or more values at a future time under different prediction models.
- the daily time series are summed on a weekly basis to form a weekly time series, which may be further summed on a monthly basis to a monthly time series.
- both the weekly time series and the monthly time series are typically smoother than the corresponding daily time series during the same time period. As shown in FIGS. 11A to 11C , this could result that an anomaly identified in the daily time series does not have an anomaly in the corresponding week of the weekly time series or the corresponding month of the monthly time series.
- the time series gathering servers 170 receive a time series update 650 - 1 .
- this update is stored as a separate time series update entry 420 C in the data structure 400 without being combined with the time series update 650 - 0 . By doing so, it is convenient for the servers 170 to add and access new entries into the data structure 400 . This process repeats every day and a new time series update 650 - 2 to 650 - 6 are added to the time series 650 until the next Sunday.
- the time series gathering servers 170 Upon receiving a new update entry (T N+7 , V N+7 ) on the next Sunday, the time series gathering servers 170 determine that it is time to consolidate the time series updates accumulated during the past week. In some embodiments, the time series gathering servers 170 follows the first-in-first-out (FIFO) rule by eliminating the oldest seven (time, value) pairs ranging from (T 0 , V 0 ) to (T 6 , V 6 ) from the time series 650 and combining the newest seven (time, value) pairs ranging from (T N+1 , V N ⁇ 1 ) to (T N+7 , V N+7 ) with the time series 650 to form a new time series 655 that includes only one time series update 655 - 0 .
- FIFO first-in-first-out
- the time series gathering servers 170 maintain a sliding time window on a fixed length of time series data when determining the existence of any events of potential interest. It should be noted that the method of updating time series as described above in connection with FIG. 6B is for illustrative purposes. There are many other ways of managing the time series that are known in the art.
- an event of potential interest has a practical, meaningful value only if the corresponding web site has received a sufficient number of visits from a broad scope of visitors for a certain time period. For example, if a website only receives a handful (e.g., less than 10) of visits per day, a small, insignificant variation of user activities (e.g., an increase of daily visits from 10 to 30) could result in a false-alarm-like event of potential interest being detected by the event detection servers 180 . Too many false-alarm-like events of potential interest would likely make the actual events of interest less visible to the web analyst.
- the time series gathering servers 170 may set a threshold such that no time series is generated for a website until the website's associated web analytics data reaches the threshold.
- the threshold can be that a website receives at least 100 visits per day or 50 visits from distinct IP addresses. This lower-bound on the generation of time series reduces not only the statistical noise level of the detected events of potential interest but also the storage needed for storing the time series.
- the event detection servers 180 are responsible for identifying events of potential interest therein and populating the identified events in the events database 185 .
- the event detection servers 180 are responsible for identifying events of potential interest therein and populating the identified events in the events database 185 .
- model-based and (ii) rule-based two different ways of detecting events are at least (i) model-based and (ii) rule-based two different ways of detecting events, which will be described in more detail below.
- FIGS. 7A and 7B are flow charts of a model-based process for detecting events of potential interest in a time series in accordance with some embodiments.
- this process occurs periodically (e.g., every night).
- this process is performed in response to a user request from a client 110 .
- the event detection servers 180 work on the time series at a predefined time. After identifying and extracting a time series and its recent update from the time series database 175 ( 710 ), the event detection servers 180 make predictions for the time series using a plurality of prediction models.
- the event detection servers 180 have a time series of the last N days of numbers of visits to a website and the number of visits for the current day. Whether the number of visits for the current day is high or low enough to be qualified as an event of potential interest, the event detection servers 180 need to determine the trend of the number of visits at the website and use the trend to estimate a predicted number of visits for the current day using the time series of the last N days of numbers of visits (note that the value of N may vary for different forecasting models). Although many statistical models can be used to making the prediction. Two types of modeling techniques are described herein for illustration: (i) linear regression; and (ii) Holt-Winters exponential smoothing.
- linear regression is an approach of modeling a linear relationship between a dependent variable y and one or more independent variables x 1 , x 2 , . . . , x n , such that the linear model's unknown parameters can be estimated from the observed data. Assuming that the relationship between the number of visits (v i ) and the corresponding date (t i ) is linear, this relationship can be mathematically expressed as follows:
- v j ⁇ circumflex over ( ⁇ ) ⁇ t j + ⁇ circumflex over ( ⁇ ) ⁇ .
- s j represents the variance of the prediction using linear regression.
- the parameter ⁇ helps to define the amount of weight given to a past observation.
- the weight given to the observation at the k th day in the past from the current date is expressed as:
- Double exponential smoothing is used for making the forecasting to capture a trend in the time series, if there is any.
- Double exponential smoothing is given by the following formulas:
- the parameter ⁇ is set to be no greater than the parameter ⁇ .
- other non-linear statistical modeling schemes such as the triple exponential smoothing may be used to take care of the seasonality (also known as periodicity) in the time series data, which feature is typically prominent when a long time series is used for forecasting and the time series itself demonstrates some cyclic patterns.
- the seasonality also known as periodicity
- some websites such as a weather forecasting website usually receive more traffic every Friday of each week because many visitors are interested in learning the weather condition during the weekend. In this case, the number of visits to the website may show a fluctuating pattern on a weekly basis and the triply exponential smoothing may be more appropriate for capturing the trend accurately.
- the number of past observations or actual data samples used for predicting the future value affects the predicted value's sensitivity to the recent changes of the actual data samples.
- three time-window lengths i.e., 4 days, 21 days, and 56 days, are chosen as the numbers of past observations used for making separate predictions so as to capture both the recent changes of the actual samples and the long-term trends using different predictions if the predicted values are daily-based or weekly-based. If the predicted values are monthly-based, the three time-window lengths are respectively, 0.5 month, 3 months, and 8 months according to some embodiments.
- the length of a time window used for predicting a value at a future time determines whether the predicted value is more or less likely to be affected by a recent fluctuation in the time series.
- a prediction model that uses a longer time window considers more data samples into the past for forecasting a value in the future. This effect is similar to a low-pass filter such that the predicted outcome is less sensitive to the recent fluctuation in the time series and it is more likely to capture the trend in the time series.
- a prediction model based on a short time window uses fewer data samples to make the prediction and the predicted result is usually more sensitive to the recent fluctuation in the time series.
- a combination of the predicted values based on the different lengths of time series may result in a more reliable prediction that takes into account both the long-term and short-term features in the time series.
- the event detection servers 180 make nine predictions using the two modeling techniques and the three different lengths of time windows. For convenience, the nine predictions are expressed as:
- three out of the nine forecasted models are derived from linear regression and the other six models are from double exponential smoothing because three possible values ⁇ x 1 , x 2 , x 3 ⁇ , which are ranked in a monotonically increasing order, are candidates for each of the two parameters ⁇ and ⁇ .
- ⁇ is set to be no greater than ⁇ . Therefore, the three possible values ⁇ x 1 , x 2 , x 3 ⁇ produce six different combinations that correspond to the six models as follows:
- the event detection servers 180 compare the actual value of the current date with each of the six predictions ( 720 ). Based on the comparison result, the event detection servers 180 determine whether an event of potential interest is detected or not ( 740 ). For each determined event, the event detection servers 180 also give it a significance factor that indicates how unlikely the event is ( 750 ) and stores the event in the events database 185 ( 760 ). In general, the more unlikely the event is, the more interested the web analyst may be.
- the web analyst would probably like to investigate the cause behind this jump and find out, e.g., whether it relates to a potential hacker's attack or a successful commercial promotion that immediately preceded the event.
- the analytics system 140 presents to a user such as a web analyst a highly-reliable “roadmap,” with which the web analyst can quickly “plow” through a large amount of web traffic data and derive information valuable for improving the quality of service offered by the website.
- the event detection servers 180 then return to select the second model, [500, 154]. This time, the comparison indicates that the actual number 618 is within the scope of the second model ( 730 - 1 , yes) and the event detection servers 180 then go ahead working the next model under the last model is processed ( 730 - 3 , yes).
- three out of the six models i.e., [500, 154], [588, 112], and [693, 87] are satisfied by the actual number 618 and three other models, i.e., [344, 15], [402, 23], and [389, 73] are not satisfied by the actual number 618.
- the event detection servers 180 determine that the actual number of visits 618 is an event of potential interest ( 740 - 2 ) and chooses a significance factor for the event ( 740 - 3 ).
- the significance factor of an event is the significance factor of one of the unsatisfied prediction models such that (i) the actual number is more likely to satisfy this prediction model than any other unsatisfied prediction models and (ii) the actual number would satisfy more than half of all the prediction models by satisfying this prediction model and therefore no longer qualify as an event.
- the event detection servers 180 also use the models to predict the minimum and maximum of the expected value for that particular time period ( 740 - 4 ). This value gives a user a range of a normal value for that time period had there been no anomalous user activities.
- the predicted metric values according to different models are ordered by their magnitudes. For example, 10 models result in a sequence of 10 predicted values. Among the 10 predicted values, the second to the lowest value is chosen to be the minimum of the expected value and the second to the highest value is chosen to be the maximum of the expected value if the actual value is outside the range defined by the pair of (minimum, maximum). Otherwise, no minimum or maximum values are available for the corresponding event.
- rule-based event detection Compared with the model-based event detection that requires little user interaction, the rule-based event detection described below provides an end user with more control on what kind of user activities may be potentially “interesting” or valuable. Since these two approaches are often complimentary to each other, they may provide better outcomes if used in combination.
- FIG. 7C is a flow chart of a rule-based process for detecting events of potential interest in a time series in accordance with some embodiments.
- the event detection servers 180 identify one or more alert rules ( 770 ) in the alert rules database 195 .
- the event detection servers 180 query the alert rules database 195 for any alert rules that may be applicable to the time series associated with the data source.
- the alert rules database 195 stores a plurality of user-specified event triggering conditions that different users enter through a graphical user interface at a client 110 , an example of which is described below in connection with FIG. 12E .
- the alert rules may be stored in the same database as the dataset segment schemes supported by the analytics system 140 .
- the event detection servers 180 select one of the identified alert rules ( 772 ) and apply the alert rule to the time series database 175 to identify those time series, if any, that satisfy the alert rule ( 774 ) and store them in the events database 195 as trigging events ( 778 ). For example, if the time series is a sequence of numbers of visits from visitors in China, the application of an alert rule that triggers an event if the visits from China increase by 10% would be appropriate (although the time series may fail to trigger such event if the recent time series update does not show at least 10% increase of visits). In contrast, another alert rule that triggers an event if the visits from Brazil drop 5% would not be applicable.
- the event detector servers 180 repeat the aforementioned process until the last alert rule associated with the data source has been processed ( 780 , yes). In some embodiments, these triggering events will be shown to a user through a graphical user interface per the user's request. In some other embodiments, the analytics system 140 also notifies the user of the triggering event through other communication channels such as email, text messaging, voicemail, etc.
- the aforementioned description focuses primarily on how the analytics system 140 detects events of potential interest in the collected web analytics data through data aggregation and time series data analysis.
- the following description shifts its focus on how the events of potential interest are served to the users of the analytics system 140 in a client-server environment like the one shown in FIG. 1B .
- FIGS. 8A and 8B are flow charts illustrating how the analytics system prepares and serves a report of events of interest in response to a user request in accordance with some embodiments.
- a user submits a request for viewing an event report for a particular web site.
- the client 110 Upon receipt of the user request ( 802 ), the client 110 generates a request for the event report to the analytics system 140 ( 804 ).
- the client request is an HTTP request.
- the query processor 190 in the analytics system 140 transforms the client request into one or more queries to the events database 185 and submits them to the database ( 810 ).
- the events database 185 identifies the corresponding events data records (if any) ( 814 ) and returns them to the query processor 190 for preparing a response to the client request ( 816 ).
- the request from the client 110 includes a range of dates and a sensitivity level for querying the events database ( 814 - 1 ).
- the events database 185 chooses one of the dates for further processing ( 814 - 2 ).
- the further processing includes retrieving events associated with the chosen date ( 814 - 3 ); identifying and counting the events whose respective significance factors are at least equal to or higher than the user-specified sensitivity threshold ( 814 - 4 ); and generating a dataset segment scheme for event identified event ( 814 - 5 ).
- the events database 185 After looping through all the dates ( 814 - 6 , yes), the events database 185 returns the information about the identified events to the query processor 190 .
- the query processor 190 it compiles an event report using the events information returned from the events database 185 ( 818 ) and then returns the report to the client 110 ( 820 ). Upon receiving the event report ( 822 ), the client 110 displays the report to the user ( 824 ). Exemplary screenshots of the graphical user interface for displaying the event reports are described below in connection with FIGS. 11A to 11C .
- FIG. 9 is a block diagram of a client device used by, e.g., a web analyst, for requesting and rendering web analytics reports in accordance with some embodiments.
- the client 110 generally includes one or more processing units (CPU's) 902 , one or more network or other communications interfaces 904 , memory 912 , and one or more communication buses 914 for interconnecting these components.
- the communication buses 914 may include circuitry (sometimes called a chipset) that interconnects and controls communications between components.
- the client 110 may optionally include a user interface 905 , for instance, a display 906 , a keyboard and/or mouse 908 , and a touch-sensitive surface 909 .
- Memory 912 may include high speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may also include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
- Memory 912 may include mass storage that is remotely located from the central processing unit(s) 902 .
- Memory 912 or alternately the non-volatile memory device(s) within memory 912 , comprises a computer readable storage medium.
- Memory 912 or the computer readable storage medium of memory 912 stores the following elements, or a subset of these elements, and may also include additional elements:
- FIG. 10 is a block diagram of an analytics system for processing web traffic data, identifying events of potential interest therein, and serving web analytics reports in response to user requests in accordance with some embodiments.
- the analytics system 140 generally includes one or more processing units (CPU's) 1002 , one or more network or other communications interfaces 1004 , memory 1012 , and one or more communication buses 1014 for interconnecting these components.
- the analytics system 140 may optionally include a user interface 1005 comprising a display device 1006 and a keyboard 1008 .
- Memory 1012 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 1012 may optionally include one or more storage devices remotely located from the CPU(s) 1002 . Memory 1012 , or alternately the non-volatile memory device(s) within memory 1012 , comprises a computer readable storage medium. Memory 1012 or the computer readable storage medium of memory 1012 stores the following elements, or a subset of these elements, and may also include additional elements:
- Each of the above-identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
- the above identified modules or programs i.e., sets of instructions
- memory 912 and 1012 may store a subset of the modules and data structures identified above.
- memory 912 and 1012 may store additional modules and data structures not described above.
- FIGS. 9 and 10 are intended more as functional descriptions of the various features of a client device and analytics system rather than a structural schematic of the embodiments described herein.
- items shown separately could be combined and some items could be separated.
- some items shown separately in FIG. 10 like the query processor 190 and the server application 194 as well as items like the databases 155 to 195 could be implemented by one or more servers.
- the actual number of server computers used to implement the analytics system 140 , and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
- FIGS. 11A to 11C are screenshots of graphical user interfaces that display daily, weekly, and monthly events of potential interest, respectively, in accordance with some embodiments.
- FIG. 11A depicts a daily alerts graphical user interface 1102 during a 30-day period from Sep. 15, 2009 to Oct. 15, 2009.
- the user interface by default displays the daily alerts when the user clicks the entry 1100 .
- Below the daily visits curve 1101 is a bar chart 1104 illustrating the respective total number of events of potential interest during the 30-day period, each day occupying one clickable spot in the bar chart 1104 .
- the user interface automatically focuses on the entry on the far right of the bar chart, which corresponds to the current date, Oct. 15, 2009. But a user can click on other parts of the bar chart 1104 to investigate the alert information for any other day within the last 30 days.
- the total number of events 1106 for the date of Oct. 15, 2009 (referred to as “alerts” in the figure) is zero.
- the analytics system 140 does not identify any anomalous user activity patterns for that day under the current sensitivity level 1112 .
- the custom alerts region 1108 which is associated with the “Custom Alerts” checkbox 1103 and used for displaying those alert rule-based events
- the automatic alerts region 1110 which is associated with the “Automatic Alerts” checkbox 1105 and used for displaying those model-based events, are both empty. Note that a de-selection of either checkboxes 1103 or 1105 removes the corresponding alert regions 1108 or 1110 from the graphical user interface.
- FIG. 11B depicts a weekly alerts graphical user interface 1120 for the past five weeks from Sep. 13, 2009 to Oct. 15, 2009, e.g., after a user selection of the “Weekly Alerts” link 1124 on the left of the user interface.
- the current week of Oct. 11-15, 2009 is highlighted in the user interface.
- a user can click on the bar chart below the curve 1122 to select another week of data. Note that there is no alert for the current week including Oct. 15, 2009 because it is not over yet and the forecasting of the present application is for the most recently completed week.
- the curve 1122 which corresponds to roughly the same period of time, is smoother because, as explained above, the weekly summation of the daily data samples acts as a low-pass filter. As a result, the number of weekly alerts during each week is typically smaller than the sum of daily alerts during the same week. This also applies to the monthly alert described below in connection with FIG. 11C .
- This user interface is similar to the one shown in FIG. 11A except that the total numbers of data samples as shown in the curve 1122 drop from 30 (which corresponds to the last 30 days from Sep. 15, 2009 to Oct. 15, 2009) to 5 (which corresponds to the last five weeks from September 13, 2009 to October 15, 2009). In this example, the number of alerts for the week of Oct. 11-15, 2009 remains to be zero under the current sensitivity level.
- FIG. 11C depicts a monthly alerts graphical user interface 1140 for the past 12 months from Oct. 1, 2008 to Oct. 15, 2009, after a user selection of the “Monthly Alerts” 1144 on the left.
- This user interface is similar to the one shown in FIG. 11A except that the total numbers of data samples as shown in the curve 1142 drop from 30 to 12. In this example, the number of alerts for the month of Oct. 1-15, 2009 remains to be zero under the current sensitivity level.
- FIGS. 12A to 12E are screenshots of graphical user interfaces that displays information relating to events of potential interest in accordance with some embodiments.
- FIG. 12A depicts the same daily alerts 1102 shown in FIG. 11A but at a different date, Sep. 30, 2009.
- the number of alerts 1204 on Sep. 30, 2009 at the current sensitivity level 1212 is three.
- the custom alerts region 1206 is empty and all the three alerts are model-based automatic alerts.
- one of the alerts 1208 suggests a significant (83%) drop of bounce rate for visits that exit from a particular web page 1209 from the expected range of 34.26%-39.96% to 6.29% .
- a visual indication 1211 of the alert's significance factor is also shown in the same row, indicating how unlikely this alert is under a normal situation.
- Two alerts 1210 , 1214 are grouped together under the label “Visits.” Note that although these two alerts are both related to the number of visits to the website (in this case, www.googlestore.com), they have different conditions and therefore have different meanings.
- the alert 1210 indicates that the number of visits to the website that exit the website from the web page www.googlestore.com/default.asp” during Sep. 30, 2009 increased more than 500% when compared with the median value derived from the multiple prediction models.
- the expected range from 0 to 458 is determined using the method described above in connection with FIGS. 7A and 7B .
- the alert 1214 indicates that the number of visits to the website that were referred to the website from the web page “www.google.com/intl/en/about.html” during Sep. 30, 2009 increased more than 281% when compared with the median value derived from the multiple prediction models. This may be because that the referral web page has a link to the website www.goolestore.com and many users who visit Google's website found that link and then clicked it through.
- the reference value used for measuring the percentage may be the actual value of the immediately preceding time period, the averaged actual value derived from multiple time periods in the past, the mean of the expected range or other reference values that are well-known in the art.
- FIG. 12B depicts a graphical user interface 1220 when the user-selected date moves from Sep. 30, 2009 to Oct. 14, 2009. Note that the number of alerts for the new dates increases to 20. Moreover, one of the 20 alerts is a custom alert 1226 called “revenue decrease.” A user selection of the edit link 1228 brings up the definition of the custom alert as shown in FIG. 12E . According to the definition, this alert is triggered when the revenue from all traffic to the website drops more than 10% from the same day of the previous week. In other words, the revenue on Oct. 14, 2009 is less than 90% of the revenue on Oct. 7, 2009.
- FIG. 12C depicts the same user interface after a user selection of the curve link 1248 next to the first automatic alert 1244 , which indicates a dramatic increase of goal conversion rate of the total traffic.
- the rate was almost zero for the entire month until a sudden jump on Oct. 14, 2009.
- This curve also explains how the jump is detected as an alert. Using this alert as a lead, the web analyst can investigate the type of traffic on the same date and research what triggers the sudden jump of goal conversion rate.
- FIG. 12D depicts a graphical user interface for defining a dataset segment scheme in response to a user selection of the “Create segment” link 1242 in FIG. 12C .
- a more detailed description of the dataset segment scheme can be found in the pending U.S. patent application Ser. Nos. 12/575,435 and 12/575,437, both of which are incorporated into this application by reference in their entirety. Note that this feature allows a user to revisit the dataset through the same visualization angle in the future without relying on the events report, which is very useful for helping a user to understand the dataset.
- FIGS. 13A to 13C are screenshots of graphical user interfaces that display different numbers of events of potential interest based on a respective user-specified sensitivity threshold in accordance with some embodiments.
- FIG. 13A depicts the alerts bar chart when the sensitivity level is about in the middle level 1310 .
- FIG. 13B depicts the alerts bar chart when the sensitivity level reaches the highest level 1320 .
- the analytics system 140 reports not only more (12 of FIG. 13B vs. 3 of FIG. 13A ) alerts or events of potential interest for the same date, Sep. 30, 2009, but also one or more alerts for many other dates that have no alerts reported in FIG. 13A .
- FIG. 13C depicts the alerts bar chart when the sensitivity level reaches the lowest level 1330 . In this case, the analytics system 140 reports zero alert for the same date, Sep. 30, 2009.
- FIGS. 14A and 14B are screenshots of graphical user interfaces that display events of potential interest based on a respective user-specified organization manner in accordance with some embodiments.
- FIG. 14A depicts a graphical user interface in which the alerts are displayed in an order defined by dimension 1410 such as the All Traffic 1412 and the Visitor 1414 and then by different metrics within the same dimension.
- FIG. 14B depicts a graphical user interface in which the alerts are displayed in an order defined by metric 1420 such as the Goal Conversion Rate 1422 and then by different dimensions within the same metric.
- FIG. 15A depicts a flow chart of a method for identifying anomalies in time series data in accordance with some embodiments.
- the server system stores time series data for a data source ( 1501 ).
- the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value.
- the server system For a particular attribute, the server system generates a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data ( 1503 ).
- each forecasting model includes an estimated attribute value and an associated error-variance.
- the server system determines whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value and tags the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models ( 1505 ).
- the sever system reports to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes ( 1507 ).
- the respective time-value pair for the particular attribute is the latest time-value pair from the data source.
- the first subset of the forecasting models comprises one of: a predetermined number of the forecasting models or a predetermined fraction of the forecasting models.
- the server system determines a significance factor ( 1511 ).
- the significance factor is chosen such that, when the error-variance for each of the forecasting models is multiplied by the significance factor, the value of the time-value pair is inside the factored error-variance of a corresponding estimated metric value for at least a second subset of the forecasting models and the first subset is within the second subset.
- the server system In response to the request from the client application for analytics information that includes a significance threshold for one or more of the attributes, the server system reports to the client application those time-value pairs tagged as anomalies when the respective significance factor for each of the time-value pairs exceeds the significance threshold ( 1513 ).
- the forecasting models include at least one of a linear regression model and a Holt-Winters exponential smoothing model.
- the forecast models include models computed from 4, 21, and 56 days of time-series data.
- the time series data includes aggregated web analytics data, the method further comprising: aggregating raw or sessionized web traffic data to generate the aggregated web analytics data for attributes of interest and storing the aggregated web analytics data in addition to the raw or sessionized web traffic data.
- the time series data includes sessionized web analytics data, the method further comprising: summarizing per session raw web traffic data to generate the sessionized time series data for one or more of the attributes storing the sessionized time series data in addition to the raw web traffic data.
- FIG. 16A depicts another flow chart of a method for identifying anomalies in time series data implemented by different components of a server system with a processor and memory in accordance with some embodiments.
- a time series data collector of the server system is configured to collect time series data at one or more predefined time intervals from a plurality of data sources ( 1601 ).
- the time series data comprises a plurality of time-value pairs, each pair including a value of one attribute associated with the data sources and a time when the value was collected.
- a time series storage module of the server system is configured to store the collected time series data in a computer memory such that, when a new time-value pair is collected by the time series data collector, the new time-value pair is added to the stored time series data for a respective collection of time series data without disturbing the previously stored time series data for the respective collection ( 1603 ).
- an anomaly detection module of the server system is configured to determine whether the particular new time-value pair is an anomaly with reference to its associated collection of time series data ( 1605 ).
- this operation further includes: generating a plurality of forecasting models characterizing different subsets of the associated collection of time series data ( 1605 - 1 ), each forecasting model including an estimated attribute value and an associated error-variance; determining whether the particular new time-value pair is within the associated error-variance for each of the plurality of forecasting models ( 1605 - 3 ); and tagging the particular time-value pair as an anomaly when the value of the particular time-value pair is outside the error-variance for at least a first subset of the forecasting models ( 1605 - 5 ).
- an anomaly storage module of the server system is configured to store the time-value pairs tagged as anomalies such that the stored time-value pairs are ready to be served to a user at a client application in response to a user request for the anomalies.
- the server system also includes an aggregation module configured to generate aggregated time series data from the collected time series data ( 1611 ).
- the aggregate time series summarizes raw time series data or sessionized time series data for particular attributes of interest associated with the data sources, the aggregate data being stored by the time series storage module in addition to stored raw time series data or sessionized time series data.
- the anomaly detection mechanism operates solely on the aggregated time series data generated by the aggregation module.
- the data sources are web pages stored on web servers and the collected time series data comprises values of metrics and dimensions for the web pages and associated time values when the values of the metrics and dimensions were collected.
- the predefined time intervals are no longer than a day.
- the time series storage module is further configured to quantize and compress the time series data before storing it so as to save more space.
- the collection of time series data includes a number of time-value pairs that is used for generating the plurality of forecasting models and the forecasting models include at least one of a linear regression model and a Holt-Winters exponential smoothing model.
- FIG. 17A depicts another flow chart of a method for detecting anomalies in web analytics data implemented at a server system in accordance with some embodiments.
- the server system stores web analytics data for a web page in a device ( 1701 ).
- the web analytics data comprises a plurality of prior time-value pairs, each time-value pair including a value of one of a plurality of attributes associated with the web page and a time associated with the value.
- the server system collects a new time-value pair for the particular attribute ( 1703 ).
- the new time-value pair includes a new value associated with the web page and a new time when the value was determined.
- the server system estimates a set of predicted values for the attribute and associated error-variances at the new time by applying a plurality of forecasting models to the plurality of prior time-value pairs in respective subsets of the web analytics data ( 1705 ).
- the server system tags the collected new time-value pair as an anomaly when the value of the new time-value pair is outside the error variance of each of a first subset of the forecasting models for the particular attribute ( 1707 ).
- FIG. 17B depicts that the server system adds to the collected web analytics data for the web page the new time-value pair ( 1711 ).
- the time-value pair includes a tag indicating whether the new value is an anomaly and a significance factor if the new value is an anomaly.
- FIG. 17C depicts that the server system storing the web analytics data for a fixed time window into the past ( 1721 ). After estimating the set of predicted values and associated error-variances for the attribute at the new time, the server system deletes one or older time-value pairs from previously collected time series data ( 1723 ) and appends the new time-value pair to the end of the collected web analytics data ( 1725 ).
- the attributes comprise a plurality of metrics and dimensions associated with the web site.
- the graphical user interface for presenting time series data and anomalies for a data source includes a first window and a second window below the first window.
- the first window includes a graph of time series data values for a first attribute for the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and a histogram of anomalies for the data source, with the same time axis scale as the graph and a dependent total anomalies axis. Note that the height of a respective bar along the total anomalies axis in the histogram represents the total number of anomalies for the web site at a particular day.
- the second window includes a list of items characterizing a set of anomalies at a particular time on the time axis, each item corresponding to an anomaly associated with a respective attribute for the data source, a value of the respective attribute at the particular time, and a significance factor of the anomaly, and a user-interactive object for adjusting a sensitivity threshold associated with the first window and the second window.
- a new histogram of anomalies for the data source is rendered to replace the existing histogram of anomalies for the data source in the first window.
- a new list of items characterizing a new set of anomalies at the particular time is rendered to replace the existing list of items.
- stages which are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art and so do not present an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A graphical user interface is used for presenting time series data and anomalies associated with a data source on a computer display. The graphical user interface has first and second windows. The first window includes a graph of time series data values for an attribute of the data source and a histogram of anomalies of the data source, each corresponding to a value of a respective attribute that is substantially different from an expected value of the attribute. The second window includes a list of automatic alerts characterizing a set of anomalies of the data source at a particular time. In response to a user adjustment of a sensitivity threshold, a new histogram of anomalies is rendered to replace the existing histogram of anomalies in the first window and a new list of automatic alerts is rendered to replace the existing list of automatic alerts in the second window.
Description
- This application claims priority under 35 U.S.C. 119(e) to U.S. Provisional Patent Application 61/253,472 filed Oct. 20, 2009, which is hereby incorporated by reference in its entirety.
- The disclosed embodiments relate generally to web analytics data mining, and in particular, to a system and method for detecting and displaying events of potential interest in time series data.
- Web analytics is the measurement, collection, analysis and reporting of the traffic data of a web site for purposes such as understanding and optimizing web site usage. The traffic data is typically organized in the form of one or more multidimensional datasets whose metadata may include multiple dimensions and metric attributes (also known as “measures”). Conventional approaches typically generate multiple (sometimes hundreds of) reports by focusing on the factual aspects of the web traffic, e.g., by visualizing different subsets of a multidimensional dataset defined by various configurations of dimensions and metric attributes. From examining the visualized traffic data, a web analyst may be able to discover useful information for improving the quality and volume of the traffic to the web site. But this exercise of searching for useful information within the multidimensional dataset is non-trivial especially if the volume of the traffic data is significant or the metadata includes a large number of dimensions and metric attributes that may correspond to hundreds or even thousands of configurations. Because different configurations correspond to different factual aspects of the dataset, it is difficult to rank the configurations by their respective importance to the web analyst based on a well-accepted standard.
- In accordance with some embodiments described below, a computer-implemented method for detecting anomalies in time series data at a server system is disclosed. The server system is connected to one or more client devices through a network. The server system stores time series data for a data source. The time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value. For a particular attribute, the server system generates a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance. For a respective time-value pair associated with the particular attribute, the server system determines whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value and tags the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models. In response to a request from a client application for analytics information for the data source, the server system reports to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes.
- In accordance with some embodiments described below, a server system for identifying anomalies in time series data is disclosed. The server system is connected to one or more client devices through a network. The server system includes one or more processors for executing programs and memory to store data and to store one or more programs to be executed by the one or more processors. The one or more programs including instructions for: storing time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute: determining whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value; and tagging the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models; and in response to a request from a client application for analytics information for the data source, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes.
- In accordance with some embodiments described below, a computer readable-storage medium stores one or more programs for execution by one or more processors of a server system. The server system is connected to one or more client devices through a network. The one or more programs include instructions for: storing time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute: determining whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value; and tagging the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models; and in response to a request from a client application for analytics information for the data source, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes.
- In accordance with some embodiments described below, a graphical user interface is disclosed for presenting time series data and anomalies for a data source on a display of a client computer having a user input device. The graphical user interface includes a first window and a second window below the first window on the display. The first window on the display includes: a graph of time series data values for a first attribute for the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and a histogram of anomalies for the data source, each of the anomalies corresponding to a value of an attribute that is substantially different from an expected value of the attribute, the histogram having the same time axis scale as the graph and a dependent total anomalies axis. The height of a respective bar along the total anomalies axis represents a total number of anomalies for the data source at a corresponding time on the time axis. The second window on the display includes a list of automatic alerts characterizing a set of anomalies for the data source at a particular time on the time axis. The particular time is designated by a user via interaction with the graph through the user input device and each item of the list of automatic alerts corresponds to an anomaly associated with a respective attribute for the data source.
- The aforementioned embodiment of the invention as well as additional embodiments will be more clearly understood as a result of the following detailed description of the various aspects of the invention when taken in conjunction with the drawings. Like reference numerals refer to corresponding parts throughout the several views of the drawings.
-
FIG. 1A is an overview block diagram of an analytics system for collecting web traffic data and performing web analytics on the data in accordance with some embodiments. -
FIG. 1B is an overview block diagram of the analytics system for preparing and providing user-requested web analytics results to the users at different clients accordance with some embodiments. -
FIG. 2 is a block diagram of a data structure used in thehits database 155 to store sessionized web traffic data at different web sites in accordance with some embodiments -
FIG. 3 is a block diagram of a data structure used in theaggregates database 165 to store aggregated web traffic data at different web sites in accordance with some embodiments. -
FIG. 4 is a block diagram of a data structure used in thetime series database 175 to store time series data extracted from the aggregated web traffic data in accordance with some embodiments. -
FIG. 5 is a block diagram of a data structure used in theevents database 185 to store events of potential interest detected in the time series data in accordance with some embodiments. -
FIG. 6A is a flow chart of a process for updating the time series data using the aggregated data updates in accordance with some embodiments. -
FIG. 6B is a block diagram of an exemplary process for updating a time series on a weekly basis in accordance with some embodiments. -
FIGS. 7A and 7B are flow charts of a model-based process for detecting events of potential interest in a time series in accordance with some embodiments. -
FIG. 7C is a flow chart of a rule-based process for detecting events of potential interest in a time series in accordance with some embodiments. -
FIGS. 8A and 8B are flow charts illustrating how the analytics system prepares and serves a report of events of interest in response to a user request in accordance with some embodiments. -
FIG. 9 is a block diagram of a client device for requesting and rendering web analytics reports in accordance with some embodiments. -
FIG. 10 is a block diagram of an analytics system for processing web traffic data, identifying events of potential interest therein, and serving web analytics reports in response to user requests in accordance with some embodiments. -
FIGS. 11A to 11C are screenshots of graphical user interfaces that display daily, weekly, and monthly events of potential interest, respectively, in accordance with some embodiments. -
FIGS. 12A to 12E are screenshots of graphical user interfaces that displays information relating to events of potential interest in accordance with some embodiments. -
FIGS. 13A to 13C are screenshots of graphical user interfaces that display different numbers of events of potential interest based on a respective user-specified sensitivity threshold in accordance with some embodiments. -
FIGS. 14A and 14B are screenshots of graphical user interfaces that display events of potential interest based on a respective user-specified organization manner in accordance with some embodiments. -
FIGS. 15A and 15B depict a flow chart of a method for identifying anomalies in time series data in accordance with some embodiments. -
FIGS. 16A and 16B depict another flow chart of a method for identifying anomalies in time series data implemented by different components of a server system with a processor and memory in accordance with some embodiments. -
FIGS. 17A to 17C depict another flow chart of a method for detecting anomalies in web analytics data implemented at a server system in accordance with some embodiments. - Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the embodiments, it will be understood that the invention is not limited to these particular embodiments. For example, although the embodiments below use web analytics for illustrative purposes. It will be apparent to those skilled in the art that the inventions disclosed in this application can be used to analyze almost any type of time series data regardless of whether the time series data is web-related or not. On the contrary, the invention includes alternatives, modifications and equivalents that are within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
-
FIG. 1A illustrates a distributedcomputer system 100 in accordance with some embodiments. The distributedsystem 100 includes one ormore web servers 120 that host web sites and serve web pages upon receiving requests fromclients 110. In some embodiments, theweb servers 120 collect web traffic data inlogfiles 130. In some other embodiments, the web pages hosted by theweb servers 120 include one or more embedded computer programs such as Javascript codes for capturing the web traffic data. When a user requests and downloads the web pages to aclient 110, the embedded computer programs also reside in theclient 110 and monitor the user's activities on the web pages. This approach can avoid some web caching-related issues and is sometimes referred to as “page tagging.” In some embodiments, aweb server 120 may employ both mechanisms for gathering web traffic data. - The distributed
system 100 includes ananalytics system 140 that includes alog processor 150 for extracting web page hit data from thelogfiles 130 or receiving web page hit data captured by the embedded computer programs from theclients 110 and storing the hit data in ahits database 155. One ormore aggregation servers 160 process the hit data and generate aggregated web analytics data that is stored inaggregates database 165. The timeseries gathering servers 170 extract or receive newly aggregated data from theaggregates database 165 and create or update a plurality of time series for each web site, which are stored in thetime series database 175. In some embodiments, the timeseries gathering servers 170 also extract web analytics data from thehits database 155. One or moreevent detection servers 180 process the time series in thedatabase 175 at regular time interval (e.g., nightly, weekly or monthly) to detect events of potential interest therein and store the events in theevents database 185. In some embodiments, the event detection process is a rule-based one in which theevent detection servers 180 extract user-specified alert rules from thealert rules database 195. Theanalytics system 140 includes aquery processor 190 for accessing theaggregates database 165, thetime series database 175, and theevents database 185, and returning the query results as web analytics reports to users of the analytics system 140 (who use the analytics system to track the visitors' activities at one or more of their web sites). If the user-requested data has not been aggregated, thequery processor 180 reads the raw hits data in real time and computes the desired aggregates from it. - In some embodiments, the
analytics system 140 processes and returns a set of the web analytics reports that correspond to a desired data view specified by a user. In some embodiments, theanalytics system 140 identifies those hits in thehits database 155 that are context-insensitive and processes these hits to incrementally update a first plurality of aggregate tables in theaggregates database 165. Theanalytics system 140 identifies those hits in thehits database 155 that are context-sensitive and processes these hits to incrementally update a second plurality of aggregate tables using the second context-sensitive entries, but only, at the end of the specified period of time, such as at the end of the day. Doing so speeds up the incremental updates to more than 90% of the data, as discussed below. - The distributed
system 100 also includes a plurality ofdata servers 106 that store one or more data structures, such as tables, that may be used by theanalytics system 140 for storage. In some embodiments, thedata servers 106 store thelogfiles 130, thehit data 155, theaggregate data 165, thetime series data 175, and/or theevents data 185. In some embodiments,data servers 106 are clustered in a data center or in two or more interconnected data centers. In some embodiments, the distributedsystem 100 includes as many as 1000 data servers or more. The various components of the distributedsystem 100 are interconnected by anetwork 102. Thenetwork 102 may be any suitable network, including but not limited to a local area network (LAN), a wide-area network (WAN), the Internet, an Ethernet network, a virtual private network (VPN), or any combination of such networks. Thenetwork 102 can be wired or wireless. In some embodiments, thenetwork 102 uses the HyperText Transport Protocol (HTTP) and the Transmission Control Protocol/Internet Protocol (TCP/IP) to transport information between different networks. The HTTP permits client devices to access various information items available on the Internet via thenetwork 102. The various embodiments of the invention, however, are not limited to the use of any particular protocol. - Typically, where an individual visitor directly accesses a web page served by a
web server 120, the log data entry (stored in one or more databases represented bylogfiles 130 or captured by the computer program embedded in the web page) records multiple variables about the visits, typically including the IP address, the user agent, the web page viewed, the time and date that the web page was accessed and a status field. Each data entry in a log file represents a single “hit” on a file hosted by aweb server 120, and consists of a number of fields (explained below in connection withFIG. 2 ). Any server request is considered a hit. For example, when a visitor calls up a web page with six images, that is seven hits—one for the page, and six for the images. - In other circumstances, the visitor may have employed a query in a search engine and the web-site under scrutiny was turned up in the search results. In such case, the corresponding entry in the log data may reveal a “reference” and the “search term” entered by the visitor. In some circumstances, the visitor is not an individual, but rather a software process such as an Internet robot, web crawler or spider, link checker, mirror agent, hacker, or other such entity used to systematically peruse vast amounts of data available via the
network 102. The log data entry corresponding to such accesses may display an IP address, host name and/or user agent that may be associated with such entities. - Another type of data that may be recorded in a
log file 130 is a session identifier or session ID, which is a unique identifier (such as, a fixed-length alphanumeric string) that a web server assigns to a specific user for the duration of that user's visit and that identifies the user's session (maybe a series of related message exchanges). Session identifiers become necessary in cases where the communications infrastructure uses a stateless protocol such as HTTP. For example, a buyer who visits a seller's web site wants to collect a number of articles in a virtual shopping cart and then finalize the shopping transaction by going to the site's checkout page. This typically involves an ongoing communication including several web pages requested by theclient 110 and sent back by theserver 120. In such a situation, it is vital to keep track of the current state of the shopper's cart, and a session ID is one way to achieve that goal. - A session ID is typically granted to a visitor on his first visit to a web site. It is different from a user ID because sessions are typically short-lived (they expire after a preset time of inactivity which may be minutes or hours) and may become invalid after a certain goal has been met (for example, once the buyer has finalized his order, he can not use the same session ID to add more items).
-
FIG. 1B illustrates the distributedsystem 100 with an emphasis on the client-server interactions in accordance with some embodiments. A client 110 (also known as a “client device”) may be any computer or similar device through which a user of theclient 110 can submit data access requests to and receive results or other services from theanalytics system 140. Examples include, without limitation, desktop computers, laptop computers, tablet computers, mobile devices such as mobile phones, personal digital assistants, set-top boxes, or any combination of the above. Arespective client 110 may contain at least oneclient application 112 for submitting requests to theanalytics system 140. For example, theclient application 112 can be a web browser or other type of application that permits a user to access the services provided by theanalytics system 140. - In some embodiments, the
client application 112 includes one ormore client assistants 114. Aclient assistant 114 can be a software application that performs tasks related to assisting a user's activities with respect to theclient application 112 and/or other applications. In some embodiments, aclient assistant 114 includes a local copy of the executable version of the embedded computer programs for collecting web analytics data relating to web pages from a particular web site. For example, theclient assistant 114 may assist a user at theclient 110 with browsing information (e.g., web pages), processing information (e.g., query results) received from theanalytics system 140, and monitoring the user's activities on the query results. In some embodiments, theclient assistant 114 is embedded in a web page (e.g., a query results web page) or other documents downloaded from theanalytics system 140. In some embodiments, theclient assistant 114 is a part of the client application 112 (e.g., a plug-in application of a web browser). Theclient 110 further includes acommunication interface 118 to support the communication between theclient 110 and other devices (e.g., theanalytics system 140 or another client 110). - In some embodiments, the
query processor 190 includes a web interface 192 (sometimes referred to as a “front-end server”) and a server application 194 (sometimes referred to as a “mid-tier server” or “mid-tier API”). Theweb interface 192 receives data access requests fromclient devices 110 and forwards the requests to theserver application 194. In response to receiving the requests, theserver application 194 processes the requests including generating database queries associated with a request, applying the queries to different databases for data requested by the client, and returning the query results to the requestingclients 110. After receiving a result, theclient application 112 at aparticular client 110 displays the result to the user who submits the original request. - In some embodiments, each of the databases shown in
FIGS. 1A and 1B is effectively a database management system including a database server that is configured to manage a large number of data records stored in the corresponding database. In response to a query submitted by theserver application 194, the database server identifies zero or more data records that satisfy the query and returns the data records to theserver application 194 for further processing. In some embodiments, theanalytics system 140 is an application service provider (ASP) that provides web analytics services to its customers (e.g., a web site owner) by visualizing the web traffic data generated at a web site in accordance with various user requests. -
FIG. 2 is a block diagram of a data structure used in thehits database 155 to store sessionized web traffic data at different web sites in accordance with some embodiments. The web traffic data stored in thedata structure 200 have a hierarchical structure. The top level of the hierarchy corresponds todifferent web sites multiple sessions web site 200A for the duration of that user's visit. Within asession 210A, other session-level attributes include theoperating system 220B (i.e., the operating system the computer runs on from which the user accesses the web site), thebrowser name 220C (i.e., the web browser application used by the user for accessing the web site) and thebrowser version 220D, geographical information of the computer such as thecountry 220E and thecity 220F, etc. - For convenience and custom, the web traffic data within a user session (or a visit) is further divided into one or
more hits 230A to 230N. Note that the terms “session” and “visit” are used interchangeably throughout this application. In the context of web traffic, a hit typically corresponds to a request to a web server for a document such as a web page, an image, a JavaScript file, a Cascading Style Sheet (CSS) file, etc. Each hit 230A may be characterized by attributes such as the type ofhit 240A (e.g., transaction hit, etc.), thereferral URL 240B (i.e., the web page the visitor was on when the hit was generated), thetimestamp 240C that indicates when the hit occurs and so on. Note that the session-level and hit-level attributes as shown inFIG. 2 are listed for illustrative purposes only. As will be shown in the examples below, a session or a hit of web traffic data may include many other attributes that either exist in the raw traffic data (e.g., the timestamp) or can be derived from the raw traffic data by the analytics system 150 (e.g., the average pageviews per session). - As noted above in connection with
FIG. 1A , theaggregation servers 160 is responsible for aggregating the data records in thehits database 155 at a regular time interval (e.g., per day or per hour) based on their respective session IDs and other dimension or metric attributes. For example, theaggregation servers 160 may determine the total number of visits to a web site during one day by counting the number of sessions associated with the web site for the same day. Theaggregation servers 160 may also determine the total number of visits to a web site using a particular type or even version of web browser during one day by counting the number of sessions associated with the web site for the same day that have the specified type or even version of web browser. In some embodiments, theaggregation servers 160 determine values for hundreds or even thousands of predefined attributes based on the hits data records and store the determined values and their associated attributes in a data structure like the one shown inFIG. 3 in accordance with some embodiments. - In some embodiments, the aggregated data stored in the
data structure 300 also has a hierarchical structure. The top level of the hierarchy corresponds todifferent sources unique source ID 310A. For each source, there are at least two types of aggregated data. The aggregatedmetrics 310B include those attributes and associated values that are determined from the hits data for a predefined period of time without applying any restrictions. For example, if the predefined period of time is one day, the visits attribute 320A may be associated with one or more pairs of (time, value) 330A in which the time represents a specific day such as Oct. 16, 2009 and the value represents the total number of visits (or sessions) during the same day regardless of, e.g., which country or city each visit is from. Similarly, thepageview attribute 320B is also associated with one or more pairs of (time, value) 330B in which the time represents a specific day and the value represents the total number of pageviews during the same day regardless of, e.g., what web browser is used for each pageview. - In some embodiments, a breakdown of a lump sum metric value (e.g., the
visits 320A) into multiple values defined by different conditions is desired because it can provide more information to a web analyst about the web traffic. For example, theconditions 310C limit the aggregation of web traffic data for a particular web site to sessions whose country is China. In this case, theaggregation servers 160 generate another set of aggregatedmetrics 320C by skipping any session whose country is not China. Similarly, theconditions 310D focuses only on the sessions that use Firefox as the web browser. Accordingly, the aggregatedmetrics 320D should not take into account of any session that uses Internet Explorer. Note that some of the condition-freeaggregated metrics 310B may be derived from the conditioned aggregatedmetrics aggregate servers 160 typically pre-compute values for many hundreds of aggregated metrics with or without conditions and store those values in theaggregates database 165 for future use. - One use of the
aggregates database 165 is to detect events of potential interest in the web analytics data and present them to a web analyst in an intuitive manner. An event of potential interest (also referred to as an alert or an anomaly in this application) is something that might be valuable to the web analyst but is hidden in the vast amount of web traffic data and difficult to identify. For example, after posting an advertisement on a web site, a market analyst is very interested in learning the advertisement's effectiveness in terms of whether there is any traffic increase at the web site during a predefined time period, from what source it sees the largest traffic increase or decrease, and how much of the increased web traffic is related to the advertisement (e.g., as measured by the click-through rate). As another example, a webmaster concerned with the security of a web site is interested in learning about abnormal web traffic patterns as early as possible to prevent serious attacks. - Without the support by the features as described in this application, it may take many hours or even days of effort for a web analyst to “plow” through the massive amount of web analytics data and track down some useful information. This approach not only wastes human resources but also reduces the value of the information due to the time lapse. One aspect of the present application is to develop a system that can automatically detect those events of potential interest from the web analytics data with no or minimal user effort and present the detection result to the web analyst in an efficient and user-friendly manner to help the web analyst's decision making process.
- According to some embodiments, the process of identifying any events of potential interest in the web analytics data begins with deriving a number of time series or time sequences from the aggregated web analytics data stored in the data structure shown in
FIG. 3 and store the time series in another data structure for further processing. As will be described below, at least two ways of detecting events of potential interest are disclosed in the present application: (i) model-based event detection; and (ii) rule-based event detection. - Generally, the model-based event detection method described herein applies one or more statistical models to a time series to forecast or predict or estimate one or more values for a future time period and then compares the predicted values with the actual value when available. If the differences between the predicted values and the actual value meet a predefined condition, an event of potential interest or an anomaly is identified for the corresponding time period. To some extent, the rule-based approach combines the prediction models and the predefined condition of the model-based approach into a user-specified alert rule. For example, one alert rule may specify that an event of potential interest is detected if the revenue metric attribute of a website at a particular date drops at least 15% than the revenue metric attribute of the same website at the same date of the previous year.
- In some embodiments, the model-based or rule-based event detection method can also be performed on a collection of time series data, e.g., in a batch mode, to not only predict anomalies in the future (which is typically the current day, week, or month) but also identify anomalies in the past. In some embodiments, the anomaly prediction for the current time period (e.g., today, this week or month) may only involve the data samples collected in the past and not include any data samples collected during the current time period. In this case, the prediction for the current time period may start right after the time series update with the data samples of the immediately previous time period. In some other embodiments, the anomaly prediction for the current time period uses the data samples from the current time period as well.
-
FIG. 4 is a block diagram of a data structure that stores time series data extracted from the aggregated web traffic data in accordance with some embodiments. In some embodiments, the time series data stored in thedata structure 400 has a hierarchical structure. The top level of the hierarchy corresponds todifferent sources unique source ID 410A. Note that thesource ID 410A may be the same as thesource ID 310A for the same source. Like the multiple aggregatedmetrics data structure 300, each source in thedata structure 400 may be associated with a plurality of time series, each time series having a unique combination of metric and condition. - For example, the metric 410B is the number of new visits to a website during a day and the
condition 410C is that only new visits that come from Paris should be considered. In this case, thetime series 410D includes atime series ID 420A and one or more time series updates 420B, 420C and each time series update includes one or more pairs of (time, value) 430A wherein the “time” parameter corresponds to a particular day and the “value” parameter corresponds to a particular number of new visits from Paris during that day. A more detailed example of a time series including multiple updates is provided below in connection withFIG. 6B . - Generally, each source may be characterized by hundreds of metric and dimension attributes in the
hits database 155. Different combination schemes of the metric and dimension attributes may produce thousands of possible time series. From a web analyst's perspective, not every possible time series is important enough to justify a spot in thetime series database 175. Although a bit arbitrary, each (condition-free or conditioned) time series stored in thetime series database 175 is generated because it may carry information of interest to many web analysts. In some embodiments, a web master of a website is allowed to define his or her own new metric or dimension attributes or customize the existing metric or dimension attributes to have a better characterization of the traffic to the website. In this case, the new or customized attributes are additional sources for generating time series data for event detection using the invention disclosed in this application. A more detailed description of how to define new or customize existing attributes can be found in a pending application entitled “Extensible custom variables for tracking user traffic” (attorney docket number 060963-5420-US) filed Oct. 20, 2009, which is hereby incorporated by reference in its entirety. - In some embodiments, the time series in the
data structure 400 are derived from the aggregated data in thedata structure 300 ofFIG. 3 . If a time series corresponds to the aggregated metrics of an entire source free of any precondition, the condition for this time series in thedata structure 400 does not exist or is none. In this case, the time series is also referred to as a “condition-free” time series. If a time series corresponds to the aggregated metrics of the source with one or more conditions, the same conditions used for aggregating the web traffic data are also the conditions in thedata structure 400 for the corresponding time series. In this case, the time series is also referred to as a “conditioned” time series. In some embodiments, a source has a number (e.g., 10) of condition-free time series including the metrics like visits, pageviews, bounce rate, pages/visit, new visits, and average time on site, etc. In addition, the source may have more (e.g., 100) conditioned time series, each having a unique set of conditions for filtering out data that does not meet any of the predefined conditions. - In some embodiments, if the definition of a time series does not have any corresponding entry in the
aggregates database 165, the timeseries gathering servers 170 may need to access thehits database 155 to build the time series directly on top of the hits data or even the raw web traffic data from thelogfiles 130 or the Javascript code of aclient assistance 114 that monitors the user activities at a web page. In some other embodiments, the timeseries gathering servers 170 can send a request to theaggregation servers 160 for aggregating the hits data according to the time series definition and return the aggregated data to the timeseries gathering servers 170. - Although the
time series database 175 does not include every possible time series that can be derived from a website's hits data, it is a challenge for thetime series database 175 to host so many time series related to different sources. In some embodiments, some data quantization and compression techniques may be employed to keep the time series storage relatively small. For example, a value in thetime series database 175 is rounded and stored in the form of an expression like a*2b, where the parameter “a” is encoded with a small number (e.g., 5) of bits and the parameter “b” can have more bits such that the difference between the value and the expression is as small as possible. This data quantization scheme is acceptable as long as the loss of precision does not defeat the purpose of detecting those events of potential interest. - For a given time series (e.g., the number of daily visits during a month), each value at a particular date may be a very large number (e.g., three or four digits) but the difference between two consecutive dates may be much smaller (e.g., only two digits). Instead of storing the actual values like v1, v2, v3, etc., one way of saving the storage space in this situation is to calculate the difference between two consecutive values and store the differences like v2-v1, v3-v2, etc. in the
time series database 175 as long as the base value v1 is available for reconstructing the actual values when needed. -
FIG. 5 is a block diagram of a data structure that stores events of potential interest detected in the time series data in accordance with some embodiments. The events data stored in thedata structure 500 also has a hierarchical structure. The top level of the hierarchy corresponds todifferent sources unique source ID 510A. Note that thesource ID 510A may be the same as thesource ID 310A in theaggregates database 165 and thesource ID 410A in thetime series database 175 for the same source. Eachevent 510B is associated with anevent ID 510C, a metric 520A, one ormore conditions 520B, a pair of (time, value) 520C wherein the value is the actual value for that time period, a pair of (minimum, maximum) 520D wherein the minimum and maximum values are usually determined through one or more statistical models, asignificance factor 520E that indicates the interest level of this event to a web analyst, etc. A more detailed description of the (minimum, maximum) pair and the significance factor is provided below in connection withFIGS. 7A and 7B . - Having described the data structures of the
time series database 175 and theevents database 185, we now discuss the process performed by the timeseries gathering servers 170 for updating thetime series database 175 and the process performed by theevent detection servers 180 for updating theevents database 185. For convenience, it is assumed that that the initial setup of theanalytics system 140 is completed and different components within thesystem 140 are in a normal operation mode. -
FIG. 6A is a flow chart of a process for updating the time series data using the aggregated data updates in accordance with some embodiments. - At a regular time interval (e.g., every few hours or every night), the time
series gathering servers 170 receive one or more aggregated data updates (610). In some embodiments, an aggregated data update provides information about the user activities at one or more websites during the recent predefined time interval. For example, the update may include a number of visits to a particular website or any other aggregated metrics that have been collected in thetime series database 175. It should be noted that, as explained earlier, the invention of this application is not limited to web traffic data. In fact, it can be used to identify or predict anomalies in almost any type of time series data. In some embodiments, the updates are pulled out of theaggregates database 165 by the timeseries gathering server 170. In some other embodiments, theaggregation servers 160 push the updates to the timeseries gathering servers 170 for further processing. - For each update, the time
series gathering servers 170 identify the time series in thedatabase 175 for updating (620). As noted above, the time series data in thetime series database 175 are organized under different sources as different sets of metrics and conditions. At a predefined time (e.g., every night), the timeseries gathering servers 170 collect the aggregated data updates corresponding to different time series and then apply each of them to a corresponding time series in thedatabase 175. In some embodiments, the metric and dimension attributes associated with different updates are part of the key for identifying the corresponding time series in thedatabase 175. In some embodiments, the data structure of the aggregated data updates is similar to thedata structure 300 inFIG. 3 . For each source ID in the update, the timeseries gathering servers 170 find the corresponding entry in thedata structure 400 inFIG. 4 that has the same source ID. Next, the timeseries gathering servers 170 update the identified time series using the data entries in the update (630) and consolidates the time series updates if predefined conditions are met (640). -
FIG. 6B is a block diagram of an exemplary process for updating a time series on a weekly basis in accordance with some embodiments. In this example, it is assumed that the updates to thetime series database 175 happen on a daily basis and a time series consolidation process occurs every week. - On Sunday, the
time series 650 includes only one time series update 650-0. The time series update 650-0 includes a plurality of (time, value) pairs, one pair per day and each value corresponding to an actual value for that day. In some embodiments, the oldest entry of these (time, value) pairs may be dated a long time (e.g., two years) back and the newest entry (TN, VN) is generated this Sunday. As will be explained below in detail, each time series is used for predicting one or more values at a future time under different prediction models. In some embodiments, the daily time series are summed on a weekly basis to form a weekly time series, which may be further summed on a monthly basis to a monthly time series. Note that this summation operation is similar to a low-pass filter of the data samples. As a result, both the weekly time series and the monthly time series are typically smoother than the corresponding daily time series during the same time period. As shown inFIGS. 11A to 11C , this could result that an anomaly identified in the daily time series does not have an anomaly in the corresponding week of the weekly time series or the corresponding month of the monthly time series. - On Monday, the time
series gathering servers 170 receive a time series update 650-1. In some embodiments, this update is stored as a separate timeseries update entry 420C in thedata structure 400 without being combined with the time series update 650-0. By doing so, it is convenient for theservers 170 to add and access new entries into thedata structure 400. This process repeats every day and a new time series update 650-2 to 650-6 are added to thetime series 650 until the next Sunday. - Upon receiving a new update entry (TN+7, VN+7) on the next Sunday, the time
series gathering servers 170 determine that it is time to consolidate the time series updates accumulated during the past week. In some embodiments, the timeseries gathering servers 170 follows the first-in-first-out (FIFO) rule by eliminating the oldest seven (time, value) pairs ranging from (T0, V0) to (T6, V6) from thetime series 650 and combining the newest seven (time, value) pairs ranging from (TN+1, VN−1) to (TN+7, VN+7) with thetime series 650 to form anew time series 655 that includes only one time series update 655-0. By repeating this process on a regular basis, the timeseries gathering servers 170 maintain a sliding time window on a fixed length of time series data when determining the existence of any events of potential interest. It should be noted that the method of updating time series as described above in connection withFIG. 6B is for illustrative purposes. There are many other ways of managing the time series that are known in the art. - In some embodiments, an event of potential interest has a practical, meaningful value only if the corresponding web site has received a sufficient number of visits from a broad scope of visitors for a certain time period. For example, if a website only receives a handful (e.g., less than 10) of visits per day, a small, insignificant variation of user activities (e.g., an increase of daily visits from 10 to 30) could result in a false-alarm-like event of potential interest being detected by the
event detection servers 180. Too many false-alarm-like events of potential interest would likely make the actual events of interest less visible to the web analyst. To solve this problem, the timeseries gathering servers 170 may set a threshold such that no time series is generated for a website until the website's associated web analytics data reaches the threshold. For example, the threshold can be that a website receives at least 100 visits per day or 50 visits from distinct IP addresses. This lower-bound on the generation of time series reduces not only the statistical noise level of the detected events of potential interest but also the storage needed for storing the time series. - For a given set of times series associated with a particular source, the
event detection servers 180 are responsible for identifying events of potential interest therein and populating the identified events in theevents database 185. As noted above, there are at least (i) model-based and (ii) rule-based two different ways of detecting events, which will be described in more detail below. -
FIGS. 7A and 7B are flow charts of a model-based process for detecting events of potential interest in a time series in accordance with some embodiments. In some embodiments, this process occurs periodically (e.g., every night). In some other embodiments, this process is performed in response to a user request from aclient 110. For simplicity, it is assumed in the example below that theevent detection servers 180 work on the time series at a predefined time. After identifying and extracting a time series and its recent update from the time series database 175 (710), theevent detection servers 180 make predictions for the time series using a plurality of prediction models. - For example, assume that the
event detection servers 180 have a time series of the last N days of numbers of visits to a website and the number of visits for the current day. Whether the number of visits for the current day is high or low enough to be qualified as an event of potential interest, theevent detection servers 180 need to determine the trend of the number of visits at the website and use the trend to estimate a predicted number of visits for the current day using the time series of the last N days of numbers of visits (note that the value of N may vary for different forecasting models). Although many statistical models can be used to making the prediction. Two types of modeling techniques are described herein for illustration: (i) linear regression; and (ii) Holt-Winters exponential smoothing. - Generally, linear regression is an approach of modeling a linear relationship between a dependent variable y and one or more independent variables x1, x2, . . . , xn, such that the linear model's unknown parameters can be estimated from the observed data. Assuming that the relationship between the number of visits (vi) and the corresponding date (ti) is linear, this relationship can be mathematically expressed as follows:
-
v i =αt i+β, - where ti=1, 2, . . . , N or (in the form of matrix)
-
- A numerical solution to the matrix of linear equations (e.g., using the well-known least-squares algorithm) can determine the two parameters α and β. Using the estimated {circumflex over (α)} and {circumflex over (β)}, it is possible to predict the number of visits (vj) at any given date in the future (tj) as follows:
-
v j ={circumflex over (α)}t j+{circumflex over (β)}. - From the time series of the actual numbers of visits at different dates, it is also possible to determine a variance for the predicted number of visits at the given date using well-known statistics theory. As a result, an estimated range of the number of visits at a given date using linear regression can be expressed as follows:
-
[vj−sj, vj+sj] - where sj represents the variance of the prediction using linear regression.
- Unlike the linear regression that gives the past observations equal weight, exponential regression is an approach that assigns exponentially decreasing weights to the past observations as they get older. Assuming that the sequence of observations begins at time t=0, one form of exponential smoothing (i.e., single exponential smoothing) is given by the following formulas:
-
w0=v0, -
w i =λv i+(1−λ)w i−1 - The parameter λ helps to define the amount of weight given to a past observation. Generally, the weight given to the observation at the kth day in the past from the current date is expressed as:
-
λ(1−λ)k−1 - In some embodiments, another form of exponential smoothing (i.e., double exponential smoothing) is used for making the forecasting to capture a trend in the time series, if there is any. Double exponential smoothing is given by the following formulas:
-
w0=v0, -
b 0 =v 1 −v 0, -
w i=αvi+(1−α)(w i−1 +b i−1), -
b i =γ(w i −w i−1)+(1−γ) bi−1 - where 0≦γ≦α≦1.
- In some embodiments, the parameter γ is set to be no greater than the parameter α. In some embodiments, other non-linear statistical modeling schemes such as the triple exponential smoothing may be used to take care of the seasonality (also known as periodicity) in the time series data, which feature is typically prominent when a long time series is used for forecasting and the time series itself demonstrates some cyclic patterns. For example, some websites such as a weather forecasting website usually receive more traffic every Friday of each week because many visitors are interested in learning the weather condition during the weekend. In this case, the number of visits to the website may show a fluctuating pattern on a weekly basis and the triply exponential smoothing may be more appropriate for capturing the trend accurately.
- In either modeling technique, the number of past observations or actual data samples used for predicting the future value affects the predicted value's sensitivity to the recent changes of the actual data samples. In some embodiments, three time-window lengths, i.e., 4 days, 21 days, and 56 days, are chosen as the numbers of past observations used for making separate predictions so as to capture both the recent changes of the actual samples and the long-term trends using different predictions if the predicted values are daily-based or weekly-based. If the predicted values are monthly-based, the three time-window lengths are respectively, 0.5 month, 3 months, and 8 months according to some embodiments. Note that the length of a time window used for predicting a value at a future time, to some extent, determines whether the predicted value is more or less likely to be affected by a recent fluctuation in the time series. A prediction model that uses a longer time window considers more data samples into the past for forecasting a value in the future. This effect is similar to a low-pass filter such that the predicted outcome is less sensitive to the recent fluctuation in the time series and it is more likely to capture the trend in the time series. By contrast, a prediction model based on a short time window uses fewer data samples to make the prediction and the predicted result is usually more sensitive to the recent fluctuation in the time series. A combination of the predicted values based on the different lengths of time series may result in a more reliable prediction that takes into account both the long-term and short-term features in the time series.
- In some embodiments, the
event detection servers 180 make nine predictions using the two modeling techniques and the three different lengths of time windows. For convenience, the nine predictions are expressed as: -
[Mi, ei] - where i=1, 2, 3, 4, 5, 6, 7, 8, 9;
- Mi represents the ith predicted metric value at the current date; and
- ei represents the variance of the ith prediction at the current date.
- In particular, three out of the nine forecasted models are derived from linear regression and the other six models are from double exponential smoothing because three possible values {x1, x2, x3}, which are ranked in a monotonically increasing order, are candidates for each of the two parameters α and γ. As noted above, γ is set to be no greater than α. Therefore, the three possible values {x1, x2, x3} produce six different combinations that correspond to the six models as follows:
-
[α=x1, γ=x1], -
[α=x2, γ=x1], -
[α=x3, γ=x1], -
[α=x2, γ=x2], -
[α=x3, γ=x2], -
[α=x3, γ=x3]. - With the multiple predictions in hand, the
event detection servers 180 compare the actual value of the current date with each of the six predictions (720). Based on the comparison result, theevent detection servers 180 determine whether an event of potential interest is detected or not (740). For each determined event, theevent detection servers 180 also give it a significance factor that indicates how unlikely the event is (750) and stores the event in the events database 185 (760). In general, the more unlikely the event is, the more interested the web analyst may be. For example, if there is an event indicating a significant jump in the number of visits at a particular day when compared with the trend in the past, the web analyst would probably like to investigate the cause behind this jump and find out, e.g., whether it relates to a potential hacker's attack or a successful commercial promotion that immediately preceded the event. Note that not every event identified by theanalytics system 140 may deserve an increased level of user attention. But by displaying a number of events or anomalies for each day or week or month, theanalytics system 140 presents to a user such as a web analyst a highly-reliable “roadmap,” with which the web analyst can quickly “plow” through a large amount of web traffic data and derive information valuable for improving the quality of service offered by the website. - Assume that:
-
- the time series being analyzed is the total number of daily visits to a website during a particular date;
- the six predictions are [344, 15], [500, 154], [402, 23], [389, 73], [588, 112], and [693, 87]; and
- the actual number of visits is 618.
- As shown in
FIG. 7B , theevent detection servers 180 select the first predicted model (720-1) and determines that the estimate and variance are [344, 15] (720-2). A comparison of the actual number 618 with the predicted model indicates that the actual number is not within the scope defined by the model (730-1, no). In this case, theevent detection servers 180 further determine a significance factor for the first model. In some embodiments, the significance factor is determined by calculating the extent of stretching the variance of the model to include the actual number within the stretched scope of the first model. For example, the significance factor for the first model can be (618−344)/15=18.3. - Since there are still five models left for comparison (730-3, no), the
event detection servers 180 then return to select the second model, [500, 154]. This time, the comparison indicates that the actual number 618 is within the scope of the second model (730-1, yes) and theevent detection servers 180 then go ahead working the next model under the last model is processed (730-3, yes). In this example, three out of the six models, i.e., [500, 154], [588, 112], and [693, 87] are satisfied by the actual number 618 and three other models, i.e., [344, 15], [402, 23], and [389, 73] are not satisfied by the actual number 618. Assuming that the threshold for detecting an event is that at least half of the models are not satisfied (740-1), theevent detection servers 180 then determine that the actual number of visits 618 is an event of potential interest (740-2) and chooses a significance factor for the event (740-3). - In some embodiments, the significance factor of an event is the significance factor of one of the unsatisfied prediction models such that (i) the actual number is more likely to satisfy this prediction model than any other unsatisfied prediction models and (ii) the actual number would satisfy more than half of all the prediction models by satisfying this prediction model and therefore no longer qualify as an event. In the example above, the significance factor of the prediction model [389, 73], i.e., (618−389)/73=3.1, is chosen to be the event's significance factor. As will be explained below in connection with
FIG. 8B , this significance factor is used for determining whether the event should be displayed to a user or not. - In some embodiments, the
event detection servers 180 also use the models to predict the minimum and maximum of the expected value for that particular time period (740-4). This value gives a user a range of a normal value for that time period had there been no anomalous user activities. In some embodiments, the predicted metric values according to different models are ordered by their magnitudes. For example, 10 models result in a sequence of 10 predicted values. Among the 10 predicted values, the second to the lowest value is chosen to be the minimum of the expected value and the second to the highest value is chosen to be the maximum of the expected value if the actual value is outside the range defined by the pair of (minimum, maximum). Otherwise, no minimum or maximum values are available for the corresponding event. - Compared with the model-based event detection that requires little user interaction, the rule-based event detection described below provides an end user with more control on what kind of user activities may be potentially “interesting” or valuable. Since these two approaches are often complimentary to each other, they may provide better outcomes if used in combination.
-
FIG. 7C is a flow chart of a rule-based process for detecting events of potential interest in a time series in accordance with some embodiments. - For a data source (e.g., a web site), the
event detection servers 180 identify one or more alert rules (770) in thealert rules database 195. In some embodiments, theevent detection servers 180 query thealert rules database 195 for any alert rules that may be applicable to the time series associated with the data source. The alert rulesdatabase 195 stores a plurality of user-specified event triggering conditions that different users enter through a graphical user interface at aclient 110, an example of which is described below in connection withFIG. 12E . In some embodiments, the alert rules may be stored in the same database as the dataset segment schemes supported by theanalytics system 140. - The
event detection servers 180 select one of the identified alert rules (772) and apply the alert rule to thetime series database 175 to identify those time series, if any, that satisfy the alert rule (774) and store them in theevents database 195 as trigging events (778). For example, if the time series is a sequence of numbers of visits from visitors in China, the application of an alert rule that triggers an event if the visits from China increase by 10% would be appropriate (although the time series may fail to trigger such event if the recent time series update does not show at least 10% increase of visits). In contrast, another alert rule that triggers an event if the visits from Brazil drop 5% would not be applicable. - The
event detector servers 180 repeat the aforementioned process until the last alert rule associated with the data source has been processed (780, yes). In some embodiments, these triggering events will be shown to a user through a graphical user interface per the user's request. In some other embodiments, theanalytics system 140 also notifies the user of the triggering event through other communication channels such as email, text messaging, voicemail, etc. - The aforementioned description focuses primarily on how the
analytics system 140 detects events of potential interest in the collected web analytics data through data aggregation and time series data analysis. The following description shifts its focus on how the events of potential interest are served to the users of theanalytics system 140 in a client-server environment like the one shown inFIG. 1B . -
FIGS. 8A and 8B are flow charts illustrating how the analytics system prepares and serves a report of events of interest in response to a user request in accordance with some embodiments. - At a
client 110, a user submits a request for viewing an event report for a particular web site. Upon receipt of the user request (802), theclient 110 generates a request for the event report to the analytics system 140 (804). In some embodiments, the client request is an HTTP request. Upon receiving the client request (806), thequery processor 190 in theanalytics system 140 transforms the client request into one or more queries to theevents database 185 and submits them to the database (810). For each of the database queries received from the query processor 190 (812), theevents database 185 identifies the corresponding events data records (if any) (814) and returns them to thequery processor 190 for preparing a response to the client request (816). - As shown in
FIG. 8B , the request from theclient 110 includes a range of dates and a sensitivity level for querying the events database (814-1). After determining the dates and the sensitivity threshold (814-1), theevents database 185 chooses one of the dates for further processing (814-2). The further processing includes retrieving events associated with the chosen date (814-3); identifying and counting the events whose respective significance factors are at least equal to or higher than the user-specified sensitivity threshold (814-4); and generating a dataset segment scheme for event identified event (814-5). After looping through all the dates (814-6, yes), theevents database 185 returns the information about the identified events to thequery processor 190. - Back to the side of the
query processor 190, it compiles an event report using the events information returned from the events database 185 (818) and then returns the report to the client 110 (820). Upon receiving the event report (822), theclient 110 displays the report to the user (824). Exemplary screenshots of the graphical user interface for displaying the event reports are described below in connection withFIGS. 11A to 11C . -
FIG. 9 is a block diagram of a client device used by, e.g., a web analyst, for requesting and rendering web analytics reports in accordance with some embodiments. Theclient 110 generally includes one or more processing units (CPU's) 902, one or more network orother communications interfaces 904,memory 912, and one ormore communication buses 914 for interconnecting these components. Thecommunication buses 914 may include circuitry (sometimes called a chipset) that interconnects and controls communications between components. Theclient 110 may optionally include auser interface 905, for instance, adisplay 906, a keyboard and/ormouse 908, and a touch-sensitive surface 909.Memory 912 may include high speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may also include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.Memory 912 may include mass storage that is remotely located from the central processing unit(s) 902.Memory 912, or alternately the non-volatile memory device(s) withinmemory 912, comprises a computer readable storage medium.Memory 912 or the computer readable storage medium ofmemory 912 stores the following elements, or a subset of these elements, and may also include additional elements: -
- an
operating system 916 that includes procedures for handling various basic system services and for performing hardware dependent tasks; - a
network communication module 918 that is used for connecting theclient 110 to other servers or computers including theanalytics system 140 via one or more communication network interfaces 904 (wired or wireless), such as the Internet, other wide area networks, local area networks, and metropolitan area networks and so on; - a client application 112 (e.g., a web browser), including one or more client assistants 114 (e.g., toolbar, browser plug-in) for monitoring the activities of a user; in some embodiments, the
client assistant 114, or a portion thereof, may include a web application manager 520 for managing the user interactions with the web browser, adata renderer 922 for supporting the visualization of an analytics report, and arequest dispatcher 924 for submitting user requests for new analytics reports; and - a
user interface module 926, including aview module 928 and acontroller module 930, for detecting user instructions to control the visualization of the analytics reports. In some embodiments, theuser interface module 926 further includes asegmentation module 932 for displaying a segmentation/filter definition template and receiving user instructions for building a dataset segment scheme using the template and analert module 934 for displaying an alert definition template and receiving user instructions for building an alert rule using the template (see, e.g., descriptions below in connection withFIGS. 12D and 12E ).
- an
-
FIG. 10 is a block diagram of an analytics system for processing web traffic data, identifying events of potential interest therein, and serving web analytics reports in response to user requests in accordance with some embodiments. Theanalytics system 140 generally includes one or more processing units (CPU's) 1002, one or more network orother communications interfaces 1004,memory 1012, and one ormore communication buses 1014 for interconnecting these components. Theanalytics system 140 may optionally include auser interface 1005 comprising adisplay device 1006 and akeyboard 1008.Memory 1012 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.Memory 1012 may optionally include one or more storage devices remotely located from the CPU(s) 1002.Memory 1012, or alternately the non-volatile memory device(s) withinmemory 1012, comprises a computer readable storage medium.Memory 1012 or the computer readable storage medium ofmemory 1012 stores the following elements, or a subset of these elements, and may also include additional elements: -
- an
operating system 1016 that includes procedures for handling various basic system services and for performing hardware dependent tasks; - a
network communication module 1018 that is used for connecting theanalytics system 140 to other computers such as the clients 110 (used by the web analyst or a regular website user) and theweb servers 120 via the communication network interfaces 1004 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on; - one or
more log processors 150 for processing the web traffic data received from theweb servers 120 and theclients 110 into sessionized data records stored in thehits database 155; - one or
more aggregation servers 160 for aggregating the different metrics of the sessionized data into the aggregated data in theaggregates database 165; - one or more time
series gathering servers 170 for organizing the different aggregated metrics data in theaggregates database 165 into time series in thetime series database 175; in some embodiments, the timeseries gathering servers 170 include a timeseries update module 1020 for updating the time series with the aggregated data updates received from theaggregates database 165; - one or more
event detection servers 180 for detecting events of potential interest in the time series stored in thetime series database 175; in some embodiments, theevent detection servers 180 include anevent detection module 1022, amodel prediction module 1024 for making predictions based on the time series, and analert detection module 1026 for identifying events in the time series that triggers one or more alert rules in thealert rules database 195; in some embodiments, themodel prediction module 1024 further includes one or more parameters 1024-1 such as α, γ in the double exponential smoothing, a linear regression sub-module 1024-2, a Holt-Winters exponential smoothing sub-module 1024-3, as well as other models 1024-4; - a
query processor 190 for querying the databases associated with theanalytics system 140 in response to user requests fromclients 110 and providing analytics reports to theclients 110 based on the query results; in some embodiments, thequery processor 190 further includes aserver application 194 that includes aquery module 1030 for converting client requests into one or more queries or data filters and aresponse module 1032 for preparing analytics reports based on the response from the different databases; - a
hits database 155 for storing sessionized web analytics data; - an
aggregates database 165 for storing the aggregated metric data and their associated conditions; - a
time series database 175 for storing the time series extracted from theaggregates database 165; - an
events database 185 for storing the events of potential interest identified in the time series; and - an
alert rules database 195 for storing user-specified alert definitions; in some embodiments, thealert rules database 195 includes one or more alert rule definitions such as the alert rule A 1034-1 including the associated metric(s) 1034-2 and the condition(s) 1034-3, the alert rule B 1034-4, etc.
- an
- Each of the above-identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments,
memory memory -
FIGS. 9 and 10 are intended more as functional descriptions of the various features of a client device and analytics system rather than a structural schematic of the embodiments described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some items shown separately inFIG. 10 like thequery processor 190 and theserver application 194 as well as items like thedatabases 155 to 195 could be implemented by one or more servers. The actual number of server computers used to implement theanalytics system 140, and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods. -
FIGS. 11A to 11C are screenshots of graphical user interfaces that display daily, weekly, and monthly events of potential interest, respectively, in accordance with some embodiments. - In particular,
FIG. 11A depicts a daily alertsgraphical user interface 1102 during a 30-day period from Sep. 15, 2009 to Oct. 15, 2009. To access this user interface, a user clicks the “Intelligence”entry 1100 on the left side of the interface. There are three levels of alerts in the “Intelligence”entry 1100, “Daily Alerts,” “Weekly Alerts,” and “Monthly Alerts.” In some embodiments, the user interface by default displays the daily alerts when the user clicks theentry 1100. Below the daily visits curve 1101 is abar chart 1104 illustrating the respective total number of events of potential interest during the 30-day period, each day occupying one clickable spot in thebar chart 1104. In some embodiments, the user interface automatically focuses on the entry on the far right of the bar chart, which corresponds to the current date, Oct. 15, 2009. But a user can click on other parts of thebar chart 1104 to investigate the alert information for any other day within the last 30 days. Note that, at thecurrent sensitivity level 1112, the total number ofevents 1106 for the date of Oct. 15, 2009 (referred to as “alerts” in the figure) is zero. In other words, theanalytics system 140 does not identify any anomalous user activity patterns for that day under thecurrent sensitivity level 1112. As a result, the custom alertsregion 1108, which is associated with the “Custom Alerts”checkbox 1103 and used for displaying those alert rule-based events, and theautomatic alerts region 1110, which is associated with the “Automatic Alerts”checkbox 1105 and used for displaying those model-based events, are both empty. Note that a de-selection of eithercheckboxes alert regions -
FIG. 11B depicts a weekly alertsgraphical user interface 1120 for the past five weeks from Sep. 13, 2009 to Oct. 15, 2009, e.g., after a user selection of the “Weekly Alerts”link 1124 on the left of the user interface. By default, the current week of Oct. 11-15, 2009 is highlighted in the user interface. A user can click on the bar chart below thecurve 1122 to select another week of data. Note that there is no alert for the current week including Oct. 15, 2009 because it is not over yet and the forecasting of the present application is for the most recently completed week. Compared with thecurve 1101 inFIG. 11A , thecurve 1122, which corresponds to roughly the same period of time, is smoother because, as explained above, the weekly summation of the daily data samples acts as a low-pass filter. As a result, the number of weekly alerts during each week is typically smaller than the sum of daily alerts during the same week. This also applies to the monthly alert described below in connection withFIG. 11C . This user interface is similar to the one shown inFIG. 11A except that the total numbers of data samples as shown in thecurve 1122 drop from 30 (which corresponds to the last 30 days from Sep. 15, 2009 to Oct. 15, 2009) to 5 (which corresponds to the last five weeks from September 13, 2009 to October 15, 2009). In this example, the number of alerts for the week of Oct. 11-15, 2009 remains to be zero under the current sensitivity level. -
FIG. 11C depicts a monthly alertsgraphical user interface 1140 for the past 12 months from Oct. 1, 2008 to Oct. 15, 2009, after a user selection of the “Monthly Alerts” 1144 on the left. This user interface is similar to the one shown inFIG. 11A except that the total numbers of data samples as shown in thecurve 1142 drop from 30 to 12. In this example, the number of alerts for the month of Oct. 1-15, 2009 remains to be zero under the current sensitivity level. -
FIGS. 12A to 12E are screenshots of graphical user interfaces that displays information relating to events of potential interest in accordance with some embodiments. -
FIG. 12A depicts the samedaily alerts 1102 shown inFIG. 11A but at a different date, Sep. 30, 2009. According to this daily alerts 1202, the number ofalerts 1204 on Sep. 30, 2009 at thecurrent sensitivity level 1212 is three. Note that the custom alertsregion 1206 is empty and all the three alerts are model-based automatic alerts. In particular, one of thealerts 1208 suggests a significant (83%) drop of bounce rate for visits that exit from aparticular web page 1209 from the expected range of 34.26%-39.96% to 6.29% . Avisual indication 1211 of the alert's significance factor is also shown in the same row, indicating how unlikely this alert is under a normal situation. Twoalerts FIGS. 7A and 7B . - In contrast, the
alert 1214 indicates that the number of visits to the website that were referred to the website from the web page “www.google.com/intl/en/about.html” during Sep. 30, 2009 increased more than 281% when compared with the median value derived from the multiple prediction models. This may be because that the referral web page has a link to the website www.goolestore.com and many users who visit Google's website found that link and then clicked it through. - In some other embodiments, the reference value used for measuring the percentage may be the actual value of the immediately preceding time period, the averaged actual value derived from multiple time periods in the past, the mean of the expected range or other reference values that are well-known in the art.
-
FIG. 12B depicts agraphical user interface 1220 when the user-selected date moves from Sep. 30, 2009 to Oct. 14, 2009. Note that the number of alerts for the new dates increases to 20. Moreover, one of the 20 alerts is acustom alert 1226 called “revenue decrease.” A user selection of theedit link 1228 brings up the definition of the custom alert as shown inFIG. 12E . According to the definition, this alert is triggered when the revenue from all traffic to the website drops more than 10% from the same day of the previous week. In other words, the revenue on Oct. 14, 2009 is less than 90% of the revenue on Oct. 7, 2009. -
FIG. 12C depicts the same user interface after a user selection of the curve link 1248 next to the firstautomatic alert 1244, which indicates a dramatic increase of goal conversion rate of the total traffic. As shown by thecurve 1246, the rate was almost zero for the entire month until a sudden jump on Oct. 14, 2009. This curve also explains how the jump is detected as an alert. Using this alert as a lead, the web analyst can investigate the type of traffic on the same date and research what triggers the sudden jump of goal conversion rate. -
FIG. 12D depicts a graphical user interface for defining a dataset segment scheme in response to a user selection of the “Create segment”link 1242 inFIG. 12C . A more detailed description of the dataset segment scheme can be found in the pending U.S. patent application Ser. Nos. 12/575,435 and 12/575,437, both of which are incorporated into this application by reference in their entirety. Note that this feature allows a user to revisit the dataset through the same visualization angle in the future without relying on the events report, which is very useful for helping a user to understand the dataset. -
FIGS. 13A to 13C are screenshots of graphical user interfaces that display different numbers of events of potential interest based on a respective user-specified sensitivity threshold in accordance with some embodiments. -
FIG. 13A depicts the alerts bar chart when the sensitivity level is about in themiddle level 1310.FIG. 13B depicts the alerts bar chart when the sensitivity level reaches thehighest level 1320. In this case, theanalytics system 140 reports not only more (12 ofFIG. 13B vs. 3 ofFIG. 13A ) alerts or events of potential interest for the same date, Sep. 30, 2009, but also one or more alerts for many other dates that have no alerts reported inFIG. 13A . By contrast,FIG. 13C depicts the alerts bar chart when the sensitivity level reaches thelowest level 1330. In this case, theanalytics system 140 reports zero alert for the same date, Sep. 30, 2009. -
FIGS. 14A and 14B are screenshots of graphical user interfaces that display events of potential interest based on a respective user-specified organization manner in accordance with some embodiments. In particular,FIG. 14A depicts a graphical user interface in which the alerts are displayed in an order defined bydimension 1410 such as theAll Traffic 1412 and theVisitor 1414 and then by different metrics within the same dimension.FIG. 14B depicts a graphical user interface in which the alerts are displayed in an order defined by metric 1420 such as theGoal Conversion Rate 1422 and then by different dimensions within the same metric. -
FIG. 15A depicts a flow chart of a method for identifying anomalies in time series data in accordance with some embodiments. At a server system with a processor and memory, the server system stores time series data for a data source (1501). The time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value. - For a particular attribute, the server system generates a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data (1503). In some embodiments, each forecasting model includes an estimated attribute value and an associated error-variance.
- For a respective time-value pair associated with the particular attribute, the server system determines whether the value of the time-value pair is within the error-variance of the corresponding estimated attribute value and tags the time-value pair as an anomaly if the value of the time-value pair is outside the error variance for at least a first subset of the forecasting models (1505).
- Finally, in response to a request from a client application for analytics information for the data source, the sever system reports to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes (1507).
- In some embodiments, the respective time-value pair for the particular attribute is the latest time-value pair from the data source. The first subset of the forecasting models comprises one of: a predetermined number of the forecasting models or a predetermined fraction of the forecasting models.
- As shown in
FIG. 15B , for the respective time-value pair and the particular attribute, the server system determines a significance factor (1511). In some embodiments, the significance factor is chosen such that, when the error-variance for each of the forecasting models is multiplied by the significance factor, the value of the time-value pair is inside the factored error-variance of a corresponding estimated metric value for at least a second subset of the forecasting models and the first subset is within the second subset. - In response to the request from the client application for analytics information that includes a significance threshold for one or more of the attributes, the server system reports to the client application those time-value pairs tagged as anomalies when the respective significance factor for each of the time-value pairs exceeds the significance threshold (1513).
- In some embodiments, the forecasting models include at least one of a linear regression model and a Holt-Winters exponential smoothing model. The forecast models include models computed from 4, 21, and 56 days of time-series data.
- In some embodiments, the time series data includes aggregated web analytics data, the method further comprising: aggregating raw or sessionized web traffic data to generate the aggregated web analytics data for attributes of interest and storing the aggregated web analytics data in addition to the raw or sessionized web traffic data. The time series data includes sessionized web analytics data, the method further comprising: summarizing per session raw web traffic data to generate the sessionized time series data for one or more of the attributes storing the sessionized time series data in addition to the raw web traffic data.
-
FIG. 16A depicts another flow chart of a method for identifying anomalies in time series data implemented by different components of a server system with a processor and memory in accordance with some embodiments. - A time series data collector of the server system is configured to collect time series data at one or more predefined time intervals from a plurality of data sources (1601). In some embodiments, the time series data comprises a plurality of time-value pairs, each pair including a value of one attribute associated with the data sources and a time when the value was collected.
- A time series storage module of the server system is configured to store the collected time series data in a computer memory such that, when a new time-value pair is collected by the time series data collector, the new time-value pair is added to the stored time series data for a respective collection of time series data without disturbing the previously stored time series data for the respective collection (1603).
- For a particular new time-value pair, an anomaly detection module of the server system is configured to determine whether the particular new time-value pair is an anomaly with reference to its associated collection of time series data (1605). In some embodiments, this operation further includes: generating a plurality of forecasting models characterizing different subsets of the associated collection of time series data (1605-1), each forecasting model including an estimated attribute value and an associated error-variance; determining whether the particular new time-value pair is within the associated error-variance for each of the plurality of forecasting models (1605-3); and tagging the particular time-value pair as an anomaly when the value of the particular time-value pair is outside the error-variance for at least a first subset of the forecasting models (1605-5).
- Next, an anomaly storage module of the server system is configured to store the time-value pairs tagged as anomalies such that the stored time-value pairs are ready to be served to a user at a client application in response to a user request for the anomalies.
- In some embodiments shown in
FIG. 16B , the server system also includes an aggregation module configured to generate aggregated time series data from the collected time series data (1611). The aggregate time series summarizes raw time series data or sessionized time series data for particular attributes of interest associated with the data sources, the aggregate data being stored by the time series storage module in addition to stored raw time series data or sessionized time series data. - In some embodiments, the anomaly detection mechanism operates solely on the aggregated time series data generated by the aggregation module. The data sources are web pages stored on web servers and the collected time series data comprises values of metrics and dimensions for the web pages and associated time values when the values of the metrics and dimensions were collected. The predefined time intervals are no longer than a day.
- In some embodiments, the time series storage module is further configured to quantize and compress the time series data before storing it so as to save more space.
- In some embodiments, the collection of time series data includes a number of time-value pairs that is used for generating the plurality of forecasting models and the forecasting models include at least one of a linear regression model and a Holt-Winters exponential smoothing model.
-
FIG. 17A depicts another flow chart of a method for detecting anomalies in web analytics data implemented at a server system in accordance with some embodiments. - The server system stores web analytics data for a web page in a device (1701). In some embodiments, the web analytics data comprises a plurality of prior time-value pairs, each time-value pair including a value of one of a plurality of attributes associated with the web page and a time associated with the value. The server system collects a new time-value pair for the particular attribute (1703). The new time-value pair includes a new value associated with the web page and a new time when the value was determined.
- For each of the set of predicted values, the server system estimates a set of predicted values for the attribute and associated error-variances at the new time by applying a plurality of forecasting models to the plurality of prior time-value pairs in respective subsets of the web analytics data (1705).
- Finally, the server system tags the collected new time-value pair as an anomaly when the value of the new time-value pair is outside the error variance of each of a first subset of the forecasting models for the particular attribute (1707).
-
FIG. 17B depicts that the server system adds to the collected web analytics data for the web page the new time-value pair (1711). The time-value pair includes a tag indicating whether the new value is an anomaly and a significance factor if the new value is an anomaly. -
FIG. 17C depicts that the server system storing the web analytics data for a fixed time window into the past (1721). After estimating the set of predicted values and associated error-variances for the attribute at the new time, the server system deletes one or older time-value pairs from previously collected time series data (1723) and appends the new time-value pair to the end of the collected web analytics data (1725). - In some embodiments, the attributes comprise a plurality of metrics and dimensions associated with the web site.
- As shown in
FIGS. 11A to 11C , the graphical user interface for presenting time series data and anomalies for a data source includes a first window and a second window below the first window. - In some embodiments, the first window includes a graph of time series data values for a first attribute for the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and a histogram of anomalies for the data source, with the same time axis scale as the graph and a dependent total anomalies axis. Note that the height of a respective bar along the total anomalies axis in the histogram represents the total number of anomalies for the web site at a particular day.
- The second window includes a list of items characterizing a set of anomalies at a particular time on the time axis, each item corresponding to an anomaly associated with a respective attribute for the data source, a value of the respective attribute at the particular time, and a significance factor of the anomaly, and a user-interactive object for adjusting a sensitivity threshold associated with the first window and the second window.
- As further depicted in
FIGS. 13A to 13C , in response to a user adjustment of the sensitivity threshold through the user-interactive object, a new histogram of anomalies for the data source is rendered to replace the existing histogram of anomalies for the data source in the first window. In addition, a new list of items characterizing a new set of anomalies at the particular time is rendered to replace the existing list of items. - Although some of the various drawings illustrate a number of logical stages in a particular order, stages which are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art and so do not present an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
- The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Claims (24)
1. A graphical user interface for presenting time series data and anomalies associated with a data source on a display of a client computer having a user input device, comprising:
a first window on the display that includes:
a graph of time series data values for a respective attribute of the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and
a histogram of anomalies of the data source, each of the anomalies corresponding to a value of a respective attribute that is substantially different from an expected value of the attribute, the histogram having the same time axis scale as the graph and a dependent total anomalies axis, wherein the height of a respective bar along the total anomalies axis represents a total number of anomalies of the data source for a corresponding time on the time axis; and
a second window on the display that includes:
a list of automatic alerts characterizing a set of anomalies of the data source at a particular time on the time axis, the particular time being designated by a user via interaction with the graph through the user input device, each item in the list of automatic alerts corresponding to an anomaly associated with a respective attribute for the data source.
2. The graphical user interface of claim 1 , wherein each item in the list of automatic alerts includes a representation of a significance factor of a respective anomaly, the significance factor indicating an extent to which the value of an attribute is different from the expected value of the attribute.
3. The graphical user interface of claim 2 , wherein each of the significance factors is determined in accordance with error-variances of a plurality of forecasting models as applied to the associated time series data and the value of the attribute.
4. The graphical user interface of claim 1 , wherein each item in the list of automatic alerts further includes a representation of an expected range of an attribute at a particular time and a percentage difference between the value of the attribute and the expected value of the attribute at the particular time.
5. The graphical user interface of claim 1 , wherein each item in the list of automatic alerts includes a user selectable link that enables a user to initiate creation of an analytics segment for future analysis of time series data of the data source based on a corresponding attribute.
6. The graphical user interface of claim 1 , wherein the second window further includes a list of custom alerts, each of the custom alerts indicating when a user-provided condition relating to a respective attribute of the data source has occurred at a particular time, each of the custom alerts including an identifier of the custom alert and a value of the respective attribute at the particular time.
7. The graphical user interface of claim 6 , further comprising display control elements that enable a user to control display of the custom alerts and/or the automatic alerts.
8. The graphical user interface of claim 1 , wherein each of the anomalies is associated with a significance factor indicating a degree to which a value of an associated attribute is substantially different from an expected value of the associated attribute, further comprising:
a user-interactive object for adjusting a sensitivity threshold associated with the first window and the second window;
wherein:
in response to a user adjustment of the sensitivity threshold through the user-interactive object:
a new histogram of anomalies for the data source is rendered to replace the existing histogram of anomalies for the data source in the first window; and
a new list of items characterizing a new set of anomalies at the particular time is rendered to replace the existing list of items;
the new histogram and the new list including information for only those anomalies whose associated significance factors are consistent with the user-adjusted sensitivity threshold.
9. The graphical user interface of claim 8 , wherein the new set of anomalies includes fewer items in response to user selection of lower sensitivity thresholds and more items in response to user selection of higher sensitivity thresholds.
10. The graphical user interface of claim 1 , wherein the time axis scale is one of: day, week or month.
11. The graphical user interface of claim 1 , wherein the automatic alerts are grouped in relation to a respective measure of the data source that is related to the values of the associated attributes.
12. The graphical user interface of claim 1 , wherein the respective measure is one of: conversion rate, visits/sessions, visitors, bounce rate, page views, revenue or time.
13. A computer-implemented method for presenting on a display time series data and anomalies associated with a data source, comprising:
displaying a graphical user interface, the graphical user interface including:
a first window on the display that includes:
a graph of time series data values for a respective attribute of the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and
a histogram of anomalies of the data source, each of the anomalies corresponding to a value of a respective attribute that is substantially different from an expected value of the attribute, the histogram having the same time axis scale as the graph and a dependent total anomalies axis, wherein the height of a respective bar along the total anomalies axis represents a total number of anomalies of the data source for a corresponding time on the time axis; and
a second window on the display that includes:
a list of automatic alerts characterizing a set of anomalies of the data source at a particular time on the time axis, the particular time being designated by a user via interaction with the graph through the user input device, each item in the list of automatic alerts corresponding to an anomaly associated with a respective attribute for the data source; and
a user-interactive object for adjusting a sensitivity threshold associated with the first window and the second window;
in response to a user adjustment of the sensitivity threshold through the user-interactive object,
replacing the existing histogram of anomalies for the data source in the first window with a new histogram of anomalies for the data source in the first window; and
replacing the existing list of items with a new list of items characterizing a new set of anomalies at the particular time in the second window.
14. The computer-implemented method of claim 13 , wherein the new histogram and the new list include information for only those anomalies whose associated significance factors are consistent with the adjusted sensitivity threshold.
15. The computer-implemented method of claim 13 , wherein each item in the list of automatic alerts includes a representation of a significance factor of the respective anomaly, the significance factor indicating an extent to which the value of the associated attribute is different from the expected value of the attribute.
16. The computer-implemented method of claim 15 , wherein each of the significance factors is determined in accordance with variances of a plurality of forecasting models as applied to the associated time series data and the value of the associated attribute.
17. The computer-implemented method of claim 13 , wherein each item in the list of automatic alerts further includes a representation of an expected range of an attribute at the particular time and a percentage difference between the value of the attribute and the expected value of the attribute at the particular time.
18. The computer-implemented method of claim 13 , wherein each item in the list of automatic alerts includes a user selectable link that enables a user to initiate creation of an analytics segment for future analysis of time series data for the data source based on a corresponding attribute.
19. The computer-implemented method of claim 13 , wherein the second window further comprises a list of custom alerts, each of the custom alerts indicating when a user-provided condition relating to a respective attribute of the data source has occurred at a particular time, each of the custom alerts including an identifier of the custom alert and a value of the respective attribute at the particular time.
20. The computer-implemented method of claim 19 , wherein the new set of anomalies includes fewer items in response to user selection of lower sensitivity thresholds and more items in response to user selection of higher sensitivity thresholds.
21. The computer-implemented method of claim 13 , wherein the time axis scale is one of:
day, week or month.
22. The computer-implemented method of claim 13 , wherein the automatic alerts are grouped in relation to a respective measure of the data source that is related to the values associated with the associated attributes.
23. The computer-implemented method of claim 13 , wherein the respective measure is one of: conversion rate, visits/sessions, visitors, bounce rate, page views, revenue or time.
24. A computer readable-storage medium storing one or more programs for execution by one or more processors of a computer for displaying anomalies in time series data, the one or more programs comprising instructions for:
displaying a graphical user interface, the graphical user interface including:
a first window on the display that includes:
a graph of time series data values for a respective attribute of the data source, the graph having a time axis corresponding to a time range and a dependent data value axis, and
a histogram of anomalies of the data source, each of the anomalies corresponding to a value of a respective attribute that is substantially different from an expected value of the attribute, the histogram having the same time axis scale as the graph and a dependent total anomalies axis, wherein the height of a respective bar along the total anomalies axis represents a total number of anomalies of the data source for a corresponding time on the time axis; and
a second window on the display that includes:
a list of automatic alerts characterizing a set of anomalies of the data source at a particular time on the time axis, the particular time being designated by a user via interaction with the graph through the user input device, each item in the list of automatic alerts corresponding to an anomaly associated with a respective attribute for the data source; and
a user-interactive object for adjusting a sensitivity threshold associated with the first window and the second window;
in response to a user adjustment of the sensitivity threshold through the user-interactive object,
replacing the existing histogram of anomalies for the data source in the first window with a new histogram of anomalies for the data source in the first window; and
replacing the existing list of items with a new list of items characterizing a new set of anomalies at the particular time in the second window.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/907,916 US20110119100A1 (en) | 2009-10-20 | 2010-10-19 | Method and System for Displaying Anomalies in Time Series Data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25347209P | 2009-10-20 | 2009-10-20 | |
US12/907,916 US20110119100A1 (en) | 2009-10-20 | 2010-10-19 | Method and System for Displaying Anomalies in Time Series Data |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110119100A1 true US20110119100A1 (en) | 2011-05-19 |
Family
ID=44011998
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/907,957 Active 2031-11-12 US8554699B2 (en) | 2009-10-20 | 2010-10-19 | Method and system for detecting anomalies in time series data |
US12/907,916 Abandoned US20110119100A1 (en) | 2009-10-20 | 2010-10-19 | Method and System for Displaying Anomalies in Time Series Data |
US14/023,061 Active US8682816B2 (en) | 2009-10-20 | 2013-09-10 | Method and system for detecting anomalies in time series data |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/907,957 Active 2031-11-12 US8554699B2 (en) | 2009-10-20 | 2010-10-19 | Method and system for detecting anomalies in time series data |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/023,061 Active US8682816B2 (en) | 2009-10-20 | 2013-09-10 | Method and system for detecting anomalies in time series data |
Country Status (1)
Country | Link |
---|---|
US (3) | US8554699B2 (en) |
Cited By (189)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8217945B1 (en) | 2011-09-02 | 2012-07-10 | Metric Insights, Inc. | Social annotation of a single evolving visual representation of a changing dataset |
US20140068411A1 (en) * | 2012-08-31 | 2014-03-06 | Scott Ross | Methods and apparatus to monitor usage of internet advertising networks |
US8682816B2 (en) * | 2009-10-20 | 2014-03-25 | Google Inc. | Method and system for detecting anomalies in time series data |
US20140222476A1 (en) * | 2013-02-06 | 2014-08-07 | Verint Systems Ltd. | Anomaly Detection in Interaction Data |
US20140267295A1 (en) * | 2013-03-15 | 2014-09-18 | Palantir Technologies, Inc. | Object time series |
US20140280867A1 (en) * | 2013-03-14 | 2014-09-18 | Novell, Inc. | Analytic injection |
US20140289003A1 (en) * | 2013-03-25 | 2014-09-25 | Amadeus S.A.S. | Methods and systems for detecting anomaly in passenger flow |
US20140303953A1 (en) * | 2011-12-22 | 2014-10-09 | John Bates | Predictive Analytics with Forecasting Model Selection |
US8917274B2 (en) | 2013-03-15 | 2014-12-23 | Palantir Technologies Inc. | Event matrix based on integrated data |
US8924797B2 (en) | 2012-04-16 | 2014-12-30 | Hewlett-Packard Developmet Company, L.P. | Identifying a dimension associated with an abnormal condition |
US8924872B1 (en) | 2013-10-18 | 2014-12-30 | Palantir Technologies Inc. | Overview user interface of emergency call data of a law enforcement agency |
US20150073894A1 (en) * | 2013-09-06 | 2015-03-12 | Metamarkets Group Inc. | Suspect Anomaly Detection and Presentation within Context |
US9009171B1 (en) | 2014-05-02 | 2015-04-14 | Palantir Technologies Inc. | Systems and methods for active column filtering |
US9021384B1 (en) | 2013-11-04 | 2015-04-28 | Palantir Technologies Inc. | Interactive vehicle information map |
US9021260B1 (en) | 2014-07-03 | 2015-04-28 | Palantir Technologies Inc. | Malware data item analysis |
US9043696B1 (en) | 2014-01-03 | 2015-05-26 | Palantir Technologies Inc. | Systems and methods for visual definition of data associations |
US9043894B1 (en) | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
EP2882139A1 (en) * | 2013-12-05 | 2015-06-10 | Deutsche Telekom AG | System and method for IT servers anomaly detection using incident consolidation |
US9100428B1 (en) | 2014-01-03 | 2015-08-04 | Palantir Technologies Inc. | System and method for evaluating network threats |
US9116975B2 (en) | 2013-10-18 | 2015-08-25 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US9123086B1 (en) | 2013-01-31 | 2015-09-01 | Palantir Technologies, Inc. | Automatically generating event objects from images |
US9202249B1 (en) | 2014-07-03 | 2015-12-01 | Palantir Technologies Inc. | Data item clustering and analysis |
US20150347568A1 (en) * | 2014-05-30 | 2015-12-03 | International Business Machines Corporation | Processing time series |
US9223773B2 (en) | 2013-08-08 | 2015-12-29 | Palatir Technologies Inc. | Template system for custom document generation |
US9250759B1 (en) * | 2010-07-23 | 2016-02-02 | Amazon Technologies, Inc. | Visual representation of user-node interactions |
US9256664B2 (en) | 2014-07-03 | 2016-02-09 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US20160042287A1 (en) * | 2014-08-10 | 2016-02-11 | Palo Alto Research Center Incorporated | Computer-Implemented System And Method For Detecting Anomalies Using Sample-Based Rule Identification |
US9335911B1 (en) | 2014-12-29 | 2016-05-10 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US9335897B2 (en) | 2013-08-08 | 2016-05-10 | Palantir Technologies Inc. | Long click display of a context menu |
US9363149B1 (en) * | 2015-08-01 | 2016-06-07 | Splunk Inc. | Management console for network security investigations |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9383911B2 (en) | 2008-09-15 | 2016-07-05 | Palantir Technologies, Inc. | Modal-less interface enhancements |
US20160210556A1 (en) * | 2015-01-21 | 2016-07-21 | Anodot Ltd. | Heuristic Inference of Topological Representation of Metric Relationships |
US20160261482A1 (en) * | 2015-03-04 | 2016-09-08 | Fisher-Rosemount Systems, Inc. | Anomaly detection in industrial communications networks |
WO2016145238A1 (en) * | 2015-03-10 | 2016-09-15 | Elemental Machines, Inc. | Method and apparatus for environmental sensing |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US9454281B2 (en) | 2014-09-03 | 2016-09-27 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9483162B2 (en) | 2014-02-20 | 2016-11-01 | Palantir Technologies Inc. | Relationship visualizations |
US20160330086A1 (en) * | 2014-03-18 | 2016-11-10 | Hitachi, Ltd. | Data transfer monitor system, data transfer monitor method and base system |
US9501851B2 (en) | 2014-10-03 | 2016-11-22 | Palantir Technologies Inc. | Time-series analysis system |
US9516052B1 (en) * | 2015-08-01 | 2016-12-06 | Splunk Inc. | Timeline displays of network security investigation events |
US20160371363A1 (en) * | 2014-03-26 | 2016-12-22 | Hitachi, Ltd. | Time series data management method and time series data management system |
US20170017990A1 (en) * | 2013-03-13 | 2017-01-19 | Jacob Solotaroff | Promotion offer language and methods thereof |
US9552615B2 (en) | 2013-12-20 | 2017-01-24 | Palantir Technologies Inc. | Automated database analysis to detect malfeasance |
US9557882B2 (en) | 2013-08-09 | 2017-01-31 | Palantir Technologies Inc. | Context-sensitive views |
US20170031565A1 (en) * | 2015-08-01 | 2017-02-02 | Splunk Inc. | Network security investigation workflow logging |
US9619557B2 (en) | 2014-06-30 | 2017-04-11 | Palantir Technologies, Inc. | Systems and methods for key phrase characterization of documents |
US9727560B2 (en) | 2015-02-25 | 2017-08-08 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US9727622B2 (en) | 2013-12-16 | 2017-08-08 | Palantir Technologies, Inc. | Methods and systems for analyzing entity performance |
US9767172B2 (en) | 2014-10-03 | 2017-09-19 | Palantir Technologies Inc. | Data aggregation and analysis system |
US9785773B2 (en) | 2014-07-03 | 2017-10-10 | Palantir Technologies Inc. | Malware data item analysis |
US9785328B2 (en) | 2014-10-06 | 2017-10-10 | Palantir Technologies Inc. | Presentation of multivariate data on a graphical user interface of a computing system |
US9785317B2 (en) | 2013-09-24 | 2017-10-10 | Palantir Technologies Inc. | Presentation and analysis of user interaction data |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US9823818B1 (en) | 2015-12-29 | 2017-11-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US9852205B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | Time-sensitive cube |
US9857958B2 (en) | 2014-04-28 | 2018-01-02 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US9864493B2 (en) | 2013-10-07 | 2018-01-09 | Palantir Technologies Inc. | Cohort-based presentation of user interaction data |
US9870205B1 (en) | 2014-12-29 | 2018-01-16 | Palantir Technologies Inc. | Storing logical units of program code generated using a dynamic programming notebook user interface |
US9880987B2 (en) | 2011-08-25 | 2018-01-30 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US9886467B2 (en) | 2015-03-19 | 2018-02-06 | Plantir Technologies Inc. | System and method for comparing and visualizing data entities and data entity series |
US9891808B2 (en) | 2015-03-16 | 2018-02-13 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US9898509B2 (en) | 2015-08-28 | 2018-02-20 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US9898335B1 (en) | 2012-10-22 | 2018-02-20 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US9923925B2 (en) | 2014-02-20 | 2018-03-20 | Palantir Technologies Inc. | Cyber security sharing and identification system |
US20180089334A1 (en) * | 2016-09-26 | 2018-03-29 | Splunk Inc. | Managing process analytics across process components |
US9946738B2 (en) | 2014-11-05 | 2018-04-17 | Palantir Technologies, Inc. | Universal data pipeline |
US20180107959A1 (en) * | 2016-10-18 | 2018-04-19 | Dell Products L.P. | Managing project status using business intelligence and predictive analytics |
US9953445B2 (en) | 2013-05-07 | 2018-04-24 | Palantir Technologies Inc. | Interactive data object map |
US9965534B2 (en) | 2015-09-09 | 2018-05-08 | Palantir Technologies, Inc. | Domain-specific language for dataset transformations |
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US9984133B2 (en) | 2014-10-16 | 2018-05-29 | Palantir Technologies Inc. | Schematic and database linking system |
US9984387B2 (en) | 2013-03-13 | 2018-05-29 | Eversight, Inc. | Architecture and methods for promotion optimization |
US9996229B2 (en) | 2013-10-03 | 2018-06-12 | Palantir Technologies Inc. | Systems and methods for analyzing performance of an entity |
US9996595B2 (en) | 2015-08-03 | 2018-06-12 | Palantir Technologies, Inc. | Providing full data provenance visualization for versioned datasets |
US10037314B2 (en) | 2013-03-14 | 2018-07-31 | Palantir Technologies, Inc. | Mobile reports |
US10037383B2 (en) | 2013-11-11 | 2018-07-31 | Palantir Technologies, Inc. | Simple web search |
US10057718B2 (en) | 2015-05-01 | 2018-08-21 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
WO2018160177A1 (en) | 2017-03-01 | 2018-09-07 | Visa International Service Association | Predictive anomaly detection framework |
US10102369B2 (en) | 2015-08-19 | 2018-10-16 | Palantir Technologies Inc. | Checkout system executable code monitoring, and user account compromise determination system |
US10133741B2 (en) | 2014-02-13 | 2018-11-20 | Amazon Technologies, Inc. | Log data service in a virtual environment |
US10169081B2 (en) * | 2016-10-31 | 2019-01-01 | Oracle International Corporation | Use of concurrent time bucket generations for scalable scheduling of operations in a computer system |
US10180977B2 (en) | 2014-03-18 | 2019-01-15 | Palantir Technologies Inc. | Determining and extracting changed data from a data source |
US10180929B1 (en) | 2014-06-30 | 2019-01-15 | Palantir Technologies, Inc. | Systems and methods for identifying key phrase clusters within documents |
US10180863B2 (en) | 2016-10-31 | 2019-01-15 | Oracle International Corporation | Determining system information based on object mutation events |
US10191936B2 (en) | 2016-10-31 | 2019-01-29 | Oracle International Corporation | Two-tier storage protocol for committing changes in a storage system |
JP2019016173A (en) * | 2017-07-07 | 2019-01-31 | 株式会社日立製作所 | Data processing method, data processing device, and data processing program |
US10198515B1 (en) | 2013-12-10 | 2019-02-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US10216801B2 (en) | 2013-03-15 | 2019-02-26 | Palantir Technologies Inc. | Generating data clusters |
US10229284B2 (en) | 2007-02-21 | 2019-03-12 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10275778B1 (en) | 2013-03-15 | 2019-04-30 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures |
US10275177B2 (en) | 2016-10-31 | 2019-04-30 | Oracle International Corporation | Data layout schemas for seamless data migration |
US10296617B1 (en) | 2015-10-05 | 2019-05-21 | Palantir Technologies Inc. | Searches of highly structured data |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US10324609B2 (en) | 2016-07-21 | 2019-06-18 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10362133B1 (en) | 2014-12-22 | 2019-07-23 | Palantir Technologies Inc. | Communication data processing architecture |
US10366346B2 (en) | 2014-05-23 | 2019-07-30 | DataRobot, Inc. | Systems and techniques for determining the predictive value of a feature |
US10366335B2 (en) | 2012-08-31 | 2019-07-30 | DataRobot, Inc. | Systems and methods for symbolic analysis |
US10372879B2 (en) | 2014-12-31 | 2019-08-06 | Palantir Technologies Inc. | Medical claims lead summary report generation |
US10387900B2 (en) * | 2017-04-17 | 2019-08-20 | DataRobot, Inc. | Methods and apparatus for self-adaptive time series forecasting engine |
US10387834B2 (en) | 2015-01-21 | 2019-08-20 | Palantir Technologies Inc. | Systems and methods for accessing and storing snapshots of a remote application in a document |
US10403011B1 (en) | 2017-07-18 | 2019-09-03 | Palantir Technologies Inc. | Passing system with an interactive user interface |
US10417258B2 (en) | 2013-12-19 | 2019-09-17 | Exposit Labs, Inc. | Interactive multi-dimensional nested table supporting scalable real-time querying of large data volumes |
US10423582B2 (en) | 2011-06-23 | 2019-09-24 | Palantir Technologies, Inc. | System and method for investigating large amounts of data |
US10438230B2 (en) | 2013-03-13 | 2019-10-08 | Eversight, Inc. | Adaptive experimentation and optimization in automated promotional testing |
US10437612B1 (en) * | 2015-12-30 | 2019-10-08 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US10437840B1 (en) | 2016-08-19 | 2019-10-08 | Palantir Technologies Inc. | Focused probabilistic entity resolution from multiple data sources |
US10444941B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10452678B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Filter chains for exploring large data sets |
US10454889B2 (en) * | 2015-10-26 | 2019-10-22 | Oath Inc. | Automatic anomaly detection framework for grid resources |
US10460602B1 (en) | 2016-12-28 | 2019-10-29 | Palantir Technologies Inc. | Interactive vehicle information mapping system |
US10474820B2 (en) | 2014-06-17 | 2019-11-12 | Hewlett Packard Enterprise Development Lp | DNS based infection scores |
US10484407B2 (en) | 2015-08-06 | 2019-11-19 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US10496927B2 (en) | 2014-05-23 | 2019-12-03 | DataRobot, Inc. | Systems for time-series predictive data analytics, and related methods and apparatus |
US20200035001A1 (en) * | 2018-07-27 | 2020-01-30 | Vmware, Inc. | Visualization of anomalies in time series data |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US10558924B2 (en) | 2014-05-23 | 2020-02-11 | DataRobot, Inc. | Systems for second-order predictive data analytics, and related methods and apparatus |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10572496B1 (en) | 2014-07-03 | 2020-02-25 | Palantir Technologies Inc. | Distributed workflow system and database with access controls for city resiliency |
US10664535B1 (en) * | 2015-02-02 | 2020-05-26 | Amazon Technologies, Inc. | Retrieving log data from metric data |
CN111224823A (en) * | 2020-01-06 | 2020-06-02 | 杭州数群科技有限公司 | Method based on different network log analysis |
US10678860B1 (en) | 2015-12-17 | 2020-06-09 | Palantir Technologies, Inc. | Automatic generation of composite datasets based on hierarchical fields |
CN111291096A (en) * | 2020-03-03 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Data set construction method and device, storage medium and abnormal index detection method |
US10698938B2 (en) | 2016-03-18 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10706434B1 (en) | 2015-09-01 | 2020-07-07 | Palantir Technologies Inc. | Methods and systems for determining location information |
CN111414395A (en) * | 2020-03-27 | 2020-07-14 | 中国平安财产保险股份有限公司 | Data processing method, system and computer equipment |
US10719188B2 (en) | 2016-07-21 | 2020-07-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
US10733159B2 (en) | 2016-09-14 | 2020-08-04 | Oracle International Corporation | Maintaining immutable data and mutable metadata in a storage system |
US10754822B1 (en) | 2018-04-18 | 2020-08-25 | Palantir Technologies Inc. | Systems and methods for ontology migration |
US10764312B2 (en) | 2017-12-28 | 2020-09-01 | Microsoft Technology Licensing, Llc | Enhanced data aggregation techniques for anomaly detection and analysis |
CN111611519A (en) * | 2020-05-28 | 2020-09-01 | 上海观安信息技术股份有限公司 | Method and device for detecting personal abnormal behaviors |
US10795723B2 (en) | 2014-03-04 | 2020-10-06 | Palantir Technologies Inc. | Mobile tasks |
US10817513B2 (en) | 2013-03-14 | 2020-10-27 | Palantir Technologies Inc. | Fair scheduling for mixed-query loads |
US10839144B2 (en) | 2015-12-29 | 2020-11-17 | Palantir Technologies Inc. | Real-time document annotation |
US10846736B2 (en) | 2013-03-13 | 2020-11-24 | Eversight, Inc. | Linkage to reduce errors in online promotion testing |
US10853378B1 (en) | 2015-08-25 | 2020-12-01 | Palantir Technologies Inc. | Electronic note management via a connected entity graph |
US10860534B2 (en) | 2016-10-27 | 2020-12-08 | Oracle International Corporation | Executing a conditional command on an object stored in a storage system |
US10885021B1 (en) | 2018-05-02 | 2021-01-05 | Palantir Technologies Inc. | Interactive interpreter and graphical user interface |
US10909561B2 (en) | 2013-03-13 | 2021-02-02 | Eversight, Inc. | Systems and methods for democratized coupon redemption |
US10915912B2 (en) | 2013-03-13 | 2021-02-09 | Eversight, Inc. | Systems and methods for price testing and optimization in brick and mortar retailers |
US10956406B2 (en) | 2017-06-12 | 2021-03-23 | Palantir Technologies Inc. | Propagated deletion of database records and derived data |
US10956051B2 (en) | 2016-10-31 | 2021-03-23 | Oracle International Corporation | Data-packed storage containers for streamlined access and migration |
US10972332B2 (en) * | 2015-08-31 | 2021-04-06 | Adobe Inc. | Identifying factors that contribute to a metric anomaly |
CN112667723A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data acquisition method and terminal equipment |
US10984441B2 (en) | 2013-03-13 | 2021-04-20 | Eversight, Inc. | Systems and methods for intelligent promotion design with promotion selection |
US10984367B2 (en) | 2014-05-23 | 2021-04-20 | DataRobot, Inc. | Systems and techniques for predictive data analytics |
US11005727B2 (en) * | 2019-07-25 | 2021-05-11 | Vmware, Inc. | Visual overlays for network insights |
US11010214B2 (en) | 2005-07-25 | 2021-05-18 | Splunk Inc. | Identifying pattern relationships in machine data |
US11042591B2 (en) | 2015-06-23 | 2021-06-22 | Splunk Inc. | Analytical search engine |
US11086640B2 (en) * | 2015-12-30 | 2021-08-10 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US11113342B2 (en) * | 2015-06-23 | 2021-09-07 | Splunk Inc. | Techniques for compiling and presenting query results |
US11119630B1 (en) | 2018-06-19 | 2021-09-14 | Palantir Technologies Inc. | Artificial intelligence assisted evaluations and user interface for same |
US11138180B2 (en) | 2011-09-02 | 2021-10-05 | Palantir Technologies Inc. | Transaction protocol for reading database values |
US11150917B2 (en) | 2015-08-26 | 2021-10-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US20210345019A1 (en) * | 2018-10-01 | 2021-11-04 | Elemental Machines, Inc. | Method and Apparatus for Local Sensing |
US11188941B2 (en) | 2016-06-21 | 2021-11-30 | The Nielsen Company (Us), Llc | Methods and apparatus to collect and process browsing history |
US11270325B2 (en) | 2013-03-13 | 2022-03-08 | Eversight, Inc. | Systems and methods for collaborative offer generation |
US11271824B2 (en) | 2019-07-25 | 2022-03-08 | Vmware, Inc. | Visual overlays for network insights |
US11288698B2 (en) | 2013-03-13 | 2022-03-29 | Eversight, Inc. | Architecture and methods for generating intelligent offers with dynamic base prices |
US11288696B2 (en) | 2013-03-13 | 2022-03-29 | Eversight, Inc. | Systems and methods for efficient promotion experimentation for load to card |
US11340774B1 (en) * | 2014-10-09 | 2022-05-24 | Splunk Inc. | Anomaly detection based on a predicted value |
US20220171736A1 (en) * | 2014-07-09 | 2022-06-02 | Splunk Inc. | Managing datasets generated by search queries |
US11403119B2 (en) * | 2020-06-21 | 2022-08-02 | Apple Inc. | Declaratively defined user interface timeline views |
US11402979B1 (en) * | 2021-01-29 | 2022-08-02 | Splunk Inc. | Interactive expandable histogram timeline module for security flagged events |
US11570188B2 (en) * | 2015-12-28 | 2023-01-31 | Sixgill Ltd. | Dark web monitoring, analysis and alert system and method |
US11599369B1 (en) | 2018-03-08 | 2023-03-07 | Palantir Technologies Inc. | Graphical user interface configuration system |
US11726979B2 (en) | 2016-09-13 | 2023-08-15 | Oracle International Corporation | Determining a chronological order of transactions executed in relation to an object stored in a storage system |
US11734711B2 (en) | 2013-03-13 | 2023-08-22 | Eversight, Inc. | Systems and methods for intelligent promotion design with promotion scoring |
US11860971B2 (en) | 2018-05-24 | 2024-01-02 | International Business Machines Corporation | Anomaly detection |
US11941659B2 (en) | 2017-05-16 | 2024-03-26 | Maplebear Inc. | Systems and methods for intelligent promotion design with promotion scoring |
US20240105050A1 (en) * | 2022-09-28 | 2024-03-28 | Sumo Logic, Inc. | Alert response tool |
US11973784B1 (en) | 2017-11-27 | 2024-04-30 | Lacework, Inc. | Natural language interface for an anomaly detection framework |
US11991198B1 (en) | 2017-11-27 | 2024-05-21 | Lacework, Inc. | User-specific data-driven network security |
US12003799B2 (en) | 2013-04-24 | 2024-06-04 | The Nielsen Company (Us), Llc | Methods and apparatus to correlate census measurement data with panel data |
US12021888B1 (en) | 2017-11-27 | 2024-06-25 | Lacework, Inc. | Cloud infrastructure entitlement management by a data platform |
US12032634B1 (en) | 2019-12-23 | 2024-07-09 | Lacework Inc. | Graph reclustering based on different clustering criteria |
US12034750B1 (en) | 2017-11-27 | 2024-07-09 | Lacework Inc. | Tracking of user login sessions |
US12034754B2 (en) | 2017-11-27 | 2024-07-09 | Lacework, Inc. | Using static analysis for vulnerability detection |
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
US12095794B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Universal cloud data ingestion for stream processing |
US12095879B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Identifying encountered and unencountered conditions in software applications |
US12095796B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Instruction-level threat assessment |
US12126695B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Enhancing security of a cloud deployment based on learnings from other cloud deployments |
US12126643B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment |
US12130878B1 (en) | 2017-11-27 | 2024-10-29 | Fortinet, Inc. | Deduplication of monitored communications data in a cloud environment |
US12204845B2 (en) | 2016-07-21 | 2025-01-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
Families Citing this family (119)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9384112B2 (en) | 2010-07-01 | 2016-07-05 | Logrhythm, Inc. | Log collection, structuring and processing |
AU2011332881B2 (en) | 2010-11-24 | 2016-06-30 | LogRhythm Inc. | Advanced intelligence engine |
US9780995B2 (en) | 2010-11-24 | 2017-10-03 | Logrhythm, Inc. | Advanced intelligence engine |
US20120203592A1 (en) * | 2011-02-08 | 2012-08-09 | Balaji Ravindran | Methods, apparatus, and articles of manufacture to determine search engine market share |
US9191296B2 (en) * | 2011-02-24 | 2015-11-17 | International Business Machines Corporation | Network event management |
US9413559B2 (en) * | 2011-06-03 | 2016-08-09 | Adobe Systems Incorporated | Predictive analysis of network analytics |
US9100205B1 (en) * | 2011-07-20 | 2015-08-04 | Google Inc. | System for validating site configuration based on real-time analytics data |
US9313100B1 (en) | 2011-11-14 | 2016-04-12 | Amazon Technologies, Inc. | Remote browsing session management |
US8583486B2 (en) * | 2011-12-21 | 2013-11-12 | 22squared | Advertising and web site feedback systems and methods |
US9330188B1 (en) | 2011-12-22 | 2016-05-03 | Amazon Technologies, Inc. | Shared browsing sessions |
US9336321B1 (en) | 2012-01-26 | 2016-05-10 | Amazon Technologies, Inc. | Remote browsing and searching |
US9092405B1 (en) * | 2012-01-26 | 2015-07-28 | Amazon Technologies, Inc. | Remote browsing and searching |
US9720996B1 (en) * | 2012-04-20 | 2017-08-01 | Open Invention Network Llc | System dependencies tracking application |
US9471544B1 (en) | 2012-05-24 | 2016-10-18 | Google Inc. | Anomaly detection in a signal |
US9087306B2 (en) | 2012-07-13 | 2015-07-21 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US9244887B2 (en) * | 2012-07-13 | 2016-01-26 | Sas Institute Inc. | Computer-implemented systems and methods for efficient structuring of time series data |
US9628499B1 (en) | 2012-08-08 | 2017-04-18 | Google Inc. | Statistics-based anomaly detection |
US11257101B2 (en) | 2012-08-15 | 2022-02-22 | Alg, Inc. | System, method and computer program for improved forecasting residual values of a durable good over time |
WO2014028645A2 (en) * | 2012-08-15 | 2014-02-20 | Alg, Inc. | System, method and computer program for forecasting residual values of a durable good over time |
US10430814B2 (en) | 2012-08-15 | 2019-10-01 | Alg, Inc. | System, method and computer program for improved forecasting residual values of a durable good over time |
US9197511B2 (en) * | 2012-10-12 | 2015-11-24 | Adobe Systems Incorporated | Anomaly detection in network-site metrics using predictive modeling |
WO2014116542A1 (en) | 2013-01-22 | 2014-07-31 | Tealium Inc. | Activation of dormant features in native applications |
US9147218B2 (en) | 2013-03-06 | 2015-09-29 | Sas Institute Inc. | Devices for forecasting ratios in hierarchies |
US9030316B2 (en) * | 2013-03-12 | 2015-05-12 | Honeywell International Inc. | System and method of anomaly detection with categorical attributes |
US9614742B1 (en) * | 2013-03-14 | 2017-04-04 | Google Inc. | Anomaly detection in time series data |
US9218570B2 (en) | 2013-05-29 | 2015-12-22 | International Business Machines Corporation | Determining an anomalous state of a system at a future point in time |
US9934259B2 (en) | 2013-08-15 | 2018-04-03 | Sas Institute Inc. | In-memory time series database and processing in a distributed environment |
US11695845B2 (en) | 2013-08-30 | 2023-07-04 | Tealium Inc. | System and method for separating content site visitor profiles |
US20150066587A1 (en) | 2013-08-30 | 2015-03-05 | Tealium Inc. | Content site visitor processing system |
US8805946B1 (en) | 2013-08-30 | 2014-08-12 | Tealium Inc. | System and method for combining content site visitor profiles |
US9537964B2 (en) | 2015-03-11 | 2017-01-03 | Tealium Inc. | System and method for separating content site visitor profiles |
US8806361B1 (en) * | 2013-09-16 | 2014-08-12 | Splunk Inc. | Multi-lane time-synched visualizations of machine data events |
US9081789B2 (en) | 2013-10-28 | 2015-07-14 | Tealium Inc. | System for prefetching digital tags |
US8990298B1 (en) | 2013-11-05 | 2015-03-24 | Tealium Inc. | Universal visitor identification system |
US9584395B1 (en) * | 2013-11-13 | 2017-02-28 | Netflix, Inc. | Adaptive metric collection, storage, and alert thresholds |
US20150169392A1 (en) * | 2013-11-20 | 2015-06-18 | Superna Incorporated | System and method for providing an application programming interface intermediary for hypertext transfer protocol web services |
US9407676B2 (en) * | 2013-11-25 | 2016-08-02 | At&T Intellectual Property I, Lp | Method and apparatus for distributing media content |
US10489266B2 (en) | 2013-12-20 | 2019-11-26 | Micro Focus Llc | Generating a visualization of a metric at one or multiple levels of execution of a database workload |
US20160292233A1 (en) * | 2013-12-20 | 2016-10-06 | Hewlett Packard Enterprise Development Lp | Discarding data points in a time series |
WO2015094312A1 (en) | 2013-12-20 | 2015-06-25 | Hewlett-Packard Development Company, L.P. | Identifying a path in a workload that may be associated with a deviation |
US9692674B1 (en) | 2013-12-30 | 2017-06-27 | Google Inc. | Non-parametric change point detection |
CN104809051B (en) * | 2014-01-28 | 2017-11-14 | 国际商业机器公司 | Method and apparatus for predicting exception and failure in computer application |
US9569332B2 (en) * | 2014-02-03 | 2017-02-14 | Apigee Corporation | System and method for investigating anomalies in API processing systems |
US20150261830A1 (en) * | 2014-03-11 | 2015-09-17 | Adrian Capdefier | Automated data mining |
US9288256B2 (en) | 2014-04-11 | 2016-03-15 | Ensighten, Inc. | URL prefetching |
US10169720B2 (en) | 2014-04-17 | 2019-01-01 | Sas Institute Inc. | Systems and methods for machine learning using classifying, clustering, and grouping time series data |
CN105095614A (en) | 2014-04-18 | 2015-11-25 | 国际商业机器公司 | Method and device for updating prediction model |
US9892370B2 (en) | 2014-06-12 | 2018-02-13 | Sas Institute Inc. | Systems and methods for resolving over multiple hierarchies |
USD863328S1 (en) | 2014-09-18 | 2019-10-15 | Aetna Inc. | Display screen with graphical user interface |
USD839289S1 (en) | 2014-09-18 | 2019-01-29 | Aetna Inc. | Display screen with graphical user interface |
USD810768S1 (en) * | 2014-09-18 | 2018-02-20 | Aetna Inc. | Display screen with graphical user interface |
USD840422S1 (en) | 2014-09-18 | 2019-02-12 | Aetna Inc. | Display screen with graphical user interface |
US9208209B1 (en) | 2014-10-02 | 2015-12-08 | Sas Institute Inc. | Techniques for monitoring transformation techniques using control charts |
US10242101B2 (en) * | 2014-10-28 | 2019-03-26 | Adobe Inc. | Automatic identification of sources of web metric changes |
US9418339B1 (en) | 2015-01-26 | 2016-08-16 | Sas Institute, Inc. | Systems and methods for time series analysis techniques utilizing count data sets |
US10311045B2 (en) | 2015-01-26 | 2019-06-04 | Microsoft Technology Licensing, Llc | Aggregation/evaluation of heterogenic time series data |
US10380867B2 (en) * | 2015-01-30 | 2019-08-13 | Cisco Technology, Inc. | Alert management within a network based virtual collaborative space |
US10505818B1 (en) | 2015-05-05 | 2019-12-10 | F5 Networks. Inc. | Methods for analyzing and load balancing based on server health and devices thereof |
US10536539B2 (en) * | 2015-05-20 | 2020-01-14 | Oath Inc. | Data sessionization |
US10380339B1 (en) * | 2015-06-01 | 2019-08-13 | Amazon Technologies, Inc. | Reactively identifying software products exhibiting anomalous behavior |
JP2018525728A (en) * | 2015-07-14 | 2018-09-06 | サイオス テクノロジー コーポレーションSios Technology Corporation | A distributed machine learning analysis framework for analyzing streaming datasets from computer environments |
US10324943B2 (en) | 2015-08-10 | 2019-06-18 | Business Objects Software, Ltd. | Auto-monitoring and adjustment of dynamic data visualizations |
US10983682B2 (en) | 2015-08-27 | 2021-04-20 | Sas Institute Inc. | Interactive graphical user-interface for analyzing and manipulating time-series projections |
US20170061297A1 (en) * | 2015-08-31 | 2017-03-02 | Sas Institute Inc. | Automatically determining data sets for a three-stage predictor |
US10985993B2 (en) | 2015-09-16 | 2021-04-20 | Adobe Inc. | Identifying audiences that contribute to metric anomalies |
US10404777B2 (en) * | 2015-10-19 | 2019-09-03 | Adobe Inc. | Identifying sources of anomalies in multi-variable metrics using linearization |
US9509710B1 (en) | 2015-11-24 | 2016-11-29 | International Business Machines Corporation | Analyzing real-time streams of time-series data |
US10504026B2 (en) | 2015-12-01 | 2019-12-10 | Microsoft Technology Licensing, Llc | Statistical detection of site speed performance anomalies |
US10263833B2 (en) | 2015-12-01 | 2019-04-16 | Microsoft Technology Licensing, Llc | Root cause investigation of site speed performance anomalies |
US10171335B2 (en) * | 2015-12-01 | 2019-01-01 | Microsoft Technology Licensing, Llc | Analysis of site speed performance anomalies caused by server-side issues |
US10685306B2 (en) * | 2015-12-07 | 2020-06-16 | Sap Se | Advisor generating multi-representations of time series data |
US10740309B2 (en) | 2015-12-18 | 2020-08-11 | Cisco Technology, Inc. | Fast circular database |
US10949426B2 (en) | 2015-12-28 | 2021-03-16 | Salesforce.Com, Inc. | Annotating time series data points with alert information |
US10776506B2 (en) * | 2015-12-28 | 2020-09-15 | Salesforce.Com, Inc. | Self-monitoring time series database system that enforces usage policies |
US10659333B2 (en) | 2016-03-24 | 2020-05-19 | Cisco Technology, Inc. | Detection and analysis of seasonal network patterns for anomaly detection |
US9661384B1 (en) | 2016-04-05 | 2017-05-23 | Arris Enterprises Llc | Trick play user activity reconstruction |
US10372524B2 (en) * | 2016-07-28 | 2019-08-06 | Western Digital Technologies, Inc. | Storage anomaly detection |
CN107819793B (en) * | 2016-09-12 | 2019-03-12 | 北京百度网讯科技有限公司 | Collecting method and device for robot operating system |
US10223175B2 (en) | 2016-10-10 | 2019-03-05 | International Business Machines Corporation | Modifying a device based on an annotated time series of sensor readings |
US10681012B2 (en) | 2016-10-26 | 2020-06-09 | Ping Identity Corporation | Methods and systems for deep learning based API traffic security |
US10484415B1 (en) * | 2016-12-16 | 2019-11-19 | Worldpay, Llc | Systems and methods for detecting security risks in network pages |
US11010260B1 (en) * | 2016-12-30 | 2021-05-18 | EMC IP Holding Company LLC | Generating a data protection risk assessment score for a backup and recovery storage system |
JP6661559B2 (en) * | 2017-02-03 | 2020-03-11 | 株式会社東芝 | Error detection device, error detection method and program |
US10445117B2 (en) * | 2017-02-24 | 2019-10-15 | Genband Us Llc | Predictive analytics for virtual network functions |
US10541866B2 (en) * | 2017-07-25 | 2020-01-21 | Cisco Technology, Inc. | Detecting and resolving multicast traffic performance issues |
US10630546B2 (en) * | 2017-09-22 | 2020-04-21 | Servicenow, Inc. | Distributed tool for detecting states and state transitions in remote network management platforms |
US10699010B2 (en) | 2017-10-13 | 2020-06-30 | Ping Identity Corporation | Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions |
US10789240B2 (en) * | 2017-11-06 | 2020-09-29 | Google Llc | Duplicative data detection |
US10331490B2 (en) | 2017-11-16 | 2019-06-25 | Sas Institute Inc. | Scalable cloud-based time series analysis |
US20190195742A1 (en) * | 2017-12-22 | 2019-06-27 | Schneider Electric Software, Llc | Automated detection of anomalous industrial process operation |
US10547518B2 (en) * | 2018-01-26 | 2020-01-28 | Cisco Technology, Inc. | Detecting transient vs. perpetual network behavioral patterns using machine learning |
US10338994B1 (en) | 2018-02-22 | 2019-07-02 | Sas Institute Inc. | Predicting and adjusting computer functionality to avoid failures |
CA3036445A1 (en) | 2018-03-12 | 2019-09-12 | Royal Bank Of Canada | Method for anomaly detection in clustered data structures |
US10255085B1 (en) | 2018-03-13 | 2019-04-09 | Sas Institute Inc. | Interactive graphical user interface with override guidance |
US10819560B2 (en) * | 2018-03-29 | 2020-10-27 | Servicenow, Inc. | Alert management system and method of using alert context-based alert rules |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
CN109033404B (en) * | 2018-08-03 | 2022-03-11 | 北京百度网讯科技有限公司 | Log data processing method, device and system |
US11537109B2 (en) | 2018-08-07 | 2022-12-27 | Aveva Software, Llc | Server and system for automatic selection of tags for modeling and anomaly detection |
WO2020122287A1 (en) * | 2018-12-13 | 2020-06-18 | 주식회사 알고리고 | Apparatus and method for identifying abnormal data by using minute distribution change |
US11496475B2 (en) | 2019-01-04 | 2022-11-08 | Ping Identity Corporation | Methods and systems for data traffic based adaptive security |
US11200607B2 (en) * | 2019-01-28 | 2021-12-14 | Walmart Apollo, Llc | Methods and apparatus for anomaly detections |
US10764386B1 (en) | 2019-02-15 | 2020-09-01 | Citrix Systems, Inc. | Activity detection in web applications |
US11803773B2 (en) * | 2019-07-30 | 2023-10-31 | EMC IP Holding Company LLC | Machine learning-based anomaly detection using time series decomposition |
US11146656B2 (en) | 2019-12-20 | 2021-10-12 | Tealium Inc. | Feature activation control and data prefetching with network-connected mobile devices |
GB2592421B (en) * | 2020-02-27 | 2022-03-02 | Crfs Ltd | Real-time data processing |
CN111400284B (en) * | 2020-03-20 | 2023-09-12 | 广州咨元信息科技有限公司 | Method for establishing dynamic anomaly detection model based on performance data |
US11574461B2 (en) | 2020-04-07 | 2023-02-07 | Nec Corporation | Time-series based analytics using video streams |
US11675799B2 (en) * | 2020-05-05 | 2023-06-13 | International Business Machines Corporation | Anomaly detection system |
CN114116659B (en) * | 2020-08-26 | 2024-11-22 | 白腊梅 | A method and device for detecting the running status of a database server |
US12032538B2 (en) | 2021-04-23 | 2024-07-09 | Capital One Services, Llc | Anomaly detection in a split timeseries dataset |
WO2022226216A1 (en) * | 2021-04-23 | 2022-10-27 | Capital One Services, Llc | Automatic model selection for a time series |
US11640387B2 (en) | 2021-04-23 | 2023-05-02 | Capital One Services, Llc | Anomaly detection data workflow for time series data |
US11789915B2 (en) * | 2021-04-23 | 2023-10-17 | Capital One Services, Llc | Automatic model selection for a time series |
US20230071667A1 (en) * | 2021-09-08 | 2023-03-09 | SparkCognition, Inc. | Neural network input embedding including a positional embedding and a temporal embedding for time-series data prediction |
US11599442B1 (en) | 2021-11-29 | 2023-03-07 | International Business Machines Corporation | Detecting abnormal database activity |
US12155631B2 (en) * | 2022-02-24 | 2024-11-26 | Uab 360 It | Adjusting data communication in a virtual private network environment |
US12118084B2 (en) * | 2022-08-25 | 2024-10-15 | Capital One Services, Llc | Automatic selection of data for target monitoring |
US12111848B2 (en) | 2023-01-30 | 2024-10-08 | Cerner Innovation, Inc. | Active management of files being processed in enterprise data warehouses utilizing time series predictions |
Citations (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870746A (en) * | 1995-10-12 | 1999-02-09 | Ncr Corporation | System and method for segmenting a database based upon data attributes |
US6317787B1 (en) * | 1998-08-11 | 2001-11-13 | Webtrends Corporation | System and method for analyzing web-server log files |
US6581054B1 (en) * | 1999-07-30 | 2003-06-17 | Computer Associates Think, Inc. | Dynamic query model and method |
US6604095B1 (en) * | 1999-09-21 | 2003-08-05 | International Business Machines Corporation | Method, system, program, and data structure for pivoting columns in a database table |
US20030158795A1 (en) * | 2001-12-28 | 2003-08-21 | Kimberly-Clark Worldwide, Inc. | Quality management and intelligent manufacturing with labels and smart tags in event-based product manufacturing |
US20030187719A1 (en) * | 2002-03-29 | 2003-10-02 | Brocklebank John C. | Computer-implemented system and method for web activity assessment |
US20040054784A1 (en) * | 2002-09-16 | 2004-03-18 | International Business Machines Corporation | Method, system and program product for tracking web user sessions |
US6850933B2 (en) * | 2001-11-15 | 2005-02-01 | Microsoft Corporation | System and method for optimizing queries using materialized views and fast view matching |
US20050114206A1 (en) * | 2003-11-25 | 2005-05-26 | Dominic Bennett | Database structure and front end |
US6925442B1 (en) * | 1999-01-29 | 2005-08-02 | Elijahu Shapira | Method and apparatus for evaluating vistors to a web server |
US6975963B2 (en) * | 2002-09-30 | 2005-12-13 | Mcdata Corporation | Method and system for storing and reporting network performance metrics using histograms |
US20060074905A1 (en) * | 2004-09-17 | 2006-04-06 | Become, Inc. | Systems and methods of retrieving topic specific information |
US20060184508A1 (en) * | 2001-05-01 | 2006-08-17 | Fuselier Christopher S | Methods and system for providing context sensitive information |
US20060189330A1 (en) * | 2005-01-28 | 2006-08-24 | Nelson Ellen M | Method for presentation of multiple graphical displays in operations support systems |
US20070112607A1 (en) * | 2005-11-16 | 2007-05-17 | Microsoft Corporation | Score-based alerting in business logic |
US7310590B1 (en) * | 2006-11-15 | 2007-12-18 | Computer Associates Think, Inc. | Time series anomaly detection using multiple statistical models |
US20080140524A1 (en) * | 2006-12-12 | 2008-06-12 | Shubhasheesh Anand | System for generating a smart advertisement based on a dynamic file and a configuration file |
US20080184116A1 (en) * | 2007-01-31 | 2008-07-31 | Error Christopher R | User Simulation for Viewing Web Analytics Data |
US20080235075A1 (en) * | 2007-03-23 | 2008-09-25 | Fmr Corp. | Enterprise application performance monitors |
US20080275980A1 (en) * | 2007-05-04 | 2008-11-06 | Hansen Eric J | Method and system for testing variations of website content |
US7464122B1 (en) * | 2000-07-11 | 2008-12-09 | Revenue Science, Inc. | Parsing navigation information to identify occurrences of events of interest |
US20090063549A1 (en) * | 2007-08-20 | 2009-03-05 | Oracle International Corporation | Enterprise structure configurator |
US7523191B1 (en) * | 2000-06-02 | 2009-04-21 | Yahoo! Inc. | System and method for monitoring user interaction with web pages |
US20090198724A1 (en) * | 2008-02-05 | 2009-08-06 | Mikko Valimaki | System and method for conducting network analytics |
US7627572B2 (en) * | 2007-05-02 | 2009-12-01 | Mypoints.Com Inc. | Rule-based dry run methodology in an information management system |
US20100030544A1 (en) * | 2008-07-31 | 2010-02-04 | Mazu Networks, Inc. | Detecting Outliers in Network Traffic Time Series |
US7669212B2 (en) * | 2001-02-02 | 2010-02-23 | Opentv, Inc. | Service platform suite management system |
US7716011B2 (en) * | 2007-02-28 | 2010-05-11 | Microsoft Corporation | Strategies for identifying anomalies in time-series data |
US20100205029A1 (en) * | 2009-02-11 | 2010-08-12 | Content Galaxy Inc. | System for digital commerce and method of secure, automated crediting of publishers, editors, content providers, and affiliates |
US20100287146A1 (en) * | 2009-05-11 | 2010-11-11 | Dean Skelton | System and method for change analytics based forecast and query optimization and impact identification in a variance-based forecasting system with visualization |
US20110035257A1 (en) * | 2009-08-06 | 2011-02-10 | Rajendra Singh Solanki | Systems And Methods For Generating Planograms In The Presence Of Multiple Objectives |
US7895012B2 (en) * | 2005-05-03 | 2011-02-22 | Hewlett-Packard Development Company, L.P. | Systems and methods for organizing and storing data |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR0141767B1 (en) * | 1994-04-25 | 1998-07-01 | 이헌조 | A digital signal processor's form / depot device |
US6286011B1 (en) * | 1997-04-30 | 2001-09-04 | Bellsouth Corporation | System and method for recording transactions using a chronological list superimposed on an indexed list |
US7966078B2 (en) * | 1999-02-01 | 2011-06-21 | Steven Hoffberg | Network media appliance system and method |
US7610289B2 (en) * | 2000-10-04 | 2009-10-27 | Google Inc. | System and method for monitoring and analyzing internet traffic |
US6792458B1 (en) * | 1999-10-04 | 2004-09-14 | Urchin Software Corporation | System and method for monitoring and analyzing internet traffic |
US6392668B1 (en) * | 1999-11-12 | 2002-05-21 | Kendara, Inc. | Client-side system and method for network link differentiation |
US7024383B1 (en) * | 2000-01-31 | 2006-04-04 | Goldman, Sachs & Co. | Online sales risk management system |
WO2002027528A1 (en) | 2000-09-25 | 2002-04-04 | Metaedge Corporation | Method and system for managing event attributes |
EP1435058A4 (en) * | 2001-10-11 | 2005-12-07 | Visualsciences Llc | System, method, and computer program product for processing and visualization of information |
US7076543B1 (en) * | 2002-02-13 | 2006-07-11 | Cisco Technology, Inc. | Method and apparatus for collecting, aggregating and monitoring network management information |
US7085682B1 (en) * | 2002-09-18 | 2006-08-01 | Doubleclick Inc. | System and method for analyzing website activity |
US7567964B2 (en) * | 2003-05-08 | 2009-07-28 | Oracle International Corporation | Configurable search graphical user interface and engine |
US6873184B1 (en) * | 2003-09-03 | 2005-03-29 | Advanced Micro Devices, Inc. | Circular buffer using grouping for find first function |
US7457872B2 (en) * | 2003-10-15 | 2008-11-25 | Microsoft Corporation | On-line service/application monitoring and reporting system |
US7765247B2 (en) * | 2003-11-24 | 2010-07-27 | Computer Associates Think, Inc. | System and method for removing rows from directory tables |
US7673340B1 (en) * | 2004-06-02 | 2010-03-02 | Clickfox Llc | System and method for analyzing system user behavior |
KR100695467B1 (en) | 2004-08-10 | 2007-03-15 | 재성 안 | Web Usability Evaluation Method and Its Evaluation System |
US20060085741A1 (en) * | 2004-10-20 | 2006-04-20 | Viewfour, Inc. A Delaware Corporation | Method and apparatus to view multiple web pages simultaneously from network based search |
US8538969B2 (en) * | 2005-06-03 | 2013-09-17 | Adobe Systems Incorporated | Data format for website traffic statistics |
US8341259B2 (en) | 2005-06-06 | 2012-12-25 | Adobe Systems Incorporated | ASP for web analytics including a real-time segmentation workbench |
US7761457B2 (en) | 2005-06-06 | 2010-07-20 | Adobe Systems Incorporated | Creation of segmentation definitions |
US7523149B1 (en) * | 2006-05-23 | 2009-04-21 | Symantec Operating Corporation | System and method for continuous protection of working set data using a local independent staging device |
US8271310B2 (en) * | 2006-12-20 | 2012-09-18 | Microsoft Corporation | Virtualizing consumer behavior as a financial instrument |
US8655918B2 (en) * | 2007-10-26 | 2014-02-18 | International Business Machines Corporation | System and method of transforming data for use in data analysis tools |
US8307101B1 (en) * | 2007-12-13 | 2012-11-06 | Google Inc. | Generic format for storage and query of web analytics data |
US8341540B1 (en) * | 2009-02-03 | 2012-12-25 | Amazon Technologies, Inc. | Visualizing object behavior |
US8255523B1 (en) * | 2009-04-24 | 2012-08-28 | Google Inc. | Server side disambiguation of ambiguous statistics |
US20110035272A1 (en) * | 2009-08-05 | 2011-02-10 | Yahoo! Inc. | Feature-value recommendations for advertisement campaign performance improvement |
US8412719B1 (en) * | 2009-09-02 | 2013-04-02 | Google Inc. | Method and system for segmenting a multidimensional dataset |
US8359313B2 (en) * | 2009-10-20 | 2013-01-22 | Google Inc. | Extensible custom variables for tracking user traffic |
US8583584B2 (en) * | 2009-10-20 | 2013-11-12 | Google Inc. | Method and system for using web analytics data for detecting anomalies |
US8554699B2 (en) * | 2009-10-20 | 2013-10-08 | Google Inc. | Method and system for detecting anomalies in time series data |
US8458424B2 (en) * | 2010-02-08 | 2013-06-04 | Hitachi, Ltd. | Storage system for reallocating data in virtual volumes and methods of the same |
US9401965B2 (en) * | 2010-12-09 | 2016-07-26 | Google Inc. | Correlating user interactions with interfaces |
-
2010
- 2010-10-19 US US12/907,957 patent/US8554699B2/en active Active
- 2010-10-19 US US12/907,916 patent/US20110119100A1/en not_active Abandoned
-
2013
- 2013-09-10 US US14/023,061 patent/US8682816B2/en active Active
Patent Citations (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5870746A (en) * | 1995-10-12 | 1999-02-09 | Ncr Corporation | System and method for segmenting a database based upon data attributes |
US6317787B1 (en) * | 1998-08-11 | 2001-11-13 | Webtrends Corporation | System and method for analyzing web-server log files |
US6925442B1 (en) * | 1999-01-29 | 2005-08-02 | Elijahu Shapira | Method and apparatus for evaluating vistors to a web server |
US6581054B1 (en) * | 1999-07-30 | 2003-06-17 | Computer Associates Think, Inc. | Dynamic query model and method |
US6604095B1 (en) * | 1999-09-21 | 2003-08-05 | International Business Machines Corporation | Method, system, program, and data structure for pivoting columns in a database table |
US7523191B1 (en) * | 2000-06-02 | 2009-04-21 | Yahoo! Inc. | System and method for monitoring user interaction with web pages |
US7464122B1 (en) * | 2000-07-11 | 2008-12-09 | Revenue Science, Inc. | Parsing navigation information to identify occurrences of events of interest |
US7669212B2 (en) * | 2001-02-02 | 2010-02-23 | Opentv, Inc. | Service platform suite management system |
US20060184508A1 (en) * | 2001-05-01 | 2006-08-17 | Fuselier Christopher S | Methods and system for providing context sensitive information |
US6850933B2 (en) * | 2001-11-15 | 2005-02-01 | Microsoft Corporation | System and method for optimizing queries using materialized views and fast view matching |
US20030158795A1 (en) * | 2001-12-28 | 2003-08-21 | Kimberly-Clark Worldwide, Inc. | Quality management and intelligent manufacturing with labels and smart tags in event-based product manufacturing |
US20030187719A1 (en) * | 2002-03-29 | 2003-10-02 | Brocklebank John C. | Computer-implemented system and method for web activity assessment |
US20040054784A1 (en) * | 2002-09-16 | 2004-03-18 | International Business Machines Corporation | Method, system and program product for tracking web user sessions |
US6975963B2 (en) * | 2002-09-30 | 2005-12-13 | Mcdata Corporation | Method and system for storing and reporting network performance metrics using histograms |
US20050114206A1 (en) * | 2003-11-25 | 2005-05-26 | Dominic Bennett | Database structure and front end |
US20060074910A1 (en) * | 2004-09-17 | 2006-04-06 | Become, Inc. | Systems and methods of retrieving topic specific information |
US20060074905A1 (en) * | 2004-09-17 | 2006-04-06 | Become, Inc. | Systems and methods of retrieving topic specific information |
US20060189330A1 (en) * | 2005-01-28 | 2006-08-24 | Nelson Ellen M | Method for presentation of multiple graphical displays in operations support systems |
US7895012B2 (en) * | 2005-05-03 | 2011-02-22 | Hewlett-Packard Development Company, L.P. | Systems and methods for organizing and storing data |
US20070112607A1 (en) * | 2005-11-16 | 2007-05-17 | Microsoft Corporation | Score-based alerting in business logic |
US7310590B1 (en) * | 2006-11-15 | 2007-12-18 | Computer Associates Think, Inc. | Time series anomaly detection using multiple statistical models |
US20080140524A1 (en) * | 2006-12-12 | 2008-06-12 | Shubhasheesh Anand | System for generating a smart advertisement based on a dynamic file and a configuration file |
US20080184116A1 (en) * | 2007-01-31 | 2008-07-31 | Error Christopher R | User Simulation for Viewing Web Analytics Data |
US7716011B2 (en) * | 2007-02-28 | 2010-05-11 | Microsoft Corporation | Strategies for identifying anomalies in time-series data |
US20080235075A1 (en) * | 2007-03-23 | 2008-09-25 | Fmr Corp. | Enterprise application performance monitors |
US7627572B2 (en) * | 2007-05-02 | 2009-12-01 | Mypoints.Com Inc. | Rule-based dry run methodology in an information management system |
US20080275980A1 (en) * | 2007-05-04 | 2008-11-06 | Hansen Eric J | Method and system for testing variations of website content |
US20090063549A1 (en) * | 2007-08-20 | 2009-03-05 | Oracle International Corporation | Enterprise structure configurator |
US20090198724A1 (en) * | 2008-02-05 | 2009-08-06 | Mikko Valimaki | System and method for conducting network analytics |
US20100030544A1 (en) * | 2008-07-31 | 2010-02-04 | Mazu Networks, Inc. | Detecting Outliers in Network Traffic Time Series |
US20100205029A1 (en) * | 2009-02-11 | 2010-08-12 | Content Galaxy Inc. | System for digital commerce and method of secure, automated crediting of publishers, editors, content providers, and affiliates |
US20100287146A1 (en) * | 2009-05-11 | 2010-11-11 | Dean Skelton | System and method for change analytics based forecast and query optimization and impact identification in a variance-based forecasting system with visualization |
US20110035257A1 (en) * | 2009-08-06 | 2011-02-10 | Rajendra Singh Solanki | Systems And Methods For Generating Planograms In The Presence Of Multiple Objectives |
Cited By (352)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12130842B2 (en) | 2005-07-25 | 2024-10-29 | Cisco Technology, Inc. | Segmenting machine data into events |
US11599400B2 (en) | 2005-07-25 | 2023-03-07 | Splunk Inc. | Segmenting machine data into events based on source signatures |
US11663244B2 (en) | 2005-07-25 | 2023-05-30 | Splunk Inc. | Segmenting machine data into events to identify matching events |
US11119833B2 (en) | 2005-07-25 | 2021-09-14 | Splunk Inc. | Identifying behavioral patterns of events derived from machine data that reveal historical behavior of an information technology environment |
US11010214B2 (en) | 2005-07-25 | 2021-05-18 | Splunk Inc. | Identifying pattern relationships in machine data |
US11126477B2 (en) | 2005-07-25 | 2021-09-21 | Splunk Inc. | Identifying matching event data from disparate data sources |
US11036567B2 (en) | 2005-07-25 | 2021-06-15 | Splunk Inc. | Determining system behavior using event patterns in machine data |
US11204817B2 (en) | 2005-07-25 | 2021-12-21 | Splunk Inc. | Deriving signature-based rules for creating events from machine data |
US10229284B2 (en) | 2007-02-21 | 2019-03-12 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10719621B2 (en) | 2007-02-21 | 2020-07-21 | Palantir Technologies Inc. | Providing unique views of data based on changes or rules |
US10248294B2 (en) | 2008-09-15 | 2019-04-02 | Palantir Technologies, Inc. | Modal-less interface enhancements |
US10747952B2 (en) | 2008-09-15 | 2020-08-18 | Palantir Technologies, Inc. | Automatic creation and server push of multiple distinct drafts |
US9383911B2 (en) | 2008-09-15 | 2016-07-05 | Palantir Technologies, Inc. | Modal-less interface enhancements |
US8682816B2 (en) * | 2009-10-20 | 2014-03-25 | Google Inc. | Method and system for detecting anomalies in time series data |
US9250759B1 (en) * | 2010-07-23 | 2016-02-02 | Amazon Technologies, Inc. | Visual representation of user-node interactions |
US10423582B2 (en) | 2011-06-23 | 2019-09-24 | Palantir Technologies, Inc. | System and method for investigating large amounts of data |
US11392550B2 (en) | 2011-06-23 | 2022-07-19 | Palantir Technologies Inc. | System and method for investigating large amounts of data |
US9880987B2 (en) | 2011-08-25 | 2018-01-30 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US10706220B2 (en) | 2011-08-25 | 2020-07-07 | Palantir Technologies, Inc. | System and method for parameterizing documents for automatic workflow generation |
US11138180B2 (en) | 2011-09-02 | 2021-10-05 | Palantir Technologies Inc. | Transaction protocol for reading database values |
US8217945B1 (en) | 2011-09-02 | 2012-07-10 | Metric Insights, Inc. | Social annotation of a single evolving visual representation of a changing dataset |
US9396444B2 (en) * | 2011-12-22 | 2016-07-19 | Adobe Systems Incorporated | Predictive analytics with forecasting model selection |
US20140303953A1 (en) * | 2011-12-22 | 2014-10-09 | John Bates | Predictive Analytics with Forecasting Model Selection |
US8924797B2 (en) | 2012-04-16 | 2014-12-30 | Hewlett-Packard Developmet Company, L.P. | Identifying a dimension associated with an abnormal condition |
US20140068411A1 (en) * | 2012-08-31 | 2014-03-06 | Scott Ross | Methods and apparatus to monitor usage of internet advertising networks |
US10366335B2 (en) | 2012-08-31 | 2019-07-30 | DataRobot, Inc. | Systems and methods for symbolic analysis |
US9898335B1 (en) | 2012-10-22 | 2018-02-20 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US11182204B2 (en) | 2012-10-22 | 2021-11-23 | Palantir Technologies Inc. | System and method for batch evaluation programs |
US9380431B1 (en) | 2013-01-31 | 2016-06-28 | Palantir Technologies, Inc. | Use of teams in a mobile application |
US10743133B2 (en) | 2013-01-31 | 2020-08-11 | Palantir Technologies Inc. | Populating property values of event objects of an object-centric data model using image metadata |
US9123086B1 (en) | 2013-01-31 | 2015-09-01 | Palantir Technologies, Inc. | Automatically generating event objects from images |
US10313833B2 (en) | 2013-01-31 | 2019-06-04 | Palantir Technologies Inc. | Populating property values of event objects of an object-centric data model using image metadata |
US20140222476A1 (en) * | 2013-02-06 | 2014-08-07 | Verint Systems Ltd. | Anomaly Detection in Interaction Data |
US10438230B2 (en) | 2013-03-13 | 2019-10-08 | Eversight, Inc. | Adaptive experimentation and optimization in automated promotional testing |
US10846736B2 (en) | 2013-03-13 | 2020-11-24 | Eversight, Inc. | Linkage to reduce errors in online promotion testing |
US11288698B2 (en) | 2013-03-13 | 2022-03-29 | Eversight, Inc. | Architecture and methods for generating intelligent offers with dynamic base prices |
US9984387B2 (en) | 2013-03-13 | 2018-05-29 | Eversight, Inc. | Architecture and methods for promotion optimization |
US20170017990A1 (en) * | 2013-03-13 | 2017-01-19 | Jacob Solotaroff | Promotion offer language and methods thereof |
US10984441B2 (en) | 2013-03-13 | 2021-04-20 | Eversight, Inc. | Systems and methods for intelligent promotion design with promotion selection |
US11270325B2 (en) | 2013-03-13 | 2022-03-08 | Eversight, Inc. | Systems and methods for collaborative offer generation |
US11636504B2 (en) | 2013-03-13 | 2023-04-25 | Eversight, Inc. | Systems and methods for collaborative offer generation |
US12014389B2 (en) | 2013-03-13 | 2024-06-18 | Maplebear Inc. | Systems and methods for collaborative offer generation |
US11288696B2 (en) | 2013-03-13 | 2022-03-29 | Eversight, Inc. | Systems and methods for efficient promotion experimentation for load to card |
US11138628B2 (en) * | 2013-03-13 | 2021-10-05 | Eversight, Inc. | Promotion offer language and methods thereof |
US10915912B2 (en) | 2013-03-13 | 2021-02-09 | Eversight, Inc. | Systems and methods for price testing and optimization in brick and mortar retailers |
US11699167B2 (en) | 2013-03-13 | 2023-07-11 | Maplebear Inc. | Systems and methods for intelligent promotion design with promotion selection |
US11734711B2 (en) | 2013-03-13 | 2023-08-22 | Eversight, Inc. | Systems and methods for intelligent promotion design with promotion scoring |
US10909561B2 (en) | 2013-03-13 | 2021-02-02 | Eversight, Inc. | Systems and methods for democratized coupon redemption |
US20140280867A1 (en) * | 2013-03-14 | 2014-09-18 | Novell, Inc. | Analytic injection |
US10037314B2 (en) | 2013-03-14 | 2018-07-31 | Palantir Technologies, Inc. | Mobile reports |
US9843490B2 (en) * | 2013-03-14 | 2017-12-12 | Netiq Corporation | Methods and systems for analytic code injection |
US10997363B2 (en) | 2013-03-14 | 2021-05-04 | Palantir Technologies Inc. | Method of generating objects and links from mobile reports |
US10817513B2 (en) | 2013-03-14 | 2020-10-27 | Palantir Technologies Inc. | Fair scheduling for mixed-query loads |
US10453229B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Generating object time series from data objects |
US9852205B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | Time-sensitive cube |
US10482097B2 (en) | 2013-03-15 | 2019-11-19 | Palantir Technologies Inc. | System and method for generating event visualizations |
US10977279B2 (en) | 2013-03-15 | 2021-04-13 | Palantir Technologies Inc. | Time-sensitive cube |
US10452678B2 (en) | 2013-03-15 | 2019-10-22 | Palantir Technologies Inc. | Filter chains for exploring large data sets |
US10216801B2 (en) | 2013-03-15 | 2019-02-26 | Palantir Technologies Inc. | Generating data clusters |
US20140267295A1 (en) * | 2013-03-15 | 2014-09-18 | Palantir Technologies, Inc. | Object time series |
US9852195B2 (en) | 2013-03-15 | 2017-12-26 | Palantir Technologies Inc. | System and method for generating event visualizations |
US9646396B2 (en) * | 2013-03-15 | 2017-05-09 | Palantir Technologies Inc. | Generating object time series and data objects |
US10264014B2 (en) | 2013-03-15 | 2019-04-16 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures |
US10275778B1 (en) | 2013-03-15 | 2019-04-30 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures |
US8917274B2 (en) | 2013-03-15 | 2014-12-23 | Palantir Technologies Inc. | Event matrix based on integrated data |
US8937619B2 (en) * | 2013-03-15 | 2015-01-20 | Palantir Technologies Inc. | Generating an object time series from data objects |
US9779525B2 (en) | 2013-03-15 | 2017-10-03 | Palantir Technologies Inc. | Generating object time series from data objects |
US20150254878A1 (en) * | 2013-03-15 | 2015-09-10 | Palantir Technologies Inc. | Generating object time series and data objects |
US9965937B2 (en) | 2013-03-15 | 2018-05-08 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US20140289003A1 (en) * | 2013-03-25 | 2014-09-25 | Amadeus S.A.S. | Methods and systems for detecting anomaly in passenger flow |
US12003799B2 (en) | 2013-04-24 | 2024-06-04 | The Nielsen Company (Us), Llc | Methods and apparatus to correlate census measurement data with panel data |
US12184917B2 (en) | 2013-04-24 | 2024-12-31 | The Nielsen Company (Us), Llc | Methods and apparatus to correlate census measurement data with panel data |
US12063402B2 (en) | 2013-04-24 | 2024-08-13 | The Nielsen Company (Us), Llc | Methods and apparatus to correlate census measurement data with panel data |
US10360705B2 (en) | 2013-05-07 | 2019-07-23 | Palantir Technologies Inc. | Interactive data object map |
US9953445B2 (en) | 2013-05-07 | 2018-04-24 | Palantir Technologies Inc. | Interactive data object map |
US10976892B2 (en) | 2013-08-08 | 2021-04-13 | Palantir Technologies Inc. | Long click display of a context menu |
US9223773B2 (en) | 2013-08-08 | 2015-12-29 | Palatir Technologies Inc. | Template system for custom document generation |
US9335897B2 (en) | 2013-08-08 | 2016-05-10 | Palantir Technologies Inc. | Long click display of a context menu |
US10699071B2 (en) | 2013-08-08 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for template based custom document generation |
US10545655B2 (en) | 2013-08-09 | 2020-01-28 | Palantir Technologies Inc. | Context-sensitive views |
US9921734B2 (en) | 2013-08-09 | 2018-03-20 | Palantir Technologies Inc. | Context-sensitive views |
US9557882B2 (en) | 2013-08-09 | 2017-01-31 | Palantir Technologies Inc. | Context-sensitive views |
US20150073894A1 (en) * | 2013-09-06 | 2015-03-12 | Metamarkets Group Inc. | Suspect Anomaly Detection and Presentation within Context |
US9785317B2 (en) | 2013-09-24 | 2017-10-10 | Palantir Technologies Inc. | Presentation and analysis of user interaction data |
US10732803B2 (en) | 2013-09-24 | 2020-08-04 | Palantir Technologies Inc. | Presentation and analysis of user interaction data |
US9996229B2 (en) | 2013-10-03 | 2018-06-12 | Palantir Technologies Inc. | Systems and methods for analyzing performance of an entity |
US9864493B2 (en) | 2013-10-07 | 2018-01-09 | Palantir Technologies Inc. | Cohort-based presentation of user interaction data |
US10635276B2 (en) | 2013-10-07 | 2020-04-28 | Palantir Technologies Inc. | Cohort-based presentation of user interaction data |
US10877638B2 (en) | 2013-10-18 | 2020-12-29 | Palantir Technologies Inc. | Overview user interface of emergency call data of a law enforcement agency |
US9116975B2 (en) | 2013-10-18 | 2015-08-25 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US10042524B2 (en) | 2013-10-18 | 2018-08-07 | Palantir Technologies Inc. | Overview user interface of emergency call data of a law enforcement agency |
US8924872B1 (en) | 2013-10-18 | 2014-12-30 | Palantir Technologies Inc. | Overview user interface of emergency call data of a law enforcement agency |
US9514200B2 (en) | 2013-10-18 | 2016-12-06 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US10719527B2 (en) | 2013-10-18 | 2020-07-21 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores |
US9021384B1 (en) | 2013-11-04 | 2015-04-28 | Palantir Technologies Inc. | Interactive vehicle information map |
US10262047B1 (en) | 2013-11-04 | 2019-04-16 | Palantir Technologies Inc. | Interactive vehicle information map |
US11100174B2 (en) | 2013-11-11 | 2021-08-24 | Palantir Technologies Inc. | Simple web search |
US10037383B2 (en) | 2013-11-11 | 2018-07-31 | Palantir Technologies, Inc. | Simple web search |
EP2882139A1 (en) * | 2013-12-05 | 2015-06-10 | Deutsche Telekom AG | System and method for IT servers anomaly detection using incident consolidation |
US11138279B1 (en) | 2013-12-10 | 2021-10-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US10198515B1 (en) | 2013-12-10 | 2019-02-05 | Palantir Technologies Inc. | System and method for aggregating data from a plurality of data sources |
US9727622B2 (en) | 2013-12-16 | 2017-08-08 | Palantir Technologies, Inc. | Methods and systems for analyzing entity performance |
US9734217B2 (en) | 2013-12-16 | 2017-08-15 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US10025834B2 (en) | 2013-12-16 | 2018-07-17 | Palantir Technologies Inc. | Methods and systems for analyzing entity performance |
US10417258B2 (en) | 2013-12-19 | 2019-09-17 | Exposit Labs, Inc. | Interactive multi-dimensional nested table supporting scalable real-time querying of large data volumes |
US9552615B2 (en) | 2013-12-20 | 2017-01-24 | Palantir Technologies Inc. | Automated database analysis to detect malfeasance |
US10356032B2 (en) | 2013-12-26 | 2019-07-16 | Palantir Technologies Inc. | System and method for detecting confidential information emails |
US10230746B2 (en) | 2014-01-03 | 2019-03-12 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US9100428B1 (en) | 2014-01-03 | 2015-08-04 | Palantir Technologies Inc. | System and method for evaluating network threats |
US10805321B2 (en) | 2014-01-03 | 2020-10-13 | Palantir Technologies Inc. | System and method for evaluating network threats and usage |
US10120545B2 (en) | 2014-01-03 | 2018-11-06 | Palantir Technologies Inc. | Systems and methods for visual definition of data associations |
US10901583B2 (en) | 2014-01-03 | 2021-01-26 | Palantir Technologies Inc. | Systems and methods for visual definition of data associations |
US9043696B1 (en) | 2014-01-03 | 2015-05-26 | Palantir Technologies Inc. | Systems and methods for visual definition of data associations |
US10133741B2 (en) | 2014-02-13 | 2018-11-20 | Amazon Technologies, Inc. | Log data service in a virtual environment |
US9923925B2 (en) | 2014-02-20 | 2018-03-20 | Palantir Technologies Inc. | Cyber security sharing and identification system |
US9483162B2 (en) | 2014-02-20 | 2016-11-01 | Palantir Technologies Inc. | Relationship visualizations |
US10873603B2 (en) | 2014-02-20 | 2020-12-22 | Palantir Technologies Inc. | Cyber security sharing and identification system |
US10402054B2 (en) | 2014-02-20 | 2019-09-03 | Palantir Technologies Inc. | Relationship visualizations |
US10795723B2 (en) | 2014-03-04 | 2020-10-06 | Palantir Technologies Inc. | Mobile tasks |
US10180977B2 (en) | 2014-03-18 | 2019-01-15 | Palantir Technologies Inc. | Determining and extracting changed data from a data source |
US20160330086A1 (en) * | 2014-03-18 | 2016-11-10 | Hitachi, Ltd. | Data transfer monitor system, data transfer monitor method and base system |
US10164847B2 (en) * | 2014-03-18 | 2018-12-25 | Hitachi, Ltd. | Data transfer monitor system, data transfer monitor method and base system |
US20160371363A1 (en) * | 2014-03-26 | 2016-12-22 | Hitachi, Ltd. | Time series data management method and time series data management system |
US9857958B2 (en) | 2014-04-28 | 2018-01-02 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US10871887B2 (en) | 2014-04-28 | 2020-12-22 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases |
US9449035B2 (en) | 2014-05-02 | 2016-09-20 | Palantir Technologies Inc. | Systems and methods for active column filtering |
US9009171B1 (en) | 2014-05-02 | 2015-04-14 | Palantir Technologies Inc. | Systems and methods for active column filtering |
US10496927B2 (en) | 2014-05-23 | 2019-12-03 | DataRobot, Inc. | Systems for time-series predictive data analytics, and related methods and apparatus |
US10984367B2 (en) | 2014-05-23 | 2021-04-20 | DataRobot, Inc. | Systems and techniques for predictive data analytics |
US11922329B2 (en) | 2014-05-23 | 2024-03-05 | DataRobot, Inc. | Systems for second-order predictive data analytics, and related methods and apparatus |
US10366346B2 (en) | 2014-05-23 | 2019-07-30 | DataRobot, Inc. | Systems and techniques for determining the predictive value of a feature |
US10558924B2 (en) | 2014-05-23 | 2020-02-11 | DataRobot, Inc. | Systems for second-order predictive data analytics, and related methods and apparatus |
US10423635B2 (en) * | 2014-05-30 | 2019-09-24 | International Business Machines Corporation | Processing time series |
US10366095B2 (en) * | 2014-05-30 | 2019-07-30 | International Business Machines Corporation | Processing time series |
US20150347537A1 (en) * | 2014-05-30 | 2015-12-03 | International Business Machines Corporation | Processing time series |
US20150347568A1 (en) * | 2014-05-30 | 2015-12-03 | International Business Machines Corporation | Processing time series |
US10474820B2 (en) | 2014-06-17 | 2019-11-12 | Hewlett Packard Enterprise Development Lp | DNS based infection scores |
US9619557B2 (en) | 2014-06-30 | 2017-04-11 | Palantir Technologies, Inc. | Systems and methods for key phrase characterization of documents |
US10162887B2 (en) | 2014-06-30 | 2018-12-25 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US10180929B1 (en) | 2014-06-30 | 2019-01-15 | Palantir Technologies, Inc. | Systems and methods for identifying key phrase clusters within documents |
US11341178B2 (en) | 2014-06-30 | 2022-05-24 | Palantir Technologies Inc. | Systems and methods for key phrase characterization of documents |
US9998485B2 (en) | 2014-07-03 | 2018-06-12 | Palantir Technologies, Inc. | Network intrusion data item clustering and analysis |
US9202249B1 (en) | 2014-07-03 | 2015-12-01 | Palantir Technologies Inc. | Data item clustering and analysis |
US10798116B2 (en) | 2014-07-03 | 2020-10-06 | Palantir Technologies Inc. | External malware data item clustering and analysis |
US10572496B1 (en) | 2014-07-03 | 2020-02-25 | Palantir Technologies Inc. | Distributed workflow system and database with access controls for city resiliency |
US9344447B2 (en) | 2014-07-03 | 2016-05-17 | Palantir Technologies Inc. | Internal malware data item clustering and analysis |
US9298678B2 (en) | 2014-07-03 | 2016-03-29 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US9021260B1 (en) | 2014-07-03 | 2015-04-28 | Palantir Technologies Inc. | Malware data item analysis |
US9785773B2 (en) | 2014-07-03 | 2017-10-10 | Palantir Technologies Inc. | Malware data item analysis |
US9256664B2 (en) | 2014-07-03 | 2016-02-09 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US10929436B2 (en) | 2014-07-03 | 2021-02-23 | Palantir Technologies Inc. | System and method for news events detection and visualization |
US20220171736A1 (en) * | 2014-07-09 | 2022-06-02 | Splunk Inc. | Managing datasets generated by search queries |
US12169471B2 (en) * | 2014-07-09 | 2024-12-17 | Splunk Inc. | Managing datasets generated by search queries |
US20160042287A1 (en) * | 2014-08-10 | 2016-02-11 | Palo Alto Research Center Incorporated | Computer-Implemented System And Method For Detecting Anomalies Using Sample-Based Rule Identification |
US10140576B2 (en) * | 2014-08-10 | 2018-11-27 | Palo Alto Research Center Incorporated | Computer-implemented system and method for detecting anomalies using sample-based rule identification |
US9880696B2 (en) | 2014-09-03 | 2018-01-30 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10866685B2 (en) | 2014-09-03 | 2020-12-15 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US12204527B2 (en) | 2014-09-03 | 2025-01-21 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9454281B2 (en) | 2014-09-03 | 2016-09-27 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US9501851B2 (en) | 2014-10-03 | 2016-11-22 | Palantir Technologies Inc. | Time-series analysis system |
US9767172B2 (en) | 2014-10-03 | 2017-09-19 | Palantir Technologies Inc. | Data aggregation and analysis system |
US10360702B2 (en) | 2014-10-03 | 2019-07-23 | Palantir Technologies Inc. | Time-series analysis system |
US11004244B2 (en) | 2014-10-03 | 2021-05-11 | Palantir Technologies Inc. | Time-series analysis system |
US10664490B2 (en) | 2014-10-03 | 2020-05-26 | Palantir Technologies Inc. | Data aggregation and analysis system |
US10437450B2 (en) | 2014-10-06 | 2019-10-08 | Palantir Technologies Inc. | Presentation of multivariate data on a graphical user interface of a computing system |
US9785328B2 (en) | 2014-10-06 | 2017-10-10 | Palantir Technologies Inc. | Presentation of multivariate data on a graphical user interface of a computing system |
US11875032B1 (en) | 2014-10-09 | 2024-01-16 | Splunk Inc. | Detecting anomalies in key performance indicator values |
US11340774B1 (en) * | 2014-10-09 | 2022-05-24 | Splunk Inc. | Anomaly detection based on a predicted value |
US11275753B2 (en) | 2014-10-16 | 2022-03-15 | Palantir Technologies Inc. | Schematic and database linking system |
US9984133B2 (en) | 2014-10-16 | 2018-05-29 | Palantir Technologies Inc. | Schematic and database linking system |
US10853338B2 (en) | 2014-11-05 | 2020-12-01 | Palantir Technologies Inc. | Universal data pipeline |
US10191926B2 (en) | 2014-11-05 | 2019-01-29 | Palantir Technologies, Inc. | Universal data pipeline |
US9946738B2 (en) | 2014-11-05 | 2018-04-17 | Palantir Technologies, Inc. | Universal data pipeline |
US9558352B1 (en) | 2014-11-06 | 2017-01-31 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US10728277B2 (en) | 2014-11-06 | 2020-07-28 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US9043894B1 (en) | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US10135863B2 (en) | 2014-11-06 | 2018-11-20 | Palantir Technologies Inc. | Malicious software detection in a computing system |
US11252248B2 (en) | 2014-12-22 | 2022-02-15 | Palantir Technologies Inc. | Communication data processing architecture |
US9898528B2 (en) | 2014-12-22 | 2018-02-20 | Palantir Technologies Inc. | Concept indexing among database of documents using machine learning techniques |
US10362133B1 (en) | 2014-12-22 | 2019-07-23 | Palantir Technologies Inc. | Communication data processing architecture |
US10447712B2 (en) | 2014-12-22 | 2019-10-15 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9367872B1 (en) | 2014-12-22 | 2016-06-14 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US10552994B2 (en) | 2014-12-22 | 2020-02-04 | Palantir Technologies Inc. | Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items |
US9589299B2 (en) | 2014-12-22 | 2017-03-07 | Palantir Technologies Inc. | Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures |
US9817563B1 (en) | 2014-12-29 | 2017-11-14 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US10552998B2 (en) | 2014-12-29 | 2020-02-04 | Palantir Technologies Inc. | System and method of generating data points from one or more data stores of data items for chart creation and manipulation |
US10157200B2 (en) | 2014-12-29 | 2018-12-18 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US10127021B1 (en) | 2014-12-29 | 2018-11-13 | Palantir Technologies Inc. | Storing logical units of program code generated using a dynamic programming notebook user interface |
US9870205B1 (en) | 2014-12-29 | 2018-01-16 | Palantir Technologies Inc. | Storing logical units of program code generated using a dynamic programming notebook user interface |
US9870389B2 (en) | 2014-12-29 | 2018-01-16 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US9335911B1 (en) | 2014-12-29 | 2016-05-10 | Palantir Technologies Inc. | Interactive user interface for dynamic data analysis exploration and query processing |
US10838697B2 (en) | 2014-12-29 | 2020-11-17 | Palantir Technologies Inc. | Storing logical units of program code generated using a dynamic programming notebook user interface |
US11030581B2 (en) | 2014-12-31 | 2021-06-08 | Palantir Technologies Inc. | Medical claims lead summary report generation |
US10372879B2 (en) | 2014-12-31 | 2019-08-06 | Palantir Technologies Inc. | Medical claims lead summary report generation |
US10891558B2 (en) * | 2015-01-21 | 2021-01-12 | Anodot Ltd. | Creation of metric relationship graph based on windowed time series data for anomaly detection |
US20160210556A1 (en) * | 2015-01-21 | 2016-07-21 | Anodot Ltd. | Heuristic Inference of Topological Representation of Metric Relationships |
US10387834B2 (en) | 2015-01-21 | 2019-08-20 | Palantir Technologies Inc. | Systems and methods for accessing and storing snapshots of a remote application in a document |
US10664535B1 (en) * | 2015-02-02 | 2020-05-26 | Amazon Technologies, Inc. | Retrieving log data from metric data |
US10474326B2 (en) | 2015-02-25 | 2019-11-12 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US9727560B2 (en) | 2015-02-25 | 2017-08-08 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US10291506B2 (en) * | 2015-03-04 | 2019-05-14 | Fisher-Rosemount Systems, Inc. | Anomaly detection in industrial communications networks |
US20160261482A1 (en) * | 2015-03-04 | 2016-09-08 | Fisher-Rosemount Systems, Inc. | Anomaly detection in industrial communications networks |
CN105939334A (en) * | 2015-03-04 | 2016-09-14 | 费希尔-罗斯蒙特系统公司 | Anomaly detection in industrial communications networks |
WO2016145238A1 (en) * | 2015-03-10 | 2016-09-15 | Elemental Machines, Inc. | Method and apparatus for environmental sensing |
US20200213157A1 (en) * | 2015-03-10 | 2020-07-02 | Elemental Machines, Inc. | Method and Apparatus for Environmental Sensing |
US11018900B2 (en) | 2015-03-10 | 2021-05-25 | Elemental Machines, Inc. | Method and apparatus for environmental sensing |
US20180062877A1 (en) * | 2015-03-10 | 2018-03-01 | Elemental Machines, Inc. | Method and Apparatus for Environmental Sensing |
US20230300004A1 (en) * | 2015-03-10 | 2023-09-21 | Elemental Machines, Inc. | Method and Apparatus for Environmental Sensing |
US20210281446A1 (en) * | 2015-03-10 | 2021-09-09 | Elemental Machines, Inc. | Method and Apparatus for Environmental Sensing |
US10530608B2 (en) * | 2015-03-10 | 2020-01-07 | Elemental Machines, Inc. | Method and apparatus for environmental sensing |
US11665024B2 (en) * | 2015-03-10 | 2023-05-30 | Elemental Machines, Inc. | Method and apparatus for environmental sensing |
US10459619B2 (en) | 2015-03-16 | 2019-10-29 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US9891808B2 (en) | 2015-03-16 | 2018-02-13 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US12147657B2 (en) | 2015-03-16 | 2024-11-19 | Palantir Technologies Inc. | Interactive user interfaces for location-based data analysis |
US9886467B2 (en) | 2015-03-19 | 2018-02-06 | Plantir Technologies Inc. | System and method for comparing and visualizing data entities and data entity series |
US12069534B2 (en) | 2015-05-01 | 2024-08-20 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
US10057718B2 (en) | 2015-05-01 | 2018-08-21 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
US10412547B2 (en) | 2015-05-01 | 2019-09-10 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
US10681497B2 (en) | 2015-05-01 | 2020-06-09 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
US11197125B2 (en) | 2015-05-01 | 2021-12-07 | The Nielsen Company (Us), Llc | Methods and apparatus to associate geographic locations with user devices |
US11113342B2 (en) * | 2015-06-23 | 2021-09-07 | Splunk Inc. | Techniques for compiling and presenting query results |
US11868411B1 (en) | 2015-06-23 | 2024-01-09 | Splunk Inc. | Techniques for compiling and presenting query results |
US11042591B2 (en) | 2015-06-23 | 2021-06-22 | Splunk Inc. | Analytical search engine |
US9454785B1 (en) | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US11501369B2 (en) | 2015-07-30 | 2022-11-15 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US10223748B2 (en) | 2015-07-30 | 2019-03-05 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
US11363047B2 (en) * | 2015-08-01 | 2022-06-14 | Splunk Inc. | Generating investigation timeline displays including activity events and investigation workflow events |
US20190166146A1 (en) * | 2015-08-01 | 2019-05-30 | Splunk Inc, | Displaying Network Security Events and Investigation Activities Across Investigation Timelines |
US10778712B2 (en) * | 2015-08-01 | 2020-09-15 | Splunk Inc. | Displaying network security events and investigation activities across investigation timelines |
US11132111B2 (en) | 2015-08-01 | 2021-09-28 | Splunk Inc. | Assigning workflow network security investigation actions to investigation timelines |
US20170034196A1 (en) * | 2015-08-01 | 2017-02-02 | Splunk Inc. | Selecting network security investigation timelines based on identifiers |
US20170031565A1 (en) * | 2015-08-01 | 2017-02-02 | Splunk Inc. | Network security investigation workflow logging |
US9516052B1 (en) * | 2015-08-01 | 2016-12-06 | Splunk Inc. | Timeline displays of network security investigation events |
US11641372B1 (en) * | 2015-08-01 | 2023-05-02 | Splunk Inc. | Generating investigation timeline displays including user-selected screenshots |
US10254934B2 (en) * | 2015-08-01 | 2019-04-09 | Splunk Inc. | Network security investigation workflow logging |
US9848008B2 (en) * | 2015-08-01 | 2017-12-19 | Splunk Inc. | Creating timeline views of information technology event investigations |
US20170048264A1 (en) * | 2015-08-01 | 2017-02-16 | Splunk Inc, | Creating Timeline Views of Information Technology Event Investigations |
US10250628B2 (en) * | 2015-08-01 | 2019-04-02 | Splunk Inc. | Storyboard displays of information technology investigative events along a timeline |
US10237292B2 (en) * | 2015-08-01 | 2019-03-19 | Splunk Inc. | Selecting network security investigation timelines based on identifiers |
US9363149B1 (en) * | 2015-08-01 | 2016-06-07 | Splunk Inc. | Management console for network security investigations |
US20190166145A1 (en) * | 2015-08-01 | 2019-05-30 | Splunk Inc. | Selecting Network Security Event Investigation Timelines in a Workflow Environment |
US10848510B2 (en) * | 2015-08-01 | 2020-11-24 | Splunk Inc. | Selecting network security event investigation timelines in a workflow environment |
US9996595B2 (en) | 2015-08-03 | 2018-06-12 | Palantir Technologies, Inc. | Providing full data provenance visualization for versioned datasets |
US10484407B2 (en) | 2015-08-06 | 2019-11-19 | Palantir Technologies Inc. | Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications |
US10444940B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10444941B2 (en) | 2015-08-17 | 2019-10-15 | Palantir Technologies Inc. | Interactive geospatial map |
US10489391B1 (en) | 2015-08-17 | 2019-11-26 | Palantir Technologies Inc. | Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface |
US10102369B2 (en) | 2015-08-19 | 2018-10-16 | Palantir Technologies Inc. | Checkout system executable code monitoring, and user account compromise determination system |
US10922404B2 (en) | 2015-08-19 | 2021-02-16 | Palantir Technologies Inc. | Checkout system executable code monitoring, and user account compromise determination system |
US10853378B1 (en) | 2015-08-25 | 2020-12-01 | Palantir Technologies Inc. | Electronic note management via a connected entity graph |
US11934847B2 (en) | 2015-08-26 | 2024-03-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US11150917B2 (en) | 2015-08-26 | 2021-10-19 | Palantir Technologies Inc. | System for data aggregation and analysis of data from a plurality of data sources |
US12105719B2 (en) | 2015-08-28 | 2024-10-01 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US10346410B2 (en) | 2015-08-28 | 2019-07-09 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US11048706B2 (en) | 2015-08-28 | 2021-06-29 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US9898509B2 (en) | 2015-08-28 | 2018-02-20 | Palantir Technologies Inc. | Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces |
US10972332B2 (en) * | 2015-08-31 | 2021-04-06 | Adobe Inc. | Identifying factors that contribute to a metric anomaly |
US20210194751A1 (en) * | 2015-08-31 | 2021-06-24 | Adobe Inc. | Identifying contributing factors to a metric anomaly |
US12184475B2 (en) * | 2015-08-31 | 2024-12-31 | Adobe Inc. | Identifying contributing factors to a metric anomaly |
US10706434B1 (en) | 2015-09-01 | 2020-07-07 | Palantir Technologies Inc. | Methods and systems for determining location information |
US11080296B2 (en) | 2015-09-09 | 2021-08-03 | Palantir Technologies Inc. | Domain-specific language for dataset transformations |
US9965534B2 (en) | 2015-09-09 | 2018-05-08 | Palantir Technologies, Inc. | Domain-specific language for dataset transformations |
US10296617B1 (en) | 2015-10-05 | 2019-05-21 | Palantir Technologies Inc. | Searches of highly structured data |
US10454889B2 (en) * | 2015-10-26 | 2019-10-22 | Oath Inc. | Automatic anomaly detection framework for grid resources |
US10572487B1 (en) | 2015-10-30 | 2020-02-25 | Palantir Technologies Inc. | Periodic database search manager for multiple data sources |
US10678860B1 (en) | 2015-12-17 | 2020-06-09 | Palantir Technologies, Inc. | Automatic generation of composite datasets based on hierarchical fields |
US11570188B2 (en) * | 2015-12-28 | 2023-01-31 | Sixgill Ltd. | Dark web monitoring, analysis and alert system and method |
US9823818B1 (en) | 2015-12-29 | 2017-11-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US10839144B2 (en) | 2015-12-29 | 2020-11-17 | Palantir Technologies Inc. | Real-time document annotation |
US10540061B2 (en) | 2015-12-29 | 2020-01-21 | Palantir Technologies Inc. | Systems and interactive user interfaces for automatic generation of temporal representation of data objects |
US10437612B1 (en) * | 2015-12-30 | 2019-10-08 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US11086640B2 (en) * | 2015-12-30 | 2021-08-10 | Palantir Technologies Inc. | Composite graphical interface with shareable data-objects |
US10698938B2 (en) | 2016-03-18 | 2020-06-30 | Palantir Technologies Inc. | Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags |
US12093978B2 (en) | 2016-06-21 | 2024-09-17 | The Nielsen Company (Us), Llc | Methods and apparatus to collect and process browsing history |
US11188941B2 (en) | 2016-06-21 | 2021-11-30 | The Nielsen Company (Us), Llc | Methods and apparatus to collect and process browsing history |
US10698594B2 (en) | 2016-07-21 | 2020-06-30 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US10719188B2 (en) | 2016-07-21 | 2020-07-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
US10324609B2 (en) | 2016-07-21 | 2019-06-18 | Palantir Technologies Inc. | System for providing dynamic linked panels in user interface |
US12204845B2 (en) | 2016-07-21 | 2025-01-21 | Palantir Technologies Inc. | Cached database and synchronization system for providing dynamic linked panels in user interface |
US10437840B1 (en) | 2016-08-19 | 2019-10-08 | Palantir Technologies Inc. | Focused probabilistic entity resolution from multiple data sources |
US11726979B2 (en) | 2016-09-13 | 2023-08-15 | Oracle International Corporation | Determining a chronological order of transactions executed in relation to an object stored in a storage system |
US10733159B2 (en) | 2016-09-14 | 2020-08-04 | Oracle International Corporation | Maintaining immutable data and mutable metadata in a storage system |
US11210622B2 (en) | 2016-09-26 | 2021-12-28 | Splunk Inc. | Generating augmented process models for process analytics |
US11250371B2 (en) * | 2016-09-26 | 2022-02-15 | Splunk Inc. | Managing process analytics across process components |
US11782987B1 (en) | 2016-09-26 | 2023-10-10 | Splunk Inc. | Using an augmented process model to track process instances |
US20180089334A1 (en) * | 2016-09-26 | 2018-03-29 | Splunk Inc. | Managing process analytics across process components |
US10839326B2 (en) * | 2016-10-18 | 2020-11-17 | Dell Products L.P. | Managing project status using business intelligence and predictive analytics |
US20180107959A1 (en) * | 2016-10-18 | 2018-04-19 | Dell Products L.P. | Managing project status using business intelligence and predictive analytics |
US11599504B2 (en) | 2016-10-27 | 2023-03-07 | Oracle International Corporation | Executing a conditional command on an object stored in a storage system |
US10860534B2 (en) | 2016-10-27 | 2020-12-08 | Oracle International Corporation | Executing a conditional command on an object stored in a storage system |
US11379415B2 (en) | 2016-10-27 | 2022-07-05 | Oracle International Corporation | Executing a conditional command on an object stored in a storage system |
US11386045B2 (en) | 2016-10-27 | 2022-07-12 | Oracle International Corporation | Executing a conditional command on an object stored in a storage system |
US10956051B2 (en) | 2016-10-31 | 2021-03-23 | Oracle International Corporation | Data-packed storage containers for streamlined access and migration |
US10275177B2 (en) | 2016-10-31 | 2019-04-30 | Oracle International Corporation | Data layout schemas for seamless data migration |
US10169081B2 (en) * | 2016-10-31 | 2019-01-01 | Oracle International Corporation | Use of concurrent time bucket generations for scalable scheduling of operations in a computer system |
US10664309B2 (en) | 2016-10-31 | 2020-05-26 | Oracle International Corporation | Use of concurrent time bucket generations for scalable scheduling of operations in a computer system |
US10191936B2 (en) | 2016-10-31 | 2019-01-29 | Oracle International Corporation | Two-tier storage protocol for committing changes in a storage system |
US10664329B2 (en) | 2016-10-31 | 2020-05-26 | Oracle International Corporation | Determining system information based on object mutation events |
US10180863B2 (en) | 2016-10-31 | 2019-01-15 | Oracle International Corporation | Determining system information based on object mutation events |
US10318630B1 (en) | 2016-11-21 | 2019-06-11 | Palantir Technologies Inc. | Analysis of large bodies of textual data |
US10460602B1 (en) | 2016-12-28 | 2019-10-29 | Palantir Technologies Inc. | Interactive vehicle information mapping system |
US11841786B2 (en) | 2017-03-01 | 2023-12-12 | Visa International Service Association | Predictive anomaly detection framework |
US11237939B2 (en) | 2017-03-01 | 2022-02-01 | Visa International Service Association | Predictive anomaly detection framework |
EP3590042A4 (en) * | 2017-03-01 | 2020-04-08 | Visa International Service Association | PREDICTIVE ANOMALY DETECTING FRAME |
WO2018160177A1 (en) | 2017-03-01 | 2018-09-07 | Visa International Service Association | Predictive anomaly detection framework |
US10387900B2 (en) * | 2017-04-17 | 2019-08-20 | DataRobot, Inc. | Methods and apparatus for self-adaptive time series forecasting engine |
US11250449B1 (en) | 2017-04-17 | 2022-02-15 | DataRobot, Inc. | Methods for self-adaptive time series forecasting, and related systems and apparatus |
US11941659B2 (en) | 2017-05-16 | 2024-03-26 | Maplebear Inc. | Systems and methods for intelligent promotion design with promotion scoring |
US10956406B2 (en) | 2017-06-12 | 2021-03-23 | Palantir Technologies Inc. | Propagated deletion of database records and derived data |
JP2019016173A (en) * | 2017-07-07 | 2019-01-31 | 株式会社日立製作所 | Data processing method, data processing device, and data processing program |
US10403011B1 (en) | 2017-07-18 | 2019-09-03 | Palantir Technologies Inc. | Passing system with an interactive user interface |
US12058160B1 (en) | 2017-11-22 | 2024-08-06 | Lacework, Inc. | Generating computer code for remediating detected events |
US12095794B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Universal cloud data ingestion for stream processing |
US11991198B1 (en) | 2017-11-27 | 2024-05-21 | Lacework, Inc. | User-specific data-driven network security |
US12095879B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Identifying encountered and unencountered conditions in software applications |
US12206696B1 (en) | 2017-11-27 | 2025-01-21 | Fortinet, Inc. | Detecting anomalies in a network environment |
US12130878B1 (en) | 2017-11-27 | 2024-10-29 | Fortinet, Inc. | Deduplication of monitored communications data in a cloud environment |
US12120140B2 (en) | 2017-11-27 | 2024-10-15 | Fortinet, Inc. | Detecting threats against computing resources based on user behavior changes |
US12126695B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Enhancing security of a cloud deployment based on learnings from other cloud deployments |
US12034754B2 (en) | 2017-11-27 | 2024-07-09 | Lacework, Inc. | Using static analysis for vulnerability detection |
US11973784B1 (en) | 2017-11-27 | 2024-04-30 | Lacework, Inc. | Natural language interface for an anomaly detection framework |
US12034750B1 (en) | 2017-11-27 | 2024-07-09 | Lacework Inc. | Tracking of user login sessions |
US12095796B1 (en) | 2017-11-27 | 2024-09-17 | Lacework, Inc. | Instruction-level threat assessment |
US12126643B1 (en) | 2017-11-27 | 2024-10-22 | Fortinet, Inc. | Leveraging generative artificial intelligence (‘AI’) for securing a monitored deployment |
US12021888B1 (en) | 2017-11-27 | 2024-06-25 | Lacework, Inc. | Cloud infrastructure entitlement management by a data platform |
EP3732572B1 (en) * | 2017-12-28 | 2024-06-05 | Microsoft Technology Licensing, LLC | Enhanced data aggregation techniques for anomaly detection and analysis |
US10764312B2 (en) | 2017-12-28 | 2020-09-01 | Microsoft Technology Licensing, Llc | Enhanced data aggregation techniques for anomaly detection and analysis |
US11599369B1 (en) | 2018-03-08 | 2023-03-07 | Palantir Technologies Inc. | Graphical user interface configuration system |
US10754822B1 (en) | 2018-04-18 | 2020-08-25 | Palantir Technologies Inc. | Systems and methods for ontology migration |
US10885021B1 (en) | 2018-05-02 | 2021-01-05 | Palantir Technologies Inc. | Interactive interpreter and graphical user interface |
US11860971B2 (en) | 2018-05-24 | 2024-01-02 | International Business Machines Corporation | Anomaly detection |
US11119630B1 (en) | 2018-06-19 | 2021-09-14 | Palantir Technologies Inc. | Artificial intelligence assisted evaluations and user interface for same |
US12147647B2 (en) | 2018-06-19 | 2024-11-19 | Palantir Technologies Inc. | Artificial intelligence assisted evaluations and user interface for same |
US10977569B2 (en) * | 2018-07-27 | 2021-04-13 | Vmware, Inc. | Visualization of anomalies in time series data |
US20200035001A1 (en) * | 2018-07-27 | 2020-01-30 | Vmware, Inc. | Visualization of anomalies in time series data |
US12160694B2 (en) * | 2018-10-01 | 2024-12-03 | Elemental Machines, Inc. | Method and apparatus for local sensing |
US20210345019A1 (en) * | 2018-10-01 | 2021-11-04 | Elemental Machines, Inc. | Method and Apparatus for Local Sensing |
US11005727B2 (en) * | 2019-07-25 | 2021-05-11 | Vmware, Inc. | Visual overlays for network insights |
US11271824B2 (en) | 2019-07-25 | 2022-03-08 | Vmware, Inc. | Visual overlays for network insights |
US11522770B2 (en) | 2019-07-25 | 2022-12-06 | Vmware, Inc. | Visual overlays for network insights |
US12032634B1 (en) | 2019-12-23 | 2024-07-09 | Lacework Inc. | Graph reclustering based on different clustering criteria |
CN111224823A (en) * | 2020-01-06 | 2020-06-02 | 杭州数群科技有限公司 | Method based on different network log analysis |
CN111291096A (en) * | 2020-03-03 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Data set construction method and device, storage medium and abnormal index detection method |
CN111414395A (en) * | 2020-03-27 | 2020-07-14 | 中国平安财产保险股份有限公司 | Data processing method, system and computer equipment |
CN111611519A (en) * | 2020-05-28 | 2020-09-01 | 上海观安信息技术股份有限公司 | Method and device for detecting personal abnormal behaviors |
US11403119B2 (en) * | 2020-06-21 | 2022-08-02 | Apple Inc. | Declaratively defined user interface timeline views |
US20220374251A1 (en) * | 2020-06-21 | 2022-11-24 | Apple Inc. | Declaratively defined user interface timeline views |
US11789755B2 (en) * | 2020-06-21 | 2023-10-17 | Apple Inc. | Declaratively defined user interface timeline views |
CN112667723A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data acquisition method and terminal equipment |
US11402979B1 (en) * | 2021-01-29 | 2022-08-02 | Splunk Inc. | Interactive expandable histogram timeline module for security flagged events |
US12106658B2 (en) * | 2022-09-28 | 2024-10-01 | Sumo Logic, Inc. | Alert response tool |
US20240105050A1 (en) * | 2022-09-28 | 2024-03-28 | Sumo Logic, Inc. | Alert response tool |
Also Published As
Publication number | Publication date |
---|---|
US20140012901A1 (en) | 2014-01-09 |
US8554699B2 (en) | 2013-10-08 |
US8682816B2 (en) | 2014-03-25 |
US20110119374A1 (en) | 2011-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8682816B2 (en) | Method and system for detecting anomalies in time series data | |
US8972332B2 (en) | Method and system for detecting anomalies in web analytics data | |
US8332775B2 (en) | Adaptive user feedback window | |
US10614077B2 (en) | Computer system for automated assessment at scale of topic-specific social media impact | |
US8972379B1 (en) | Centralized web-based software solution for search engine optimization | |
US9305105B2 (en) | System and method for aggregating analytics data | |
US8549019B2 (en) | Dynamically generating aggregate tables | |
US20230069403A1 (en) | Method and system for generating ensemble demand forecasts | |
EP2874064B1 (en) | Adaptive metric collection, storage, and alert thresholds | |
US8838560B2 (en) | System and method for measuring the effectiveness of an on-line advertisement campaign | |
US9400824B2 (en) | Systems and methods for sorting data | |
US11295324B2 (en) | Method and system for generating disaggregated demand forecasts from ensemble demand forecasts | |
US20210042338A1 (en) | Systems and methods for analyzing computer input to provide next action | |
US20110055250A1 (en) | Method and system for generating and sharing dataset segmentation schemes | |
US20200134642A1 (en) | Method and system for validating ensemble demand forecasts | |
JP2009518701A (en) | Website visit data set comparison | |
US20110055214A1 (en) | Method and System for Pivoting a Multidimensional Dataset | |
JP2013519941A (en) | Method and system for e-commerce transaction data accounting | |
US8489439B2 (en) | Forecasting discovery costs based on complex and incomplete facts | |
US10552996B2 (en) | Systems and techniques for determining associations between multiple types of data in large data sets | |
US20200151281A1 (en) | Performing query-time attribution channel modeling | |
US20200058037A1 (en) | Reporting of media consumption metrics | |
US8782162B1 (en) | System for merging and comparing real-time analytics data with conventional analytics data | |
US8060484B2 (en) | Graphical user interface for data management | |
US20060294220A1 (en) | Diagnostics and resolution mining architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RUHL, JAN MATTHIAS;VAN DER MOLEN, DOUGLAS;MOON, HUI SOK;AND OTHERS;REEL/FRAME:025716/0476 Effective date: 20110120 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044142/0357 Effective date: 20170929 |