[go: up one dir, main page]

Academia.eduAcademia.edu
International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 Extort Factual Information using Open Source Intelligence of Web Mining Balaganesh Lincoln University College,Malaysia baga_indian@yahoo.co.in Joshua Samual FTMS College, Malaysia Joshua@ftms.edu.my Abstract Open source intelligence of web mining is the extraction and analysis of directly or indirectly, publicly available information from various source using Information extraction techniques. OSINTWM operations support other intelligence, surveillance, and reconnaissance (ISR) efforts by providing foundational information that enhances collection and production. As part of a multidiscipline intelligence effort, the use and integration of OSINTWM ensures decision makers have the benefit of all available information .Two important terms in these complementary definitions are––Open Source, which is any person or group that provides information without the expectation Of privacy––the information, the relationship, or both is not protected against public disclosure. Publicly Available Information, which is data, facts, instructions, or other material published or broadcast for general public consumption; available on request to a member of the general[3]public: lawfully seen or heard by any casual observer; or made available at a meeting open to the general public[1]. Collecting webpage content and links can provide useful information about relationships between individuals and organizations. Properly focused, collecting and processing publicly available information from Internet sites can support understanding of the operational environment. 1. Internet Search Techniques The ability to search the Internet is an essential skill for open source research and collection personnel. The Internet provides access to webpages and databases that hold a wide range of information on current, planned, and potential operational environments[1] techniques and procedures for searching the Internet. STEP Plan Search Conduct Search Refine Search TECHNIQUES AND PROCEDURES • Determine operations and computer security risks and protective measures. • Use mission and specific information requirements to determine objective and search terms Plan • Write all search terms down. • Collaborate with librarians and other analysts to determine potential information sources. • Select the search tools and sources that will best satisfy the objective. (These may be on classified systems vice the Internet.) • Use approved hardware and software applications. • Use authorized government or commercial Internet service provider. • Search only for information for which the organization has an authorized and assigned mission in accordance with AR 381-10. • Based on requirements, software, and tools of the chosen search engine or resource. • Search conduct search using methods such as keyword searching, field searching, or Natural Language techniques. • Browse or scan results for relevancy, pertinence, associated terms, discovery of new concepts and terms to follow up on. and irrelevant terms to exclude in more refine searches. • Compare the relevancy of the results to objective and indicators. • Compare the accuracy of the results to search parameters (keywords, phrase, date or ISSN: 2289-2265 18 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 • Record Results • • • • • • • • • date range, language, format, etc). Compare the results from different search engines to identify missing or incomplete information (for example, one engine’s results include news articles but another engine does not). Modify the keywords. Search within results. Search by field. Search cached and archived pages. Truncate uniform resource locator. Record relevant source information—as a minimum, URL (location), date accessed, name and date of file of document title, and author or organizations. Save content. Download files. Identify Intellectual Property. Table 1: Searching Techniques and Procedure 2. Cyber Security The Internet is not a benign environment. There are operations and computer security risks to searching the Internet and interacting with Internet sites. Searching the Internet can compromise OPSEC by leaving footprints on visited sites. Visiting Internet sites can compromise computer security by exposing vulnerability or providing information that exposing the computer and the network to malicious software or unauthorized access. Users must be vigilant to potential threats; use only authorized hardware and software; and comply with OPSEC measures[1]. Awareness of what information the user's computer provides to each server and site on the Internet is the beginning of effective cyber security. Just by visiting a site, the computer transmits machine specifications such as operating system and type and version of each enabled software program, security levels, a history of sites visited during that session, cookie information, user preferences[1], communication protocol information such as an IP address (for the user and hosting or proxy server), enabled languages, and other computer profile information such as date and time (and time zone), referring URLs (the previous site visited), and more. Available on unprotected computers could be the email address of the user, login information, their certifications. In addition to computer vulnerabilities, just knowing where the research comes from may affect the page accessed. Sites frequently redirect visitors to alternate web pages (or totally block access) based on what user is searching for, where the user is located, what language the user is searching in, and what time of day the user accessed the site. Uniform resource locator information from the previous sited visited (referring URL) is frequently an OPSEC issue. It identifies some characteristics and interests of the user to the visited site, server, and country. While necessary for an effective search, the use of specific, focused, search terms such as locations, names, and equipment have obvious OPSEC implications[1]. 3. Plan Search Intelligence personnel use their understanding of the supported unit’s mission, the SIRs, indicators, and the Internet to plan, prepare, and execute their search. The SIR helps to determine what information to search for and where to look. The SIR provides the focus and initial keywords that intelligence personnel[1] ISSN: 2289-2265 19 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 4. Conduct Search Intelligence personnel should avoid the temptation of using one favorite search engine to the exclusion of others. Each search engine has its strengths and weaknesses. Organizational standards, research experience, and peer recommendations guide the selection of which search engine to use[1] in any particular Situation. Generally, a thorough search often requires the use more than one search engine and even then, the information may not be complete. As a rule, if a trained analyst or collector cannot find the information using multiple search engines and common search techniques within 30 minutes, it is possible that the information is not on the Internet, not indexed, or not in a retrievable format[1]. At that point, the analyst or collector should seek assistance from other personnel, digital but non-Internet resources such as commercial and in-house databases, and non-digital resources available at government or university libraries. 4.1 Search Engines Intelligence personnel use search engines and search terms to locate Internet sites and find information within the Internet site. Search engines allow the user to search for text and images in millions of web pages. The different commercial and government search engines vary in what they search, how they search, and how they display results. Most search engines use programs called web crawlers to build indexed databases. A web crawler searches Internet sites and files and saves the results in a database. The search engine, therefore, is actually searching an indexed database not the content of the site or an online database. The search results also vary between search engines because each engine uses different web crawlers and searches different sites. Most engines display search results in order of relevancy with a brief description and a hyperlink to the referenced Internet file or site[1]. 4.2 Web crawler Search engines have an index database built by a web crawler. The web crawler or spider is a different application than the search engine. The crawler is like some voracious monster with an insatiable appetite, it roams the Internet 24 hours a day, 7 days a week, searching for information. Once it finds a Website, it then indexes and saves it in a database relevant to the search engine. Some search engines have their own spiders while others use. Commercial contracted spider programs to develop their databases. In addition, each spider may use a different approach to acquiring data. One spider may be programmed to research only the titles of web pages and the first few lines of text. Other spiders research virtually the entire website with the exception of graphics or video files. Because search engines may use different web crawler software with different ways to index and save data, each separate search engine may yield different results. Also, the search engine provider can supplement or alter the spider software’s index to ensure the website of specific customers appears in the index. 4.3 Relevancy Formulas The relevancy formula evaluates how well the query results match the request. For web pages that are commercially oriented, designing the page to achieve the highest ranking has become an art form. For some search engines, the process is simple, the higher the bid, the higher the site’s ranking. Search engines are continually changing their relevancy. Formulas in order to try to stay ahead of web developers. Some web designers, however, load their sites with words like free, money or sex in an attempt to influence the search engine’s relevancy formula. Other web designers engage in practices called spamdexing ISSN: 2289-2265 20 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 or spoofing in an attempt to trick the search engine. The significance of the relevancy formulas to the user is the importance of understanding that the keyword in the search does not necessarily yield the same results with every search engine. This becomes obvious when the user considers that relevancy formulas vary from search engine to search engine and are in a constant state of evolution. In some formulas, the placement of the keywords yields different results if rearranged because the search engine’s relevancy formula places more emphasis on the first words in the search string. Relevancy formulas may also assume importance depending on the type of search being done. For instance, a field-search, which is limited to the webpage itself (for example, title, URL, and date) may be more critical than a full-text search. As search engines evolve, some engines have become adept at finding specific types of information such as statistical, financial, and news more effectively than other engines. To overcome this specialization, software engineers developed the meta search engine. The meta search engine allows the user to query more than one search engine at a time. On the surface this would seem to be the final answer to the search question; just query all search engines at one time. Unfortunately, it is not quite that easy. Since it must be designed to work with all search engines that it queries, the meta search engine must strip out each search parameter to the lowest common denominator of each search engine. For example, if a particular search engine cannot accommodate phrases in quotation marks or a type of Boolean function then the Meta search engine will eliminate that function from the search. The resulting search, in many instances, then becomes too broad and less useful than a well-formatted search using a search engine that the user is familiar with and that is known to be good at locating the type of information required. With an understanding how search engines work, intelligence personnel conduct an initial search using unique key words or key word combinations and, if possible, multiple search engines. Avoid using one search engine to the exclusion of others. Evaluate the relevance and accuracy of the search results to research objectives, indicators, and search parameters. Do not rely on the relevancy formula of the search engine, particularly commercial search engines, to list the most relevant information source at the top of the list. Conduct follow-on searches using refined terms and methods. Refining terms includes inverting the word order, changing the case, searching common misspellings, correcting spelling, and adjusting search terms. Refining search methods includes searching within results that are similar to the desired information. 4.4 Search by Keyword In keyword-based searches, the intelligence personnel should consider what keywords are unique to the information being sought. The analyst or collector needs to determine enough keywords to yield relevant results but not so many as to overwhelm them with a mixture of relevant and irrelevant information. They should also avoid common words such as a, an, and, and the unless these words are part of the title of a book or article. Most search engines ignore common words. For example, if looking for information about Russian and Chinese tank sales to Iraq, the analyst or collector should not use tank as the only keyword in the search. Instead, they should use additional defining words such as Russian Chinese tank sales Iraq . In some search engines, Boolean and Math logic operators help the analyst or collector establish relationships between keywords that improve the search. (for example, Russian tank). If they want to exclude Chinese tank sales from the search result then he uses (Russian tank) NOT (Chinese tank) sale Iraq in the search. The analyst or collector can also use a NEAR ISSN: 2289-2265 21 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 search when the relationship and the distance between the terms are well established. For example, if the analyst or collector is looking for incidents of earthquakes in Pakistan and news articles normally place the place name of the location of an attack within five words of earthquake in the title of body of the article then they use earthquake NEAR/5 Pakistan in the search. FUNCTIONS Must be present BOOLEAN ANDNOT Must not be present NOT May be present Complete Phrase Nested Near Wildcards Stop words OR EXAMPLES Chemical AND Weapon Chemical + Weapon Africa NOT Sudan Africa* Sudan Chemical OR Biological Sales to INDIA (Shining path) White house near airspace Gun*(Gunpowder , Gun sight) OR(Do not ignore OR) () Near Word or Word Table 2: Boolean and math logic operators 4.5 Search in Natural-Language An alternative to using a keyword search is the natural language question format. Most of the major search engines allow this capability. The analyst or collector obtains the best results when the question contains good keywords. One of the major downsides to this technique is the large number of results. If the needed information is not found in the first few pages then they should initiate a new search using different parameters. 4.6 Refine Search Normally, the first few pages of search results are the most relevant. Based on these pages, the analyst or collector evaluates the initial and follow-on search to determine if the results satisfied the objective or requires additional searches. During evaluation, they compare Relevancy of the results to the objective and indicators.Accuracy of the results to search parameters (keywords, phrase, date or date range, language, format).Results from different search engines to identify missing or incomplete information (for example, one engine’s results include news articles but another engine does not). 4.6.1 Modify the Keyword If initial search attempts are unsatisfactory, the analyst or collector can refine the search by changing the following: 4.6.2 Order Search engines may place a higher value or more weight on the first word or words in a multiple word or phrase search string. Changing the word order from insurgents Iraq to Iraq insurgents may yield different search results. 4.6.3 Spelling or Grammar Search engines attempt to match the exact spelling of the words in the search string. There are search engines that do recognize alternate spellings or prompt the user to correct common misspellings. Changing the spelling of a word from the American-English center to ISSN: 2289-2265 22 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 the British-English centre may yield different results. Changing the spelling of a transliterated name from Al-Qaeda to al-Qaida, al-Qa'ida, el-Qaida, or al Qaeda generates different results that may be useful depending upon the objective of the search. Some search engines provide this capability for a sounds like-type search that eliminates or reduces the manual entry of each variation. Looking for common misspellings or common, grammatically incorrect, short phrases may be useful in yielding results from a source for which English is a second language or the language of the webpage is in a second language for the web designer or web contributor. 4.6.4 Case Search engines may or may not support case sensitive searches. Like spelling, some engines attempt to match the word exactly as entered in the search. The intelligence personnel should use all lowercase letters for most searches. When looking for a person’s name, a Geographical location, a title, or other normally capitalized words then the intelligence personnel should use a case sensitive search engine. changing the case of a word from java to JAVA changes the search result from sites about coffee to sites about a software program. 4.6.5 Variants Intelligence personnel use terms that are common to their language, culture, or geographic area. Using variants of the keyword such as changing policeman to cop, bobby, gendarme, carabiniere, policía, politzei, or other form may improve search results. 4.7 Search within Results If the initial or follow-on search produces good but still unsatisfactory results, the analyst or collector can search within these results to drill down to the web pages that have a higher probability of matching the search string and containing the desired information. Most of the popular search engines make this easy by displaying an option such as search within these results or similar pages that the user can select. Selecting the option takes the analyst or collector to web pages with additional, related information. 4.8 Search by Field In a field search, the analyst or collector looks for the keywords within the URL as opposed to searching the entire Internet. The best time to use a field search is when the search engine returned a large number of web pages. While capabilities vary by search engine, some of the common field search operators are: i. Anchor: Searches for webpages with a specified hyperlink. ii. Domain: Searches for specific domains Like: Searches for webpages similar or related in some way to specified URL. iii. Link: Searches for specific hyperlink embedded in a webpage. iv. Text: Searches for specific text in the body of the webpage. v. URL: Searches for specific text in complete Web addresses. With the millions of URLs on the web, the analyst or collector is faced with a myriad of sites that may or may not actually be produced and maintained by the type of organization represented by the majority of web pages in that domain (see Table.3). Certain domains, such as .mil, .edu, and .gov are consistently reliable as being administered and authored by those organizations. Several domains have, over the years of ever-increasing numbers of Internet participants, become highly suspect as to the validity of the organization using such a domain extension. In particular, the open source information gatherer must not take .org, .info, or ISSN: 2289-2265 23 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 .net extensions as necessarily produced by a bona fide organization for that domain. Each country has a two-digit digraph and registers domains with Internet Assigned Number Authority (http://www.iana.org). The country digraph is important because it indicates the site in another country. For example, .uk (United Kingdom) has more than a billion sites indexed; .cn (China) has 700 million sites; .fr (France) has more than 600 million sites, while the domains you listed have fewer than a million sites each (for instance, .aero, .jobs, .museum, .pro). 4.9 Search in Cache and Archive Sometimes a search or an attempt to search with results returns a URL that matches exactly the search objective but when the analyst or collector tries to link to the site, the link or the site is no longer active. If the search engine captures data as well as the URL locator, they can select cached link to access the original data. Another technique is search in an Internet archive site such as www.archive.org for the content. The analyst or collector needs to be aware that this information is historical and not subject to update by the original creators. 4.10 Truncate the Uniform Resource Locator In addition to using the search engine to search within results, the analyst or collector can also manually search within the results by truncating the URL to a webpage. The analyst or collector works backward from the original search result to the webpage or homepage containing the desired information or database by deleting the end segments of the URL at the / forward slash. This technique requires a basic understanding of how webpage designers structure their webpage. Domain .Aero Description Reserved for members of the air-transport industry Restricted to business Unrestricted top level domain intended for commercial content Reserved for cooperative associations Postsecondary institution accredited by an agency on the US Department of educations Operator/Sponsor Social international telecommunication aeronautics New level incorporated Global Registry service US general service administration Afilias Limited Internet assigned number authority .job Reserved exclusively for the us government Un restricted top level domain Used only for registering organization established by international treaties between governments Reserved for human reserved managers .Mil Reserved exclusively for the US military Museum Reserved for museum Biz .Com .coop .edu .gov .info .int .name .net .org Limited liability company Educues Dot co-operational limited liabilities companies US department of defense network information center Museum domain management association Global name Registry Public interested registry Public interested registry Reserved for individual network Indented noncommercial use but open to all communities .pro Professional and related entities Registry pro Source: Internet assigned number authority at http://www.iana.org Table 3: Domain Detail ISSN: 2289-2265 24 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 5. Record Results Intelligence personnel must save the search results that satisfy the research objective. Saving the results enables the analyst or collector to locate the information later as well as to properly cite the source of the information in intelligence reports and databases. While printing a hardcopy is an option, a softcopy (electronic) record of the search results provides a more portable and versatile record. Also, some intelligence organizations have software tools specifically designed for creating a complete record of the webpage content and metadata. The following are some basic techniques for saving an electronic record of the search results. 5.1 Bookmark Bookmark the link to the webpage using the bookmarks or favorites option on the Internet browser. 5.2 Save Content Save all or a portion of the webpage content by copying and pasting the information in text document or other electronic format such as a field within a database form. The naming convention for the softcopy record should be consistent with unit electronic file management standards. As a minimum, the record should include the URL and retrieval date within the file. 5.3 Download Files Download audio, image, text, video, and other files to the workstation. The naming convention for the softcopy record should be consistent with unit electronic file management standards. 5.4 Save Webpage Save the webpage as .mht, .pdf, .doc, .html––or other specified format––that creates a complete, stable record of the webpage content. It may be necessary to include the date and time in the file name in order to ensure a complete citation for the information. 5.5 Record Source As a minimum, record the author or organization, title, publication or posting date, retrieval date, and URL locator of the information in a citation format that is consistent with the American Psychological Association and Modern Language Association style manuals. The following is an example of a American Psychological Association citation for an Internet document: BBC News (2005). Sudan: A Nation Divided. Retrieved 16 May 2005 from http://news.bbc.co.uk/1/hi/in_depth/africa/2004/sudan/default.stm 5.6 Identify Intellectual Property Identify intellectual property that an author or an organization has copyrighted, licensed, patented, trademarked, or otherwise taken to preserve their rights to the material. Some web pages list the points of contact and terms of use information at the bottom of the site’s homepage. When uncertain, intelligence personnel should contact their supporting Judge Advocate General office before publishing information containing copyrighted or similarly protected intellectual property. ISSN: 2289-2265 25 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 Open source research, coupled with an understanding of the COE, is the basis for an operational environment assessment. An operational environment assessment is a technique designed to apply the COE variables to a specific region, nation states, or non-state actors. It encompasses all the conditions, circumstances, and influences that affect the employment of military forces and the decisions of the unit commander. The operational environment assessment consists of a detailed examination and analysis of the eleven critical variables of the COE, their interaction and reciprocal relationships. Based on this analysis, the operational environment assessment identifies trends and issues with which units may have to grapple during their planning, preparation for, and execution of operations. As an unclassified document in whole the operational environment assessment also serves as a useful tool for individual and collective training during preparation for operations in a specific area. Every operational environment is complex, dynamic, and multi-dimensional. An operational environment assessment provides a detailed look at a specific operational environment in terms of the eleven critical variables and their impacts. It identifies the critical relationships between the variables in the operational environment, how they affect one another, and how this affects military operations. Some variables are dependent variables, whose value is determined by that of one or more other variables. For each dependent variable, the assessment identifies the most significant independent variables that are linked to it and shows their impact on the dependent variable under investigation. To understand any operational environment, one needs to study and understand the synergy and interaction of variables and their reciprocal influence on one another. Within the analysis by variables, the operational environment assessment identifies key actors (nation-state and non-state) and assesses their impact on the operational environment. This analysis of variables and actors helps to identify relevant trends and issues in the operational environment over time. Given the dynamic and fluid nature of the operational environment under investigation, an operational environment assessment requires continuous updates and additions in order to remain current and relevant. 6. Critical Variables Open source research must address the eleven critical variables that describe the conditions in the potential operational environment. Collectively, these variables provide a complete framework for thoroughly assessing and understanding the complex and everchanging combination of conditions, circumstances, and influences that affect military operations in any given real-world operational environment. While these variables can be useful in describing the overall (strategic) environment, they are most useful in defining the nature of specific operational environments. The variables do not exist in isolation from one another. The linkages of the variables cause the complex and often simultaneous dilemmas that a military force might face. Only by studying and understanding these variables—and their dynamic and complex combinations and interactions—will the US Army operational and tactical forces be able to keep adversaries from using them against them or to find ways to use them to its own advantage. The eleven critical variables shown in Figure 1 are discussed below. ISSN: 2289-2265 26 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 6.1 Physical Environment The physical environment defines the physical circumstances and conditions surrounding and influencing military forces and the execution of operations. The defining factors include urban settings and other complex terrain, all relevant infrastructures, weather, topography, hydrology, and environmental concerns. Figure 1: Critical variables of the operational environment 6.2 Nature and Stability of the State (or Other Critical Actors) It is important to understand the nature and stability of the state or states with which or in which military operations take place. This variable, however, refers to the internal cohesiveness of the various political actors (nation states as well as non-state actors) with respect to the population, economic infrastructures, political processes, military and/or paramilitary forces, authority, goals, and agendas. 6.3 Sociological Demographics Sociological demographics refer to the traits and trends that have an impact on the human population of a particular group, area, country, or region. This includes its cultural, religious, and ethnic makeup. 6.4 Regional and Global Relationships Regional and global relationships include political, economic, military, or cultural mergers and partnerships. An actor’s membership in or allegiance to such a relationship can determine its actions in terms of support, motivation, and alliance construct. ISSN: 2289-2265 27 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 6.5 Military (or Paramilitary) Capabilities The military or paramilitary capabilities of various actors in the operational environment are a key concern. This variable includes such factors as equipment, manpower, training levels, resource constraints, and leadership issues. The military variable interacts with the other variables, and all the other variables can affect military and paramilitary capabilities. 6.7 Technology Technology represents the level or sophistication of technologies an actor could bring to the operational environment. Their level of integration and exploitation, and any niche technologies are important. 6.8 Information Information involves the access, use, manipulation, distribution, and reliance on information technology systems, both civilian and military, by a nation-state or non-state entity. Information technology is the systems or mechanisms for preserving or transmitting information. 6.9 External Organizations External organizations refer to those entities found in an operational environment, which come from outside the confines of that specific operational environment but could impact the battlefield and related battle space. Such impact could be both positive and negative in nature—at the strategic, operational, and tactical levels and across the entire spectrum of conflict. An understanding of each group’s varying and dynamic agendas, media philosophies, and international connections can be critical to the success of any military endeavor. 6.10 National Will or Actors’ Will Will encompasses a unification of values, morals, and effort between the population, the leadership or government, and the military or paramilitary forces. Through this unity, all parties are willing to sacrifice individually for the achievement of the unified goal. The interaction of military actions and political judgments, conditioned. by national will, further defines and limits the achievable objectives of a conflict, thereby determining its duration and conditions of termination. It is imperative to study not just the national will of the state actors but also the will of the non-state actors (such as ethnic groups, political groups, insurgents, terrorist groups, and criminal organizations) involved in the operational environment. The will of non-state actors often affects the environment more significantly. 6.11 Time The time available for commanders to accomplish missions is determined by the goals and associated milestones established by the national political leadership. It is within this timeframe that all the elements of power—diplomatic, informational, military, and economic—must operate to achieve national objectives. How much time is available and how long events might take will affect every aspect of military planning, to include force package development, force flow rate, quality of intelligence preparation of the AO, and the need for forward-deployed forces and logistics. Time is often in favor of actors other than the US and itsfriends and multinational or coalition partners. Such actors often can afford to prolong the conflict and try to outlast the US will to continue operations in a particular operational environment. ISSN: 2289-2265 28 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 6.12 Economics There may be significant differences among nation-states, organizations, or groups, regarding how they produce, distribute, and consume goods and services. Being able to affect another actor, positively or negatively, through economic rather than military means may become the key to regional hegemonic status or dominance. Economic deprivation is also a major cause of conflict. One actor may have economic superiority over another for many reasons, including access to natural resources or energy. Control of and access to natural or strategic resources can cause conflict. Military personnel operating in this complex environment may need to look beyond political rhetoric to discover a fundamental economic disparity among groups. Variables are fluctuating factors or elements that make up an operational environment. When operational , they define the conditions, circumstances, and influences that affect the employment of military forces and influence the options and decisions of the commander. The starting point for understanding the operational environment are those critical variables that reside in all operational environments and have the greatest impact on the military. See FM 7100 for detailed information on the critical variables and operational environments. An operational environment assessment provides a methodology for examining and understanding any potential operational environment. In effect, this assessment is an application of the COE concept to the specific operational environment under investigation. The methodology involves the following steps: 7. Define Variables Defining a variable simply means describing the nature and composition of each of the eleven variables in a specific operational environment. To help focus and facilitate the research effort, it is necessary for analysts to break down each variable into its subcategories of information (main topics and subtopics). This topic outline defines the scope and focus of the variable and serves as a guide for research on the variable in question. All eleven variables are present in all operational environments, but different operational environments typically will require different outlines within the variables. For example, a landlocked operational environment will not require discussion of coastlines or ports under the Physical Environment variable or of naval forces under Military Capabilities. As analysts begin to populate the outline with information gleaned from their research, further refinements and additions to the headings and subheadings may be necessary.Because of the interrelated and sometimes overlapping nature of the individual variables, some subsets of available information may have a place under more than one variable. For instance, information technology might be addressed under both Information and technology and could have an impact on several other variables. Analysts may determine links between the subcategories of one variable and the same, similar, or related subcategories that may exist under other variables. Thus, it may be necessary for one variable description to repeat some information contained under another variable or to cross-reference or provide an electronic link to it. Such linkages may be obvious when constructing the original topic outlines for the variables, or may become evident later—when analysts are populating the outlines with information. ISSN: 2289-2265 29 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 7.1 Populate the Variable Outlines Analysts then conduct extensive research to populate the outline for each variable.This can be done with relevant information gleaned from all available open sources. The various sources can include official government documents, think-tank products, academic journals, open source periodicals, foreign press, websites, and interviews or discussions with various subject matter experts. Each variable is described as it applies to the specific operational environment in question. This step is an ongoing process, involving continuous updates as new or better information becomes available or when conditions change. Much of the most useful information about the potential operational environment may be available from open sources. An operational environment assessment can reveal key areas where information gaps exist. These gaps may become PIRs during the planning and the execution of operations––to be targeted by further open source research or perhaps by scarcer, more sensitive intelligence means. The operational environment assessment and underlying data form the basis of the GMI database and continuation of the operational environment assessment process when the unit deploys to the AO—possibly layered with additional information from classified sources. 7.2 Analyze Relationships, Linkages, and Trends The next step highlights the key cause-and effect relationships and linkages among the variables.In any operational environment assessment, the key to understanding the significance of thevariables is to understand the relationships among the variables and how these affectmilitary operations in the selected operational environment. Therefore, analysts can develop a matrix for each dependent variable that shows its critical relationships to other variables. This makes it possible to analyze each dependent variable from the perspective of its relationship or connectivity to other, independent variables. From this relational analysis, critical trends and issues become more evident. Clearly it is impossible to show every potential linkage and trend. Analysts should, however, identify and examine the most significant independent variables (linked to the dependent variable) to show their specific relationships to and impacts on the dependent variable under investigation. For instance, the variable of military capabilities is dependent on or influenced by virtually all the other variables. 7.3 Dependent and Independent variables Identify Key Facts and Impacts. Finally, analysts identify and highlight key facts and potential operational impacts for each variable. From the definition of variables and relationship analysis, analysts can attempt to identify trends over time. This trends analysis can provide an understanding of the dynamics of the variables and their impacts in a selected operational environment. Analysis can also identify possible trigger events in the operational environment based on relationships of variables across time. ISSN: 2289-2265 30 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 Sourse:Https://desint-threats Leavenworth army mil/default aspx Figure 2: Example of possible links between Analysis Dependent and Independent variables 8. ANALYZE INFORMATION During analysis, intelligence personnel use a variety of analysis techniques to discern facts, indicators, patterns, and trends in information and relationships between variables. The techniques apply inductive or deductive reasoning to understand the meaning of past events and predict future actions. Each technique is based on facts, observations, or assumptions about the operational environment. Intelligence personnel are mindful of injecting US or US military cultural bias into their analysis, particularly their assumptions Structure of informations systems Docments Local Text analysia 1. Lexical analysis 2. Name recognition 3. Partial syntatic analysis 4. Scenario pattern matching Discover analysis 1. Conference Analysis 2. Inference Template genaration Extracted templates Table 4: Information extraction 9. ASSOCIATION MATRIX Intelligence analysts use the association matrix to establish known or suspected associations between individuals. Direct connections include, for example, face-to-face meetings or confirmed telephonic conversations. Figure 4 provides a one-dimensional view of the relationships and tends to focus on the immediate AO. Analysts can use association matrixes to identify those personalities and associations needing a more in-depth analysis in order to determine the degree of relationship, contacts, or knowledge between the individuals. ISSN: 2289-2265 31 International Journal of Information System and Engineering (IJISE) Volume 1, Issue 1, September 2013 The structure of the threat organization is formed as connections between personalities are made. Figure 3: Example of an association matrix 10. Conclusion Investigators can now find answers to most of their elementary information needs using OSINT from professional software products designed and built for that purpose. These support the complete intelligence lifecycle and user workflows based on established methodologies. They utilize core technology engines in the areas of information harvesting, data fusion, text analysis, link visualization, rules-based alerts and reporting in order to provide today’s intelligence analyst a rich investigation environment and intuitive user experience. At the same time, these provide an excellent return on investment as users can generate critical intelligence with minimal effort. References [1] [2] [3] [4] [5] [6] Department of the Army Information Security Program. 9 September 2000. Available online from Army Publishing Directorate at http://www.army.mil/usapa/epubs/index.html. Productions Requirements and Threat Intelligence Support to the US Army. 28 June 2000. Available online from Army Publishing Directorate at http://www.army.mil/usapa/epubs/index.html. Public Affairs Tactics, Techniques, and Procedures. 1 October 2000. Available online from Army Publishing Directorate at http://www.army.mil/usapa/doctrine/index.html Army Planning and Orders Production. 20 January 2005. Available online from Army Publishing Directorate at http://www.army.mil/usapa/doctrine/index.html The Operations Process. 31 March 2006. Available online from Army Publishing Directorate at http://www.army.mil/usapa/doctrine/index.html Opposing Force Doctrinal Framework and Strategy. 1 May 2003. FM 34-3. Intelligence Analysis. 15 March 1990. FM 34-130. Intelligence Preparation of the Battlefield. 8 July 1994. FM 34-3-61.1. Public Affairs Tactics, Techniques, and Procedures. 1 October 2000. Available online from Army Publishing Directorate ISSN: 2289-2265 32