[go: up one dir, main page]

Academia.eduAcademia.edu
Interdisciplinary Journal of Information, Knowledge, and Management Volume 5, 2010 Egocentric Database Operations for Social and Economic Network Analysis Adio Akinwale, Adebayo Adekoya, and Adebukola Onashoga Department of Computer Science, University of Agriculture, Abeokuta, Nigeria atakinwale@yahoo.com; adekoyaaf@unaab.edu.ng; bookyy2k@yahoo.com Abstract Recent research development on egocentric social network analysis focuses on the causes and consequences of relationships among sets of people rather than on the features, attribute data of individuals, and their basic database operations. This work describes a record type of network data model that would be suitable to perform database operations in social network analysis (SNA). It describes two schemas that would be needed for web space interface and database operations. A force directed algorithm was used to display membership vertices that show link membership number, name, and pictures. A Breadth first search algorithm was then embedded in the algorithm to avoid endless searching in cycled nodes. Measurement of degree of centrality, betweenness, and closeness was incorporated to determine the relationship among the nodes and their linkages. Implementation of database operations into SNA reveals not only social information but also critical economic reports of the individuals. Observation of the work showed that users can impose any relevant query related to the defined schemas and the results would be available to monitor both social and economic activities of the memberships. Keywords: Egocentric, database operations, record types, schemas, measurement Introduction A social network is basically a set of actors and relations that hold these actors together. Actors can be individuals (egos), organizations, departments, or families. Actors exchange information and knowledge among themselves. These kinds of information exchanges are considered a social network relation where individuals who maintain the relation are said to maintain a tie. The simplest form of a social network consists of actors or events and their connections to each other. The use of a social network depends on the availability of relational schemas rather than the attribute data. It further allows the mapping of relationships among people that can be used to identify knowledge and information flow such as Material published as part of this publication, either on-line or who people seek assistance from during in print, is copyrighted by the Informing Science Institute. economic hardship or share information Permission to make digital or paper copy of part or all of these with. By using social network analysis, works for personal or classroom use is granted without fee users can be aware of how information provided that the copies are not made or distributed for profit flow through social ties and how people or commercial advantage AND that copies 1) bear this notice in full and 2) give the full citation on the first page. It is peracquire information. missible to abstract these works so long as credit is given. To copy in all other cases or to republish or to post on a server or to redistribute to lists requires specific permission and payment of a fee. Contact Publisher@InformingScience.org to request redistribution permission. Egocentric social network analysis focuses on the network of individuals where each person has his or her own Editor: Eli Cohen Egocentric Database Operations network of relationships that may traverse to many different groups. These relationships contribute to individual’s behaviour and attitudes. The strength of egocentric analysis is its ability to capture the diversity of a social environment. The strength may range from weak to strong which depends on the number and types of social activities and resources. In egocentric studies, participants are asked for a list of their network members (alters). Researchers are then interested in constructing a network that is composed of each individual and his or her links to the alters. Each alter links to other alters. It is difficult to survey each of the alters when there are many links among alters and egos. It is then possible for researchers to rely on participants in reporting their relationships to their alters (Coleman, Kartz, & Menze, 1987). Researchers are always interested about the existence of a tie and possibly its strength and avoid questions about asymmetric ties because these are unlikely to be well understood by respondents. For example, respondents are not likely to know if two of their co-workers know each other unequally. In addition, consider a situation where there are cooperative societies trying to understand the way each member borrows and pays back loans granted. Literature Review on Egocentric Graphs and matrices are normally used to present social network and qualifying important structural properties. Graphs are instruments of reasoning about quantitative information. Often the most effective way to describe and summarise a set of numbers, even a large set, is to look at the pictures of those numbers through graph presentation in a graph. Graphic presentation such as set partition cannot only describe data in different ways but can also facilitate the comparison among sets of data, stimulate scientific innovation, and even encourage theoretical insights. All work done so far on egocentric social network was based on visualization of every member. Information visualization of egocentric social network consists of an appropriate transformation of input data to output graphics (Chalmers & Chitson, 1992). Accordingly, it can be argued that a visualization method is acceptable only if it clearly identifies the relevant information of ego in relation to alter and defines an appropriate mapping to generate their connections. Effective visual attributes for information coding are position, size, shape, colour, motion, etc. Light (2001) believed visualization should be a means of involving the user in a search for results rather than creating visualization to present results to the users. Nardi et al. (2002) used spatial grouping and colour redundantly code for various community groups within a visualization of a user’s e-mail contacts. TouchGraph (http://www.touchgraph.com/navigator) used a force directed layout to present a network visualization of users by allowing personal networks to be expanded or contracted by the users’ interactions. Boyd (2004) analyzed users’ instant messaging buddy list to present a static network visualization of their contacts. Viégas, Boyd, Nguyen, Potter, and Donath (2004) visualized personal e-mail archives in a zoomable network view and included temporal filtering to visualize contacts at various time periods. They found their work to be particularly well suited as a memory aid for past contexts and contacts. The work was designed to be an on-line dating site with profiles, demographic and interest driven search together with a private messaging system. In this system, people regularly update their profiles, add new photos, change tests dependent on their friends and writing testimonials for their friends in order to encourage reciprocity. In 2005, Louis Tomsic stated that Stanley Milgram in the 1960s developed small world studies that represented an interesting and less conventional use of egocentric network data. In the studies, respondents in various parts of the United States of America were told the name, occupation, and city of residence of some target people and were asked to mail a packet of papers to a particular person if and only if they knew the target person. They were asked to send the packet to someone who they do know and who they do believe has a chance of knowing the person. Track- 294 Akinwale, Adekoya, & Onashoga ing the path of the packets provides information on how people know each other and on the average number of links between pairs randomly chosen people in a large society like the United States of America. Currently, there are many comprehensive tools developed for egocentric analysis. Among them are UCINET (http://www.analytictech.com), Pajek, Agna, NetDraw, NetMiner, MultiNet, and Stocnet. Developers of these tools believe that human eyes are powerful network analytic tools. These tools assist analysts to easily understand the structure of information in egocentric network and statistical models of social networks. In practice, there are many egocentric network developers who have designed and implemented egocentric systems but failed to include database operations. Problem Statements Researchers in egocentric social network analysis only do investigation on the existence of a tie between ego and alter and possibly its strength. Apart from adding new members or searching for existing ones in a database, they always avoid the basic database operations such as selection, projection, join or delete a member since it is possible for a respondent to leave the network. Other database operations may include arithmetic operations, union or intersection of member profile for matching pattern and measurement of value of the members in the network. The work done so far on this subject considered a lot of efforts on social network analysis rather than economic network analysis. This work takes a more instructive view on egocentric economic network analysis by examining economic activities of ego in relation to alters and associated interconnections with other alters in another economic environment. It would capture economic information of egos and alters in a single database while social information would be on web space. Through social network analysis, the work would measure the value of economic and social relationship among members. If the value falls within a certain range, the social link would be coloured for effective monitoring. Query operations would be performed on the attribute data of the members. Measurement of Social Network Analysis Social Network Analysis (SNA) normally uses data mining techniques to mine a large volume of association data and discover mutual relationship among respondents. Centrality is one of the most studied concepts in SNA and normally used to identify key members who play important roles in a network. According to Ab Raub et al. (2007), centrality measures can be categorized into three: degree, closeness, and betweenness. Degree Centrality The out-degree centrality is defined as follows: g CD (ni ) = d (nij ) = ∑ xij j =1 where xij = 1 if there is link between nodes i and j , and xij = 0 if thereis no such link To measure centralisation in a network, there is a need to observe the difference in the centrality valued of the most central nodes and all the other nodes. Then, to arrive at the centralisation value, we calculate the ratio of the sum of actual differences and the sum of the maximum possible differences. Centralisation is thus defined as: 295 Egocentric Database Operations ∑ r= g i =i [max( Di ) − Di ] ( g − 1)( g − 2) where Di is the number of people in the network that are directly linked to person i. The number of respondents (actors ) is represented by g in this equation. Closeness Centrality A node that lies at a short distance between many other nodes is considered as closeness. Closeness centrality is calculated as follows: Ji Cc (ni ) = ( g − 1) g [ ∑ d ( ni n j )] j =1 Ji where stands for node i ni d(ni, nj) defines the length of the shortest path from node i to node j g is the total member of nodes in a graph Ji is the total number of actors who are reachable from ni Betweenness Centrality Betweenness centrality measures the extent to which a particular node lies in between the other nodes of the network. It is the number of shortest paths that pass through a given node. The betweeness centrality of node i, denoted as b(i) is obtained as follows: b(i ) = g jik ∑g j ,k jik where gjik is the number of shortest paths from node j to node k (j, k ≠ i ) Another measurement of SNA is network density. It represents the actual number of ties in a network as a ratio of the total maximum ties that are possible with all the nodes of the network. A fully dense network has a network density value of 1 which indicates that all nodes are connected to each other. A network with a density value near 0 indicates that it is a sparsely-knit network. For an undirected graph as the case of SNA, network density is calculated as follows: D= 2 Ni N ( N − 1) where Ni are the number of the ties and N is the number of the nodes Society Network Data Model Society network data model can be 1:1, 1:N, or M:N relationships. Figure 1 illustrates the link relationship between the record types ego and the alter. In general, each ego may relate to many 296 Akinwale, Adekoya, & Onashoga alters and each alter may be related with many egos. The fact that alters are related in both ego1 and ego2 as shown in Figure 1 makes this an N:M relationship. An N:M relationship can be represented without using N:M links between record types. Instead an intermediate record type and two 1:N links can be used. In this way a functional mapping is maintained. The introduction of the intermediate record type is not always artificial. For instance, it can be used to contain data common to both egos and alters. For database operations, the network database is a set of record types and links defined as follows: - there is a set of named links connecting the record types in data structure of both ego and alter - every link is functional in at least one direction (partial functionality is allowed ) ego1→alter1 that is ego1 is functional determined alter1 or alter1 is functional dependent on ego1. - there is no link in form of egoii that is from a record type to itself ( reflexive) alter1 ego1 alter2 alter3 ego2 alter4 ‐ - ‐ - ‐ - egon alterm Figure 1: Relationship between record types of ego and alter In addition, all logical units of data are declared in a schema and subschema. This type of set membership of records is optional and automatic; an optional member can join and leave the network society at any time while an automatic member is automatically assigned to other existing members (Lightstone et. al., 2007). In the system, a relational identification number or text is used as database key. In addition, a record can have an independent existence apart from the links among record types. That is, a record does not have to be connected to another record even though a link is defined between the two record types. Deletion of a record in an egocentric network model does not necessarily trigger the deletion of all records connected to it. Just as a record is explicitly connected to another record an effort is made to disconnect a record from another record. Therefore, it is possible to remove all of the connections and then delete the records. It is also possible to have the removal of certain connections implied by the deletion operations or to trigger the deletion of certain records connected to the deleted record. In this system, it is possible 297 Egocentric Database Operations to combine deletion and removal of connection between records so that certain connections are automatically removed while other connections cause the associated records to be deleted. In this type of network database, there can be several paths from one record to another record. This makes it necessary to have several current records in the database in form of pointers. These pointers keep track of where it is in the database. It is also possible to have cycles or endless loop. This work sets the times of a cycle to one. Force Directed Algorithm of the Network The work used force directed algorithm of Koren and Harel (2002) that could display a large number of vertices. Figure 2 illustrates the force directed algorithm of the network. The algorithm was fine tuned to reflect the semantic relationship among the alters, egos and their central degree of measurements. The vertices represent picture of the members, names, and link membership numbers. Breadth First Search algorithm was integrated into search to avoid repetition in a cycle. //The aim is to find Layout, L of graph G Begin Set_Time(); Layout(G(V,E)); k = minSize = 10, radius = 7; While k ≤ |V| do begin Centres := k_Centres(G(V,E), k)); Search link_identification number using BFS(k); If(photo_member_identification_existing()==false and k_existing()== true ) Add photo identification number into database L := add(if(photo_identification_number != existing())) If L_existing() then L := (linkid, name, member_Picture) localLayout(L(centres), radius, 4) for every v ε V do begin L(v) ε L(centre(v)) + rand K:= kRatio return L End Compute Xij for 1 ≤ i≠ j ≤ n; Compute Gijk for 1 ≤ i ≠ j ≠ k ≤ n; Compute Dij for 1 ≤ i≠ j ≤ n; end end. Figure 2: Force Directed Algorithm of the Network 298 Akinwale, Adekoya, & Onashoga K_Centres(G(V,E), k) S:= {v} for some arbitrary v ε V For i:= 2 to k do begin Find the vertex u farthest away from S LocalLayout(L, k, 4) For i:=1 to 4 * |v| do begin Choose the vertex v with the maximal Δ vk Compute BFS( ∂ v ) k S := S U {u} Return S end L(v) := L(v) + ( ∂ v (x), ∂ v (y)) k k end Figure 2a and 2b: Called Functions of Force Directed algorithm of the Network Data Collection The process was applied to cooperative society network analysis in Abeokuta, Nigeria, as a case study of social and economic network analysis. There are sixty five cooperative societies in Abeokuta, Nigeria, with a minimum number of 251 memberships and a maximum number of 500. There is an assumption that needs to be taken in calculating the measurement of degree in cooperative societies in Abeokuta, Nigeria. First, the proportion of alters in cooperative society are members of the population. It is assumed that every member has an equal chance to know each member from a given cooperative society. This means there is probability that every member has a perfect knowledge of other members. More so, every member has the right to join more than one cooperative society. Bernard, Johnson, & Robinson (1989) expressed the equation as follows: m e = c t where m is the average of alters that respondents know in the subpopulation, c is the average size of the respondent’s network e is the total size of the subpopulation and t is the total population Relational data are collected from cooperative societies along with attribute data. The relational and attribute data are information that is stored in schema and subschema. The schema contains social network analysis data of each member as follows: SNA_Data1, SNA_Data2, SNA_Data3,…,SNA_Datan (member identification number, link identification number, member name, age, gender, e-mail, phone number, passport photography) The information on this schema serves as egocentric online social network input interface and they would appear on web space. The subschema contains cooperative network analysis (CNA) data of each member as follows: CNA_Data1, CNA_Data2, CNA_Data3,…,CNA_Datan(member identification number, link identification number, passport photograph, member name, occupation, registration number cooperative society, name cooperative society, loans amount, past loan amount, outstanding loan amount, date loan given, first sureties name, second sureties name, borrower comment) 299 Egocentric Database Operations This information on both schema and subschema are stored in Microsoft SQL Server. For security purpose and to able to track members’ information, member identification number, link identification number, and passport photograph are used as candidate keys while passport photograph is used as primary key. This would assist tracking a situation whereby a member can join different cooperative societies with different particulars such as name, occupation, age, photograph, etc. The attribute of borrower comment is created as text and two questions are needed to be provided by the member. They are as follows: • to whom do member turn for assistance during financial needs • who are the friends among his or her cooperative members who always stand for him/her as guarantor? These statements serve as security for checking the economic network analysis of the alters. Implementation and Results The force directed algorithm was implemented in Java Programming Language and connected into Microsoft SQL Server, which serves as back end. On execution, the program loads the network environment, which consists of web space and database section. Web space displays the attributes of the membership while the database section keeps values of both schema and subschema. Figure 3 shows the user input interface of social network analysis concept space that contains the details of add, search, edit, and delete. As shown in Figure 3, nodes are labelled with link membership number, name, and member passport photograph. Moving the cursor over a node will pop up the tool tip showing the attributes of the memberships. The user can select as many operations as the user wants from the input interface and conduct database operations such as projection, selection, join, union, sort, add, delete, and search. It is also possible to view the number of alters in the network that are directly linked to an ego or the number of egos reachable from any node. For example, to add a new member into the network, the add detail panel is required to be filled. Deleting a member from the web space or database requires both search and deletes detail panels. Figure 3 shows a sample of seven members being added using add panel details while Figure 4 illustrates the delete operation by reducing the member to six using search and delete panel details. On clicking the photograph, a box appears which requires the password to be entered by any executive member of the cooperative society. After supplying the password, another query operation box will also appear for entering any database query operation. For example it is possible to check if a member has joined more than two cooperative societies and his/her financial outstanding amount as follows: Select passport photograph, sum(outstanding loan amount) From CNA_data1, CNA_data2,…,CNA_datan Where passport photograph == “image” Group by 300 passport photograph Akinwale, Adekoya, & Onashoga Figure 3: Using add panel detail to add member into web space Figure 4: Using search and delete panel details to remove a member Assuming the result of the above query indicates that a member joined more than one cooperative society and the outstanding loan balance was more than 0.5million naira, a trigger function would be called to show a blinking red edge link between the member ego and alter. This permits the executive member of the cooperative societies to pay a great attention to such member. After execution the query, another browser is invoked to present database operation results for report analysis. The operation panel permits the users to manipulate different database attributes of members. Conclusion and Future Works The paper included database operations into social network analysis for performing relevant queries related to the defined schemas and subschema. Effort was made to incorporate force directed algorithm to avoid edge crossings and uniform distribution of nodes and edges. The embedded structural query language trigger constraint served as an objective of discovery of defaulters. The process permits viewing all egos in relation to the alters and monitoring their social and economic activities. For future works, in case the number of membership reaches 200 or higher then, the 301 Egocentric Database Operations input interface may appear crowded and complex. To solve this problem, an alternative two layer hierarchical interface is proposed as future works that first displays society cluster and later enlarge the cluster to see each member detail. References Ab Raub, R., Nor Hamzah, A. H., Mohamad, A., Mohamed Bandi, A. F., Hasan, S., Abdullah, S., & Jaafar, M. D. (2007). Network analysis in analyzing potential fraudster. Proceedings of the International Conference on Electrical Engineering and Informatics, June 17-19, Indonesia. Available http://repository.gunadarma.ac.id:8000/679/2/eRepository-eprint-679.txt Bernard, H. R., Johnson, E. C., & Robinson, J. (1989). Estimating the size of an average personal network and of an event subpopulation. In M. Kochen (Ed.), The small world (pp. 159-175). Norwood, NJ: Ablex. Boyd, D. (2004). Publicly articulated social networks. Proceedings of the Conference on Human Factors and. Computing Systems (CHI 2004) Vienna, Austria, 1279-1282. Chalmers, M. & Chitson, P. (1992). Beed: Exploration on information visualization. Proceeding of the 15th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp 330-337, Coleman, J. S., Kartz, E., & Menzel, H. (1987). The diffusion of an innovation among physicians. Sociometry, 20(4), 253-270. Koren, Y., & Harel, D. (2002). A fast multi-scale method for drawing large graph. Journal of Graph Algorithms and Applications, 6(3). Light, J. (2001). Portable document indexes. In U. M. Fayyad, G. G. Grinstein, & A. Wierse (Eds.), Information visualization in data mining and knowledge discovery (pp. 99-102). San Francisco, CA: Morgan Kaufmann. Nardi, B., Whittaker, S., Isaacs, E., Creech, M., Johnson, J., & Hainsworth, J. (2002). Integrating communication and information through ContactMap. Communications of the ACM, 45(4), 89-95. http://portal.acm.org/citation.cfm?id=505251&amp%3Bdl=ACM&amp%3Bcoll=portal Tomsic, L. (2005). Establishing a community of practice for collaborative learning and social network analysis. Viégas, F., Boyd, D., Nguyen, D., Potter, J., & Donath, J. (2004). Digital artefacts for remembering and storytelling: Posthistory and social network fragments. Proceedings of the 37th Annual Hawaii International Conference on System Sciences, HICSS 04. 302 Akinwale, Adekoya, & Onashoga Biographies Akinwale Adio Taofiki received his Magister in Informatics from Warsaw University, Warsaw, Poland, M.Sc./PhD in Economic Cybernetics and Computer Science from Oskar Lange University, Wroclaw, Poland. His area of research interest is knowledge database system and query database optimization. Adekoya, Adebayo is a Lecturer in Department of Computer Science, University of Agriculture, Abeokuta, Nigeria. He has B.Sc (Hons) and M.Sc in Computer Science from Ogun State University (Now Olabisi Onabanjo University), Ago-Iwoye, Nigeria in 1993 and University of Agriculture, Abeokuta, Nigeria in 2002 respectively. He is currently at an advanced stage of his Ph.D research work in Computer Science. His area of research interests include Computer Networks & Data Communications, Ontology and Semantic Web, and Artificial Intelligence. Onashoga Adebukola is a lecturer in the Department of Computer Science, University of Agriculture, Abeokuta, Nigeria. She just completed her research work for the award of doctorate degree in Computer Science from the Department of Computer Science, University of Agriculture, Abeokuta Nigeria. She has published in both international and local journals. Her current research interests include Computer Security, Data mining and Artificial Intelligence. 303