Big Data Analytics Unit-2

Notes

Uploaded by

prasathdhanam66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

146 views30 pages

Big Data Analytics Unit-2

Notes

Uploaded by

prasathdhanam66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 30

r UNIT II | 2 NoSQL Data Management Syllabus Introduction 10 NoSQL - aggregate data models - key-value and document data models - relationships - graph databases - schemaless databases - materialized views - distribution models - master-slave replication - consistency - Cassandra - Cassandra data model ~ Cassandra examples = Cassandra clients Contents 2.1. Introduction to NoSQL 2.2 Aggregate Data Models 2.3. Schemaless Databases 2.4 Materialized Views 2.5 Distribution Models 26 Consistency 27 Cassandra 2.8 Two Marks Questions with Answers (2-1)Big Date Analytics El Introduction t “re ing huge vol lem of handling huge volume g s o 7 ‘SQL databases are schema fre NoSOL databases are open source, , which means that information jg \d_ database, te or local. This ensures 1 NoSQL NoSQL means Not Only data that rel and are non-relationa ibute NoSQL is also type of distribute ach can ena copied and stored 0” various sever he data re Seine, the resto th availability and reliability of data. If 80 database can continue tO run. NoSQL encompasses structured data, semi-structured data, unstructured data ang polymorphic data. ; No SQL database provides @ mechanism for storage and crs i - that employs less constrained consistency models than traditional relational databases. NoSQL is a response to nowadays business data related factors ? referring to the ability to handle large datasets that arrive 1. Volume and velocity, quickly; 2, Variability, referring to how diverse data types don't fit into structured tables; 3. Agility, referring to how fast an organization NoSQL databases are very often referred to as data stores rather thai NoSQL systems work on multiple processors and can run on low-cost separate computer systems - No need for expensive nodes to get high-speed performance. It supports linear scalability. Every time we add more processors, we get a consistent increase in performance. responds to business changes. mn data-bases. Histoty ‘of NoSQL : = The acronym NoSQL was first use his lightweight, open-source “relational” name came up again in 2009 when Eric Evans and Johan Oskarsson use to describe non-relational databases. Relations databases are often referred to as SQL systems. The term NoSQL eat a aaa * - systems" or the more commonly accepted y SQL," t : : support SQL-like query ae SS es NoSQL developed at least i ‘ t in the beginni: Beste : ginning as a response te , the processing unstructured data and the oy for Cae peocesing The NoSQL model us - : es a distrib ia aie ae uted database system, meaning a syst¢™ d in 1998 by Carlo Strozzi while naming database that did not use SQL. The rd it TECHNI CAL PUBLICATIONS® - an up-thrust f for knowhednaig Data Analytics » Not only can NosoL but they SQL systems hy organizations such as Faceboon eet NoSQL systems, aioe These organi ‘ganizations uunstructured data, coordinating it to find eens, Temendous amounts of Big data became ai ind p: atterns and gain business insights. why NoSQL ? in official term in 2005. mn handle tre | e It cat large volumes of Structured, semi-structured and unstructured dat: 3 .. . % ae ™ Agile sprints, quick iteration and frequent code pushes Ee occu that is easy to use and flexible. types of NoSQL Stores : 1. Column Oriented (Accumulo, Cassandra, Hbase) 2, Document Oriented (MongoDB, Couchbase, Clusterpoint) 3. Key-value (Dynamo, MemcacheDB, Riak) 4, Graph (Allegro, Neo4j, OrientDB) ¢ NoSQL databases are guaranteed to adhere to two of the CAP properties. Such databases are of several types. 1. Key-value store : Stores in the form of a hash table (Example - Riak, Amazon $3 (Dynamo), Redis) 2. Document-based store : Stores objects, mostly JSON, which is web friendly or supports ODM (Object Document Mappings). (Example - CouchDB, MongoDB) 3, Column-based store : Each storage block contains data from only one column {Example - HBase, Cassandra} 4. Graph-based : Graph representation of relationships, mostly used by social networks. {Example - Neo4]) Br The Definition of Four Types of NoSQL Databases les a mechanism for storage and retrieval, of data that is than the tabular relations used in relational databases. ted as Not-only-SQL to emphasize that they may also Most NoSQL databases are designed to store * NoSQL database provid modeled in means other NoSQL is often interpre support SQL-like query languages. iti in a fault-tolerant way: co uae oo at is used to describe a family of databases that are fae i a SBNSSOL ie siroply peter ies, data types and use cases vary Wi lely Jonal, While the technologies = ee me cay agreed that there are four tyPes ‘of NoSQL. databases amount them, 1 ® TECHNICAL PUBLICATIONS” - &” up-thrust for knowledgezon 5 any of four primary days sig Data Anaiytics Lased and graph based. information usi can manage panes .4, column datal oN store, document-base models : Key-value Be Example and Advantages GOL databases ; 3 a cena fan open source, JSON document-based database that uses a) b JavaScript as its query Tangues® b) Elasticsearch, ©) Couchbase, a key-val build responsive and flexible PP computing. Advantages : a) NoSQL databases have a simple and flexible b) NoSQL databases are pased on key-value pairs. include column store, document store, c) Some store types of NoSQL databases key. value store, graph store, object, store, XML store and other data store modes. database that inelucdes a full-text search engine, se that empowers developers to mobile and edge a document-based te and document databa: ications for cloud, structure. They aré schema-free, abase stores also allow base has a key. Some NoSQL dat not just simple string d) Each value in the. datal developers to store serialized objects into the database, values. e) Open-source NoSQL databa run on inexpensive hardware, «Disadvantages : a) Most NoSQL databases do not suppo! supported by relational database systems. b) In order to support reliability. and consistency features, developers must implement their own proprietary code, which adds more complexity to the system. EZE cap Theorem * Fig. 2.1.1 shows the three properties of the CAP theorem. ° Th istril ea aaa ote data systems will offer a trade-off between Z lity and partition t alas areca hones Pi sia olerance. And, that any database can only ses do not require expensive licensing fees and can rendering their deployment cost-effective. rt reliability features that are natively © Consistency : E i Cent oe nee pei in. cluster responds with the most recent data, eve? the request until all replicas update. If you query * TECHNIC ICAL PUBLICATIONS® - an up-thrust for knowledgegel Analytics 2-6 NoSQL Data Management Fig. 2.1.1 CAP theorem you will wait for that *consistent system” for an item that is currently updating, will receive the most response until all replicas successfully update, However, you current data. ity : Every node returns an immediate response, "available system” for an item th ervice can provide at that Availabilit even if that response is not the most recent data. If you query an at is updating, you will get the best possible answer the s moment. stem continues to operate ever: if a : Guarantees the sy’ with other replicated data nodes. Partition tolerance + replicated data node fails or loses connectivity ‘on of SQL and Nost QL. Databases srctused query Nos databases Tave dynamic schemas ed schema. for ‘unstructured data. é oS § SQL databases use Sue e a predefin Janguage and bev NosQL databases are document SQL databases are table-based. Key-value, graph, oF wide-column stores _ ee spultirow while NoGOl better [oh unstructured sh daabacs ae beter or MITT | lik dosumgns 8 SON _fransadtons Ba Aggregate Data Models collection gate is tg that are treated as a unit In NoSQL of data that interact as a unit Moreover, of object rm the boundaries for the ACID © Aggregate means @ Databases, an aggre! these units of data or 96 operations. ‘an up-thrust for knowledge @ TECHNICAL PUBLICATIONS« Aggregate data ™ a er the clusters a5 dl ‘ js retrieve’ “Is in NoSQL- D transaction: ut s and Sactitg aa the databases tO manage g a ow reside on a Il the data a 5 IL ate data aggreea vom the database C) SQL do not support A new e help of aggregate data models in No atabase. We can achieve hj.” NoSQL database if the Pi a e aggregate. Aggregate data a iD properties. can ‘ ee LAP operations on the d in the one easly perform OLAT, A mmodels ici legate data Miiciency of the aB8r°8" i a transactions and interactions tring of characters and the ea Key-value Store the key is usually a simple © to the database. Key-value structure, that are opaque Jn the key-value value is a series of uninterrupted bytes store is like @ relational database with only two columns : The key or attribute name and the value. Fig. 22.1 shows key-value store: value "Name Key F Mobile Number [Aadhat ja=p[ rie] Fig, 2.2.1 Key-value store oe data as a group of key value pairs, which are made up of two data items = are linked. The link between the items is a "key” which acts as an identifier for an item within the data and the ‘value that is the data that hy b identified. oc The data itself i : imiti 1 itself oe some — data type (string, integer, array) oF 4 {tan application needs to persist and access directly: flexible data mode! structures as their it more complex obj Thi ae ie ig sei of relational schemas with a more ; levelopers to easil} ify fi oe ly modify fields and object st Key value systems tre: : : at the dat i ee tbe a a as a single opaque collection which may ha In each key value pair, ‘ a) The key is represented by an arbitrary strin, b) The value can be any kind of data i 7 ai an image, file, text or document. TECHNICAL PUBLICATIONS® "ONS® - an up-thrust for kn r knowledgeNoSQL Data Management 5 9g 8 é - a = z 2 g a i simply provid Fi le a wa ple GET, PUT and DELETE The simplicity of this model : 1 tak; portable and flexible, akes a key-value store fast, easy to use, scalabl , , scalable, Advantages of key value stores ; a) The secret to its speed lies in : ts simplici request to the object in mer plicity. The path to retrieve data is a direct ry or on disk b) The relationship betw : ee b between data does not have to b inguage, there is no optimization performed. e calculated by a query ¢) They can exist on distributed 5 store indexes. Disadvantages of key value stores : a) No complex query filters ystems and do not need to worry about where to b) Alll joins must be done in code ©) No foreign key constraints d) No trigger. Document-based — A document is an object and keys (strings) that have values of recognizable types, Booleans and strings, as well as nested arrays and dictionaries. so there is no need for cross-referencing and tion in a table, it is stored ina document. including numbers, All data is stored in one table, instead of storing informal nt based data model. Fig. 2.2.2 shows docume! Documents ument based data model flexibility. They are modif} Fig. 2.2.2 Doe! are designed for jatabases are therefore easy to not typically forced to * Document d have a schema and ‘an up-thrust for knowleds TECHNICAL PUBLICATIONS2-8 NoSQk. Data Manageman, Big Data Analytics ires the ability to store varying attributes along with larg, mt databases are @ good option. te formats including XML and JSON. This 1a without an impedance match. « If an application requ amounts of data, docume! Document stores work with multip! allows for storage and retrieval of dat t data store are as follows : Terminologies in document a) A table is called a collection b) A row is called a document ‘A column in called a field. syeiod ‘use eases for document stores include the storage and retrieval of catalogs, biog posts, news articles and data analysis. MongoDB and Apache CouchDB are examples of popular document-based databases. Do not use document databases for transactions across multiple documents (records) and Ad hoc cross-document queries. Advantages of document based model : a) Faster retrieval of data. b) Dynamic architecture for unstructured data and storage options ¢) Sharing for horizontal scalability nally, so chances of accidental loss of data is ) Replication is managed inte negligible. Disadvantages of document data model : 2) No views, triggers, scripts or stored procedure. b) Relationship not well defined. ©) No support for transactions, which could lead to data corruption. 2% Column-based * Column-based is also called ‘wi ; ide column’ models enabli : using a row key, column name and cell timestamp. ling very quick data access * The flexible ! aaa deci - these types of databases means that the columns do not actoss records and you can add a column to specific rows As data is organized 4 into colum: key-value stores, wumns, we have better indexin - Also, when it comes to updates, multiple 5 aaa to = ‘ umn upda can be aggregated, ee CHNICAL PUBLICATION: M1 Up-thrust for knowlacinaNoSQL Data Management Google oj ogle open sourced its i -ynown Google e-mail : cd Bij its implementatic ell-no gle e-mail service, Gmail ig Table. Apparently, the data f Te NoSQL Database. 1 is stored in the Google Bi : e le Big Table e wide, columnar stores data mod ae el, li derived from Google's BigTable paper. ike that found in Apache Cassandra, are nizations mostly u: Cee ae ay use Column data stores for data 7 proces: Br evident in services such as Amazon R warehousing and data zon i advantages of column data stores : ae a) Column stores are very efficient at data compression and/or partitioning. b) Columnar stores can be loaded extremely fast.» ¢) Columnar databases are very scalable. d) Due to ‘thelr structure, columnar databases perform particularly well ‘with aggregation queries. Disadvantages of column data store : : a) Updates can be inefficient. The fact that columnar families group attributes, as opposed to rows of tuples, works against it b) If multiple attributes are touched by 2 join or query, this may also lead to column storage experiencing slower performance. «) It is also slower when deleting rows from columnar systems, to be deleted from each of the record files. as a record needs EEX] Graph-based * The modem graph database js a data storage and processing engine that makes the persistence and exploration of data and relationships more efficient. data in nodes that are connected by edges. These Guaphbased dat stor OL are widely used for storing the huge volumes Ay in NoSQ] : 7 ‘ ggregate Data Models i mn dimensional data having many interconnections of complex aggregates between’ them. In graph theory, structures are composed of later be called "data relationshiP® es ae oes je think, alee Graphs behave similarly 1° how people thinks Or uarly useful for visualizing discrete units of data. i e type This database OF gwen different pieces of data. analyzing, of helping to : vertices and edges, oF What would find connection ——~ puBLiCATION:Big Data Analytics 2-10 NoSQL Data Managemen, l nologies for recommendation engine, ° As ult, businesses leverage graph tec s a resi Examples of graph-based NoSQL database, fraud analytics and network analysis. include Neodj and JanusGraph. * Graph databases can be used to analyze customer interactions, social media ang scientific applications where it is crucial to traverse long relationship graphs ¢g better understand data. © Advantages of graph data + a) More descriptive queries b) Greater flexibility in adapting your model ©) Greater performance when traversing data yh data stores : relationships. * Disadvantages of grap a) Difficult to scale }) No standard language. NoSQL Key/Value Database : MongoDB + MongoDB is an open-source document database that provides high performance, high availability and automatic scaling. MongoDB is one of the most popular open-source NoSQL databases written in C++. As of February 2015, MongoDB is the fourth most popular database management system. It was developed by a ry 10gen which is now known as MongoDB Inc. * Why use MongoDB ? a) Simple queries b) Functionality provided applicable to most web applications ©) Easy and fast integration of data d) No ERD diagram : = = for heavy and complex transactions systems not provides any comm: " : eed ee fe and to create a "database". Actually, you do ease ae ae cause, MangoDB will create it on the fly, during database, lue into the defined collection (or table in SQL) and * MongoDB is a docu: iment-oriented di rt phen ea teary fatabase which stores data i lata -li oan ae schema, It means you can store = ities ie e data ‘structure @ your records without is 2 vies, Monge on as the number of fields. or types of fields a mi imil *. MongoDB stores data record: en oe ea Is as BSON docum ; i JSONdocuments, though it contains a fat pa lata types than JSON. TECHNICAL PUBLICATIONS® . an up-thrust for knowledaepig D2t8 Analytics NoSQL Data Management ability of MongoDB ting existing ones. This crrays and other mer YOU to represent hierarchical selationshe, : ys er More complex structures easily. lips, to store MongoDB ° Bi uses Mongo server and Mongo shell commands to fetch records or the informatic i fo. ion from the database (ie. collections). Few areas where MongoDB is ideal are big data, user data mana; i gement, mobile and social i management and delivery, data hub. a + A MongoDB instance may have zero or more databases, A database may have zero or more ‘collections’. A collection may have zero or more ‘documents’. A document may have one or more ‘fields’. MongoDB ‘Indexes' function much like their RDBMS counterparts. : * Database is a physical container for collections. Collection is a group of documents and is similar to an RDBMS table. A document is a set of key-value pairs. Documents have dynamic schema. MongoDB documents are composed of field-and-value pairs and have the following structure ieee Tere Tae | oun . DB supports many data types d arrays of documents. Mongol ; done a nd a Nem Se such as : 7 date, code and binary data. + Fig, 2.2.3 shows relation between ‘SQL terms and MangoBD terms. figctions T Gol a] |_| | peaTeTESON | : ‘an up-thrust for knowledge pa TECHNICAL PUBLICATIONS2-12 NoSQL Data Managemen, Big Date Analytics guage and supports Ad hoc queries lany uses MongoDB query 1anBuB Toe MongoDB that helps it 4, and sharding. Sharding i ted data system. jongoDB replication operate as a distribul Shatding is used by MongoDB to izontal scaling to add more ma ae to the growth of load and demand. Sharding arrangement in MongoDB has mainly three components : . store data across multiple machines. It uses chines to distribute data and operation with Fig. 2.2.4 Sharding by MongoDB a) Shards or replica sets : Each shard serves as a separate replica set. They store all the data. They target to increase the consistency and availability of the data. b) Configuration servers : They are like the managers of the clusters. These servers contain the cluster's metadata. They actually have the mapping of the cluster’s data to the shards. When a query comes, query routers use thes? mappings from the config servers to target the required shard. ©) Query router : The query router is mongo instances which serve as interface for user applications, The : i ions and . They take in the user queries from the applications @™ serve the applications with the required results. a Advantages of MangoDB : * MongoDB is a schema - less document type database. * MongoDB supports field, 1: ing data from the stored cag, nS® based query, regular expression for searching . SIC a aData Analytics Boost Date Management MongoDB is very easy to scale up or down. « It uses internal memory for storin ‘ much faster. ig the working temporary datasets for which it is + MongoDB support primary ard secondary indexes on any field. * MongoDB supports replication of databases, MongoDB can be used as a file storage system which is known as a GridFS. | [ERI schemaless Databases «Since Nest does not require a schema, there is no blueprint on how data should be stored and therefore varies between databases. Generally, there are two ways that NoSQL data storage functions : 1, On-the-disk using B-Trees, with the top of it being permanently in RAM. 2, In-memory where it is all on RAM using RB-Trees and anything stored on the disc is just an append. Schemaless databases are a type of NoSQL databases that do not have a predefined schema or structure for data. This means that data can be inserted and retrieved without adhering to a specific structure and the database can adapt to changes in data over time without requiring schema migrations or changes. * Schemaless database manages information without the need for a blueprint. The onset of building a schemaless database does not rely on conforming to certain fields, tables, or data model structures: * There is no Relational Database Management System (RDBMS) to enforce any specific kind of structure. In other words, it is a non-relational database that can handle any database type, whether ‘that be a key-value store, document store, in-memory, column-oriented, or graph data model. © In actuality, there is no such thing as schema-less dataset : 1 In a relational database, the schema is explicit and created separately in advance. 2. In column-based databases, we ¢ we often reuse schema fragment same is true for document databases. . in document databases, users directly query data eate a fresh schema for each row and in fact, ts from rows that are’ grouped together. The 3. In column-based and also based on the schema. 4, In graph-based databases, the data. we are in essence building the schema as we build TECHNICAL PUBLICATIONS® - an up-thnust for knowledgeott NoSQL Date Menegemon, Big Data Analytics a jg stored in JSON-style documents which ca, « In schemaless databases, informatio! Ippes for each field. So, a collecten have varying sets of fields with different data could look like this : name : "Joe", age : 30, interests + 'gootball’ }- { "name: "Kate', age! 25 the data itself normally has a fairly consistent structure, there is some additional structure, the ollections and indexes. Collections © In the above condition, With the schemaless MongoDB database, i icit list of ¢ system namespace contains an explicit nd may be implicitly or explicitly created, indexes must be explicitly declared. Benefits of using schemaless databases : 1. Flexibility : Schemaless databases allow fo x greater flexibility in data modeling. 2. Scalability : | Schemaless databases are designed for scalability, as they can handle large amounts of unstructured data with ease. 3, Reduced complexity : Schemaless databases can reduce the complexity of data modeling and development. 4. Good support for non-uniform data. © Disadvantages : 1. Potentially inconsistent names and data types for a single value. 2. Management of the implicit schema migrates into the application layer. Materialized Views © Materialized views solve the problem of views. The views provide a mechanism to hide from the client whether data is detived data or base data. Views are used when data is to be accessed infrequently and the data in a table gets updated on a frequent basis. * A materialized view is a replica of a target master from a single point in time. The master can be either a master table at a master site or a master materialized view at a materialized view site. A materialized view is like a cache, a copy of the data that can be accessed quickly. : If a regular view is a saved ialized vi : query, a materialized i is results stored as a table, o Meee ate ” Nese ae do not have views, they may have precomputed and cached queries and they reuse the term "materialized view" to describe them NoSql. TECHNICAL PUBLICATIONS® - an up-thrust for knowledgean 2-15 We can use materialized y; iews t . Bas 1. Ease network loads achieve ong °F more of th, e NoSQL Data Management followii : 2, Create a mass deployment pee ‘owing goals : ito 3, Enable data subsetting a 4, Enable disconnected computing Two methods are used for build: ; 1. Eager approach : user eee materialized view ; : © materialized yj the base data for it. In. uy wed view at the same time updat Ne case, adding an order would also ou the : aay purchase history aggregates for ea i Pee wade re en ch product, This method is used when more ; lized view than writes, 2, The application database approach i is valuabk i it ji ensure that any updates to SES dela also le here as it makes it easier to dest : ipdate materialized views, Materialized views can be built outside o} y ie data, ; f the database by i : latabase by reading the data, ving it back to the database, EZ Distribution Models * Ability of NoSql is to run a database on a large cluster. As data volumes increase, it becomes more difficult and expensive to scale up, so it is necessary to buy a bigger server to run the database on. EERI Single Server * Database is run on a single machine which handles all the reads and writes to the data store. Organizations prefer a single server because it eliminates all the complexities that the. other options introduce. Single server is easy to manage for application developers. Lot of NoSQL databases are designed around the idea of running on a cluster, it can make sense to use NoSQL with a single-server distribution model if the data model of the NoSQL store is more suited to the application. is sui h-database. i iguration is suitable for grap! Feel eait rat processing aggregates, then a single-server document If data usage is mostly abo or key-value store may be useful. Baa Sharding single dataset across multiple databases, eae * Sharding is a method for oe machines, This allows for larger ae Which can then be ue om id stored in multiple data nodes, increasing the to be split into smaller storage capacity of the system. © - an up-thrust for knowledge IBLICATIONS. ~ 8” TECHNICAL PUEpaniarentics 2-16 NoSQL Data Menagemon, Big Data ing i ing known as horizontal scaling or scale-out, ,, : nae ae reales to share the load, Horizontal scaling allows jo. near-limitless scalability to handle big data and intense workloads. Sharding is also known as data partitioning. Many NoSQL databases offe auto-sharding, Fig, 2.5.1 shows Sharding. aay Fig. 2.5.1 Sharding Sharding is the process of splitting a large dataset into many small partitions which are placed on different machines. Each partition is known as a "shard", Each shard has the same database schema as the original database. Most data is distributed such that each row appears in exactly one shard. The combined data from all shards is the same as the data from the original database. The load is balanced out nicely between servers, for example, if we have five servers, each one only has to handle 20 % of the load. The NoSQL framework is natively designed to support automatic distribution of the data across multiple servers including the query load. Both data and query replacements are automatically distributed across multiple servers located in the different geographic regions and this facilitates, rapid, automatic and transparent replacement of the data or query instances without any disruption. ° Sharding is particularly valuable for performance because it can improve both read and write performance. Using replication, Particularly with caching, can greatly improve read performance but does little for applications that have a lot of writes. Advantages of Sharding 4) Faster performance : There are more servers available to handle input/output. ») Horizontal scaling : We can quickly add additional servers to a cluster. © Costs : Horizontal scaling can often be less expensive than vertical scaling. 4) Distribution/uptime : A horizontally scaled distributed database can achieve better uptime than a traditional single server, . TECHNICAL PUBLICATIONS® « an upthrust for knowiedge” sig Data Analytics 7 al 2-1 NoSQL Data Management « Disadvantages of Sharding Complexity : i a) iplexity : Depending on the database system, sharding complexity can vary. | .-p) Rebalancing : When addin; R g additional machi i likely need to be rebalanced to distribute eee me c) Increased infrastructure costs. [EGEI master-slave Replication « We replicate data across multi 5 tipl is i i Se ultiple nodes. One node is designed as primary ce dary (slaves). Master is responsible for processing any pdates to that data. A replication process synchronizes the slaves with the master. © Master is the authoritative source for the data. It is responsible for processing any updates to that data, Masters can be appointed manually or automatically. + Slaves is a replication process that synchronizes the slaves with the master. After a tail of the master, a slave can be appointed as new master very quickly. Fig, 2.5.2 shows master-slave replication. All updates are, made to master Fig. 2.5.2 Master-slave replication «. Masterslave replication is most helpful for scaling when we have a read-intensive dataset. It will scale horizontally to handle more reads. This design offers read resilience. Even if one or more of the servers fails, the remaining servers can keep offering read access. This can help a lot with read-heavy applications, but will offer little benefit to write-intensive applications. splicas of the master server, one of them can assume the role of the master in case the master fails. In fact most of the time you can simply create a set of nodes and have them “atomatically decide who would be the Tceues that occur due to the delay in updating « As the slaves are exact re master. There are some consistency between master and slaves. TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeBig Data Analytics 2-18 NoSQL Data Management + Masters can be appointed manually or automatically. In manual appointing performed when we configure our cluster and we configure one node as the snaster, With automatic appointment, we create a cluster of nodes and they elect cone of themselves to be the master. © Problems of master-slave replication : 1. Does not help with scalability of writes resilience against failure of a slave, but not of a master 2. Provides 3. The master is still a bottleneck. EEE] Peer-to-Peer Replication + Ina peer-to-peer replication setup the various nodes are all "equals". Any node J] as writes and they communicate these writes to each can accept reads as wel i other. In peer-to-peer replication updates on any one server are replicated to all other associated servers. «Fig, 2.5.3 shows peer-to-peer replications. Requests: Fig. 2.5.3 Peer-to-peer replications * The advantage of this setup is its read and write resilience. One node's failure does not cause problems, as the remaining nodes can continue their work without losing a beat. * The problem that arises is that of consistency. For example we may have conflicting write requests that come to different nodes and then those nodes attempt to communicate those requests to the rest of the nodes. This could lead to considerable inconsistencies. * There are various ways to resolve this problem. The most standard approach would be to have the replicas communicate their writes first before they “accept” them. Once a majority of the replicas has confirmed a write, it can now be considered as having been successfully performed and a response sent to the client. This requires a certain amount of network traffic in coordinating these writes. TECHNICAL PUBLICATIONS® - an up-thrust for knowledgetli ll Big Date Analytic nalytics 2-19 NoSQL Data Management . There is a problem of write-write conflict. Two users can update different copies Of the same record stored on different nodes at the same time is called a write-write conflict. Combining Sharding and Replication ¢ Sharding and replication can be combined to get a better response. If we use both master-slave replication With Sharding and Peer-to-peer replication with Sharding. 1, Master-slave replication and Sharding : * We have multiple masters, but each data item only has a single master. * Anode can be a master for some data and a slave for others. 2, Peer-to-peer replication and Sharding : + A common strategy for column-family databases. * A good starting point for peer-to-peer replication is to have a replication factor of 3, so each shard is present on three nodes. EEG Difference between Replication and Sharding . Replication The primary server node copies data onto secondary server nodes. This can help increase data availability and act as a backup, in case the primary server fails. ; : Replication copies data across multiple servers. = oe Each bit of data can be found in multiple es si: ae Replicated servers contain identical | Sharded database servers each contain a copies of the entire database. part of the overall data, ie. they store | : | __. different data on separate nod _ It can improve bot More read requests, i i [sequests 12.6 | Consistency ortant when considering a distributed database, since we * The CAP theorem is imp must make a decision about what we are willing to give up. The database we choose will lose either availability or consistency. Reading about NoSQL databases uum is the minimal number of nodes e the concept of quorum. A quort i 1 write operation to be considered complete. 1s® - an up-thrust for knowledge we can fac that must respond to a read 0 ‘TECHNICAL’ PUBLICATION:2-20 NOSQL Data Managemen Big Date Analytics: Of course having a maximum quorum and querying all servers is the way we cs 0} i It. determine the correct resul Consistency can be simply defined by how the copies from the same data may vary wigyi, si y epi se system. the same replicated database sys aa _ vadays systems need to scale. The "traditional! seer database aera er w power server, does fot gunrantee the high ava a a etwork pattition required by today’s web-scale systems, as demonstrateq th CaP theorem. To achieve such requirements, systems cannot impose strong e i consistency. * In the past, almost all architectures used in database aa strongly consistent, In these cases, most architectures would have a single datal ase instance only responding to a few hundred clients. Nowadays, many systems are ‘accessed, by hundreds of thousands of clients, so there was a mandatory requirement to system's architectures that scale. However, considering the CAP theorem, high-availability and consistency do conflict on distributed systems when subject to a network partition event. : Update Consistency 5 * Two users updating the same data item at the same time is called write-write conflict. * When the writes reach the server of the two users, the server will serialize them and decide to apply one, then the other. First user's update would be applied and immediately overwritten by the second user, * In this case first user's is a lost update. Here the lost update is not a big problem. We see this as a failure of consistency because second user's update was based on the state before first user's update, yet was applied after it, Approaches for maintaining consiste: described as pessimistic or optimistic, * A pessimistic approach works by preventing conflicts from occurring; an optimistic approach lets conflicts occur, but detects them and takes action to sort them out. * For update conflicts, the most so that in order to change a ensures that only one client cai ney in the case of concurrency are often value we need to acquire a lock and the system n get a lock at a time, : bore ieers would attempt to acquire the write lock, but only the first use Succeed. Second user would then see the i 's. write oa result of the first user's: before deciding whether to make his own update.mecca 4-21 , Acommon optimistic ap, an update tests the value just bef efore ona NoSQL Data Management ditional Updat on opt Updating rime Where any client that does Proach is a NE it to see if it j « Both, the pessimistic and 6 Wit is changed since his P timisti the updates and it is possible fox “PPI le for a Two general solutions for wie Single 5 ar ite-wrj 4. Pessimistic approach ; Prevent: conf! locks before update, ing conf 2, Optimistic approach : Lets « resolve them. « Jf there are more than one sity : er Le might apply the updates j , * Peer-to-peer replicati Soe fate) ie es in a different order, resulting peer ere ig n each peer. Sequential consi: Peictatd value for the systems. istency is used in distributed hes 1 ely on a i izat a co ializati aber nsistent serialization of fa ate as follows : icts ‘ from Occurring. Also acquire write ‘onflicts 0, cur, but detects them and takes actions to timistic way t i i oe they se = ae a poe conflict is to save both updates and record ~ Replication makes it much more lik i ; y ‘ : ely to run int aa HE If different nodes have different copies of nee data ‘which lependently updated, then we will get conflicts unless we take specific measures SS aoe them. Using a single node as the target for all writes for some data makes it much easier to maintain update consistency. ra Read Consistency « Problem : One user reads in the middle of another user's writing. © It is called read-write conflict, inconsistent reading. This leads to logical inconsistency. * In NoSQL databases, read consistency refers to the level of consistency between multiple read operations on the same data. In a distributed database, where data can be replicated across multiple nodes, ensuring read consistency can be oo updates, but only within a single support atomic aly consistency within an aggregate * Aggregate-oriented databases do a ll have logical aggregate. This means that we W! but not between aggregates. * The length of time an inconsistency is ae A NoSQL system may have a quite short i There are different levels of 74 Tanging from eventual consistency to called the inconsistency window. tency window. ailable in NoSQL databases, present is d consistency Vv strong consistency: thrust for knowledge TECHNICAL PUBLICATIONSth Write (key, A) Write (key, B) i nN 2-22 NoSQL Data Manageneny Big Data Analytics i inconsistency to occur betwe, ee ae ieee oe es gee en ee bite Secale to all nodes, but it makes no guarantees about how long a will take a about the order in which updates will be applied. * Read-your-writes consistency means that once we have iat a record, all of our subsequent reads of that record will return the update value. * Session consistency means read-your-writes consistency but at een level, Session can be identified with a conversation between a client an : Server. As Jong as the conversation continues, we will read everything we have Written during this conversation. If the session ends and we start another session with the same server, there is no guarantee that we can read values we have written during previous conversation. * Session consistency is of two types : Sticky session and version stamps. EZ Quorums * Quorum consistency is used in systems where consistency is more important than availability (CAP theorem) for write and read. * In systems with multiple replicas there is a possibility that the user read: inconsistent data. This happens say when there are 2 replicas, N1 and N2 in ¢ cluster and a user writes value v1 to node N1 and then another user reads fron node N2 which is still behind N1 and thus will not have the value v1, so th second user will not get the consistent state of data. * In order to achieve a state where at least one node has consistent data we us quorum consistency. * Fig. 2.6.1 shows write and read quorums. Write (key, A) Write (key, B) (a) Write quorums ee ce (b) Read quorums2-23 : Quorum is achieved when nodes fotlory NOSOL Date Menegement where 11= Nodes in the the below protocol : wt r>n Minimum write nog 1 = Minimum read nodes ‘ Here w is our write quorum and +; . 4s our read quorum. [ED Relaxing Durability . en write is committed, the ch is « In some cases, strict durability = (write performance). 35 not essential and it can be traded for scalability « A simple way to relax durability is to store i : E regularly, If the system shuts oe L we lose nantes eee GA Cassandra + Cassandra fs a column NoSQL database, It was initially developed by Facebook to fulfill the needs of the company's Inbox Search services. In 2009, it became an Apache Project, Apache Cassandra is an open source, distributed, NoSQL database, Apache Cassandra in a distributed database system using a shared nothing architecture. * Apache Cassandra war initially designed at Facebook using a Staged Event-Driven Architecture (SEDA) to implement a combination of Amazon's Dynamo distributed tion techniques and Google's Bigtable data and storage engine storage and replicat model. A columnar database, also called @ store, is a database that stores the storing the values of each row together. Columnar databases are well suited for users can determine the consistency (B}) and analytics. : . ides tunable consistency © 1°" paeanca provides epread apd wae operations. Cassandra enables users to vel by tuning i of replicas in 2 cluster that must acknowledge a read or configure the num! = .tion suc \eiite operation before considering © OP eS all nodes in a cluster. pers 3 tocol to discover node state for ¢ ies in : * Cassandra uses a EOSSIP PIOHO sara workloads bY distributing data, reads Cound is see handil tiple nodes with no single point of failure., writes (eventually os, column-oriented database or a wide-column values of each column together, rather than big data processing, Business Intelligence rans? - on uptinst for nowledee2-24 2M Big Data Analytics © Features of Cassandra + : able; it allows to js highly scala?’ add 1) Elastic scalability : Cassandra 38 HEY Ts more data as per rein i, custome! te more hardware to accommoda oe no single point of failure, dra + Cassandra is linearly scalable, ie, it ince wr of nodes in the cluster. U 3) Fast linear-scale pe! throughput as we increase the numb : 4) Flexible data storage Cassandra accommodates all possible data foray including : Structured, semi-structured and unstructured. port : Cassandra supports properties like ACID. vides the flexibility to distribute gg, tiple datacenters. 2) Always on architecture CAS formance 5) Transaction sup} 6) Easy data distribution : Cassandra Pr? where you need by replicating data across mul Cassandra Architecture «Fig. 2.7.1 shows Cassandra architecture. Fig. 2.7.1 Cassandra architecture « Components of Cassandra architecture are node, it memtable, SSTable, Bloom Filters and Cassandra ee 2 7 re. « Node : A Cassandra node is a place where data is stored. * Data center ; Data center is a collection of related nod \odes, « Cluster : A cluster it i t is a component which contains one or more data centets- © Commit log : In Cassandr; e : ‘a, the commit is ver Mem-table : A mem-table is : a memory-resi, it the data will be writ ryeresident dat es written to the mem-table, § a structure, Aes CT Aol there will be multiple mem-tables, jometimes, for a single-colum” TECHNICA M PUBLICATIONS® . an upshmust for kn mr knowledgeee | sotable : It is a disk gy contents reach © to Which NoSQL Data M o a threshold val fanagement 4 ue, « Bloom filter : Bloom §; : whether an elem filters are ve : ent is a memb, TY fast, nondetermi filters are accessed after ee er Of a g leterminis| 2-25 the data j ‘a is flus| shed from the mem-table when its tic algori et. Tt j ee igorithms for testing TY query, 18 a special kind of cache. Bloom an in-memo! integrity. : ry structure called a Secacl ese writes are indexed and written to « A memtable can be thoy ught of as , to cache with its completion i @ write-back cache whi ‘i advantage of low ee immediately confirmed His oa oe —— va heaj high throughput. Thi . This has the Ja ip Memory by default. e memtable structure is kept in « SSTables : When the commit it log gets foes the memtable are written to aa full, a flush is triggered and the contents of his Genaees ema into an SSTables data file. At the completion of lea nemtal le is cleared and the commit log is recycled. Cassandra ally partitions these writes and replicates them throughout the cluster. Cassandra Data Model * Some of the features of Cassandra data model are as follows : 1) Data in Cassandra is stored as a set of rows that are organized into tables. 2) Tables are also called column families. 3) Each row is identified by a primary key value. 4) Data is partitioned by the primary key- ’ ae driven approach, in which specific * Dat ling in Cassandra uses @ query: ? agen ti key to organizing the data. The main goal of Cassandra data maaan ig to develop and design @ high-performance and well-organized Cluster. Apache Ca: dra data model components include keyspaces, tables and . ssandr pache or column families 2 ; ta as a set of rows organized into tables a) Cassandra stores dat b) A primary key ¥! ©) The primary key data alue identifies each FO" pased on the primary Key: partitions 4a¥@ its entirely ‘The components of in part OF in ism for data storage.1. Keyspaces : ) 2-26 NoSQL Data Management Big Data Analytics Fig. 2.7.2 shows Cassandra data model Cluster [ KeySpacet KeySpace2 Column family Column family Column family2 Row | Row Row Row {em Column2}] Column} Column2} | Column ‘Column3||Column1}|Column2} Value [see Value | Value [Eve | Value Value Value Value [ee ess Fig. 2.7.2 Cassandra data model SQL data model consists of data containers * At a high level, the Cassandra No! Jar to the schema in a relational database, called keyspaces. Keyspaces are simil Typically, there are many tables in a keyspace. * Features of keyspaces are : a) A keyspace needs to be defined before creating tables, as there is no default keyspace. : b) A keyspace can contain any number of tables an keyspace. This represents a one-to-many relationship. ©) Replication is specified at the keyspace level. For example, replication of three implies that each data row in the keyspace will have three copies. da table belongs only to one 2. Tables : are defined © Tables, alsé called column families in earlier iterations of Cassandra, within the keyspaces. Tables store data in a set of rows and contain a primary key and a set of columns. Cassandra tables are used to hold the actual data in the form of rows and columns. A table in Cassandra must be created with the primary key during table creation time, post that it can not be altered. To alter the table new tables should be created with existing data. The primar! key would be used to locate and order the data.2-27 NoSQL Dat me of the features of tables are ; Sa Nenavenent * |) Tables have multiple rows and coh called column family in the SMUMNS. AS mentio earlier vers is still referred to p) it is 8 as column fami dolienents of Cassandra, ily in some of the error messages and ned earlier, a table is also , si Ons of Cassandra, ) It is important to define a Primary key for a table. 4, columns : Columns define data structure within a tal such as Boolean, double, integer and text, : ble. There are various types of columns, + Cassandra column’ is used to store a sin, L gle piece of data. Th i's of various types of data such as big int ie column a conta eget, double, text, float and Boolean. Fach column value has a timestamp associated with it that shows the time of update. Cassandra provides’ the collection type of columns such as list, set and map. Some of its features are : a) Columns consist of various types, such as integer, big integer, text, float, double and Boolean. ; b) Cassandra also provides collection types such as set, list and map. ¢) Further, column values have an associated time stamp representing the time of update, 4) This timestamp can be retrieved using the function write time. Ba Cassandra Clients * Thrift is the driver-level interface; a wide variety of languages. Thrift was . i 7 Juster, allowing it to be queried. Each A Client holds connections to a ieee ree to the cluster nodes, provides Client i maintains multip] ies for failed poe which node to use for each query and handles retries for query etc... hy single instance is lived and usually a sing! * Client § designed to be long: “logged” into one keyspace coe) instances are peti given Client can only be logs Snopes Sugh per application. Am Mee to create one client pet ee : s time, it can oop multiple Keyspaces since it is always p ever not necessary ified table name in queries. oe {se a single session with fully qualifie et ete das a ring. it provides the API for client implementations in developed at Facebook. * The Cassandra cluster is denote ‘0 show token .n up-thrust for knowledge 1. PUBLICATIONS TECHNICA!a th NoSQL Data Man Big Data Analytics Se 1, Write in action : the Cassandra nodes and seng To write, clients need to connect t0 any of ites Wtedsiante ° request. This node is galled the coordinator : a8 and cluster receives a write request, it delegates it to a service called StorageP roxy hg the right place to write the data to. The tu 11 the replicas) that are responsible to hoy id tilizes a replication strategy to do that, This node may or may not be StorageProxy is to get the nodes (a data that is going to be written. It ul of the Once the replica nodes are identified, it sends the RowMutation message to the node waits for replies from these nodes, but it does not wait for all the repli to come. It only waits for as many responses as are enough to satisfy the client's minimuy, number of successful writes defined by: consistency level. Write operations at a node level : « Each node processes the request individually. Every node first writes the mutation to the commit log and then writes the mutation to the memtable. Writing to the commit log ensures durability of the write as the memtzble is an in-memory structure and is only written to disk when the memtable is flushed to disk. , « A memtable is flushed to disk when : 1. It reaches its maximum allocated size in memory 2. The number of minutes a memtable can stay in memory elapses. 3. Manually flushed by a user. « A memtable is flushed to an immutable structure called as SSTable (Sorted String Table). The commit log is used for playback purposes in case dat? from the memtable is lost due to node failure. Two Marks Questions with Answers Qi What Is consistency In a distributed system ? Ans, : In a distributed system, consistency will be defined as one that responds Wi" the same output for the same request at the same time across all the replicas. az What is database Sharding 7 Ans. : Sharding is a method for distributing a single dataset across multiple datab | which can then be stored on multiple machines. This al split into smaller chunks and stored in multiple dat: increasi Bay of he ryeum. ple data nodes, increasing the t lows for larger datasets © otal storag' | TECHNICAL PUBLICATIONS® . . ...— | as How Is Sharding different from partitioning ? Ans.: All partitions of a table reside on the same server whereas Sharding involves multiple servers. Therefore, Sharding implies a distributed architecture whereas - pattitioning does not. Partitions can be horizontal (split by tows) or vertical (by Columns), Shards are usually only horizontal. In other words, all shards share the same schema but contain different records of the original table. Q.6 What aro write-writo and read-write conflicts ? big Date Anaiytics 2-29 NoSQL Data Management Q3 Why are N. ae aah losis databases known as schemaless databases 7 iy eee pais levee designed to store and query unstructured data, Recia can be apelica ae = rigid schemas used by relational databases. Although a ee mi ed at the application level, NoSQL databases retain all of your Sheebseapiaatin is original raw format. This means that complete granularity is , ‘if you later change your application schema - Something that is simpl not possible with a traditional SQL database. a a4 ‘| What is the difference between Sharding and replication 7 Ans. : Sharded database servers each contain a part of the overall data, ie. they store different data on separate nodes. Replicated servers contain identical copies of the entire database. Ans. : Write-write conflicts occur when two clients try to write the same data at the same time, Read-write conflicts occur when one client reads inconsistent data in the middle of another client's write. Q.7 Define Cassandra. ‘Ans. ; Cassandra is a distributed, fault tolerant, scalable, column oriented data store. Cassandra is a peer-to-peer distributed system made up of a cluster of nodes in which any node can accept read or write request. Q.8 What Is the use of Bloom filters In Cassandra 7 Ans. : Bloom filters are used as a performance booster. Bloom filters are very fast, nondeterministic algorithms for testing whether an element is a member of a set. They are nondeterministic because it is possible to get a false-positive read from a Bloom filter, but not a false-negative. Bloom filters work by mapping the values in a data set into a bit array and condensing 2 larger data set into a digest string. Bloom filter is a special kind of cache. Q9 Explain sorted strings table. ; ich i the statics : trings table which is a file format used by Cassandra to store aM sere the memtables. The Cassandra SSTables are immutable hence any ia a ble creates a new SSTables file. The data structure format used by eeTubled is Log Structured Merge which is qualified for writing intense heavy data sets compared to the traditional B tree structure. TECHNICAL ‘PUBLICATIONS® - an up-thrust for knowledge2-30 NoS@L. Data Managemony Q.10 Explain Cassandra data center. —] ‘Ans. : Cassandra data center is the collection of related nodes which are configured in the cluster to perform the replication, The data centers can be physical data centers or depending upon the workload a separate data center can be By Data Anadtics logical data center and used, Qt ‘Ans. : ¢ Advantages of graph data + a) More descriptive queries b) Greater flexibility in adapting your model ©) Greater performance when traversing data relationships. ¢ Disadvantages of graph data stores : a) Difficult to scale, b) No standard language. Q.12 Describe session consistency. ‘Ans. Session consistency means read-your-writes consistency but. at session level. Session can be identified with a conversation between a client and a server. As long as the conversation continues, we will read everything. we have written during this conversation. If the session ends and we start another session with the same server, there is no guarantee that we can read values we have written during previous Explain advantages and disadvantages of graph data. conversation. Q.13 What are schemaless databases 7 ‘Ans. : Schemaless databases are a type of NoSQL databases that do not have a predefined schema or structure for data. This means that data can be inserted and retrieved without adhering to'a specific structure and the database can adapt to changes in data over time without requiring schema migrations or changes. Qaa

Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
Unit-Iii Advanced Database Systems
No ratings yet
Unit-Iii Advanced Database Systems
29 pages
CS3492 Database Management Systems Lecture Notes 2
100% (1)
CS3492 Database Management Systems Lecture Notes 2
170 pages
Object-Relational & NoSQL Databases
No ratings yet
Object-Relational & NoSQL Databases
46 pages
R Language
No ratings yet
R Language
59 pages
Unit I - Introduction To DBMS
No ratings yet
Unit I - Introduction To DBMS
9 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
CC Module 5
No ratings yet
CC Module 5
26 pages
Module 6
No ratings yet
Module 6
13 pages
Module 6 Lecture 1 (Advance Topics)
No ratings yet
Module 6 Lecture 1 (Advance Topics)
18 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
BD Problem Solving - I
No ratings yet
BD Problem Solving - I
2 pages
Hadoop Basics for Data Science Students
No ratings yet
Hadoop Basics for Data Science Students
22 pages
Unit 4
No ratings yet
Unit 4
17 pages
DATA ANALYTICS Syllabus 3 Units
No ratings yet
DATA ANALYTICS Syllabus 3 Units
37 pages
Cassandra: Types of Nosql Databases
No ratings yet
Cassandra: Types of Nosql Databases
6 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
Query Processing & Optimization
No ratings yet
Query Processing & Optimization
77 pages
Module 3
No ratings yet
Module 3
43 pages
DBMS Unit-2 by Engineering Express
No ratings yet
DBMS Unit-2 by Engineering Express
22 pages
Big Data
No ratings yet
Big Data
22 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Distributed Databases Overview
No ratings yet
Distributed Databases Overview
19 pages
Reporting and Query Tools and Applications: Tool Categories
No ratings yet
Reporting and Query Tools and Applications: Tool Categories
13 pages
D B M S: ATA ASE Anage Me NT Ystem
No ratings yet
D B M S: ATA ASE Anage Me NT Ystem
114 pages
Dbms PDF
100% (1)
Dbms PDF
4 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Unit 6
No ratings yet
Unit 6
143 pages
LAB Set Questions Rdbms
No ratings yet
LAB Set Questions Rdbms
18 pages
O.R - Unit - I, II, III
No ratings yet
O.R - Unit - I, II, III
44 pages
AI Basics for Tech Enthusiasts
No ratings yet
AI Basics for Tech Enthusiasts
125 pages
Chapter 5 - Database Management System
No ratings yet
Chapter 5 - Database Management System
30 pages
Module 3 Nosql
No ratings yet
Module 3 Nosql
12 pages
DBMS - Unit 4
No ratings yet
DBMS - Unit 4
22 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Unit V
No ratings yet
Unit V
67 pages
ASSIGNMENT 1 Questions BI
No ratings yet
ASSIGNMENT 1 Questions BI
1 page
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
No ratings yet
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
67 pages
Big Data Adoption for Businesses
No ratings yet
Big Data Adoption for Businesses
45 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
UE20CS332 Unit2 Slides PDF
No ratings yet
UE20CS332 Unit2 Slides PDF
264 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
4.2 NoSQL Databases UNIT-1
No ratings yet
4.2 NoSQL Databases UNIT-1
35 pages
Methodologies For Stream Data Processing and Stream Data Systems
No ratings yet
Methodologies For Stream Data Processing and Stream Data Systems
20 pages
23ubcas36 - Erp
No ratings yet
23ubcas36 - Erp
137 pages
Types of DBMS Architecture
No ratings yet
Types of DBMS Architecture
13 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
NoSQL Databases UNIT-3
No ratings yet
NoSQL Databases UNIT-3
20 pages
AI&ML-Unit1 DBMS
No ratings yet
AI&ML-Unit1 DBMS
18 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Bda Unit-2
No ratings yet
Bda Unit-2
29 pages
Big Data Analytics Unit-1
No ratings yet
Big Data Analytics Unit-1
39 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Big Data Analytics-4
No ratings yet
Big Data Analytics-4
26 pages

Big Data Analytics Unit-2

Uploaded by

Big Data Analytics Unit-2

Uploaded by

You might also like