Data Warehouse Interview Questions:: Why Oracle No Netezza?
Data Warehouse Interview Questions:: Why Oracle No Netezza?
Data Warehouse Interview Questions:: Why Oracle No Netezza?
2. Foreign keys
3. Immediate Consistency
4. Triggers
5. Supported vast range programming languages
6. XML support
7. Server operating systems: all where netezza Linux only
Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into useful
information. Can be queried and retrieved the data from database in their own format.
OLTP: (On-Line Transaction Processing), and it is an application that modifies the data whenever it received and has large
number of simultaneous users. i.e ATM
OLAP: Online Analytical Processing and it is set to be a system which collects, manages, processes multi-dimensional data for
analysis and management purposes. i.e. financial reporting, forecasting
OLTP OLAP
Data is from original data source Data is from various data sources
Simple queries by users Complex queries by system
Normalized small database (3NF) De-normalized Large Database
Fundamental business tasks Multi-dimensional business tasks
Must maintain data integrity constraint data integrity is not affected as not frequently modified
ROLAP MOLAP
Relational Online Analytical Processing Multidimensional Online Analytical Processing
Data is stored and fetched from main data warehouse. Data is Stored and fetched from Proprietary database MDDBs.
Data is stored in the form of relational tables. Data is Stored in the large multidimensional array made of data cubes.
Large data volumes Limited summaries data is kept in MDDBs.
Uses Complex SQL queries MOLAP engine created a pre-calculated and prefabricated data cubes
for multidimensional data views, Sparse matrix technology is used to
manage data sparsely
ROLAP creates a multidimensional view of data dynamically. MOLAP already stores the static multidimensional view of data
in MDDBs.
Slow access faster access
Aggregate tables contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to
retrieve data from the aggregated tables than the original table which has more number of records. This table reduces the load
in the database server and increases the performance of the query.
A factless fact tables are the fact table which doesn’t contain numeric fact column in the fact table.
Non-Addictive facts are said to be facts that cannot be summed up for any of the dimensions present in the fact table. If
there are changes in the dimensions, same facts can be useful.
Conformed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.
ODS: Operational Data Store it is a repository of real time operational data rather than long term trend data
Datamart is a specialized version of Data-warehousing and it contains a snapshot of operational data that helps the business
people to decide with the analysis of past trends and experiences. A data mart helps to emphasizes on easy access to relevant
information.
Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing
aggregations, information partitioning into minor level information.
SCD is defined as slowly changing dimensions and it applies to the cases where record changes over time.
SCD 1 – The new record replaces the original record
SCD 2 – A new record is added to the existing customer dimension table
SCD 3 – A original data is modified to include new data
BUS schema consists of suite of confirmed dimension and standardized definition if there is a fact tables.
Star schema is nothing but a type of organizing the tables in such a way that result can be retrieved from the database quickly
in the data warehouse environment.
Snowflake schema: which has primary dimension table to which one or more dimensions can be joined, primary dimension
table is the only table that can be joined with the fact table.
Star schema does not use normalization whereas snowflake schema uses normalization to eliminate redundancy of data.
Core dimension is nothing but a Dimension table which is used as dedicated for single fact table or datamart.
Name itself implies that it is a self explanatory term. Cleaning of Orphan records, Data breaching business rules, Inconsistent
data and missing information in a database.
Metadata is defined as data about the data. The metadata contains information like number of columns used, fix width and
limited width, ordering of fields and data types of the fields.
In data warehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will
take more time and it creates ambiguity. It is advised to avoid loop between the tables.
Yes, dimension table can have numeric value as they are the descriptive elements of our business.
Cubes are logical representation of multidimensional data. The edge of the cube has the dimension members,and the body of
the cube contains the data values.
Dimensional Modeling is a concept which can be used by dataware house designers to build their own datawarehouse. This
model can be stored in two types of tables – Facts and Dimension table.
Fact table has facts and measurements of the business and dimension table contains the context of measurements.
There are three types of Dimensional Modeling and they are as follows:
Conceptual Modeling
Logical Modeling
Physical Modeling
38. What is surrogate key?
Surrogate key is nothing but a substitute for the natural primary key. It is set to be a unique identifier for each row that can be
used for the primary key to a table.
ER modeling will have logical and physical model but Dimensional modeling will have only Physical model.
ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling is used for de-normalizing the
ROLAP and MOLAP design.
Cursor: Cursor is a named private area in SQL from which information can be accessed. They are required to process each row
individually for queries which return multiple rows.
Raise_application_error: It is a procedure of package DBMS_STANDARD that allows issuing of user_defined error messages
from database trigger or stored sub-program.
Explain two virtual tables available at the time of database trigger execution.
Table columns are referred as THEN.column_name and NOW.column_name.
For INSERT related triggers, NOW.column_name values are available only.
For DELETE related triggers, THEN.column_name values are available only.
For UPDATE related triggers, both Table columns are available.
Sequences are used to generate sequence numbers without an overhead of locking. Its drawback is that the sequence number
is lost if the transaction is rolled back.
How would you reference column values BEFORE and AFTER you have inserted and deleted triggers?
Using the keyword "new.column name", the triggers can reference column values by new collection. By using the keyword
"old.column name", they can reference column vaues by old collection.
DECODE vs CASE?
DECODE does not allow Decision making statements in its place.
An autonomous transaction is an independent transaction of the main or parent transaction. It is not nested if it is started by
another transaction. There are several situations to use autonomous transactions like event logging and auditing.
What is out parameter used for eventhough return statement can also be used in pl/sql?
Out parameters allows more than one value in the calling program
Tracing Method:
DBMS_APPLICATION_INFO
DBMS_TRACE
DBMS_SESSION and DBMS_MONITOR
trcsess and tkproof utilitiese:
Data Warehouse:
It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of
transaction processing. It is a process of transforming data into information and making it available to users in a timely manner
to make a difference.
An information system which stores historical and commutative data from single or multiple sources, it is designed to analyze,
report, integrate transaction data from different sources.
Decision Support System (DSS)
Executive Information System
Management Information System
Business Intelligence Solution
Analytic Application
Types:
1. Enterprise Data Warehouse:
A centralized warehouse, it provides decision support service across the enterprise. It offers a unified approach for organizing
and representing data. It also provides the ability to classify data according to the subject and give access according to those
divisions.
2. Operational Data Store:
Data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data
warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.
3. Data mart
A subset of the data warehouse, it specially designed for a particular line of business, such as sales, finance, sales or finance. In
an independent data mart, data can collect directly from sources.
3-tier architecture
Bottom Tier: It is usually a relational database system. Data is cleansed, transformed, and loaded into this layer using back-
end tools.
Middle Tier: OLAP server which is implemented using either ROLAP or MOLAP model. For a user, this application tier presents
an abstracted view of the database. This layer also acts as a mediator between the end-user and the database.
Top-Tier: The top tier is a front-end client layer. Top tier is the tools and API that you connect and get data out from the data
warehouse. It could be Query tools, reporting tools, managed query tools, Analysis tools and Data mining tools.