Quick Reference Interview SAP Datasphere 2025
Quick Reference Interview SAP Datasphere 2025
Impact of source system changes: Alterations to data types, table structures, or the
underlying configuration of applications like SuccessFactors can disrupt data
replication and synchronization, requiring manual rework and careful change
management.
Licensing and API restrictions: Recent changes to SAP's licensing terms have
restricted third-party tools from using ODP APIs for real-time extraction from SAP
sources. This forces companies to explore alternative integration methods like OData
APIs, which can result in slower performance.
possible.
However, if replicating tables is unavoidable due to specific business or compliance
requirements, then the recommendations below should be followed.
1.Standalone SAP LT Replication Server: For replicating tables, the recommended approach.
2.Restrictions:1.Replicating tables with redirect views (e.g., MBEW, MARD, COEP) from your
SAP S/4HANA source system is not supported.
2.Declustering of INDX-like tables (e.g. STXL, PCL1, ...) during the transfer is not supported.
SAP S/4HANA Connection Supported SAP Datasphere Integration Features:
Audit Logs:
SAP Datasphere audit logs are used for security analysis, compliance, and root-cause analysis,
tracking who did what and when to monitor database read and change actions, and can be enabled
and viewed from the Space Management area by selecting your space and going to the Auditing
tab. To get a complete overview, you'll need to enable logs for your spaces, set retention times, and
then access the Audit Log Viewer within your SAP BTP cockpit by subscribing to the service.
Security Monitoring:Identify unusual or suspicious network and system behavior to detect potential
attacks or unauthorized access.
Compliance:Provide auditors with the necessary information to validate security policy enforcement
and ensure proper segregation of duties.
Troubleshooting:Assist IT staff with root-cause analysis following a security incident or other system-
related issues.
Data Integrity:Track data modifications and user actions to maintain the integrity of your data and
systems.
Check the boxes to Enable Audit Log for Read or Change Operations and set the retention time in
days.
Subscribe to the Audit Log Viewer service: in your SAP BTP Cockpit.
From the cockpit, you will be able to Go to Application to launch the viewer.
Within the viewer, you can see a list of audit log entries.
Important Considerations
Disk Space: Be aware that audit logs can consume a significant amount of disk space, especially with
long retention periods.
Log Deletion: Administrators can delete audit logs to free up disk space when necessary.
Data Lake Objects: Statements executed via procedures for Data Lake objects are currently not
audited in SAP Datasphere.
Primary Data preparation and transformation for a Targeted, multi-dimensional modeling for
purpose variety of semantic usages. analytical consumption in SAC.
Target user Data architects and engineers, as well as Data modelers and power users defining
business users for simpler tasks. complex analytical models.
User interface Visual, drag-and-drop environment. A dedicated editor that curates dimensions
and measures for analytical queries.
Level of Defines the base data logic at the Defines complex aggregation and business
abstraction individual row level. logic on top of base data.
Hierarchy Creates dimension views with parent-child Incorporates pre-defined hierarchies from
support or level-based hierarchies as a base dimension views for user drill-down in
artifact. analytics.
End-user Primarily handles static filters and Uses variables and prompts for dynamic end-
interaction parameters at the view level. user input during analysis.
Data preview Basic preview showing data at each node, Rich analytical preview replicating SAC's data
primarily at a relational level. analyzer for aggregated, multi-dimensional
data.
Performance Supports run-time hints and persistency The analytical preview enables immediate
tuning options for the base view. testing and optimization of aggregation logic.
local table only during its initial creation, not manually after it has been deployed. The setting is
permanent once the table is deployed.
For remote tables or other data sources, the delta functionality is managed differently and depends
on the source system's capabilities.
1. Navigate to the Data Builder: Access the Data Builder in the SAP Datasphere interface and
choose your space.
Add at least one key column, which is required for delta capture.
4. Enable Delta Capture: Find the "Delta Capture" toggle in the table creation panel and switch
it to "On".
5. Observe Auto-Generated Columns: Activating delta capture automatically adds two system-
managed columns to the table schema:
CHANGE_TYPE: This column tracks change types (e.g., I for Insert, D for Delete, U for
Update).
CHANGE_DATE: Records the timestamp of each change.
6. Deploy the Table: Deploy the table after defining it. The delta capture setting is now
permanent.
7. Populate the Table: Load data into the table. The CHANGE_TYPE and CHANGE_DATE columns
are automatically populated based on the data loading method (e.g., data flow, file upload,
or replication flow).
For delta capture from sources other than local tables, use these methods:
If replicating data from a source system like SAP S/4HANA, the source system's capabilities typically
manage the delta mechanism. This includes Change Data Capture (CDC) based on CDS views.
1. Define a CDS View: The source CDS view in the ABAP system must be annotated
with @Analytics.dataExtraction.enabled: true and define a delta mechanism.
2. Create a Replication Flow: Create a new "Replication Flow" in the Datasphere Data Builder.
3. Select the Source: Select the connection to your SAP source system and the CDC-enabled
CDS view as the source object.
4. Configure Delta Load: Choose the "Initial and Delta" load type in the replication flow settings.
This automatically detects and replicates changes.
Transformation flows can process data from delta-enabled local tables. This allows loading only
changed records into a new target table.
1. Create a Transformation Flow: Start a new transformation flow in the Data Builder.
2. Use a Delta-Enabled Local Table as Source: Drag and drop your local table with delta capture
enabled into the flow.
3. Set the Load Type: Select the "Initial and Delta" load type in the target table's properties. The
flow will use the CHANGE_DATE column to process new and changed records on subsequent
runs.
their execution and purpose: a join is a physical, immediate data combination, while
an association defines a semantic relationship that triggers a "join on-demand" only when a
consumer requires the associated data. This difference significantly impacts performance, reusability,
and data modeling strategy.
Execution Lazy/On-demand. The actual join is only Immediate. Data from multiple sources is
executed at query runtime if a field from the immediately and permanently combined into
associated entity is explicitly requested by a a single new object based on the defined join
consuming application. conditions.
Purpose To define reusable, logical relationships To explicitly combine data from two or more
between entities without immediately sources for a specific data transformation or
combining data. This is ideal for modeling reporting scenario.
master-data relationships (e.g., product
dimensions and their attributes).
Reusability High. An association can be defined once on Low. Joins are specific to the view or data
a table or view and then used in any number flow where they are defined. If the same join
of views, data flows, or analytical models. is needed elsewhere, it must be re-created.
Performance Optimized. The "join on-demand" Can impact performance. Every time the view
functionality (or "join pruning") means that is accessed, the database executes the
associated tables are only queried when predefined join, regardless of which fields are
needed, improving performance for queries requested. This can lead to slower
that don't require the joined data. performance if many tables are joined but
not all are needed.
Use cases Semantic Modeling: Recommended for Data Integration: Combining data from
defining semantic relationships in the disparate sources within a Data Flow for a
Analytic Model and other consumption- one-time, specific transformation.
ready views. Creating a Flat View: Joining multiple tables
Master Data: Connecting fact data to to create a flat, denormalized view for a
dimension and text tables. specific report.
Configuration Defined by setting key columns and Configured within a Graphical View or Data
cardinality (e.g., 1:1, 1:n) in the associations Flow by dragging and dropping objects and
tab of a table or view. setting the join type and conditions.
Introduction to Associations. When you create a join in a view, you are immediately combining the
data from the two sources based on the definition of the join ...
When to use each in SAP Datasphere
Modeling for Consumption: When building objects like an Analytic Model for reporting, use
associations to link dimension and text data to the central fact data.
Performance Optimization: Use associations for tables with many potential lookups,
especially if not all of them are needed in every query. This ensures only necessary data is
retrieved.
Master Data Relationships: For defining the relationships between transactional data and its
descriptive master data (e.g., sales data to product and customer information).
Enabling Navigation: To allow end-users in reporting tools like SAP Analytics Cloud (SAC) to
navigate between related entities via path expressions.
Data Transformation: In data flows, use joins to physically combine data from different
sources into a new target object before it is used for analysis.
Flattening Data: When a single, flattened table is required for a specific purpose and the data
should be combined regardless of which columns are later queried.
Non-Equi Joins: For complex join conditions that are not based on equality (e.g., joins using
greater than > or less than < operators).
Static Views: When building data layer views for a downstream consumption model that
requires the sources to be fully combined ahead of time.
sap datasphere how can we show the replication related status in the sap analytics cloud:
You can show SAP Datasphere replication status in SAP Analytics Cloud (SAC)
by using a live data connection to a custom monitoring view in Datasphere. This method allows you
to build SAC stories that visualize the metadata related to your replication flows, providing a custom
dashboard for status monitoring.
The standard monitoring tools in Datasphere are not natively exposed for SAC reporting. You must
build your own views on top of the system's metadata tables.
1. Create a database user: A technical user with the necessary permissions is needed to access
the underlying SAP HANA database where Datasphere stores its monitoring tables.
2. Access the monitoring metadata: You will need to import the system tables that track the
replication process into your Datasphere space. The key tables are:
3. Build a model for consumption: Use the imported tables to create a graphical or SQL view. In
this view, you can:
Create calculated columns to transform the raw data into more user-friendly status
metrics (e.g., categorizing run statuses as "Success," "Failure," or "Running").
Define the output as an Analytic Model to make it ready for consumption by SAC.
After deploying the monitoring view in Datasphere, you can establish a live connection from SAC.
1. Set up a live data connection: In SAC, create a new live data connection to your SAP
Datasphere tenant.
2. Create a story: Build a new SAC story using the live connection to the Analytic Model you
created in Datasphere.
3. Visualize the status: Add charts, tables, and numeric point widgets to your SAC story to
visualize the replication status metrics:
Numeric Point: Show the total number of running or failed replication flows.
Table: Display a detailed list of replication jobs, including status, start/end times, and
any error messages.
Bar Chart: Show the number of replication flows by their current status (e.g., Active,
Failed, Completed).
For broader, enterprise-wide monitoring, you can integrate Datasphere with SAP Cloud ALM. This
provides a pre-configured, centralized dashboard for operations monitoring.
Setup: Activate monitoring for Datasphere within your SAP Cloud ALM tenant.
Benefits: While not as customizable as a self-built SAC story, Cloud ALM provides an
integrated overview of system health and includes metrics for Datasphere, reducing the need
for manual configuration of monitoring artifacts.
sap datasphere how can you analyze the data model performance:
You can analyze SAP Datasphere data model performance by using in-platform tools, standard
monitoring views, and advanced SQL techniques. SAP Datasphere runs on an SAP HANA Cloud
database, so HANA performance analysis tools are also applicable.
The Runtime Metrics tool in the Data Builder is the starting point for analyzing a view's performance.
What it does: Runs two benchmark queries on a view and provides metrics on their runtime
and memory consumption.
1. Go to the Data Builder and open the view you want to analyze.
4. Run the tool again after making changes to see a comparison between the last two
runs.
What you see: Key indicators include the number of sources, peak memory usage, execution
times for SELECT TOP 1000 and SELECT COUNT(*), and view persistence status.
For more detailed analysis, the View Analyzer provides persistence recommendations and the ability
to generate a query plan.
What it does: Creates an overview of entities in your data model and suggests which views
are candidates for persistence to improve performance.
Output: The analyzer provides a persistency score and a visual lineage showing where
persistence is recommended.
System Monitor
The System Monitor tracks overall tenant health, including expensive statements and queuing
events.
You can query the underlying SAP HANA database monitoring views directly using a Database
Analysis User. This provides low-level insights that can be aggregated into a custom performance
dashboard.
Expensive statements
The M_EXPENSIVE_STATEMENTS table captures queries that exceed a specified time or memory
threshold. It records metrics such as CPU usage, runtime, and peak memory.
Key insight: Identify the most resource-intensive queries and the data models they are
running on.
MDS statistics
Key insight: If per-call capturing is enabled, you can see detailed performance data for
individual SAC requests, grouped by story and widget.
Explain Plan
For in-depth analysis of a single query, you can generate an Explain Plan. This shows the query
execution flow and helps identify specific bottlenecks.
How to generate:
1. Use the Runtime Metrics tool to generate the Explain Plan for a specific view.
2. Alternatively, you can get the SQL statement from the monitoring views and run a
PlanViz analysis in a tool like HANA Studio.
For any data model, you can preview the generated SQL code. This helps you understand the
underlying database operations and manually debug or optimize the query.
How to access:
After analyzing your data model's performance, consider these strategies for improvement:
Persist complex views: Replicate data for complex or frequently accessed views instead of
relying on virtual modeling. Persisting data materializes the view, improving performance but
increasing storage.
Filter early: Push filters down to the source systems to reduce the amount of data
transferred and processed.
Optimize joins: Join on integer or key columns instead of string columns. Use associations for
dimension and text data to enable join pruning.
Use layering: Use smaller, reusable views rather than monolithic ones. Structure your data
model into layers (e.g., inbound, harmonization, reporting) to simplify maintenance and
improve performance.
Manage spaces: Use space management tools to prioritize resources for the most critical
workspaces.
sap datasphere in what scenario we use data flow and transformation flow
In SAP Datasphere, you use Data Flows for complex ETL (Extract, Transform, Load) scenarios and
Transformation Flows for simpler, view-based transformations, often as part of a larger ELT (Extract,
Load, Transform) process. The key difference lies in their transformation capabilities and how they
handle data persistence.
A Data Flow is a graphical ETL tool for building end-to-end data pipelines with a wide range of robust
operators. It is the right choice when you need more advanced, flexible, or custom processing
capabilities.
Complex data transformations: When your data requires extensive manipulation beyond
standard joins and filters. This can include multi-step operations like splitting a column,
replacing values, and then aggregating the data.
Incorporating custom Python logic: For advanced analytics, data cleansing, or complex
calculations, you can use the Python Script operator. This allows you to leverage libraries like
Pandas and NumPy for powerful transformations within your data pipeline.
Combining multiple source types: When you need to integrate data from various sources
(e.g., SAP tables, non-SAP databases, and flat files) into a single Datasphere table.
Unioning data with similar structures: You can use a Data Flow's Union operator to combine
data from different sources with compatible schemas.
Aggregating data: Use the Aggregation operator to summarize data, such as calculating the
total sales per product line.
A Transformation Flow is designed for more focused, SQL-based transformations that are either
based on a single source or simpler view logic. It is particularly useful for delta-based data loads.
Use a Transformation Flow for scenarios such as:
Delta loads (Initial and Delta): This is a key advantage of Transformation Flows. You can
configure them to load the full data set initially and then automatically load only the
changed records on subsequent runs.
Simpler, SQL-based transformations: When your logic can be expressed clearly in a graphical
view or an SQL query, without needing a full-featured ETL pipeline. This is useful for building
data layers that perform simple consolidations or filtering.
Targeting an SAP BW Bridge space: A Transformation Flow can be used to load transformed
data into local tables or remote tables within an SAP BW Bridge space.
Persisting a complex view: If a graphical or SQL view is too complex or slow to be used
virtually, you can use a Transformation Flow to persist its results into a physical table for
better performance.
Complexity Designed for complex, multi-step ETL Best for simpler, view-based
pipelines. transformations.
Transformation Uses graphical operators (Join, Union, Uses graphical or SQL view transforms.
Logic Aggregation) and custom Python scripting.
Delta Loading Does not natively support delta capture Supports initial and delta loads for local
on the target table via the flow itself. tables with delta capture enabled.
Source Type Can integrate data from a wider variety of Primarily works with tables and views
SAP and non-SAP sources. within Datasphere, including remote tables
from BW Bridge.
Code Allows for advanced transformations using Uses SQL for custom logic within a
Customization the Python Script operator. transform, but no Python support.
Best For Heavy ETL jobs, complex data cleansing, Simple data loads, persisting view results,
and scripting. and delta-based integrations.
sap datasphere what is dp agent and when do you use:
a lightweight, on-premise component that enables secure and direct connectivity to your local data
sources. It acts as a bridge, allowing your cloud-based SAP Datasphere instance to access and
integrate with data that resides behind your corporate firewall.
The DP Agent is part of the SAP HANA Smart Data Integration (SDI) framework and hosts adapters
that facilitate connections to various on-premise sources like SAP S/4HANA, Microsoft SQL Server,
Oracle, and more.
The DP Agent is essential for hybrid data landscapes, where you need to combine data from both
cloud and on-premise systems for analysis and reporting in SAP Datasphere. You would use the DP
Agent for the following scenarios:
Remote tables: The agent allows you to create remote tables in Datasphere that mirror
tables in your on-premise systems. Data from the source is accessed on-demand, enabling
live, federated queries without permanent replication.
Data replication and flows: For scenarios that require data transformation, the DP Agent
facilitates replication flows. This allows you to extract, transform, and load (ETL) data from
on-premise sources into Datasphere, either as a one-time process or on a scheduled basis.
Real-time replication: The DP Agent enables real-time data streaming and change data
capture (CDC) from on-premise sources like Microsoft SQL Server and Oracle to SAP
Datasphere. This ensures that your cloud-based analytics are always working with the most
current data.
Connecting to specific sources: You use the DP Agent to connect to systems like on-premise
SAP S/4HANA and SAP ECC. For other sources like SAP BW/4HANA, the DP Agent can be used
in combination with the SAP Cloud Connector for specific use cases like remote tables.
1. Installation: The DP Agent is installed on a host within your corporate network that can
securely connect to both the on-premise data sources and the internet.
2. Registration: You register the DP Agent with your SAP Datasphere tenant through a
configuration process. The agent's external IP address must be added to the allowlist in
Datasphere for a secure connection.
3. Adapter hosting: The DP Agent hosts different data provisioning adapters for specific source
systems (e.g., ABAPAdapter, HanaAdapter, MssqlLogReaderAdapter). These adapters handle
the technical communication with the source databases.
4. Secure gateway: When Datasphere needs to access on-premise data, it sends requests to the
DP Agent, which then uses the appropriate adapter to communicate with the source system.
This creates a secure, controlled pathway for data transfer without exposing your internal
network.
The Data Layer is the foundation where IT and data engineers acquire, transform, and prepare data
from various sources for the entire organization.
Ingestion (Source) layer: Raw data is brought into Datasphere using remote tables for virtual
access or data flows for replication and transformation.
Inbound layer: This layer, sometimes overlapping with the ingestion layer, stores raw data,
essentially creating a mirror of the source system data.
Harmonization layer: Here, the data is standardized, cleansed, and enhanced for consistency.
For example, currency conversions may be applied to transactional data in this layer.
Propagation layer (optional): This optional layer can be used to further consolidate and
enrich datasets, feeding into the reporting layer.
Corporate Memory Layer (optional): For long-term storage of historical transactional data, a
separate Corporate Memory Layer can be created to support historical analysis and machine
learning.
Also known as the Business Layer, the Semantic Layer adds business context to the prepared data,
abstracting technical complexity for business users. Using the Business Builder tool (now largely
replaced by the Analytic Model in the Data Builder), this layer redefines and enriches technical
models with business-friendly terms.
Business terms: Measures, dimensions, and hierarchies are given business names and
descriptions that are easily understood by non-technical users.
Data abstractions: It hides the complex joins and technical details of the underlying Data
Layer models, providing a single, consistent view of the data.
Consumption models: For business users, this layer provides a self-service environment for
creating highly focused, multi-dimensional models for analysis.
Consumption (Reporting) layer: This final layer sits on top of the Data Layer to create
optimized views for reporting and analysis. It is designed for direct consumption by BI tools
like SAP Analytics Cloud.
The Catalog in SAP Datasphere is a central repository that catalogs data and metadata from across
the data landscape.
It provides a single, trusted source for discovering and understanding data assets.
The Catalog promotes data governance by centralizing metadata, data lineage, and semantic
definitions.
In practice, organizations often implement these layers across different "spaces" in Datasphere for
better organization and security. For example:
A Central IT Space might contain the Inbound and Harmonization layers for corporate data.
Different Departmental Spaces (e.g., Sales or Finance) would then consume and build upon
the harmonized data from the central space, developing their own Reporting layers.
This approach, known as cross-space sharing, prevents data duplication and ensures
consistent data models.
the mechanisms used to implement row-level security on data. This ensures that users can only view
records that they are explicitly authorized to see, based on defined criteria. DACs are a crucial part of
an organization's data governance strategy, protecting sensitive information and maintaining
regulatory compliance.
1. Permissions Entity: A table or view that defines which users have access to which data. It
typically contains a user ID column and one or more criteria columns, such
as SalesRegion or Department.
2. Data Access Control Object: An object that consumes the permissions entity and specifies
how its filtering criteria should be applied.
3. Data Views: The target views in your data model where the Data Access Control object is
applied. When a user queries a view with an attached DAC, the system filters the data
according to the rules defined in the permissions entity.
SAP Datasphere supports several types of DACs to suit different security scenarios:
Single Values: Filters data based on exact, individual matches in the permissions entity. For
example, a user may be given access to data where the SalesRegion column equals "EMEA".
Operator and Values: Provides more complex filtering logic using boolean operators
like EQ (Equal to), GT (Greater than), BT (Between), or LIKE. This reduces the number of
permissions records needed, as an administrator can use a single row to define a range of
access.
A DAC is not a passive security layer—it actively joins your data with the permissions list to filter
results at query time.
2. Application: The DAC is then applied to a data layer view. The administrator maps the criteria
columns in the DAC to the corresponding columns in the view.
3. Consumption: When a user with the necessary permissions queries the view (e.g., in an SAP
Analytics Cloud story), the Datasphere query engine filters the result set to show only the
rows matching the user's authorizations.
Consider performance: While applying a DAC as early as possible in the data flow can
improve performance by filtering records early, applying multiple DACs to a complex view
can have the opposite effect. It is generally best to apply a single, targeted DAC to a
combined view.
Note persistence limitations: A view that consumes another view with an applied DAC
cannot be persisted (stored as a snapshot). This can impact performance for subsequent
transformations and queries.
DAC vs. Space roles: Datasphere also uses space-level roles to control high-level access to
artifacts (tables, views, etc.). DACs, by contrast, are used for granular, row-level security
within those artifacts.
the process of distributing and exchanging data and analytical assets with different users, teams, and
systems. It enables a business data fabric architecture that unifies data access and promotes self-
service data consumption, while ensuring governance and security.
The Data Marketplace: A central hub where users can discover and acquire pre-packaged
data products offered by internal teams, SAP, and third-party data providers.
o Data providers can list, manage, and monetize their data products.
o Consumers can browse and install data products into their SAP Datasphere spaces.
The Content Network: A feature that allows you to import, export, and share content
packages across different SAP Datasphere tenants.
o Transfer content: Facilitates the movement of content, such as tables, views, and
data models, between development, quality assurance, and production systems
without manual file transfers.
Sharing within a single tenant: A secure method for sharing data assets between different
spaces within the same SAP Datasphere tenant.
o Space-to-space sharing: Enables teams to share data entities and other objects from
their designated space with other teams for collaborative projects.
Centralized access to trusted data: The Data Marketplace provides a single point for
discovering data products and ensuring that all data consumers are using governed,
authoritative data.
Accelerated time to value: Accessing pre-built business content from the Content Network
allows organizations to jump-start projects and leverage proven, best-practice data models.
Enhanced governance: Content sharing mechanisms are built with security and control in
mind, ensuring that data is accessed and used according to defined policies.
Simplified data landscape: A business data fabric architecture, enabled by content sharing,
provides seamless access to data across hybrid and multi-cloud environments without
creating unnecessary data duplicates.
End-to-end Integration: Consumption of data from any source in SAP Analytics Cloud
In SAP BW/4HANA, a table function is typically used to leverage the WORKDAYS_BETWEEN SQL
function within a graphical calculation view, since the graphical editor does not directly support it.
A company wants to analyze the efficiency of its logistics department by calculating the number of
working days between the order date and the shipping date for a list of sales orders. The calculation
needs to exclude weekends and holidays defined in the company's factory calendar(TFACS table).
Solution architecture
A Graphical Calculation View that uses the table function to enrich the sales order data with
the calculated delivery lead time.
Stored Procedures:
SAP BW/4 hana calculation view when to use a stored procedure with a
scenario
A stored procedure is used within an SAP BW/4HANA calculation view
when you need to execute complex business logic that goes beyond the
capabilities of graphical modeling. You embed the stored procedure in a
"scripted" calculation view or, more commonly, within a Table Function
that is then consumed by a graphical calculation view.
Return Value Always returns a tabular Can return scalar values, tables,
result set or no value
Usage in Used to define the logic for Used for data preparation,
BW/4HANA SQL Scripted Calculation transformation, or specific
Views business processes
Data Builder ---Currency Conversion Views -select source(S/4 HANA system or manual)-create -It will
create tables,views and data flows automatically- Run the 8 data flow manually or add all the 8 steps
in the task chain and run.
https://www.youtube.com/watch?v=Y2udOzBh4mE
Task Chains:
Automation and Scheduling:Task chains group multiple tasks, allowing you to run them manually or
set them on a periodic schedule for automated execution.
Data Flow and Replication Execution:You can include tasks to execute Data Flows, Transformation
Flows, and Remote Table Replication as steps within a task chain.
View Persistency Management:Task chains can be used to create or refresh the persistency of
views.
Open SQL Procedure Integration:They can run Open SQL schema procedures, integrating custom
database logic into your data processes.
Parallel Task Execution:You can add tasks to a task chain to run them in parallel, allowing for
simultaneous processing and reducing overall execution time.
REST API Interaction:Newer capabilities allow task chains to include a REST API task, enabling direct
interaction with external systems and services through REST APIs.
Error Handling:Task chains can be configured to stop processing if a task fails, protecting data
integrity by preventing subsequent tasks from running on incomplete data.
Nesting Task Chains:You can nest task chains within other task chains, creating hierarchical "meta
chains" that combine complex processes into a single, manageable unit.
Intelligent Lookup:
SAP Datasphere's Intelligent Lookup uses a combination of fuzzy matching and rules-based
automation to harmonize and merge data from disparate sources, even when they lack common
identifiers or contain inconsistencies. It excels at combining internal datasets with external data,
handling issues like case sensitivity, typos, and extra characters, to simplify data integration and
improve data quality for analytics and business intelligence.
Harmonizing Data:It blends internal and external datasets that may not have a common unique key.
Handling Data Inconsistencies:It effectively combines data from heterogeneous systems despite
inconsistencies such as:
Automated Data Matching:It uses intelligent rules to create mappings and automates the process,
reducing manual effort.
Fuzzy Matching:The core technology allows for matches based on a similarity percentage rather than
exact values, enabling the connection of similar but not identical records.
Interactive Environment:Provides a visual interface for users to review and confirm proposed
matches, with mapping rules needing to be created only once.
Integration with SAP Datasphere:The harmonized results are standard entities that can be leveraged
by other models and consuming tools within SAP Datasphere.
Business Data Fabric Foundation:It forms a cornerstone for a Business Data Fabric, connecting data
from different business contexts to simplify the overall data landscape.
In essence, Intelligent Lookup simplifies the complex task of joining and integrating data, allowing
users to connect and analyze datasets that were previously difficult or impossible to combine using
traditional join methods.
Eg., Imagine you have a view containing 20k electric vehicle charging points across the country,but it
lacks accurate location names.Use fuzzy rule to match charging points with location information.
Databuilder-Intelligent Lookup – Input entity(table or view you want to enrich)-Identify the pairing
column(usually a unique identifier)that links individual objects between the input objects and lookup
table-Define the rules for matching records based on your business requirements.
An SAP Datasphere Entity Relationship (ER) Model scenario involves creating a visual representation
of your data structure, where tables and views act as entities, and you define the relationships (joins)
between them to form a coherent data model.
Scenario: Retail sales analysis using an Entity-Relationship (ER) model in SAP Datasphere
In this scenario, a retail company needs to build a data model to analyze its online sales
performance. The company wants to answer questions like:
To achieve this, a data engineer in SAP Datasphere will create an ER model to structure the data,
define relationships between entities, and enable downstream analysis.
1. Identify entities
The first step is to identify the core business objects or entities involved in the online retail process.
These will typically be represented as tables or views in the data model.
o Attributes: Order ID (primary key), Customer ID (foreign key), Order Date, Status.
o Attributes: Order ID (foreign key), Product ID (foreign key), Quantity, Line Item Price.
2. Define relationships
Next, the data engineer establishes the associations between these entities in the ER model to
reflect the business rules.
Customer to Order (One-to-Many): A single customer can place multiple orders, but each
order is associated with only one customer.
Order to OrderProduct (One-to-Many): A single order can contain many line items
(products), but each line item belongs to only one order.
Product to OrderProduct (One-to-Many): A single product can be included in many line items
across different orders.
1. Ingest data: The data engineer imports the necessary tables from source systems (e.g., SAP
ERP) or external files (like a CSV for product data) into SAP Datasphere using the Data Builder.
2. Create the ER model: A new ER model is created in the Data Builder. The data engineer then
drags and drops the imported tables (Customer, Product, Order) and potentially a new table
for OrderProduct onto the canvas.
3. Create associations: The data engineer graphically links the tables together based on the
defined relationships and cardinality (e.g., 1-to-many).
4. Enrich the model (Optional): Additional information can be added, such as business names,
semantic usage types (like dimensions and facts), and hierarchies.
5. Deploy: The entire model is saved and deployed with a single action, making the
relationships available for subsequent data modeling and consumption.
With the ER model deployed, data consumers can easily build reports.
1. Create a graphical view: In the Data Builder, a business user creates a new graphical view.
2. Add a source: The user drags the Order table (the fact table) onto the canvas.
3. Leverage associations: The user can then click the + icon on the Order table to automatically
add the associated Customer and Product tables from the ER model. This eliminates the
need to manually define the joins.
4. Build calculations: Measures like Total Revenue (calculated from Quantity * Line Item Price)
can be created in the view.
5. Enable analysis: The resulting view can then be used in SAP Analytics Cloud for creating
dashboards and stories, allowing business users to analyze sales data based on any
combination of attributes from the customer, product, and order entities.
Variables are primarily for filtering, while input parameters are for dynamic calculations and more
advanced logic.Variables are bound to attributes; input parameters are more flexible and can be used
in various expressions.Variables filter at the semantic node; input parameters can be applied at any
projection level.
Input parameters, by applying conditions earlier, can sometimes lead to better performance by
reducing the data set earlier in the processing chain.Dynamic date calculation (e.g., filtering for the
last month's data)
IP_DATE_FROM: Type Direct, semantic type Date, with a default value expression
like ADD_MONTHS(date(now()), -1).
IP_DATE_TO: Type Direct, semantic type Date, with a default value expression like date(now()).
Use them in a filter expression: Filter a date column using the BETWEEN operator: Date_Column
BETWEEN '$$IP_DATE_FROM$$' AND '$$IP_DATE_TO$$'
Name it IP_DISCOUNT_RATE.
SAP BW bridge: This component (optional) creates a data staging layer specifically for supporting SAP
S/4HANA ODP extractors. This is mainly for the existing SAP BW and BW/4HANA customers to enable
a path towards conversion with SAP Datasphere, BW bridge, or Hybrid use case
SAP Landscape Transformation Replication Server (SLT): This server replicates tables from SAP
S/4HANA to the SAP Datasphere and can be integrated into your Replication Flow.
SAP Datasphere is a unified service for data integration, cataloging, semantic modeling, data
warehousing, and virtualizing workloads across all your data.
Spaces, as part of the SAP Datasphere solution, are virtual (team/project/department) environments
where your administrator can assign users, add source system connections, and allocate memory and
storage.All data acquisition, preparation, and modeling in SAP Datasphere happens inside spaces. A
space is a secure area. Space data cannot be accessed outside the space unless it is shared to
another space or exposed for consumption. Therefore all data related workflows in SAP Datasphere
start with the selection of a space.