Ab Initio One Architecture
Ab Initio One Architecture
Contents
Storing and Managing Big Data...................................................................................................................2
Everything is Graphical................................................................................................................................3
Ab Initio applications are graphical......................................................................................................3
Business Rules are graphical................................................................................................................4
Application orchestration is graphical.................................................................................................4
Ab Initio applications are graphical......................................................................................................5
Business Rules are graphical................................................................................................................5
Application orchestration is graphical.................................................................................................5
The Ab Initio Engine.....................................................................................................................................6
The Co>Operating System provides unlimited scalability....................................................................7
The Co>Operating System is a distributed processing system.............................................................7
The same Co>Operating System engine does real-time, service-oriented architectures (SOA), and
batch....................................................................................................................................................8
Metadata.....................................................................................................................................................9
Data Quality...............................................................................................................................................10
Storing and Managing Big Data
What if you have hundreds of terabytes – or even multiple petabytes – of raw data that you need
to keep online and process on a regular basis? Storing that data in a traditional relational database
can be prohibitively expensive. If the pattern of access is write-once/read-many, then Ab Initio’s
Indexed Compressed Flat File (ICFF) facility may be for you.
The Ab Initio ICFF facility makes it easy to take hundreds of gigabytes to multiple petabytes of
raw data, compress it, index it, and store it in a set of traditional flat files that may be dispersed
across multiple disks, possibly on different servers. ICFFs require very little temporary storage,
and because the standard (non-proprietary) compression utilities used by ICFFs often achieve
compression ratios of 1:10, the physical storage difference between an ICFF and a database can
easily be a factor of 10, if not 20. One-tenth or one-twentieth as much disk can mean real money
savings.
The performance of ICFFs is also extraordinary. Data can be added to an ICFF at the rate of
hundreds of thousands of records per second – in real-time! These records can be looked up
immediately after they are received – big bulk loads that might take minutes or hours don’t get in
the way. Indexed lookup performance is limited only by the total number of disk arms that can
move at any one moment. Full table scans enable parallel streams of data from each data
partition to flow into Ab Initio applications, and these applications can run in parallel across as
many CPUs as desired. ICFFs can even be queried via SQL, with support for federated queries
across an ICFF and datasets stored in databases and/or flat files. For the right applications, an
ICFF can’t be beat.
Enterprise Computing
There are many technologies for implementing and understanding business data processing
applications.
Ab Initio has a single architecture for processing files, database tables, message queues, web
services, and metadata. This same architecture enables virtually any technical or business rule to
be graphically defined, shared, and executed. It processes data in parallel across multiple
processors, even processors on different servers. It can run the same rules in batch, real-time, and
within a service-oriented architecture (SOA). It supports distributed checkpoint restart with
application monitoring and alerting. And, this same architecture enables end-to-end metadata to
be collected, versioned, and analyzed by non-technical users.
This single architecture is what makes Ab Initio a general-purpose data processing platform.
Users don’t need to stitch together a collection of technologies to get the job done. Everything
from Ab Initio is designed from the beginning to form a unified processing platform – fully
integrated by definition, as opposed to by marketing.
The Ab Initio architecture manifests itself through a wide range of technologies and capabilities,
all of which are built on the same architectural foundation. These capabilities fall into the
following general categories:
Everything is Graphical
Core to Ab Initio is the simple idea that everything should be graphical. Applications should be
graphical. Rules should be graphical. Orchestration, metadata, data management, and so on – no
matter how big or complex, all should be graphical. Let’s start with applications:
Ab Initio applications are represented as “dataflow graphs,” or “graphs” for short. These
applications consist of components that process data and flows that move data from one
component to the next:
Data flows from the input components, through all the processing components, and finally to the
output components. This is so basic that most software professionals design applications in this
manner on paper, on whiteboards, or in drawing programs (such as Visio). The difference with
Ab Initio is that these are not just pretty pictures – they are full-fledged applications that can
implement the most complex business processes.
These dataflow applications can connect to practically any type and source of data, legacy or
modern, structured or semi-structured, real-time or web services or batch. The components can
come from Ab Initio, from end-user programs (usually with no changes), or from 3rd party
products.
The Ab Initio technology for designing, implementing, and testing applications is the Graphical
Development Environment (GDE). The GDE is tightly coupled with the processing engine, the
Co>Operating System.
Rules are the heart of applications. Because they embody the details of the business, it is critical
that business users be able to at least understand their implementation. Even better is when the
business users are able to specify and test the rules. This is accomplished with Ab Initio’s
Business Rules Environment (BRE).
The substrate for rules is simple: They are specified in the form of spreadsheets, something that
all business users are intimately familiar with. Below is an example of a rule in a loyalty card
application for deciding what reward level to allocate to each customer:
The Ab Initio Business Rules Environment goes far beyond graphical specification of business
rules. Go here to learn more.
Large applications consist of many graphs, and managing the orchestration of these graphs is
also done graphically. Such an orchestration is called a “plan,” and the elements of a plan are
“tasks.” Tasks correspond either to Ab Initio graphs or to other non-Ab Initio programs or shell
scripts. Depending on how they are connected, tasks can make decisions as to which other tasks
will be executed in what sequence. Thus, plans are very much like flowcharts, though they also
contain information about execution constraints and resource management.
Plans are controlled by Ab Initio’s Conduct>It technology. Conduct>It also provides operational
monitoring and scheduling capabilities. For details, see here.
Core to Ab Initio is the simple idea that everything should be graphical. Applications should be
graphical. Rules should be graphical. Orchestration, metadata, data management, and so on – no
matter how big or complex, all should be graphical. Let’s start with applications:
Ab Initio applications are graphical
Ab Initio applications are represented as “dataflow graphs,” or “graphs” for short. These
applications consist of components that process data and flows that move data from one
component to the next:
Data flows from the input components, through all the processing components, and finally to the
output components. This is so basic that most software professionals design applications in this
manner on paper, on whiteboards, or in drawing programs (such as Visio). The difference with
Ab Initio is that these are not just pretty pictures – they are full-fledged applications that can
implement the most complex business processes.
These dataflow applications can connect to practically any type and source of data, legacy or
modern, structured or semi-structured, real-time or web services or batch. The components can
come from Ab Initio, from end-user programs (usually with no changes), or from 3rd party
products.
The Ab Initio technology for designing, implementing, and testing applications is the Graphical
Development Environment (GDE). The GDE is tightly coupled with the processing engine, the
Co>Operating System.
Rules are the heart of applications. Because they embody the details of the business, it is critical
that business users be able to at least understand their implementation. Even better is when the
business users are able to specify and test the rules. This is accomplished with Ab Initio’s
Business Rules Environment (BRE).
The substrate for rules is simple: They are specified in the form of spreadsheets, something that
all business users are intimately familiar with. Below is an example of a rule in a loyalty card
application for deciding what reward level to allocate to each customer:
The Ab Initio Business Rules Environment goes far beyond graphical specification of business
rules. Go here to learn more.
Large applications consist of many graphs, and managing the orchestration of these graphs is
also done graphically. Such an orchestration is called a “plan,” and the elements of a plan are
“tasks.” Tasks correspond either to Ab Initio graphs or to other non-Ab Initio programs or shell
scripts. Depending on how they are connected, tasks can make decisions as to which other tasks
will be executed in what sequence. Thus, plans are very much like flowcharts, though they also
contain information about execution constraints and resource management.
Plans are controlled by Ab Initio’s Conduct>It technology. Conduct>It also provides operational
monitoring and scheduling capabilities. For details, see here.
The Ab Initio Engine
Rules, dataflow applications, and orchestration plans are by themselves just graphics. It is the
Co>Operating System that brings them to life.
The Co>Operating System is the single engine for all processing done by Ab Initio’s
technologies. Its capabilities are unrivaled and, because all other Ab Initio technologies are built
on top of the Co>Operating System, they inherit all of those capabilities in a consistent manner.
That’s what happens when a system is architected from first principles!
There are, of course, many details to getting scalability right. If a single detail isn’t right, the
system doesn’t scale. Ab Initio has sweated all those details so that you don’t have to – details
necessary for applications to process tens of billions of records per day, store and access multiple
petabytes of data (that’s thousands of terabytes), and process hundreds of thousands of messages
per second. That said, to process thousands of messages per second, or gigabytes to terabytes a
day, there is no substitute for experienced and sophisticated technical developers. Ab Initio
enables these people both to be remarkably productive and to produce systems that truly work.
Large enterprises inevitably have a mixture of servers distributed across a network. Getting these
servers to cooperate on behalf of applications is a challenge. That’s where the Co>Operating
System comes in. The Co>Operating System can run a single application across a network of
servers – each server running a different operating system. The Co>Operating System makes all
these systems “play nice” with each other.
For instance, an application can start with components on the mainframe because that’s where
the data is, run other components on a farm of Unix boxes because that’s where the compute
power is, and end with target components on a Windows server because that is where the report
is supposed to end up. The fact that this application spans multiple servers is irrelevant to the
developers and the business users – all they care about is the business flow and the business
rules. The Co>Operating System knows where the work is supposed to be performed, and does it
there.
The same Co>Operating System engine does real-time, service-oriented
architectures (SOA), and batch
It’s a real hassle when an application was built to run in batch mode and now the business wants
to go to real-time transactions. Or when the business decides that a real-time application needs to
process very large nightly workloads, but it can’t, because a real-time architecture can’t make it
through millions of transactions in just a few hours. In both cases, the same business logic has to
be reimplemented with the other methodology, and then there are two different and incompatible
underlying technologies. Twice the development work, twice the maintenance.
Not so with the Co>Operating System. With the Co>Operating System, the business logic is
implemented just once. And then, depending on what the logic is connected to, the application is
batch, real-time, or web service-enabled. The same logic can be reused in all those modes,
generally with no changes. All that is required to build applications that can span these different
architectures is Ab Initio’s Continuous>Flows, the real-time processing facility of the
Co>Operating System.
Metadata
OK. You’ve got lots of data. You’ve got lots of processes. You’ve got lots of datastores. But do
you have any idea how they tie together? If the CFO asks, “How was that field in the XYZ report
calculated?” could you answer her? What would it take to find the answer? Even answering that
question can be a project!
With Ab Initio, the answer comes with the push of a button. This is because the Ab Initio
Enterprise Meta>Environment (EME) captures all metadata from Ab Initio applications as well
as from many other technologies, and then presents this information using an intuitive,
interactive, drill-down user interface:
The Ab Initio EME goes far beyond just lineage and impact analysis. It is a complete enterprise-
class and enterprise-scale metadata solution architected to manage the metadata needs of
business users, technical developers, and operational staff. It handles many types of metadata
from different technologies in three categories: business, technical, and operational. For
example:
The EME integrates these different types of metadata and as a result multiplies their value. For
example, this integration enables end-to-end data lineage across technologies, consolidated
operational statistics for comprehensive capacity planning, and fully linked data profile statistics
and data quality metrics. And all this information is presented in a graphical manner that is
appropriate to both business and technical people.
In addition, the EME supports a robust Software Development Life Cycle (SDLC) for
application development. Versioning, checkin/checkout, locking, branching, promotion,
governance – all are built-in to the EME.
Go here to learn about the EME and all the types of metadata it integrates.
Data Quality
Everybody wants it. Some have it. Most don’t. Data quality is very hard for organizations to
define, measure, monitor, and improve. But it matters because inaccurate information can lead to
bad business decisions and ultimately affect the bottom line.
Addressing data quality requires an enterprise approach. A company’s data and its systems are
distributed. The data comes in every conceivable format. Some systems process huge numbers of
records or transactions each day. Some systems are batch, some are real-time. Many are built
with Ab Initio. Most are legacy. Only a general-purpose approach can address data quality in
such a wide variety of situations.
Problem detection and correction are integrated into the processing pipeline of
applications across the enterprise. The Co>Operating System makes this easy – it can be
deployed on practically any platform, can scale to any data volumes or transaction rates,
and can process any kind of data. It comes with a library of standard validation rules and
can graphically express complex data validation and cleansing rules. The Business Rules
Environment allows business analysts to specify and test their rules in a spreadsheet-like
environment. All of this logic can be integrated directly into existing systems, whether
they were built with Ab Initio or not.
Issue reporting is handled by Ab Initio’s Enterprise Meta>Environment. The EME
collects statistics from data profiling and data validation, and computes data quality
metrics. It then provides a single point for data quality reporting by combining data-level
statistics and metrics with various data quality dashboards. System lineage diagrams
include data quality metrics so that the source of data quality problems can be identified
graphically.
Quality monitoring is performed by the Data Profiler. It can reveal issues with the
contents of datasets, including data values, distributions, and relationships. Using the
Data Profiler operationally allows subtle changes in data distributions to be detected and
studied.
Ab Initio’s data quality design pattern is based on a set of powerful, reusable building blocks.
These building blocks can be customized and integrated into all aspects of a production
environment. Because these building blocks all run on the Co>Operating System, they can be
deployed across the enterprise.
Finally, Ab Initio technology supports collecting, formatting, and storing information about
individual data quality problems, for root cause analysis and to inform the design of solutions to
both rare, “one of a kind” issues as well as systemic data quality problems.
Operations Management
It’s 10 PM – do you know where your jobs are? Your hundreds of jobs? Your thousands of jobs?
Your tens of thousands of jobs? Whatever! Real-world management of complex operations is a
tough and dirty job, and everybody knows that.
Ab Initio makes a real dent in the problems confronting operations staff: Conduct>It provides
graphical process orchestration, as well as complete job monitoring and scheduling. These
capabilities can be run independently – customers with traditional enterprise-class schedulers
may continue to use them and leverage just the Conduct>It monitoring facilities.
The Conduct>It Operational Console can track tens of thousands of jobs and their dependencies
and, in a monitoring dashboard, show status and performance information for each.
Red-light/green-light signaling, alerts and alarms, and SLA warnings all come from this
information. The Operational Console can slice and dice the stats of all those jobs, as well as log
them in the Enterprise Meta>Environment (EME) for long-term historical and capacity-planning
analysis.
Learn more about Conduct>It and the Operational Console.