Dzone Researchguide Databases
Dzone Researchguide Databases
DZONE.COM/GUIDES
Databases: Evolving Solutions and Toolsets. We’ve come a long way in 4 Key Research Findings
BY JORDAN BAKER
the world of databases. Hierarchical databases entered the playing
field first in the late 1960s, paving the way for network and then 7 The Multiple Facets of Time Series: From
Operations Optimization to Business Strategy
relational databases in the 1970s, which quickly gained traction. BY DANIELLA PONTES
structured XML databases still take the stage. 14 AgensGraph: A Graph DB Helping to Solve
World Hunger
Now, we can’t function without databases. 82% of people from BY JO STICHBURY
our DZone Guide to Databases survey have 10 or more years of 18 Performance Tuning and Monitoring Traditionally
Falls to the Database Administrator
experience with databases, showing just how much they have
BY MONICA RATHBUN
permeated developers’ everyday lives. However, developers are
24 Cross-Platform Deployments in Azure With BASH
still not containerizing their databases as much as you might think. BY KELLYN POT’VIN-GORMAN
expect containers will transform database technology over the next 36 Databases Solutions Directory
five years, so it will be interesting to see how this number changes in
43 Diving Deeper Into Databases
coming years.
Andre Lee-Moye
developers prefer and which one suits their needs more. Suha Shim
Content Coordinator
Acquisition Marketing Mgr.
SALES
Kendra Williams Lauren Ferrell
Cathy Traugot
DZone’s 2019 Guide to Databases: Evolving Solutions and Toolsets Sr. Director of Media Sales Content Marketing Mgr.
Content Coordinator
databases, and handling dynamic data in both SQL and graphs. We’ll Jim Dyer Sarah Sinning
Sr. Account Executive Staff Writer
also look into time series data and Azure on Linux with databases. Tevano Green
Sr. Account Executive
PRODUCTION
Brett Sayre Chris Smith
Thanks for reading, and we hope you enjoy! Account Executive Director of Production
Alex Crafts Billy Davis
Key Account Manager Production Coordinator
Craig London Naomi Kromer
Key Account Manager Sr. Campaign Specialist
Executive data when asked the same question last year. Respondents who
persist their time-series data with a relational database fell from
64% to 56% this year. Respondents using a non-time-series-
Summary
specific NoSQL database for this data also declined — from 19%
to 16% this year.
IMPLICATIONS
Adoption of specialized time-series databases appears to be
BY KARA PHELPS EDITORIAL PROJECT MANAGER, PUBLICATIONS, DEVADA
gaining momentum among organizations that rely on time-series
At the foundation of software development, databases are data. In a corresponding way, the use of SQL and NoSQL databases
essential building blocks. New developers often learn database for time-series data also seems to have fallen over the past year.
technology first. Applications rely on databases to deliver data
from an ever-increasing array of sources — securely, at scale, RECOMMENDATIONS
If you work with time-series data, it may be worth it to look into
with little to no latency. To prepare for the DZone’s 2019 Guide
how specialized time-series databases might fit your particular
to Databases, we surveyed 639 tech professionals to learn how
use case, if you haven’t already done so. For an introduction or
they’re using database technology every day, and how they
a refresher, check out “What the Heck Is Time-Series Data (And
expect databases to change in the near future. Let’s dig into some
Why Do I Need a Time-Series Database)?” We’re also featuring an
key results.
article on implementing time-series databases later in this guide.
Key Research
• Respondents reported using four main programming lan-
guage ecosystems
Findings
–– 74% reported client-side JavaScript
–– 41% work with Node.js
–– 41% told us the use the Python ecosystem
• 61% of survey takers work for enterprise-level organizations best suit the current knowledge of developers at their organization
(27%). This last point could prove crucial in understanding why SQL
–– 23% for organizations sized 100-999
databases continue to hold an edge over NoSQL databases, at least
–– 20% for organizations sized 1,000-9,999
in terms of adoption rates. The older and more established of the
–– 18% for organizations sized 10,000+
two technologies, SQL databases could very well bring with them a
• Respondents work on three main types of software. larger user base and thus simplify the recruiting and onboarding of
new developer talent within an organization.
–– 69% develop web applications
–– 60% are developing enterprise business applications
Looking at the numbers we gathered around specific SQL and
–– 24% develop native mobile apps
NoSQL databases, we continue to see that SQL databases have
SURVEY RESPONSES
Within the next five years, what technology/trend do you think What security protocols do you implement to protect the data
will most transform database technology as we know it? in the databases you are responsable for?
a slight advantage in terms of adoption ratings. When we asked ment systems, let's turn to what respondents told us about their
respondents what database management system they use in actual data storage and management practices. In keeping with
production, the six most popular databases reported included four the popularity of SQL databases in professional settings delineated
SQL and two NoSQL databases. The adoption rates of these six above, we found that an overwhelming majority of respondents
databases are as follows: use relational data storage models to persist their data. In fact, this
model of data persistence has proved far and away the most pop-
• PostgreSQL: 49% ular means of data persistence over the past several years. In our
• MySQL: 46% 2018 database survey, we found that 88% of respondents reported
using relational data persistence models, with key-value coming in
• Oracle: 43%
as the second most popular persistence model with a 57% percent
• Microsoft SQL Server: 40%
adoption rating. In this year's survey, relational persistence held
• MongoDB: 29% steady, with 89% of respondents using this model, whereas key-val-
• Redis: 25%
ue persistence models fell to a 53% adoption rating.
As you can see, the top four are all SQL databases. Interestingly, Looking at the number of persistent storage models that respondents
PostgreSQL and MySQL swapped spots from last year's survey, with reported using in their projects, we find some interesting year-over-
PostgreSQL increasing from its 37% adoption rate in our 2018 sur- year fluctuations as well. Over the past two years, the use of per-
vey, and MySQL falling from its 49% adoption rate last year. When sistent storage models proved the most popular with survey takers,
we examine databases used in a non-production environment, we with 37% of respondents reporting to use two persistent storage
see much of the same. MySQL (46%), PostgreSQL (44%), MS SQL models in each year's survey. But, the number of survey takers using
Server (33%) and Oracle (31%) were the four most used, followed by either one or three persistent storage models varied considerable
MongoDB (30%) and Redis (22%). between 2018 and 2019. In our 2018 survey, 20% of respondents
reported using one persistent storage model; this year, that number
While SQL databases seem to have a decided edge in current rose to 30%. Conversely, we found that the percentage of survey
adoption rates among respondents, NoSQL databases, specifically takers using three persistent storage models in their projects fell from
MongoDB, make up ground in other categories. When we asked sur- 24% in 2018 to 13% in 2019. Thus, while persistent storage models are
vey takers to tell us which database management system they most
largely remaining relational in their set up, in number they seem to be
enjoy working with, 39% reported PostgreSQL, 30% said MySQL,
drifting more to one or two per project.
and 30% told us MongoDB. And, of the 17% of respondents looking
to adopt a new DBMS in the next year, MongoDB and PostgreSQL Unlike the use of relational data models, the use of time-series data
tied for their leading candidates (19% each). saw some fluctuation. In 2018, 81% of respondents told us their
company collected some kind of time-series data. In this year's
Database Management and Data Storage database survey, this number fell to 67%. Among those survey
Now that we've discussed the place of SQL and NoSQL databases takers who do use time-series data, the percentage of them who use
in the field, and the popularity of particular database manage- either SQL or NoSQL databases for their time-series data also fell. In
SURVEY RESPONSES
Which database management systems do you, personally, Which persisent storage models do your application use?
most enjoy working with?
2018, 64% of respondents who worked with time-series data used Despite this low adoption rate, however, when we asked survey tak-
a relational database, which dropped to 56% in this year's survey; ers to tell us what technology or trend they thought will transform
similarly, those who used NoSQL databases for their time-series database technology over the next five years, 23% said containers.
data fell from 19% to 16%. While the adoption rates for these two Interestingly, containers proved the second most popular response
widely popular models of data storage dropped among time-series to this question on potentially transformative technologies, beat
users, the instance of respondents using a specialized time-series out only by stream processing/real-time analytics (30%).
database grew from 12% in 2018 to 22% in 2019.
SECURIT Y
Popular Tools and Database Security While the world of database technology is seeing a greater interest
DATABA SE TOOL S in security arise in recent years, possibly due to several high-profile
Having addressed the fluctuations and trends in the types of data breaches, 26% of respondents reported that their organization is
storage models above, let's now take a moment to examine the planning on adding new security protocols in the near future. Ad-
tools used to host these databases and DBMS, namely servers (and ditionally, the year-over-year trends for the usage rates of security
where they're located) and containers. protocols either stayed stagnant or fell. In 2018, 91% of respondents
told us they use authentication to protect the data in their database;
Despite the constant advances being made in cloud computing,
in 2019, 88% reported to use authentication. Similarly, in last year's
the adoption rates of cloud, hybrid, and on-premise solutions
survey, 37% of respondents told us that they sanitized their user in-
seemed to remain rather static year-over-year. When we asked
puts, and in 2019 this number stayed fairly steady, with 38% claim-
survey takers where the data they work with typically resides, 46%
ing to sanitize user inputs. What was potentially alarming, however,
told us on-premise (compared to 47% in 2018), 30% host their
was the drop in respondents using encryption to protect their data.
data in the cloud (compared to 32% in 2018), and 20% use a hybrid
In 2018, 62% of survey takers reported that their organization used
hosting environment (compared to 19% in 2018). Additionally, we
encryption to secure the data they stored in their database; but, this
saw little to no yearly fluctuation in the percentage of databases
year, that number fell to 49%.
reported to be containerized. Given that discussing the percentage
of respondents who containerize a certain percentage of their
databases is difficult to convey in prose, here's a quick breakdown
of this year-over-year trend:
Popular Container
2018 2019
Strategies
0% of databases
53% of respondents 52% of respondents
are containerized
100% of databases
13% of respondents 10% of respondents
are containerized
11-25% of databas-
es are container- 10% of respondents 9% of respondents
ized
SURVEY RESPONSES
Where does the data that you're working with typically reside? How is your time split between working with SQL and NoSQL
databases?
BY DANIELLA PONTES
SENIOR PRODUCT MARKETING MANAGER AT INFLUXDATA
Time-series data seems to be everywhere lately. Today’s data-driven directions and points back to them. Preemptiveness is the name of
society demands optimization — we all want to be on top of what we the game for IT Ops.
consume and spend, and we want to make the most out of what we
Business visionaries have also contracted the “nothing-to-waste bug”
do and buy.
— but with a sense of opportunity turned on. New entrants who bet on
Lower performance and downtime (of any nature) is not merely seen performance improvements are winning big in commoditized markets.
as an inconvenience, but directly translates to waste: whether of Entire new markets are also being created around collecting measure-
resources, revenue opportunities, brain power, working hours, or a ments. The sensor-ification of our world (and of ourselves) is driven by
perfect sunny day. It all boils down to the only thing that levels us: the our increasing dependence on data to make sound decisions.
unrecoverable loss of time.
The Rise of Purpose-Built Time-Series Databases
Consumers and professionals share the same realization that things Time series is a direct byproduct of the current demand for optimiza-
must be constantly improved because there is no time to waste. tion. As the saying goes, “one can only improve what has been mea-
Application developers know that visitors will not spare another sured.” IT Ops, DevOps, and data-driven professionals are taking this
second on a sluggish website or on an outdated and clumsy ap- wisdom to the limit. Real-time monitoring allows them to act on fresh
plication. Ensuring flawless continuous integration and delivery is data to avoid missing profit opportunities or loss of revenue. A subtle
becoming a priority for DevOps teams and application managers. event not caught by large sampling intervals could mean a disaster
in the making — the root cause of a sequence of degradations and
As for IT operations teams, they know that problematic infrastruc-
eventual downtime.
ture and networks will consume their days with complaints, alerts,
and diagnosing and remediating problems. As the clock ticks, the Real-time monitoring means very fine-grained sampling and obser-
compounding impact of arising issues creates dissatisfaction in all vation of events as they occur. Some latency-sensitive industries,
such as finance and gaming, run on nanosecond technologies. Think order to handle the volume of data being generated, collected, trans-
billions of measurements a day. formed, and observed.
“Time series” are sets of time-stamped measurement data of the — as professionals as well as consumers, no matter the context. The
same monitored object, collected over time. The samples don’t need for flawless performance and increased dependence on data
replace previous ones. They accumulate. This already differen- for visibility permeates all layers and relations.
tiates time series from other types of records, which are usually
Quick interactions and precise actions are the expectation. We use
updated. Time-stamped measurements are stored in series,
observation and prediction modeling to increase our odds to get it
queried in large scans, and plotted in dashboards for visualization
right. But collecting measurements randomly without a clear goal
of behavior over time.
may produce the opposite results: an inability to act objectively.
Finding patterns in an apparently uncorrelated, complex, and
confusing data pool is like a treasure hunt — we know the answer
is in the data, but we cannot see it without the right tools. We want
to extract meaning from time-stamped data using the lens of time,
“Time series” are sets of time- and time-series databases have become the channel to tap into
such data.
stamped measurement data Businesses are already embracing time series beyond the usual
use cases — seeing it as a tool for differentiation and taking a more
of the same monitored object, strategic approach. Purpose-built time-series platforms have been
adopted as part of the following business models:
collected over time. • As a central full-stack monitoring platform for IT Ops aimed
at creating a common practice and format for collecting and
monitoring time-series data across the organization. This
becomes especially important for companies that have mul-
tiple sites, that go through M&As, and who have to deal with
Depending on the type of measurement, the data is kept in raw multiple development and operation centers.
form or aggregated for long-term storage and modeling purposes.
• As a performant time-series storage engine supporting
Data processing can be done upfront by applying real-time stream
applications and solutions that extract their value from time-
processing for alerting. Cross-analytics with other measurements
stamped data. Examples are: resource utilization monitoring
for more insightful indicators is important to consider — it provides
and planning solutions, performance monitoring solutions
additional observation dimensions. Data volume should not be un-
and services, and the whole IoT industry.
derestimated, either. It will continue to grow, and some sets of data
will require long-term retention. Therefore, when seeking time-series
• As part of a competitive strategy to differentiate from estab-
solutions, you should be looking for a platform that can ingest, pro-
lished players. In this case, time series must be integrated into
cess in real time, query, and store time-stamped data at scale.
various layers. It must provide visibility into critical trans-
How time series are collected — and what you do with them — de- actions, including third-party services. From infrastructure
pends on the use case. Keep in mind that they comprise buckets of to business indicators and internal processes, all must be
measurements with which you can devise your unique monitoring optimized to achieve SLAs and SLOs.
recipe for success. Purpose-built design yet again tips the scale
IT OPERATIONS CENTRALIZED MONITORING
toward multiple data types, and therefore, providing more options to
Full-stack monitoring is one of the most common use cases for
compose your monitoring strategy.
time series. IT Ops and DevOps have long been collecting time
In short, time-series databases must be performant databases in series to monitor their infrastructure and applications. What's
changed is the complexity, geo-distribution, and speed of modern instrumentation tools. Synthetic transactions, real user monitoring
IT environments. The amount of collected metrics exploded with (RUM), and distributed tracing also fall under APM. Each of these
hybrid deployments and microservice architecture running on giga- subcategories has its own set of functionalities, features, and inter-
bits-per-second networks. Siloed solutions showing partial views of faces as value propositions. Focusing on the unique value proposi-
what is going on are ineffective, yet still generate tons of data that tion aspects would yield quicker results, and would better reduce
needs to be examined. But when you are in a downtime situation risk to market, than would developing everything from scratch.
losing revenue, every minute counts. A survey from 2017-2018 Particularly with regard to the storage tier (where performance and
revealed that about one-quarter of respondents estimated a loss scalability is crucial), it would be better to adopt a platform already
between $300K-400K per hour of downtime. proven to be time-series performant and efficient.
Dealing with an incident goes through phases that could be summarized COMPETITIVE ADVANTAGE VIA SERVICE EXCELLENCE
as: detection, investigation or diagnosis, and remediation. Reducing In one way or another, all companies promise to be more reliable,
detection-to-remediation time has a direct impact on revenue and cheaper, and faster in order to win business. And we all know that,
credibility to the organization. Therefore, organizations are in search in practice, what is delivered is quite far from the promised land.
of efficiency as well as efficacy. Billions of ad hoc metrics in silos will That has been the status quo: customer complaints versus compa-
not provide the answers in a timely manner. It is necessary to be smart nies’ ability to dodge them. However, times are different now, main-
about the data collected and analyzed. ly for online e-commerce and retail companies. Customers are one
click away from landing on the competition’s website. So, no more
Creating a standard and a guideline for full-stack monitoring
empty promises — online shoppers are non-committal, and will only
organization-wide is a path that many IT Ops and DevOps teams
give you another second of dedicated attention if the previous one
are adopting. But in order to implement such a plan, it is necessary
was satisfactory.
to have in place a scalable platform that can accommodate the re-
quirements from various business units and teams. For instance, it’s Performance optimization for competitive advantage relies on
essential to be able to ingest multiple types of time-stamped data, the ability to observe code, systems, networks, applications, and
numeric and non-numeric, as well as provide configurable precision transactions all in the context of customer experience. Purpose-built
and retention policies. time-series platforms empower developers to implement application
environments with integrated full-stack and business KPI monitoring,
One key insight to avoid excessive monitoring data is to use critical
intelligent alerting, and workflows.
business indicators as red alerts, and the rest of the stack data for
quick diagnosis and recovery. Symptoms of a malfunctioning or When starting with a business goal, DevOps, developers, and
underperforming application environment should be detectable on performance engineers adopt time series as a means to an end,
all layers — infrastructure, network, and application — but not alerted tailoring unique measurements to achieve performance bench-
on all outliers. That would lead to a saturation of responses. On the marks to translate into competitive advantage. By taking a
other hand, any degradation of business key performance indicators strategic hands-on approach to instrumentation, and defining and
(KPIs) should be acted on immediately, with access to the full-stack monitoring critical metrics, small players are entering and even
monitoring data for correlation and quick diagnosis. By generating dominating saturated markets.
more intelligent alerts, Ops and DevOps can handle the increasing
complexity and speed of modern IT environments.
“MySQL is not intended for time series data… I can testify it is like
pounding nails with a screwdriver. It’s definitely not what you
want to do in any relational database.”
Bringing Time Series our processes and resource utilization in our quest for
optimization, centralization of time series data also cuts
PARTNER SPOTLIGHT
InfluxData
Act in Time
Category Time Series Data Platform New Release Quarterly release cycles Open Source? Yes
Graphs
03. Comparing data to build knowledge
graphs
BY MAX DE MARZI
GRAPH DATABASE EXPERT
Graph databases are predominantly known for two things: fiercely to each other by relationships. These relationships are typed
finding the missing relationships in your data and finding the and directed.
relationships that shouldn't exist at all. These two basic concepts
A user node DRIVES a car node, at least today, but in the near
make up the basis for two of the most popular use cases, which are
future, the car node may be the one doing the driving. A node can
recommendations and fraud detection.
have zero or millions of relationships of one type or thousands
In recommendations, we are looking for a person you should know of types to a single other node or to every node in the graph; it's
because you both know some of the same people, a product you completely dynamic.
should buy because shoppers with similar behavior have bought
Since there are no columns in graph databases, each node can
that product, a movie you should watch because people who share a
have completely different properties from another node even if
similar taste in movies liked that movie, and so on.
they are both the same kind of node. One user node may have
In fraud, we are looking at how a back-office insurance agent is just a username and an email, while the other may also have their
friends with Mark, who is married to Isabelle, who is a cousin of height, weight, date of birth, etc. One may have a string value for the
the person involved in the insurance claim, or complex fraud rings nickname property, and the other may have an array of strings.
in money laundering schemes, or over-connected synthetic data
Lastly the relationships themselves may have properties, typically
that looks legitimate at first glance but forms long chains of shared
storing data like the strength of the relationships, when the
attributes when analyzed by its relationships.
relationship came to be, and if it ended. They are very different from
There are many other use cases, but before we go there, we need to traditional relational databases and must not be treated the same way.
understand what a graph database is and what it is not.
For starters, you cannot talk to them using SQL. Remember, SQL
Graph databases do not have tables, columns, rows, or foreign keys. stands for Structured Query Language, where the schema of the
All they have is these small objects floating around in space, holding data is painstakingly argued over by DBAs in some dark corner of the
office. Is this name property a varchar(32) or varchar(48), or is it an "keep going," making recursive and variable path queries a breeze.
nvarchar(48)? Is this weight property a smallint, decimal, real, or Since every node knows exactly what it is connected to, you don't
do we fall back to varchar(32)? How do the different tables connect? have to spell out the joins in a horrifically long and error-prone SQL
Which foreign keys point which ways? How many join tables do we query. Being able to simply ask how these two things are connected,
need? Is it "one to many" or "many to many?" regardless of what is between them and the number of hops it takes,
is magic.
Those questions don't happen in graph databases since properties
and relationships are dynamic. Dealing with mixed data types across Magic is exactly what large enterprises need to make sense of
multiple databases is a huge pain. Before starting any master data their data. Often due to legacy systems, mergers, and acquisitions,
management or data reconciliation project, the DBAs must design a enterprises end up with many siloed databases, each with their own
monster of an entity relationship diagram (ERD) the size of an entire version of the truth.
wall with foreign keys crisscrossing tables in a giant hairball. Few
But what happens when the data in the billing database and the
folks outside the DBA team can understand just what is going on.
operational database don't match? What do you do when you have
But when you show a graph representation of the data to business low confidence in your data? How do you make decisions using
stakeholders, they begin to understand their data. It's how they analytics and machine learning when you don't even know the data
often think of their business in terms they understand, not database the queries and models were built on are true?
tech jargon. Engineers and business can now talk about data in
To deal with these problems, companies are importing these siloed
a language they both understand. Well, almost. To query graph
datasets into multiple graphs and comparing them to one another.
databases like Neo4j, Memgraph, RedisGraph, SAP HANA, and others,
They are looking for data missing from one system that exists in
we still need to use dynamic data languages like Cypher.
another and data incorrectly related in one system versus the others.
Graph databases are being used to solve problems in a way that They are giving data seen multiple times a high confidence score, and
have never been solved before, not because people who use them sending data disputed amongst systems to analysts for reconciliation.
are any smarter, but because they are applying a different tool to
The goal of these types of projects is twofold: One goal is to prevent
the problem. If you hit a screw with a hammer hard enough over
and automate fixes to the source systems, and the other is to build
and over, it will eventually go into the wood, but sometimes turning
the newly popular knowledge graph, which then becomes the
it with a screwdriver will be a better solution. Relational databases
golden record of data and a solid foundation for analytics, enterprise
and graph databases are very different tools; you can think of Cypher
data APIs, and machine learning projects.
like a screwdriver in a land of SQL hammers.
QUICK VIEW
BY JO STICHBURY
FREELANCE TECHNICAL WRITER
In this article, we describe how data scientists in one of the world's To determine regional food security, the VAM unit gathers various data
largest humanitarian organizations are using a graph database, Agens- including economic analysis, thematic analysis, and geospatial analysis.
Graph, to analyze rich datasets and determine food security within However, another strand of their work is to determine the characteris-
vulnerable regions. tics of an area by carrying out surveys through face-to-face or over-
the-phone interviews. A single survey may have over 400 questions,
Introduction and hundreds of thousands of local households and food suppliers are
In the most extreme situations, such as war, flood, hurricane, earth- interviewed each year.
quake, or fire, there are many vulnerable people left in need of critical
nutrition to prevent serious health problems or even death. As part of The surveys are carefully designed and highly structured, but the
the United Nations, the World Food Programme (WFP) assists such peo- amount of data is large, diverse, and extremely complex because:
ple around the world. Every day, too many men, women, and children
• It is impossible to create one single universal standard that
lack a daily nutritious meal, according to the WFP --- which estimates
encompasses all possible health, livelihood, and socioeconomic
that 821 million people still go to bed on an empty stomach each night,
variables for the many different regions at risk.
and that one in three members of the global population suffer from
some form of malnutrition. • A multitude of relationships exists among the variables them-
selves. As Gaurav Singhal, the lead data scientist at UN WFP,
Assisting 91.4 million people in around 83 countries each year, the WFP
puts it, "Food is not just something you eat. In rural areas, food
has, on any given day, 5,000 trucks, 20 ships and 92 planes on the move.
is something you grow, food is something you transport and
The logistics alone are significant, and it is essential that the WFP are
something you sell."
prepared to deliver the most effective response to every crisis. The Vul-
nerability Analysis and Mapping (VAM) unit within the WFP consists of 200 Diverse Data Demystified
food security analysts based in more than 70 countries such as Somalia, Let's consider a specific example of how the VAM unit works with the
South Sudan, and Yemen. Their mission is to provide needed help in the data they collect to infer new information and determine the most at-
most timely and cost-effective way possible. Many factors are behind food risk areas and people. This example is taken, with kind permission, from
poverty. Not only does the team need to understand the level of food inse- Gaurav Singhal's presentation given at the Postgres Conference Silicon
curity in a particular region, but why that insecurity occurs. Valley in 2018.
The VAM unit needed to consolidate their data about food consumption
and expenditure. This data was collected by surveying different regions
about diverse food groups, different units of measurement, varying cur-
rencies, and multiple food sources (such as whether food is purchased
from a market, grown at home, or whether it comes via government dis-
tribution). How to collate this vital data into a single, useful repository?
In the era of Big Data, many organizations are finding that traditional
RDBMS cannot keep pace with the variety of data formats and their
rapidly changing nature. As just one example of the NoSQL database
revolution, graph databases are increasingly proving themselves as a
powerful technology in the data management community today. Gart-
ner predicted in 2015 that at least 70% of leading companies would pilot
a graph database project by the end of 2018.
E X AMPLE DATA
AgensGraph is a multi-model graph database based on PostgreSQL that Even in this simple example, it is clear that the graph database queries
allows developers to integrate the relational and graph data model and inference allowed the UN WFP to find a way to:
into one database. SQL and Cypher can be used in the same query, • Identify the households with the most pressing malnutrition issues
which allows developers to build upon the power of a graph database • Prioritize distribution of items that aligned with food consump-
alongside a legacy SQL environment for data management. PostgreSQL tion pattern analysis
users can easily migrate relational data into the graph with support from • Plan appropriate nutrition education programs for long-term
PostgreSQL extensions. AgensGraph also supports features such as ACID sustainability
transactions, MVCC, triggers, and JSON, and provides a browser to visu-
If you're interested in seeing AgensGraph in more detail, here is a video
alize the graph contents, forming an intuitive model of the real world.
that illustrates how how the WFP work with the database.
COMPARE COUCHBASE
VS. MONGODB
TM
couchbase.com/CBvsMongo
NoSQL for Highly Couchbase is also the only database that combines the best of
NoSQL with the power and familiarity of SQL in a single elegant
Scalable Business-Critical platform spanning from any cloud to the edge. We give developers
all the tools to develop web, mobile, and IoT applications with
Applications in Any Cloud unmatched agility. And our comprehensive SQL-compatible query
language provides ANSI joins, nest, unnest, group, sort, and other
data manipulation functions.
Make no mistake, the requirements of today’s business-critical With our cloud-native architecture, businesses can easily deploy
applications far exceed the intended capability of traditional to any cloud – public, private, or multi-cloud. Developers can build
relational databases. Modern applications must process a growing and test applications in the exact environment where they’ll be
variety, velocity, and volume of data. And they have to rapidly deployed. And our Kubernetes operator makes it simple to run and
adapt to constantly evolving technologies, run in distributed manage Couchbase as a service in any cloud.
PARTNER SPOTLIGHT
BY MONICA RATHBUN
MICROSOFT MVP FOR DATA PLATFORM AND MICROSOFT CERTIFIED SOLUTIONS EXPERT
SQL Server database performance tuning and monitoring tradition- use SET SHOWPLAN_ALL prior to executing your code. You have two plan
ally falls to the Database Administrator (DBA), but in reality, good options: Estimated Plan (CTRL +LM) and Actual (CTRL + M). Estimated
performance starts as applications are being developed. If you are a plans will give you what the query optimizer will likely do and what the
developer, it is critical to know how your application affects database estimated rows will be without actually executing the T-SQL or batch at
performance. You must also take steps to continually monitor and tune compile time. In almost every case, the "shape" of your execution plan
performance. Knowing what tools are available is a good way to start. (the query operators that the optimizer chooses) will be the same in an
actual and estimated plan. In performance tuning, getting an estimated
Execution Plans plan is a good place to start; however, we will focus on the actual execu-
The most important element in understanding database performance is tion plan. The actual execution plan will depict exactly which operators
the query execution plan. The execution plan contains all the informa- were utilized to execute the query.
tion that SQL Server's Query Optimizer uses to execute the query. The
Query Optimizer is responsible for road-mapping the most efficient plan
to execute the provided query, using the lowest resource consumption.
This execution plan is a best guess based on the column statistics that Figure 1
SQL Server maintains on your data. Note: it is very important to keep the
USE AdventureWorks2016CTP3;
statistics of the database tables and indexes up to date, in order to be able
GO
to create optimal execution plans. Once the execution plan is generated,
SET SHOWPLAN_ALL ON;
it contains runtime information, usage metrics, and warnings such as GO
missing indexes, implicit conversions, and tempdb spills (think of this as
your query paging to the file system). Getting a look into what your code is Here is a query that looks at all discounted products no longer active
doing when it reaches the database is vital to performance tuning. with a unit price greater than 100. One of the key things in the plan to
look at is the query cost. The CPU, IO, and memory consumption are
To view execution plans in SQL Server, you must be granted SHOW- some of the factors that SQL Server uses in determining the cost of the
PLAN permissions for all databases in that query. Once you have the query. The cost is an abstract value. You can use this number to compare
proper permission, it's easy to access the execution plan. In SQL Server between two queries --- but because it's an estimate, it means you can't
Management Studio, click on the icon highlighted in red in Figure 1, or count on it as an accurate measure of performance. Still, it is important
as a pointer to what may need to be tuned or changed if the cost is high pages that have to be read from memory or disk. The lower the number,
relative to the overall query. the faster the read.
Before running the query, take note of the two SET statements. These
are two settings you'll want to begin using. SET STATISTICS IO displays
statistics on the amount of page activity generated by the query. It gives
you important details, such as page logical reads, physical reads, scans,
read aheads, and lob reads, both physical and logical. SET STATISTICS
TIME displays the amount of time needed to parse, compile, and execute
each statement in the query. The output shows the time in milliseconds
for each operation to complete. This allows you to really see, in num-
bers, the differences one code change can make.
USE AdventureWorks2016CTP3;
GO Figure 3
SET STATISTICS IO ON
GO You will also notice in the execution plan SQL Server told us we are miss-
SET STATISTICS TIME ON; ing an index, and it gives us the actual index script we need to create. If
GO
you right-click on the missing index and choose "Index Details," it will
SELECT Name, [Description],[UnitPrice],[UnitPriceDis-
open a new window with the script you need to run. Be sure to name
count]
your index with proper naming conventions before you run it.
FROM [Production].[Product] p
INNER JOIN [Production].[ProductDescription] pd USE [AdventureWorks2016CTP3]
ON p.ProductID = pd.[ProductDescriptionID] GO
INNER JOIN [Sales].[SalesOrderDetail] s CREATE NONCLUSTERED INDEX IDX_UnitPrice_UnitPriceDis-
ON p.[ProductID]=s.ProductID count
WHERE SellEndDate is not NULL ON [Sales].[SalesOrderDetail] ([UnitPrice],[UnitPrice-
AND UnitPrice>100.00 Discount])
AND UnitPriceDiscount<>0 INCLUDE ([ProductID])
ORDER BY [Name],[UnitPrice] GO
GO
Here is the improved plan:
Notice the 85% cost against SalesOrderDetailTable. By adding a Miss-
ing Index to this, the plan tells us we can estimate an 83% performance
gain. In addition, note the SCAN on the Product.ProductID. For a query
returning larger set rows, we always want the index to SEEK to the value,
not SCAN. There are times when SQL Server will decide to do a scan if the
table is small enough rather than try to seek an index. SEEK means that
instead of reading the entire index, SQL Server is only reading the pages
it needs to return the rows meeting your predicate (such as the WHERE
clause, or what is being filtered out).
Figure 2
Now let's take a look closer at what SET STATISTICS IO and TIME gave
us. Looking at the output, we see the table had 1266 logical reads and a
CPU time of 31 ms. We will use these numbers to compare performance
improvements. These numbers are important because there are 8,000 Figure 4
Now, looking at the plans, you can see the difference. We have added a are put in place --- so you can directly track those impacts. It can also
SEEK on Product.ProductID, and we can see the plan is now using our help you identify and tune top resource-consuming queries. Hundreds,
new index for SalesOrderDetail. Its cost is only 29%. This has effective- thousands, and even millions of queries are processed by a server in
ly given us a more efficient plan. about a second. Digging through all of them to identify any issues can
be very difficult. Query Store helps with that by doing the analysis for
After you run the execution plan, you have the ability to save it by just
you. In the past, to perform this kind of analysis, you had to access SQL
right-clicking in the execution plan area and choosing Save Execution
Server's caches in real time to capture the data you needed, which made
Plan as shown below in Figure 5.
after-the-fact tuning impossible. Fixing those performance issues after
identifying them will depend on the issue, but Query Store makes is
easy to find them.
Allowing insight as to what code could benefit from tuning, it tracks the
number of times a query is called as well as which queries are waiting on
server resources. It provides a way to analyze usage patterns and identify
high resource consuming queries. It allows you, as developers, to monitor
code you have written and to address any issues that may arise.
Figure 5 First you need to enable the feature. Use the latest version of SQL Server
Management Studio to see the most recent Query Store options and
Additionally, SQL Server gives you the option to compare different
analysis available. If you are working on a lower version, you will not see
execution plans. You'll notice in Figure 5 there is a "Compare Show-
all the reports listed.
plan" option. This allows you to compare the active plan with a saved
plan. The top plan is the active query and the lower will be the saved USING THE GUI
query chosen for comparison. This information can be very useful and In SQL Server Management Studio Object Explorer, right-click on the da-
should be added to your testing check lists. When comparing the plans, tabase, click "Properties," and then select "Query Store." In Operational
Management Studio highlights similar query operators by shading them Mode, choose "Read Write." To get started, change the Query Store
with the same color. The best place to start looking for differences is in Capture Mode to "Auto" (in SQL Server 2016 and 2017 the default for this
the properties pane, which you can see on the right in Figure 6 --- where is "All"), which reduces capture overhead on your server.
there's a "not-equal" sign, you can easily see the difference in costing
these two plans.
Figure 6
QUERY STORE
Next let's look at Query Store, a feature that was introduced in SQL Serv-
er 2016 and iterates on execution plans for the developer. Query Store
captures execution plan history and runtime statistics, and it retains Figure 7
them for review and trending. It is a great tool for finding regression over
a time period (last hour, day, week, and so on), as changes to your code You can also enable using T-SQL:
USE [master] tion. Clicking on each bubble will show you their plan in the bottom part
GO of the screen. If you find better plans available for a query --- in cases
ALTER DATABASE [AdventureWorks2016CTP3] SET QUERY_STORE such as parameter sniffing --- you can "force" a plan, which will require
(OPERATION_MODE = READ_WRITE, QUERY_CAPTURE_MODE = AUTO)
the Optimizer to use the selected plan.
GO
As queries start to run, Query Store will start to keep plans and analyze
them for overall query performance.
Figure 9
Your devops
will love
to run Redis
Your developers
will love
to build with Redis
Redis Enterprise
Organizations typically use Redis Enterprise because of its shared-
nothing clustering architecture enabling horizontal and vertical
scaling with the press of a button. In addition, Redis Enterprise can
be deployed in a geo-distributed architecture, preserving data
consistency, by leveraging Conflict-free Replicated Data Types (CRDT).
PARTNER SPOTLIGHT
Category In-memory database platform New Release A major release each year, and Open Source? Yes
minor releases every few months
Whitepages chose Redis Enterprise to handle time-sensitive • Shared-nothing cluster architecture with always-on availability and
customer transactions and maintain sub-hundred millisecond effortless scaling.
latency of its application and provide a consistent end-user
• High throughput by running multiple Redis instances on multiple cores.
experience of its digital identity verification services for
businesses and consumers. With Redis on Flash, Whitepages
saves up to 70% in infrastructure costs while delivering single
Notable Customers
digit latency.
• Dell • TD Bank Website redislabs.com
As Whitepages expands its identity dataset beyond North America, • Staples • Home Depot
• Intuit • Mastercard Twitter @redislabs
it increasingly relies on Redis on Flash, and enterprise-grade Redis
from Redis Labs, to keep operational costs low and performance • Vodafone • Atlassian Blog redislabs.com/blog/
high. redislabs.com/customers/whitepages/ • Microsoft
QUICK VIEW
Cross-Platform
01. Azure commands can be used
to deploy everything from Virtual
Machines (VMs), Database Servers, and
Deployments in Azure
Virtual Networks and Services.
BY KELLYN POT’VIN-GORMAN
DATA PLATFORM ARCHITECT, MICROSOFT
When you think deployment automation, you may think cloud, been deployed. Several scripts in different languages are provided
but do you think BASH shell scripting with automation in Azure? along with the templates — PowerShell, Ruby, .NET, and a tab
Most may not even realize it's a common thing. I've been work- titled CLI. The CLI portion is just a BASH script.
ing heavily in this area for the last year and it's been incredibly
valuable and the cost savings are evident, as any cloud migration
can be. The customers I work with, mostly in Higher Education, are
working hard to embrace the cloud, but they may not have the ex-
pertise, the resources, or the time to gain the technical knowledge If you have any doubts about this, the dead give-away is that when
needed to deploy to the cloud. Many in their organizations feel you click on the CLI tab in the template output, the first line in the
that if they could just get the infrastructure deployed, they could script sets the shell:
easily get up to speed to manage the resources, but the new skills
they imagine they'll have to add to their team often creates an
overwhelming roadblock.
highly dependent on complex JSON and they require the user to 5. Create the VM with the appropriate OS and database installa-
learn this along with all the other areas of the Azure cloud. As I re- tion chosen as part of our values that were entered earlier in
searched how to make the most of what I already knew, I learned the interactive part of the script.
Azure has an extensive library of AZ commands that can be used 6. Create the script to:
from the command line. PowerShell has its own built-in Azure
–– start the listener.
commands, but you can also use PowerShell scripts with the AZ
–– start the instance.
commands in a manner similar to what I will demonstrate in this
–– create the database.
article's examples.
–– check the status of the database.
The format for these commands is: –– log information on the VM about our database environment.
az <command> --<parameter> <value> --<parameter> 7. Once the script is written to disk, run a remote session to the
<parameter/template file> VM and execute the script to perform the actions.
There are two reasons you would move to the AZ commands vs. This may sound like a lot to perform, but if you use a set scripting
the JSON from the templates: format, the rest is rather simple. Most of the script can be recycled
for future deployments of other VMs, too. BASH grants us the abil-
1. Dynamic environment variable generation from the ity to perform this type of work and many DBAs may have already
script down. been doing this at some level, just not in the cloud.
2. Less code to manage.
The new skill that will need to be added for the Oracle DBA is
The JSON in the templates export every minute detail of the de- learning the AZ commands. If you're a PowerShell DBA, then
ployment. It is an excellent artifact, but less pliable or manageable running through the steps for the AZ commands and database
for deployments, granting me more power using the AZ com- creation will be necessary. Luckily, there is impressive reference
mands with little to no JSON template/parameter files. material and regular new feature releases to help make this easier
for all of us.
Azure commands can be used to deploy everything from Virtual
Machines (VMs), Database Servers, and Virtual Networks and
Step by Step
Services. When paired with shell scripts, (BASH or PowerShell) the
All my scripts begin the same way: they have a section of com-
automation can incorporate logical deployments of objects and
ments describing the script, along with its purpose. The format
data. With the addition of Azure DevOps CLI, advanced analytics
of the steps in the script then proceeds the same way each time.
features such as Azure Data Factory pipelines and dataflows can
There's a very good reason for this:
also be included.
1. Ease of readability.
An Oracle VM on Azure
2. Level of maturity.
To help demonstrate this, we'll look at a use case for deploying an
Oracle VM in Azure. Although this is a cross-platform deployment I won't include all the scripts here, just the main areas that are of
of a Linux VM and Oracle on Microsoft Azure, it's a simple task importance. The full script can be found, along with other resource
using BASH scripting and AZ commands. deployments, in my GitHub repository.
Our script will perform the following: To begin on our Azure scripting journey, I need to capture the
following values to populate the environment variables as we
1. Designate the arguments that need to be input to fulfill the
requirements for the script to run successfully. move forward:
2. Set the values for the environment variables for the user's usage() { echo "usage: $0 -g <groupname> -s <sku> -v
deployment. <version> -o <oraname> -sz <size> -s -l <zone> -gn <gdb>
3. Execute the commands to deploy the resource group (sim- -sp <syspassword> -stp <systempassword> -pp <pdbpass-
plified management of resources by containing them to a word> " 1>&2; exit 1; }
single grouping).
declare groupname=""
4. Create the virtual network as part of the VM on which the
declare sku=""
database will reside.
• SKU: Used to call values from the Azure catalog for Azure # Create the latest version of Oracle VM installations
SKU to be use for the sizing of the VM. By querying the Azure availavlbe and push to a file
catalog, our script has longevity vs. static values set in the az vm image list --offer Oracle --all --publisher Oracle
>db.lst
script.
• Version: Also used as part of the calls to the Azure Catalog. if [[ -z "$sku" ]]; then
echo "Here's the installation version, from
This will be for the database version.
12c through 18c available for Oracle: "
• Oraname: Name for the Oracle VM server.
cat db.lst | grep sku | awk '{print $2}'| tr -d
• Size: From the Azure catalog, size of the VM server. \"\,
echo "Enter the version you'd like to in-
• Adminuser: This is the VM administrator's name.
stall, the numbering convention must be exact, feel free
• Zone: Location Zone in Azure (when choosing a zone, I like to copy from the list and past here:"
to list out the available zones for the user). read sku
[[ "${sku:?}" ]]
• Gdb: Global Database Name.
fi
able from the catalog until all the values are entered and, if the
user answers the questions, the script proceeds, if not, it exits, Create the Oracle VM
deploying nothing. There are three steps to building the Virtual Machine that will
house my Oracle database.
Deploying to Azure
1. Create the VM.
As part of my steps, there is one environment variable (urn, for the
2. Open the ports required to gain remote access to the VM.
value of the image) that is set dynamically from the db.lst work
3. Attach the disk to be used for the database.
file after parsing the $sku and $version values. I also set my logfile
that will track all the output I write to it as part of my deployment. Note that in each of my AZ commands, I reference the variables
After these two steps are complete, it's time to create our group that were set earlier in my script.
that will contain all the resources in Azure.
Now it's time to create our script to perform the work on the VM to
az group create --name $groupname --location $zone build out the database:
Notice in the next step that I set the group and location as default # Get IP Address and log into VM
My script requires that I know the IP address for remotely logging tnsnames.ora"
into my VM with SSH (secure shell). I capture this using the "az echo " (ADDRESS = (PROTIOCOL = TCP)(HOST = $vmh)
(PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames.
network" command. You'll also note that I take advantage of the
ora"
very valuable grep/awk/tr(im) BASH commands. Once I'm finished,
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
I need to use the concatenate (cat) utility to write to a new script
echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/
(mkdb.sh) for everything inside the EOM to EOM sections in the next tnsnames.ora"
section of my script. We start with: echo " (SERVICE_NAME = pdb1) >> $ORACLE_HOME/net-
work/admiin/tnsnames.ora"
• sudo (switch user domain owner) to the Oracle user. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
• mkdir (make the directory) for the datafiles. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
echo " >> $ORACLE_HOME/network/admin/tnsnames.ora"
• export the oracle home (oh).
echo " CDB1 = >> $ORACLE_HOME/network/admin/tnsnames.ora"
• start the listener.
echo " (DESCRIPTION = >> $ORACLE_HOME/network/tnsnames.
ora:
dbca -silent \
echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/
-createDatabase \
tnsnames.ora"
-templateName General_Purpose.dbc \
echo " (ADDRESS = (PROTOCOL = TCP)(HOST = $vmh)
-gdbname $gdb \
(PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames.
-sid cdb1 \
ora"
-responseFile NO_VALUE \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-characterSet AL32UTDF8 \
echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/
-sysPassword $syspassword \
tnsnames.ora"
-systemPassword $systempassword \
echo " (SERVICE_NAME = cdb1) >> $ORACLE_HOME/net-
-createAsContainerDatabse true \
work/admin/tnsnames.ora"
-numberOf{DBs 1 \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-pdbName pdb1 \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-pdbAdminPassword $pdbpassword \
-databaseType MULTIPURPOSE \
# third part of dynamic script build, listener reload
and db final check
-automaticMemoryManagement false \
-staorageType FS \
lsnrctl stop
-datafileDestination /home/oracle/data \
lsnrctl start
-ignorePreReqs
export ORACLE_BASE=/u01/app/oracle
(Oracle Network Configuration Assistant) from the command line
export ORACLE_HOME=/u01/app/oracle/product/$oh/db_Home1 to perform this same task on my list of enhancements.
export vmh=$vmip
Once it does write to the file, I need to stop and restart the listener,
#Create the TNSNAMES entry so this is what I add as steps to finish up my mkdb.sh creation script.
echo " PDB1 = >> $ORACLE_HOME/network/admin/tnsnames. # Log into new database to verify creation
ora" sql plus connect as /sysdba <<EOF
echo " (DESCRIPTION = >> $ORACLE_HOME/network/admin/ spool ./ck_db.txt
tnsnames.ora"
echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/ connect system@cdb1
select count(*) from dba_objects; When I executed the script, I entered the following information:
zone=EastUS and oraname=ora122db8.
sho con_name;
alter session set container=pdb1;
Note, the $oraname variable is used to populate the name for
sho con_name;
all the resources, making the naming convention dynamic and
connect / as sysdba
select count(*) from dba_objects; simplified. The user now can use traditional tools to manage the
spool off database, add enhanced security at the database level, or begin
exit development. Scaling up is easy via the portal and requires little
EOF training. If the organization desires to migrate existing Oracle data
EOM
into the Azure deployment, pre-existing Oracle migration tools can
To check everything, I log into the database with SQLPlus, do a be used to make easy work of it, as well.
After all of this, you'll see an End of File (EOF) and End of Message
(EOM) to close out all the writes to secondary files and scripts as
part of this process.
With a little ingenuity,
The last part is to execute it all from my deployment server to the
VM, which is done with the last AZ command, az vm run-command: the scripting skills you
# Run script created dynamically in the section above this
one. already have with BASH
az vm run-command invoke \
-- name $oraname \
-- command-id RubShellScript \
or PowerShell can be
--scripts "./mkdb.sh"
used to deploy to Azure.
Once completed, this script can be run and the VM, along with the
secondary script creations, don't take very long, but the database
creation that's part of the secondary script does take some time.
You will get a prompt with the percentage completed:
UNITED
with MariaDB Platform
Mariadb.com/downloads
Facing Applications products based on active carts, recent purchases and current
inventory. This query is too difficult for a database. It needs to
perform aggregates on many rows. Nor is it practical for a data
warehouse. It will return a small number of rows, and many
columns per row.
There are databases for transactions, and there are data
warehouses for analytics. Not long ago, analytics was used The solution is a hybrid transactional/analytical database with
to derive the actionable insights to improving operational row and columnar storage, streaming change-data-capture
efficiency and mitigating risks. Today, it’s used to improve to synchronize them and intelligent query routing to use the
customer experience and customer engagement – to provide right storage for the right query. To use columnar storage for
customers with actionable insights. analytics (e.g., find the top soon-to-be-sold-out products) and
to use row storage for transactions( e.g., look up those five
However, when it comes to customer-facing applications and
products). This is the architecture of MariaDB Platform.
analytics, databases offer too little and data warehouses offer
too much. A database is great for browsing products, adding
them to a cart and completing a purchase. These queries return WRITTEN BY MARIADB
PARTNER SPOTLIGHT
MariaDB Platform
MariaDB Platform is the enterprise open source database for data-driven, customer-facing applications
and hybrid transactional/analytical workloads.
Category DBMS New Release MariaDB Platform X3, Open Source? Yes
January 2019
QUICK VIEW
Executive Insights on
performance.
BY TOM SMITH
RESEACRCH ANALYST, DEVADA
To learn about the current and future state of databases, we spoke like Spark and Kinesis have perfected the collection of data, there
with and received insights from 19 IT professionals. Here’s what are still several challenges in the analysis and operationalization of
they shared with us: the data.
databases, it appears that more clients are looking to find the right 9. To be more successful with database projects, developers need to
tool to solve the particular business problem they are working to understand the concepts of the databases they are working with as
solve. There’s also been increased adoption of database-as-a-service well as the foundational elements of AI/ML. Application developers
solutions as more companies move to the cloud and they can offload need to have a rudimentary understanding of how data is being
many administrative and operational functions. stored and accessed, as well as how the database is going to help you
achieve the goal of the application. Do some upfront work with the
5. The most frequently mentioned use cases are in financial services
database before you get into the specifics of how the application and
and retail, and revolve around forecasting, risk mitigation, and
database will work together. Be able to determine the best tool for
improving customer experience. Customer journey and hyper-per- the job.
sonalization efforts in particular are being spearheaded by marketing
across a number of industries and have specific data needs. Classical Understand how to interact with databases. Understanding database
use cases like risk management, fraud detection, and inventory man- architectures, database design, access paths/optimizations, and even
agement are being rebuilt with built-in real-time analytics. how data is arranged on disk on in-memory Is useful, particularly if
you are a backend developer.
6. The most common issues with databases revolve around scale and
knowledge. The number one issue is around scale, performance, and It’s also beneficial to understand basic data modeling techniques,
cost efficiency. You need to be able to design the database and IT normalization versus denormalization, SQL, when to use foreign
stack to cope with the explosion of data. Data volumes are only going keys, execut on plans, prepared statements, different types of joins
up. However, you must also be able to access and analyze the data. available, data obfuscation, and encryption.
It’s not very valuable unless you are gaining insight from it.
Here are the contributors of insight, knowledge, and experience:
Data management can be difficult if you don’t have a knowledge- • Ragu Chakravarthi, SVP and Chief Product Officer, Actian
able staff to handle regular database patches and updates, disaster • Joe Moser, Head of Product, Crate.io
recovery tests, and to implement and follow established processes
• Brian Rauscher, Director of Support, Cybera
and procedures.
• Sanjay Challa, Director of Product Management, Dactical
7. Given the vast number of vertical-specific solutions available, the • OJ Ngo, CTO, DH2i
future of databases is fluid with IoT, artificial intelligence and machine • Anders Wallgren, CTO, Electric Cloud
learning, real-time computing at the edge, quantum computing, and • Johnson Noel, Senior Solutions Architect, Hazelcast
the automation of it all. IoT is now producing unseen volumes and • Adam Zegelin, SVP Engineering, Instaclustr
sets of data. Companies will use real-time analytics and enhanced BI • Daniel Raskin, CMO, Kinetica
reporting to build a data-driven culture. Leading database vendors
• James Corcoran, CTO of Enterprise Solutions, Kx
are automating common DBA tasks like performance tuning, monitor-
• Neeraja Rentachintala, V.P. of Product Management, MapR
ing, disaster recovery, high availability, low latency, and auto-scaling
• Mat Keep, Senior Director of Product & Solutions, MongoDB
based on historical workloads.
• Philip Rathle, V.P. of Products and Matt Casters, Chief Solution
8. The biggest concerns regarding databases today are around secu- Architect, Neo4j
rity and complexity. Security of data at rest continues to be a major • Ariff Kassam, V.P Products, NuoDB
issue, as exemplified breaches at Equifax, StackOverflow, and several • Dhruba Borthakur, co-founder and CTO, Rockset
other targets. We still have many companies that pay very little atten- • Erik Gfesser, Principal Architect, SPR
tion to database security. Poor database management practices and • Lucas Vogel, Owner, Vogel Endpoint Systems
procedures are revealed with data breaches and cyber hacks.
• Neil Barton, CTO, WhereScape
The complexity and the sheer amount of data that’s in the open is a
concern. There are so many options to consider, it’s hard to under-
TOM SMITH is a Research Analyst at Devada who excels
stand the subtle differences and determine the optimal solution for at gathering insights from analytics—both quantitative
the business problem you are trying to solve. This is exacerbated by and qualitative—to drive business results. His passion
vendors who misrepresent what their products can do at scale. Every is sharing information of value to help people succeed. In his spare
time, you can find him either eating at Chipotle or working out at the
vendor should ship a utility to help customers verify the capability of
gym. LinkedIn - Twitter
the vendor.
The arguments about including the database in DevOps are now It’s Not All Bad News
over and discussions have moved on from whether to include it to Rather than being a blocker to database DevOps, the requirement
how to introduce it. to protect data provides a blueprint for introducing a database
DevOps process that helps companies comply with data privacy
The latest Accelerate State of DevOps Report, for example, talks
legislation. There are four natural steps.
about database version control and the automation of database
changes alongside the application, and concludes: When teams Introduce standardized team-based development, using
follow these practices database changes don’t slow them down, or database version control to maintain a single source of truth for
cause problems when they perform code deployments. the code, encourage collaboration, and create an auditable record
of who changed what, when.
Redgate’s 2019 State of Database DevOps Report also revealed
that 83% of organizations which have already adopted DevOps
think it will take less than a year to automate and deploy database
changes – and 45% say it will take less than six months. The requirement to protect data provides
They’re doing so because it’s been recognized that database a blueprint for introducing a database
DevOps stops the problems of database deployments and
increases the speed of delivery of database changes.
DevOps process that helps companies
comply with data privacy legislation
That’s the Good News
But then there’s the elephant in the room I mentioned in the
title of this piece. Data privacy. A wave of data breaches and
concerns over how personal data is used have prompted new data Automate deployments by using a continuous integration
protection regulations to be introduced everywhere. process to trigger an automated build when changes are checked
into version control. This tests the changes and flags up any errors,
Since the enforcement of the GDPR in the EU, countries as far apart so a stable build is always available.
as New Zealand and China have enacted similar legislation, and
the California Consumer Privacy Act is the first of many state laws Monitor availability as well as performance by managing
to be introduced in the US. permissions, ensuring data is available and identifiable, and
having a record of which servers and what data is being managed.
What’s important isn’t the new legislation – it’s the common
strands that are emerging. One of those is that new regulations Protect and preserve data by introducing a data masking tool to
require personal data to be protected throughout the pseudonymize and anonymize data in database copies outside
The problem is that the State of Database DevOps Report I In this way, you can streamline processes, automate repetitive
tasks, minimize errors, and release value to end users faster. All
mentioned earlier also revealed that 65% of respondents use
while still being compliant with new data privacy regulations.
a copy of their production database in development and test
environments. The same database that contains the data that
WRITTEN BY MARY ROBBINS
needs to be protected. PRODUCT MARKETING MANAGER, REDGATE
actian.com/data-management/
Actian Actian NoSQL Object-oriented Free solution
versant-nosql-object-database/
actian.com/data-management/
Actian Actian X RDBMS Free solution
actian-x-hybrid-rdbms/
guides.rubyonrails.org/active_
ActiveRecord ActiveRecord ORM Included in Rails
record_basics.html
Amazon Web
DynamoDB KV, document, DBaaS Free tier available aws.amazon.com/dynamodb
Services
Amazon Web
SimpleDB Column store Free tier available aws.amazon.com/simpledb
Services
Apache Foundation Apache Cassandra KV, wide column Open source cassandra.apache.org
In-memory, Hadoop,
Apache Foundation Apache Ignite Open source ignite.apache.org
data grid
developer.atlassian.com/server/
Included in Jira &
Atlassian ActiveObjects ORM framework/atlassian-sdk/active-
Confluence
objects
Demo available by
Delphix Delphix Dynamic data plaform delphix.com
request
Available by embarcadero.com/products/
Embarcadero InterBase RDBMS
request interbase
enterprisedb.com/products/edb-
EnterpriseDB EDB Postgres Platform RDBMS Open source
postgres-platform
NoSQL With
No Equal
Production-Ready Cassandra,
100% Open Source Compatible
Another day, another buzzword that surfaces in the data management efficiency, and engagement to data teams (and, by extension, entire
space with promises of changing the way things are done. organizations) that DevOps brings to software development teams.
Although in this case—it could be with good reason. But how, specifically, do companies stand to benefit from adopting a
DataOps mindset?
DataOps, or data operations, refers to the next step in the evolution
of DevOps teams, where data scientists, processes, and tools are Here are several of the benefits DataOps brings to application development:
increasingly integrated into the development process at data-focused
• Real-time insights
enterprises. This makes it easier for companies to leverage real-time data
• Enhanced collaboration across the enterprise
to serve up delightful customer experiences and make better decisions at
• Improved agility and faster response times
a moment’s notice.
• Faster bug fixes
By unlocking the true power of DataOps, companies can build modern, data- • Better decisions and avoid costly missteps
rich applications in less time while operating with more agility and baking • Higher quality products
more automation into their processes to shorten the development lifecycle. • Increased employee engagement
• Stronger user experiences
Companies that successfully implement an agile approach to data
management are four times more likely to enjoy business outcomes that The benefits of DataOps speak for themselves. Much like DevOps teams
exceed shareholder expectations. have enabled software companies to ship better products faster and with
greater frequency, DataOps teams bring the same level of efficiency and
So, while it’s still a relatively new concept, it comes as no surprise that
data-driven decision-making to their organizations.
one study found that 73% of organizations were planning to invest in
DataOps during 2018. As the promise of DataOps is realized, we expect
How to Achieve DataOps
more and more companies to invest in the space in 2019 and beyond.
Change is never easy, especially when employees have been doing things
What is DataOps? one way forever.
The names sound similar and the two practices are related. But DataOps
To launch DataOps at your organization, you first need to get your team to
is not just DevOps with data. Still, just as DevOps transformed the way
buy in to the new way of working. Beyond that, implementing DataOps at
modern applications are developed, DataOps promises to transform the
your organization requires the right suite of tools as well as an underlying
way organizations use data and run data analytics, speeding up data-
active everywhere database that ensures contextual data is always
driven processes considerably.
available in real time—even when teams are globally distributed and
Just as the agile approach to software development comes with a shiny applications are hosted in hybrid cloud environments.
Agile Manifesto, DataOps has its own manifesto, too. The DataOps
While it might take some time to get your team to buy in to DataOps and
Manifesto places emphasis on:
master the associated new workflows, whatever time and resources you
• Individuals and interactions over processes and tools invest to make that happen will undoubtedly return significant dividends
• Working analytics over comprehensive documentation down the road.
• Customer collaboration over contract negotiation
• Experimentation, iteration, and feedback over extensive upfront design Thanks to DataOps, your company will be able to move faster, make
• Cross-functional ownership of operations over siloed responsibilities better decisions, and ship better products while impressing your
customers and inspiring your employees.
The DataOps Manifesto also features a set of 18 core principles, including
“continually satisfy your customer,” “value working analytics,” “embrace A new buzzword—yes. But one that is perhaps buzzworthy.
change,” and “it’s a team sport.”
Available by
IBM IBM DB2 RDBMS ibm.com/analytics/us/en/db2
request
ibm.com/analytics/us/en/
IBM Informix RDBMS, transactional Free tier available
technology/informix
Object-oriented, intersystems.com/products/
InterSystems Cache N/A
relational cache/#technology
Open-source
JOOQ JOOQ ORM for Java jooq.org
version available
NewSQL,
MariaDB ClustrixDB N/A mariadb.com/products/clustrixdb
transactional
marklogic.com/product/
MarkLogic MarkLogic Transactional Free tier available
marklogic-database-overview
Micro Focus Vertica Enterprise RDBMS, column store Free tier available vertica.com/product/on-premise
Part of .NET
Microsoft Entity Framework ORM docs.microsoft.com/en-us/ef
framework
opentext.com/products-and-
solutions/products/specialty-
Available by
OpenText NuoDB RDBMS technologies/opentext-gupta-
request
development-tools-databases/
opentext-gupta-sqlbase
oracle.com/technetwork/
OpenText Gupta
Oracle ORM Free solution database/application-
SQLBase
development/index-099369.html
oracle.com/technetwork/
Oracle Toplink ORM Free solution middleware/toplink/overview/
index.html
Open-source
OrientDB OrientDB Document, graph, KV orientechnologies.com
version available
In-memory, column-
SAP SAP HANA Platform Free tier available sap.com/products/hana.html
oriented RDBMS
In-memory, Hadoop,
ScaleOut ScaleOut StateServer 30 days scaleoutsoftware.com/products
data grid
Available by
Software AG Adabas Stream processing adabas.com
request
Open-source splicemachine.com/product/
Splice Machine Splice Machine NewSQL, RDBMS
version available features
teradata.com/Resources/
Open-source
Teradata Aster Database Specialist analytics Brochures/Aster-Database-
version available
Performance-and-Scalability
Into Databases for Hadoop, R, and data visualization technologies. Not only that, but we
also give you advice from data science experts on how to understand and
present that data.
Cloud dzone.com/cloud
The Cloud Zone covers the host of providers and utilities that make cloud
computing possible and push the limits (and savings) with which we can
deploy, store, and host applications in a flexible, elastic manner. The
Cloud Zone focuses on PaaS, infrastructures, security, scalability, and
hosting servers.
Twitter Refcardz
DevOps for Databases
@SQLEspresso @GFritchey Download this new Refcard to get started with Database Release
Automation and eliminate bottlenecks. Learn the key best practices that
your DevOps database solution should meet in order for you to get the
@KimberlyLTripp @BrentO
most out of your investment.
Books Podcasts
Database System Concepts SQL Data Partners
If you’re just getting started with databases, check out this book to learn all Learn about new and familiar topics in the worlds of SQL Server and
about database management. professional development.
VISIT THE
Database Zone
Managing the growing amount of data effectively is an ongoing concern. From
handling event and streaming data to finding the best use cases for NoSQL
databases, there’s plenty to unpack in the database community.