[go: up one dir, main page]

0% found this document useful (0 votes)
97 views44 pages

Dzone Researchguide Databases

databases research guide
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views44 pages

Dzone Researchguide Databases

databases research guide
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

                 


                             

               
   
       
DZONE.COM/GUIDES

Dear Reader, Table of Contents


3 Executive Summary
Welcome to DZone’s latest Guide (and the last one ever!) on BY KARA PHELPS

Databases: Evolving Solutions and Toolsets. We’ve come a long way in 4 Key Research Findings
BY JORDAN BAKER
the world of databases. Hierarchical databases entered the playing
field first in the late 1960s, paving the way for network and then 7 The Multiple Facets of Time Series: From
Operations Optimization to Business Strategy
relational databases in the 1970s, which quickly gained traction. BY DANIELLA PONTES

Entity-relationship databases came soon after, followed by semantic,


12 Handling Dynamic Connected Data in Graphs
object-oriented, and object-relational databases. Today, semi- BY MAX DE MARZI

structured XML databases still take the stage. 14 AgensGraph: A Graph DB Helping to Solve
World Hunger
Now, we can’t function without databases. 82% of people from BY JO STICHBURY

our DZone Guide to Databases survey have 10 or more years of 18 Performance Tuning and Monitoring Traditionally
Falls to the Database Administrator
experience with databases, showing just how much they have
BY MONICA RATHBUN
permeated developers’ everyday lives. However, developers are
24 Cross-Platform Deployments in Azure With BASH
still not containerizing their databases as much as you might think. BY KELLYN POT’VIN-GORMAN

Only 10% of our survey respondents say they have containerized


32 Executive Insights on the State of Databases
all of their databases — down from 13% in 2018. Still, 23% say they BY TOM SMITH

expect containers will transform database technology over the next 36 Databases Solutions Directory
five years, so it will be interesting to see how this number changes in
43 Diving Deeper Into Databases
coming years.

Also according to our survey, 46% of our survey respondents use


MySQL in non-production environments, with PostgreSQL coming
in second at 44%. However, 39% said they personally most enjoy
DZone is...
working with PostgreSQL and only 30% prefer working with MySQL.
BUSINESS & PRODUCT MARKETING EDITORIAL
But perhaps the most significant rivalry is between SQL and NoSQL. Matt Tormollen Susan Wall Mike Gates
CEO CMO Content Team Lead
SQL databases are considered relational databases, whereas NoSQL Terry Waters Aaron Tull Kara Phelps
Interim General Manager Dir. of Demand Gen. Editorial Project Manager
databases are non-relational or distributed databases. The many Jesse Davis
Waynette Tubbs Jordan Baker
EVP, Technology Publications Associate
differences between the two make it very interesting to explore why Dir. of Marketing Comm.
Kellet Atkinson
Media Product Manager Colin Bish Tom Smith
developers choose one over the other. You’ll be able to see which one Member Marketing Spec. Research Analyst

Andre Lee-Moye
developers prefer and which one suits their needs more. Suha Shim
Content Coordinator
Acquisition Marketing Mgr.
SALES
Kendra Williams Lauren Ferrell
Cathy Traugot
DZone’s 2019 Guide to Databases: Evolving Solutions and Toolsets Sr. Director of Media Sales Content Marketing Mgr.
Content Coordinator

Chris Brumfield Lindsay Smith


dives into more data like this as well as database performance, graph Sales Manager Content Coordinator

databases, and handling dynamic data in both SQL and graphs. We’ll Jim Dyer Sarah Sinning
Sr. Account Executive Staff Writer
also look into time series data and Azure on Linux with databases. Tevano Green
Sr. Account Executive
PRODUCTION
Brett Sayre Chris Smith
Thanks for reading, and we hope you enjoy! Account Executive Director of Production
Alex Crafts Billy Davis
Key Account Manager Production Coordinator
Craig London Naomi Kromer
Key Account Manager Sr. Campaign Specialist

WRITTEN BY LAUREN FERRELL Jordan Scales Michaela Licari


Sales Development Rep. Campaign Specialist
CONTENT COORDINATOR, DEVADA

THE DZONE GUIDE TO DATABASES PAGE 2 OF 44


DZONE.COM/GUIDES

it with a specialized time-series database, compared to just


12% of respondents who claimed to do so with their time-series

Executive data when asked the same question last year. Respondents who
persist their time-series data with a relational database fell from
64% to 56% this year. Respondents using a non-time-series-

Summary
specific NoSQL database for this data also declined — from 19%
to 16% this year.

IMPLICATIONS
Adoption of specialized time-series databases appears to be
BY KARA PHELPS EDITORIAL PROJECT MANAGER, PUBLICATIONS, DEVADA
gaining momentum among organizations that rely on time-series
At the foundation of software development, databases are data. In a corresponding way, the use of SQL and NoSQL databases
essential building blocks. New developers often learn database for time-series data also seems to have fallen over the past year.
technology first. Applications rely on databases to deliver data
from an ever-increasing array of sources — securely, at scale, RECOMMENDATIONS
If you work with time-series data, it may be worth it to look into
with little to no latency. To prepare for the DZone’s 2019 Guide
how specialized time-series databases might fit your particular
to Databases, we surveyed 639 tech professionals to learn how
use case, if you haven’t already done so. For an introduction or
they’re using database technology every day, and how they
a refresher, check out “What the Heck Is Time-Series Data (And
expect databases to change in the near future. Let’s dig into some
Why Do I Need a Time-Series Database)?” We’re also featuring an
key results.
article on implementing time-series databases later in this guide.

SQL Maintains Royalty Status


The Future Is in Stream Processing/Real-Time
DATA
Analytics
98% of survey respondents said they use at least some SQL
DATA
on a regular basis. 31% of respondents said they use SQL
In a new question this year, survey takers were asked to choose
only. Just 2% reported using NoSQL only. The majority of which technology or trend they believe will have the biggest
respondents (65%) reported using a blend of SQL and NoSQL impact on database technology within the next five years. 11%
in their typical projects. chose data protection and the rising consumer demand for
privacy; 18% chose blockchain; 23% chose containers; the largest
IMPLICATIONS group (30%) chose stream processing/real-time analytics.
SQL databases continue to outpace NoSQL in terms of adoption.
When asked why their resources are arranged as they are, 27% IMPLICATIONS
of survey respondents reported that it best suits the current All of these trends are poised to change the face of database
knowledge of developers at their organization. It’s possible that technology in the near future. Many tech professionals with
developer teams simply continue to go with what they know. knowledge of the space believe that stream processing (also
known as real-time analytics or streaming analytics) will lead
RECOMMENDATIONS the most transformation.
Your SQL skills aren’t falling out of favor anytime soon. SQL
will keep its place in a developer’s toolbox for the foreseeable RECOMMENDATIONS
future. The large user base also means that any organization in Stream processing is invaluable when insights need to be
need of SQL database assistance will easily find knowledgeable extracted from large volumes of data in real time. The amount
developers to bring onboard. of data in the world continues to grow exponentially, and the
modern enterprise demands any new information to be acted
The Rise of Time Series upon instantaneously. It makes sense that stream processing
DATA is becoming an essential part of database management. If
When asked how their company persists time-series data, 22% of you’re interested in the concept, take a look at our Refcard on
respondents who actually use time-series data said they persist Understanding Stream Processing.

THE DZONE GUIDE TO DATABASES PAGE 3 OF 44


DZONE.COM/GUIDES

Key Research
• Respondents reported using four main programming lan-
guage ecosystems

–– 79% said Java

Findings
–– 74% reported client-side JavaScript
–– 41% work with Node.js
–– 41% told us the use the Python ecosystem

• Despite the above ecosystem usage rates, Java proved the


most popular primary programming language by far, with
BY JORDAN BAKER PUBLICATIONS ASSOCIATE, DEVADA
55% of respondents using Java in this capacity.

SQL vs. NoSQL: The Battle Continues


The choice between SQL and NoSQL databases is one of the most
divisive questions in the field of database engineering at the mo-
ment. In fact, over half (65%) reported using a mixture of SQL and
Demographics
NoSQL databases in their projects. The most popular database con-
For this year's DZone Database survey, we received 639 respons-
figuration among all respondents turned on out to be using mostly
es with a 62% completion percentage. Based on these answers,
SQL with some NoSQL (44%), with another 31% of survey takers
we compiled some basic demographic information about the
reporting to use only SQL. And, while plenty of developers are work-
respondents.
ing with both SQL and NoSQL databases in their code, only 2% of
• 82% of respondents have 10 or more years of experience respondents reported using NoSQL only. Unlike in our survey from
2018, this year we included an option to report an even division of
• Respondents work in three main roles:
resources between SQL and NoSQL databases. Among respondents,
–– 35% are developers/engineers
11% claimed to divide their resources in this way. When asked why
–– 23% work as architects
they use databases in the way they do, most respondents said it
–– 22% are developer team leads
was either to best organize data to suit their needs (42%) or to

• 61% of survey takers work for enterprise-level organizations best suit the current knowledge of developers at their organization
(27%). This last point could prove crucial in understanding why SQL
–– 23% for organizations sized 100-999
databases continue to hold an edge over NoSQL databases, at least
–– 20% for organizations sized 1,000-9,999
in terms of adoption rates. The older and more established of the
–– 18% for organizations sized 10,000+
two technologies, SQL databases could very well bring with them a
• Respondents work on three main types of software. larger user base and thus simplify the recruiting and onboarding of
new developer talent within an organization.
–– 69% develop web applications
–– 60% are developing enterprise business applications
Looking at the numbers we gathered around specific SQL and
–– 24% develop native mobile apps
NoSQL databases, we continue to see that SQL databases have

SURVEY RESPONSES

Within the next five years, what technology/trend do you think What security protocols do you implement to protect the data
will most transform database technology as we know it? in the databases you are responsable for?


 

 
  
  
 
  




  
 

  
 
     

THE DZONE GUIDE TO DATABASES PAGE 4 OF 44


DZONE.COM/GUIDES

a slight advantage in terms of adoption ratings. When we asked ment systems, let's turn to what respondents told us about their
respondents what database management system they use in actual data storage and management practices. In keeping with
production, the six most popular databases reported included four the popularity of SQL databases in professional settings delineated
SQL and two NoSQL databases. The adoption rates of these six above, we found that an overwhelming majority of respondents
databases are as follows: use relational data storage models to persist their data. In fact, this
model of data persistence has proved far and away the most pop-
• PostgreSQL: 49% ular means of data persistence over the past several years. In our
• MySQL: 46% 2018 database survey, we found that 88% of respondents reported
using relational data persistence models, with key-value coming in
• Oracle: 43%
as the second most popular persistence model with a 57% percent
• Microsoft SQL Server: 40%
adoption rating. In this year's survey, relational persistence held
• MongoDB: 29% steady, with 89% of respondents using this model, whereas key-val-

• Redis: 25%
ue persistence models fell to a 53% adoption rating.

As you can see, the top four are all SQL databases. Interestingly, Looking at the number of persistent storage models that respondents

PostgreSQL and MySQL swapped spots from last year's survey, with reported using in their projects, we find some interesting year-over-

PostgreSQL increasing from its 37% adoption rate in our 2018 sur- year fluctuations as well. Over the past two years, the use of per-

vey, and MySQL falling from its 49% adoption rate last year. When sistent storage models proved the most popular with survey takers,

we examine databases used in a non-production environment, we with 37% of respondents reporting to use two persistent storage

see much of the same. MySQL (46%), PostgreSQL (44%), MS SQL models in each year's survey. But, the number of survey takers using

Server (33%) and Oracle (31%) were the four most used, followed by either one or three persistent storage models varied considerable

MongoDB (30%) and Redis (22%). between 2018 and 2019. In our 2018 survey, 20% of respondents
reported using one persistent storage model; this year, that number
While SQL databases seem to have a decided edge in current rose to 30%. Conversely, we found that the percentage of survey
adoption rates among respondents, NoSQL databases, specifically takers using three persistent storage models in their projects fell from
MongoDB, make up ground in other categories. When we asked sur- 24% in 2018 to 13% in 2019. Thus, while persistent storage models are
vey takers to tell us which database management system they most
largely remaining relational in their set up, in number they seem to be
enjoy working with, 39% reported PostgreSQL, 30% said MySQL,
drifting more to one or two per project.
and 30% told us MongoDB. And, of the 17% of respondents looking
to adopt a new DBMS in the next year, MongoDB and PostgreSQL Unlike the use of relational data models, the use of time-series data
tied for their leading candidates (19% each). saw some fluctuation. In 2018, 81% of respondents told us their
company collected some kind of time-series data. In this year's
Database Management and Data Storage database survey, this number fell to 67%. Among those survey
Now that we've discussed the place of SQL and NoSQL databases takers who do use time-series data, the percentage of them who use
in the field, and the popularity of particular database manage- either SQL or NoSQL databases for their time-series data also fell. In

SURVEY RESPONSES

Which database management systems do you, personally, Which persisent storage models do your application use?
most enjoy working with?


 

 

 
 
 

 


 

   




           

THE DZONE GUIDE TO DATABASES PAGE 5 OF 44


DZONE.COM/GUIDES

2018, 64% of respondents who worked with time-series data used Despite this low adoption rate, however, when we asked survey tak-
a relational database, which dropped to 56% in this year's survey; ers to tell us what technology or trend they thought will transform
similarly, those who used NoSQL databases for their time-series database technology over the next five years, 23% said containers.
data fell from 19% to 16%. While the adoption rates for these two Interestingly, containers proved the second most popular response
widely popular models of data storage dropped among time-series to this question on potentially transformative technologies, beat
users, the instance of respondents using a specialized time-series out only by stream processing/real-time analytics (30%).
database grew from 12% in 2018 to 22% in 2019.
SECURIT Y
Popular Tools and Database Security While the world of database technology is seeing a greater interest
DATABA SE TOOL S in security arise in recent years, possibly due to several high-profile
Having addressed the fluctuations and trends in the types of data breaches, 26% of respondents reported that their organization is
storage models above, let's now take a moment to examine the planning on adding new security protocols in the near future. Ad-
tools used to host these databases and DBMS, namely servers (and ditionally, the year-over-year trends for the usage rates of security
where they're located) and containers. protocols either stayed stagnant or fell. In 2018, 91% of respondents
told us they use authentication to protect the data in their database;
Despite the constant advances being made in cloud computing,
in 2019, 88% reported to use authentication. Similarly, in last year's
the adoption rates of cloud, hybrid, and on-premise solutions
survey, 37% of respondents told us that they sanitized their user in-
seemed to remain rather static year-over-year. When we asked
puts, and in 2019 this number stayed fairly steady, with 38% claim-
survey takers where the data they work with typically resides, 46%
ing to sanitize user inputs. What was potentially alarming, however,
told us on-premise (compared to 47% in 2018), 30% host their
was the drop in respondents using encryption to protect their data.
data in the cloud (compared to 32% in 2018), and 20% use a hybrid
In 2018, 62% of survey takers reported that their organization used
hosting environment (compared to 19% in 2018). Additionally, we
encryption to secure the data they stored in their database; but, this
saw little to no yearly fluctuation in the percentage of databases
year, that number fell to 49%.
reported to be containerized. Given that discussing the percentage
of respondents who containerize a certain percentage of their
databases is difficult to convey in prose, here's a quick breakdown
of this year-over-year trend:

Popular Container
2018 2019
Strategies

0% of databases
53% of respondents 52% of respondents
are containerized

100% of databases
13% of respondents 10% of respondents
are containerized

11-25% of databas-
es are container- 10% of respondents 9% of respondents
ized

SURVEY RESPONSES

Where does the data that you're working with typically reside? How is your time split between working with SQL and NoSQL
databases?

 
 

 
 
   
 

 
  
 

THE DZONE GUIDE TO DATABASES PAGE 6 OF 44


DZONE.COM/GUIDES

The Multiple Facets of Time Series:


From Operations Optimization to
Business Strategy

BY DANIELLA PONTES
SENIOR PRODUCT MARKETING MANAGER AT INFLUXDATA

Time-series data seems to be everywhere lately. Today’s data-driven directions and points back to them. Preemptiveness is the name of
society demands optimization — we all want to be on top of what we the game for IT Ops.
consume and spend, and we want to make the most out of what we
Business visionaries have also contracted the “nothing-to-waste bug”
do and buy.
— but with a sense of opportunity turned on. New entrants who bet on
Lower performance and downtime (of any nature) is not merely seen performance improvements are winning big in commoditized markets.
as an inconvenience, but directly translates to waste: whether of Entire new markets are also being created around collecting measure-
resources, revenue opportunities, brain power, working hours, or a ments. The sensor-ification of our world (and of ourselves) is driven by
perfect sunny day. It all boils down to the only thing that levels us: the our increasing dependence on data to make sound decisions.
unrecoverable loss of time.
The Rise of Purpose-Built Time-Series Databases
Consumers and professionals share the same realization that things Time series is a direct byproduct of the current demand for optimiza-
must be constantly improved because there is no time to waste. tion. As the saying goes, “one can only improve what has been mea-
Application developers know that visitors will not spare another sured.” IT Ops, DevOps, and data-driven professionals are taking this
second on a sluggish website or on an outdated and clumsy ap- wisdom to the limit. Real-time monitoring allows them to act on fresh
plication. Ensuring flawless continuous integration and delivery is data to avoid missing profit opportunities or loss of revenue. A subtle
becoming a priority for DevOps teams and application managers. event not caught by large sampling intervals could mean a disaster
in the making — the root cause of a sequence of degradations and
As for IT operations teams, they know that problematic infrastruc-
eventual downtime.
ture and networks will consume their days with complaints, alerts,
and diagnosing and remediating problems. As the clock ticks, the Real-time monitoring means very fine-grained sampling and obser-
compounding impact of arising issues creates dissatisfaction in all vation of events as they occur. Some latency-sensitive industries,

THE DZONE GUIDE TO DATABASES PAGE 7 OF 44


DZONE.COM/GUIDES

such as finance and gaming, run on nanosecond technologies. Think order to handle the volume of data being generated, collected, trans-
billions of measurements a day. formed, and observed.

“Time series” is everywhere, and it’s generating tons of data. But


There Is a Place for Time Series in Every Organization
what constitutes time-series data, and why should it be treated any
Time-stamped data is on the rise because performance improve-
differently than other types of data?
ments, observable via time series, is what we are willing to pay for

“Time series” are sets of time-stamped measurement data of the — as professionals as well as consumers, no matter the context. The

same monitored object, collected over time. The samples don’t need for flawless performance and increased dependence on data

replace previous ones. They accumulate. This already differen- for visibility permeates all layers and relations.

tiates time series from other types of records, which are usually
Quick interactions and precise actions are the expectation. We use
updated. Time-stamped measurements are stored in series,
observation and prediction modeling to increase our odds to get it
queried in large scans, and plotted in dashboards for visualization
right. But collecting measurements randomly without a clear goal
of behavior over time.
may produce the opposite results: an inability to act objectively.
Finding patterns in an apparently uncorrelated, complex, and
confusing data pool is like a treasure hunt — we know the answer
is in the data, but we cannot see it without the right tools. We want
to extract meaning from time-stamped data using the lens of time,

“Time series” are sets of time- and time-series databases have become the channel to tap into
such data.

stamped measurement data Businesses are already embracing time series beyond the usual
use cases — seeing it as a tool for differentiation and taking a more
of the same monitored object, strategic approach. Purpose-built time-series platforms have been
adopted as part of the following business models:

collected over time. • As a central full-stack monitoring platform for IT Ops aimed
at creating a common practice and format for collecting and
monitoring time-series data across the organization. This
becomes especially important for companies that have mul-
tiple sites, that go through M&As, and who have to deal with
Depending on the type of measurement, the data is kept in raw multiple development and operation centers.
form or aggregated for long-term storage and modeling purposes.
• As a performant time-series storage engine supporting
Data processing can be done upfront by applying real-time stream
applications and solutions that extract their value from time-
processing for alerting. Cross-analytics with other measurements
stamped data. Examples are: resource utilization monitoring
for more insightful indicators is important to consider — it provides
and planning solutions, performance monitoring solutions
additional observation dimensions. Data volume should not be un-
and services, and the whole IoT industry.
derestimated, either. It will continue to grow, and some sets of data
will require long-term retention. Therefore, when seeking time-series
• As part of a competitive strategy to differentiate from estab-
solutions, you should be looking for a platform that can ingest, pro-
lished players. In this case, time series must be integrated into
cess in real time, query, and store time-stamped data at scale.
various layers. It must provide visibility into critical trans-

How time series are collected — and what you do with them — de- actions, including third-party services. From infrastructure

pends on the use case. Keep in mind that they comprise buckets of to business indicators and internal processes, all must be

measurements with which you can devise your unique monitoring optimized to achieve SLAs and SLOs.

recipe for success. Purpose-built design yet again tips the scale
IT OPERATIONS CENTRALIZED MONITORING
toward multiple data types, and therefore, providing more options to
Full-stack monitoring is one of the most common use cases for
compose your monitoring strategy.
time series. IT Ops and DevOps have long been collecting time
In short, time-series databases must be performant databases in series to monitor their infrastructure and applications. What's

THE DZONE GUIDE TO DATABASES PAGE 8 OF 44


DZONE.COM/GUIDES

changed is the complexity, geo-distribution, and speed of modern instrumentation tools. Synthetic transactions, real user monitoring
IT environments. The amount of collected metrics exploded with (RUM), and distributed tracing also fall under APM. Each of these
hybrid deployments and microservice architecture running on giga- subcategories has its own set of functionalities, features, and inter-
bits-per-second networks. Siloed solutions showing partial views of faces as value propositions. Focusing on the unique value proposi-
what is going on are ineffective, yet still generate tons of data that tion aspects would yield quicker results, and would better reduce
needs to be examined. But when you are in a downtime situation risk to market, than would developing everything from scratch.
losing revenue, every minute counts. A survey from 2017-2018 Particularly with regard to the storage tier (where performance and
revealed that about one-quarter of respondents estimated a loss scalability is crucial), it would be better to adopt a platform already
between $300K-400K per hour of downtime. proven to be time-series performant and efficient.

Dealing with an incident goes through phases that could be summarized COMPETITIVE ADVANTAGE VIA SERVICE EXCELLENCE
as: detection, investigation or diagnosis, and remediation. Reducing In one way or another, all companies promise to be more reliable,
detection-to-remediation time has a direct impact on revenue and cheaper, and faster in order to win business. And we all know that,
credibility to the organization. Therefore, organizations are in search in practice, what is delivered is quite far from the promised land.
of efficiency as well as efficacy. Billions of ad hoc metrics in silos will That has been the status quo: customer complaints versus compa-
not provide the answers in a timely manner. It is necessary to be smart nies’ ability to dodge them. However, times are different now, main-
about the data collected and analyzed. ly for online e-commerce and retail companies. Customers are one
click away from landing on the competition’s website. So, no more
Creating a standard and a guideline for full-stack monitoring
empty promises — online shoppers are non-committal, and will only
organization-wide is a path that many IT Ops and DevOps teams
give you another second of dedicated attention if the previous one
are adopting. But in order to implement such a plan, it is necessary
was satisfactory.
to have in place a scalable platform that can accommodate the re-
quirements from various business units and teams. For instance, it’s Performance optimization for competitive advantage relies on
essential to be able to ingest multiple types of time-stamped data, the ability to observe code, systems, networks, applications, and
numeric and non-numeric, as well as provide configurable precision transactions all in the context of customer experience. Purpose-built
and retention policies. time-series platforms empower developers to implement application
environments with integrated full-stack and business KPI monitoring,
One key insight to avoid excessive monitoring data is to use critical
intelligent alerting, and workflows.
business indicators as red alerts, and the rest of the stack data for
quick diagnosis and recovery. Symptoms of a malfunctioning or When starting with a business goal, DevOps, developers, and
underperforming application environment should be detectable on performance engineers adopt time series as a means to an end,
all layers — infrastructure, network, and application — but not alerted tailoring unique measurements to achieve performance bench-
on all outliers. That would lead to a saturation of responses. On the marks to translate into competitive advantage. By taking a
other hand, any degradation of business key performance indicators strategic hands-on approach to instrumentation, and defining and
(KPIs) should be acted on immediately, with access to the full-stack monitoring critical metrics, small players are entering and even
monitoring data for correlation and quick diagnosis. By generating dominating saturated markets.
more intelligent alerts, Ops and DevOps can handle the increasing
complexity and speed of modern IT environments.

TIME-SERIES DATABA SES A S CRITIC AL COMPONENTS


Performant time-series stores are also the choice of solution archi- DANIELLA PONTES is part of the product marketing
team at InfluxData. She started her career in
tects who need a scalable time-series database to handle storage
telecommunications, wireless technology, and global
and processing of time-stamped data. By doing so, they can apply Internet service provisioning. As security became a major concern for
their resources on competitive functionalities and features of the enterprises, she worked on enterprise policy management, SaaS, and
data encryption solutions. Prior to joining InfluxData, she spent some
complete solution.
years living in Japan, Germany, and Brazil. Having worked in various
market segments, from embedded smart antenna technology to
Application performance monitoring (APM) and load testing
Internet security and e-commerce doing product management,
solutions are two good examples of solutions that benefit from partnerships, marketing, and business development, she has a broad
integrating purpose-built time series as their storage tier. APM experience working cross-functionally and with customers and
partners. LinkedIn
grew to encompass other sets of solutions beyond the early code

THE DZONE GUIDE TO DATABASES PAGE 9 OF 44


May 2019 DZONE.COM/GUIDES

Why Time Series matters


for metrics, real-time, and
sensor data

DOWNLOAD THE E-BOOK

“MySQL is not intended for time series data… I can testify it is like
pounding nails with a screwdriver. It’s definitely not what you
want to do in any relational database.”

John Burk, Senior Software Developer

THE DZONE GUIDE TO DATABASES PAGE 10 OF 44


DZONE.COM/GUIDES

We already know that business indicators and full-stack


measurements should come to the same place in order
to be effective and fully utilized. As we ban waste from

Bringing Time Series our processes and resource utilization in our quest for
optimization, centralization of time series data also cuts

Into the Strategy


the intrinsic inefficiencies and inefficacies of gathering
information pieces from multiple siloed sources.

One can say that success is a measure of outperforming


competitors. Therefore, time series planning must move
from the operations tool-box to become an important aspect
Big brother culture is here to stay. Everything is up to of business strategy. For that, an integrated view within the
observation justified by performance, efficiency and efficacy whole organization is necessary. Bringing all together will
improvements. Infrastructure, flows, applications, processes, allow compounding insights and aggregating improvements
services, activities, natural resources...the list grows endlessly. that can be translated into unique competitive advantages.
If you can think of it, it exists as an observable entity.
Therefore, anything that has a purpose to anyone is in line for
time series monitoring.

If anything is qualified to time series, the question is: How


to manage such a vast set of different measurements in an
WRITTEN BY DANIELLA PONTES
effective and efficient way? SENIOR PRODUCT MARKETING MANAGER AT INFLUXDATA.

PARTNER SPOTLIGHT

InfluxData
Act in Time

Category Time Series Data Platform New Release Quarterly release cycles Open Source? Yes

Case Study Strengths


Coupa Software needed to create a custom DevOps • Developer happiness
Monitoring solution for their leading spend • Faster Time to Awesome
management cloud platform. With InfluxData they
• Ease of scaleout and deployment
moved from pure data collection to predictive analytics
and achieved a consistent track record of delivering • Centralized and integrated full stack monitoring

close to 100% uptime SLA across 13 major product


releases and 5 major product module offerings, as well
as solving their data accessibility, aggregation, and
retention challenges. Operational metrics are collected
via Telegraf, stored in InfluxDB, and analyzed by
Kapacitor. They use Grafana for visualization and have Notable Customers
created a custom alerting framework. This has become Website influxdata.com
• Coupa
the foundation to the path of building a system that is • PayPal Twitter @InfluxDB
self-healing and can provide predictive analytics key to • Wayfair Blog influxdata.com/blog
accurate forecasting.

THE DZONE GUIDE TO DATABASES PAGE 11 OF 44


DZONE.COM/GUIDES

Handling Dynamic QUICK VIEW

01. How are graph databases different?

Connected Data in 02. What are the advantages of graph


databases?

Graphs
03. Comparing data to build knowledge
graphs

BY MAX DE MARZI
GRAPH DATABASE EXPERT

Graph databases are predominantly known for two things: fiercely to each other by relationships. These relationships are typed
finding the missing relationships in your data and finding the and directed.
relationships that shouldn't exist at all. These two basic concepts
A user node DRIVES a car node, at least today, but in the near
make up the basis for two of the most popular use cases, which are
future, the car node may be the one doing the driving. A node can
recommendations and fraud detection.
have zero or millions of relationships of one type or thousands
In recommendations, we are looking for a person you should know of types to a single other node or to every node in the graph; it's
because you both know some of the same people, a product you completely dynamic.
should buy because shoppers with similar behavior have bought
Since there are no columns in graph databases, each node can
that product, a movie you should watch because people who share a
have completely different properties from another node even if
similar taste in movies liked that movie, and so on.
they are both the same kind of node. One user node may have
In fraud, we are looking at how a back-office insurance agent is just a username and an email, while the other may also have their
friends with Mark, who is married to Isabelle, who is a cousin of height, weight, date of birth, etc. One may have a string value for the
the person involved in the insurance claim, or complex fraud rings nickname property, and the other may have an array of strings.
in money laundering schemes, or over-connected synthetic data
Lastly the relationships themselves may have properties, typically
that looks legitimate at first glance but forms long chains of shared
storing data like the strength of the relationships, when the
attributes when analyzed by its relationships.
relationship came to be, and if it ended. They are very different from
There are many other use cases, but before we go there, we need to traditional relational databases and must not be treated the same way.
understand what a graph database is and what it is not.
For starters, you cannot talk to them using SQL. Remember, SQL
Graph databases do not have tables, columns, rows, or foreign keys. stands for Structured Query Language, where the schema of the
All they have is these small objects floating around in space, holding data is painstakingly argued over by DBAs in some dark corner of the

THE DZONE GUIDE TO DATABASES PAGE 12 OF 44


DZONE.COM/GUIDES

office. Is this name property a varchar(32) or varchar(48), or is it an "keep going," making recursive and variable path queries a breeze.
nvarchar(48)? Is this weight property a smallint, decimal, real, or Since every node knows exactly what it is connected to, you don't
do we fall back to varchar(32)? How do the different tables connect? have to spell out the joins in a horrifically long and error-prone SQL
Which foreign keys point which ways? How many join tables do we query. Being able to simply ask how these two things are connected,
need? Is it "one to many" or "many to many?" regardless of what is between them and the number of hops it takes,
is magic.
Those questions don't happen in graph databases since properties
and relationships are dynamic. Dealing with mixed data types across Magic is exactly what large enterprises need to make sense of
multiple databases is a huge pain. Before starting any master data their data. Often due to legacy systems, mergers, and acquisitions,
management or data reconciliation project, the DBAs must design a enterprises end up with many siloed databases, each with their own
monster of an entity relationship diagram (ERD) the size of an entire version of the truth.
wall with foreign keys crisscrossing tables in a giant hairball. Few
But what happens when the data in the billing database and the
folks outside the DBA team can understand just what is going on.
operational database don't match? What do you do when you have
But when you show a graph representation of the data to business low confidence in your data? How do you make decisions using
stakeholders, they begin to understand their data. It's how they analytics and machine learning when you don't even know the data
often think of their business in terms they understand, not database the queries and models were built on are true?
tech jargon. Engineers and business can now talk about data in
To deal with these problems, companies are importing these siloed
a language they both understand. Well, almost. To query graph
datasets into multiple graphs and comparing them to one another.
databases like Neo4j, Memgraph, RedisGraph, SAP HANA, and others,
They are looking for data missing from one system that exists in
we still need to use dynamic data languages like Cypher.
another and data incorrectly related in one system versus the others.
Graph databases are being used to solve problems in a way that They are giving data seen multiple times a high confidence score, and
have never been solved before, not because people who use them sending data disputed amongst systems to analysts for reconciliation.
are any smarter, but because they are applying a different tool to
The goal of these types of projects is twofold: One goal is to prevent
the problem. If you hit a screw with a hammer hard enough over
and automate fixes to the source systems, and the other is to build
and over, it will eventually go into the wood, but sometimes turning
the newly popular knowledge graph, which then becomes the
it with a screwdriver will be a better solution. Relational databases
golden record of data and a solid foundation for analytics, enterprise
and graph databases are very different tools; you can think of Cypher
data APIs, and machine learning projects.
like a screwdriver in a land of SQL hammers.

If all you use is relational databases, you owe it to yourself to try


The important characteristics to remember are that every node
graph databases. It's like only using languages like Java and C++
"knows" what it is connected to; the "joins" are created just once
when some problems are better suited to languages like Python
on the creation of the relationship; and the relationships of each
and Ruby. The flexibility and expressiveness of graph databases
individual node are grouped by type, which allows queries to touch
has made it the fastest growing category in popularity these last
very small amounts of data to get the answer.
few years.
Imagine a user node with 200 friends, 500 posts, 3000 likes, etc.
So, before you spin up yet another relational database to try to
Getting the 200 friends of that user node will cost 200 traversals
handle the ever-increasing complexity of information, try a graph
(following a pointer from a node to a relationship to another node).
database, and you will find new ways to understand your data and
Regardless of how many user nodes exist; regardless of how many
add a powerful tool to your toolbelt.
friendship relationships exist; regardless of how many posts, likes,
or any other data exists; the query time is constant. The combined
advantages of a dynamic data query language —- "pre-joined"
MAX DE MARZI is a graph database expert. His graph
relationships, "relationship-type grouping" per node, and dynamic power level is over 9000, but he is terrible at just about
properties -- are not available in other types of databases. everything else. Over the past six years, he has written
150 blog posts on graph use cases on his blog at maxdemarzi.com
Using Cypher, you are able to build a single query that can find where he waxes poetically and obsessively over these collections of
points and lines. If you have a question about graphs, don’t be shy…
paths of varying lengths while traversing many different types of
connect. LinkedIn Twitter
nodes and relationships. The * ("star") character in Cypher means

THE DZONE GUIDE TO DATABASES PAGE 13 OF 44


DZONE.COM/GUIDES

QUICK VIEW

01. The World Food Programme

AgensGraph: A Graph analyzes the many factors behind food


poverty in order to deliver timely and
cost-effective help to the vulnerable.

DB Helping to Solve 02. A graph database, AgensGraph,


is used by the WFP to map multiple
relationships within a repository of
survey data.

World Hunger 03. AgensGraph is a multi-model graph


DB based on PostgreSQL that allows
developers to integrate relational and
graph data models.

BY JO STICHBURY
FREELANCE TECHNICAL WRITER

In this article, we describe how data scientists in one of the world's To determine regional food security, the VAM unit gathers various data
largest humanitarian organizations are using a graph database, Agens- including economic analysis, thematic analysis, and geospatial analysis.
Graph, to analyze rich datasets and determine food security within However, another strand of their work is to determine the characteris-
vulnerable regions. tics of an area by carrying out surveys through face-to-face or over-
the-phone interviews. A single survey may have over 400 questions,
Introduction and hundreds of thousands of local households and food suppliers are
In the most extreme situations, such as war, flood, hurricane, earth- interviewed each year.
quake, or fire, there are many vulnerable people left in need of critical
nutrition to prevent serious health problems or even death. As part of The surveys are carefully designed and highly structured, but the

the United Nations, the World Food Programme (WFP) assists such peo- amount of data is large, diverse, and extremely complex because:

ple around the world. Every day, too many men, women, and children
• It is impossible to create one single universal standard that
lack a daily nutritious meal, according to the WFP --- which estimates
encompasses all possible health, livelihood, and socioeconomic
that  821 million people still go to bed on an empty stomach each night,
variables for the many different regions at risk.
and that one in three members of the global population suffer from
some form of malnutrition. • A multitude of relationships exists among the variables them-
selves. As Gaurav Singhal, the lead data scientist at UN WFP,
Assisting 91.4 million people in around 83 countries each year, the WFP
puts it, "Food is not just something you eat. In rural areas, food
has, on any given day, 5,000 trucks, 20 ships and 92 planes on the move.
is something you grow, food is something you transport and
The logistics alone are significant, and it is essential that the WFP are
something you sell."
prepared to deliver the most effective response to every crisis. The Vul-
nerability Analysis and Mapping (VAM) unit within the WFP consists of 200 Diverse Data Demystified
food security analysts based in more than 70 countries such as Somalia, Let's consider a specific example of how the VAM unit works with the
South Sudan, and Yemen. Their mission is to provide needed help in the data they collect to infer new information and determine the most at-
most timely and cost-effective way possible. Many factors are behind food risk areas and people. This example is taken, with kind permission, from
poverty. Not only does the team need to understand the level of food inse- Gaurav Singhal's presentation given at the Postgres Conference Silicon
curity in a particular region, but why that insecurity occurs. Valley in 2018.

THE DZONE GUIDE TO DATABASES PAGE 14 OF 44


DZONE.COM/GUIDES

The VAM unit needed to consolidate their data about food consumption
and expenditure. This data was collected by surveying different regions
about diverse food groups, different units of measurement, varying cur-
rencies, and multiple food sources (such as whether food is purchased
from a market, grown at home, or whether it comes via government dis-
tribution). How to collate this vital data into a single, useful repository?

The team found a solution by using a graph database to record the


relationships between the variables, infer facts, and organize the data.
Working with AgensGraph from Bitnine Global, the VAM data scien-
tists took the complex dataset and made sense of the information it Defined Edge Types and Properties
contained. Singhal says in the presentation that "...through the use of
inferred facts, we were able to easily aggregate and coalesce highly
disaggregated raw data along multiple dimensions."

In the era of Big Data, many organizations are finding that traditional
RDBMS cannot keep pace with the variety of data formats and their
rapidly changing nature. As just one example of the NoSQL database
revolution, graph databases are increasingly proving themselves as a
powerful technology in the data management community today. Gart-
ner predicted in 2015 that at least 70% of leading companies would pilot
a graph database project by the end of 2018.
E X AMPLE DATA
AgensGraph is a multi-model graph database based on PostgreSQL that Even in this simple example, it is clear that the graph database queries

allows developers to integrate the relational and graph data model and inference allowed the UN WFP to find a way to:

into one database. SQL and Cypher can be used in the same query, • Identify the households with the most pressing malnutrition issues
which allows developers to build upon the power of a graph database • Prioritize distribution of items that aligned with food consump-
alongside a legacy SQL environment for data management. PostgreSQL tion pattern analysis
users can easily migrate relational data into the graph with support from • Plan appropriate nutrition education programs for long-term
PostgreSQL extensions. AgensGraph also supports features such as ACID sustainability
transactions, MVCC, triggers, and JSON, and provides a browser to visu-
If you're interested in seeing AgensGraph in more detail, here is a video
alize the graph contents, forming an intuitive model of the real world.
that illustrates how how the WFP work with the database.

One of the examples given in the presentation is a study of the dietary


For Further Information
habits of a population receiving WFP cash handouts to buy food. The
The priority of the WFP is "Zero Hunger," which is a pledge to end hun-
graph database can be queried to establish the families that are buying
ger, achieve food security, improve nutrition, and promote sustainable
and consuming non-essential sugary food 7 days per week, but only
agriculture by 2030. Despite progress so far, there is still a long way to
consuming essential Vitamin A-containing food once per week. By es-
go. To achieve this goal, governments, citizens, civilian organizations,
tablishing the regions and characteristics of families lacking education
and the private sector must collaborate to invest, innovate, and create
about the importance of Vitamin A, the WFP can focus their efforts to
lasting solutions. To accelerate the achievement, they work with various
improve information. They may reduce cash handouts and instead intro-
technologies, of which Bitnine's AgensGraph is one. AgensGraph Com-
duce vouchers to be exchanged for those foodstuffs that contain Vitamin munity Edition is licensed under AGPL 3.0, while the Enterprise Edition
A, for example, in certain areas. is licensed under a commercial license.

The following graphic illustrates the concept modeling behind this


JO STICHBURY is a technology writer and podcaster
example: with over 20 years’ experience in the software
industry, including 8 years of low-level mobile
• A household, which consumes a food group (in this case, meat) development. Jo typically writes about machine intelligence, high
performance computing, electric and driverless vehicles, and
• A food group (meat), and the nutrition it contains
renewable energy. She podcasts about developer marketing at
• A household that expends money upon such a food group developermarketingpodcast.com. LinkedIn Twitter

THE DZONE GUIDE TO DATABASES PAGE 15 OF 44


DZONE.COM/GUIDES

COMPARE COUCHBASE
VS. MONGODB
TM

couchbase.com/CBvsMongo

THE DZONE GUIDE TO DATABASES PAGE 16 OF 44


DZONE.COM/GUIDES

And as a cloud-native distributed database, we accelerate the


innovation of enterprise-ready applications.

NoSQL for Highly Couchbase is also the only database that combines the best of
NoSQL with the power and familiarity of SQL in a single elegant
Scalable Business-Critical platform spanning from any cloud to the edge. We give developers
all the tools to develop web, mobile, and IoT applications with

Applications in Any Cloud unmatched agility. And our comprehensive SQL-compatible query
language provides ANSI joins, nest, unnest, group, sort, and other
data manipulation functions.

Make no mistake, the requirements of today’s business-critical With our cloud-native architecture, businesses can easily deploy
applications far exceed the intended capability of traditional to any cloud – public, private, or multi-cloud. Developers can build

relational databases. Modern applications must process a growing and test applications in the exact environment where they’ll be

variety, velocity, and volume of data. And they have to rapidly deployed. And our Kubernetes operator makes it simple to run and

adapt to constantly evolving technologies, run in distributed manage Couchbase as a service in any cloud.

environments, and deliver submillisecond response times to


With no need to disrupt your development or operational
millions of global users.
environments, you can be confident your applications will deliver
uncompromised performance and availability at any scale and any
That’s why Couchbase – unlike other NoSQL databases – offers
location – now and in the future.
robust capabilities for business-critical enterprise applications on a
highly scalable and available NoSQL platform. Our open standards
WRITTEN BY RAVI MAYURAM
simplify the transition from mainframe and relational databases. SVP OF ENGINEERING & CTO, COUCHBASE

PARTNER SPOTLIGHT

Couchbase NoSQL Database


Enterprise-class, multi-cloud NoSQL database

Category NoSQL Database New Release Annual Open Source? Yes

Case Study Strengths


Couchbase's mission is to be the platform that • Built for change at scale – Support millions of interactions and easily
accelerates application innovation. To make this respond to ever-changing business requirements
possible, Couchbase created an enterprise-class, • Memory-first speed – Rich data access, in-memory replication,
multi-cloud NoSQL database architected on top of 99.999% availability
an open source foundation. Couchbase is the only
• Security across the stack – Encrypt, audit, protect, and secure your
database that combines the best of NoSQL with the
data anywhere
power and familiarity of SQL, all in a single elegant
• Cloud-native, available anywhere – Cross datacenter replication (XDCR)
platform spanning from any cloud to the edge.
in any cloud
Couchbase has become pervasive in our everyday • Fully synced and offline-first – Real-time data sync even when users
lives; our customers include industry leaders are offline
Amadeus, AT&T, BD (Becton, Dickinson and Company),
Notable Customers
Carrefour, Comcast, Disney, DreamWorks Animation, Website couchbase.com
• Amadeus • Tesco
eBay, Marriott, Neiman Marcus, Tesco, Tommy Hilfiger,
• Comcast • United Twitter @couchbase
United, Verizon, Wells Fargo, as well as hundreds of
other household names. For more information, visit • LinkedIn
Blog blog.couchbase.com
couchbase.com.

THE DZONE GUIDE TO DATABASES PAGE 17 OF 44


DZONE.COM/GUIDES

Performance Tuning and QUICK VIEW

01. Understanding Your Role

Monitoring Traditionally Falls to 02. Discover Execution Plans

the Database Administrator 03. Leverage Query Store

BY MONICA RATHBUN
MICROSOFT MVP FOR DATA PLATFORM AND MICROSOFT CERTIFIED SOLUTIONS EXPERT

SQL Server database performance tuning and monitoring tradition- use SET SHOWPLAN_ALL prior to executing your code. You have two plan
ally falls to the Database Administrator (DBA), but in reality, good options: Estimated Plan (CTRL +LM) and Actual (CTRL + M). Estimated
performance starts as applications are being developed. If you are a plans will give you what the query optimizer will likely do and what the
developer, it is critical to know how your application affects database estimated rows will be without actually executing the T-SQL or batch at
performance. You must also take steps to continually monitor and tune compile time. In almost every case, the "shape" of your execution plan
performance. Knowing what tools are available is a good way to start. (the query operators that the optimizer chooses) will be the same in an
actual and estimated plan. In performance tuning, getting an estimated

Execution Plans plan is a good place to start; however, we will focus on the actual execu-

The most important element in understanding database performance is tion plan. The actual execution plan will depict exactly which operators

the query execution plan. The execution plan contains all the informa- were utilized to execute the query.

tion that SQL Server's Query Optimizer uses to execute the query. The
Query Optimizer is responsible for road-mapping the most efficient plan
to execute the provided query, using the lowest resource consumption.
This execution plan is a best guess based on the column statistics that Figure 1

SQL Server maintains on your data. Note: it is very important to keep the
USE AdventureWorks2016CTP3;
statistics of the database tables and indexes up to date, in order to be able
GO
to create optimal execution plans. Once the execution plan is generated,
SET SHOWPLAN_ALL ON;
it contains runtime information, usage metrics, and warnings such as GO
missing indexes, implicit conversions, and tempdb spills (think of this as
your query paging to the file system). Getting a look into what your code is Here is a query that looks at all discounted products no longer active

doing when it reaches the database is vital to performance tuning. with a unit price greater than 100. One of the key things in the plan to
look at is the query cost. The CPU, IO, and memory consumption are
To view execution plans in SQL Server, you must be granted SHOW- some of the factors that SQL Server uses in determining the cost of the
PLAN permissions for all databases in that query. Once you have the query. The cost is an abstract value. You can use this number to compare
proper permission, it's easy to access the execution plan. In SQL Server between two queries --- but because it's an estimate, it means you can't
Management Studio, click on the icon highlighted in red in Figure 1, or count on it as an accurate measure of performance. Still, it is important

THE DZONE GUIDE TO DATABASES PAGE 18 OF 44


DZONE.COM/GUIDES

as a pointer to what may need to be tuned or changed if the cost is high pages that have to be read from memory or disk. The lower the number,
relative to the overall query. the faster the read.

Before running the query, take note of the two SET statements. These
are two settings you'll want to begin using. SET STATISTICS IO displays
statistics on the amount of page activity generated by the query. It gives
you important details, such as page logical reads, physical reads, scans,
read aheads, and lob reads, both physical and logical. SET STATISTICS
TIME displays the amount of time needed to parse, compile, and execute

each statement in the query. The output shows the time in milliseconds
for each operation to complete. This allows you to really see, in num-
bers, the differences one code change can make.

USE AdventureWorks2016CTP3;
GO Figure 3
SET STATISTICS IO ON
GO You will also notice in the execution plan SQL Server told us we are miss-
SET STATISTICS TIME ON; ing an index, and it gives us the actual index script we need to create. If
GO
you right-click on the missing index and choose "Index Details," it will
SELECT Name, [Description],[UnitPrice],[UnitPriceDis-
open a new window with the script you need to run. Be sure to name
count]
your index with proper naming conventions before you run it.
FROM [Production].[Product] p
INNER JOIN [Production].[ProductDescription] pd USE [AdventureWorks2016CTP3]
ON p.ProductID = pd.[ProductDescriptionID] GO
INNER JOIN [Sales].[SalesOrderDetail] s CREATE NONCLUSTERED INDEX IDX_UnitPrice_UnitPriceDis-
ON p.[ProductID]=s.ProductID count
WHERE SellEndDate is not NULL ON [Sales].[SalesOrderDetail] ([UnitPrice],[UnitPrice-
AND UnitPrice>100.00 Discount])
AND UnitPriceDiscount<>0 INCLUDE ([ProductID])
ORDER BY [Name],[UnitPrice] GO
GO
Here is the improved plan:
Notice the 85% cost against SalesOrderDetailTable. By adding a Miss-
ing Index to this, the plan tells us we can estimate an 83% performance
gain. In addition, note the SCAN on the Product.ProductID. For a query
returning larger set rows, we always want the index to SEEK to the value,
not SCAN. There are times when SQL Server will decide to do a scan if the
table is small enough rather than try to seek an index. SEEK means that
instead of reading the entire index, SQL Server is only reading the pages
it needs to return the rows meeting your predicate (such as the WHERE
clause, or what is being filtered out).

Figure 2

Now let's take a look closer at what SET STATISTICS IO and TIME gave
us. Looking at the output, we see the table had 1266 logical reads and a
CPU time of 31 ms. We will use these numbers to compare performance
improvements. These numbers are important because there are 8,000 Figure 4

THE DZONE GUIDE TO DATABASES PAGE 19 OF 44


DZONE.COM/GUIDES

Now, looking at the plans, you can see the difference. We have added a are put in place --- so you can directly track those impacts. It can also
SEEK on Product.ProductID, and we can see the plan is now using our help you identify and tune top resource-consuming queries. Hundreds,
new index for SalesOrderDetail. Its cost is only 29%. This has effective- thousands, and even millions of queries are processed by a server in
ly given us a more efficient plan. about a second. Digging through all of them to identify any issues can
be very difficult. Query Store helps with that by doing the analysis for
After you run the execution plan, you have the ability to save it by just
you. In the past, to perform this kind of analysis, you had to access SQL
right-clicking in the execution plan area and choosing Save Execution
Server's caches in real time to capture the data you needed, which made
Plan as shown below in Figure 5.
after-the-fact tuning impossible. Fixing those performance issues after
identifying them will depend on the issue, but Query Store makes is
easy to find them.

Allowing insight as to what code could benefit from tuning, it tracks the
number of times a query is called as well as which queries are waiting on
server resources. It provides a way to analyze usage patterns and identify
high resource consuming queries. It allows you, as developers, to monitor
code you have written and to address any issues that may arise.

Let's look at an example of how a simple code change can regress a


query and how Query Store can help you identify it.

Figure 5 First you need to enable the feature. Use the latest version of SQL Server
Management Studio to see the most recent Query Store options and
Additionally, SQL Server gives you the option to compare different
analysis available. If you are working on a lower version, you will not see
execution plans. You'll notice in Figure 5 there is a "Compare Show-
all the reports listed.
plan" option. This allows you to compare the active plan with a saved
plan. The top plan is the active query and the lower will be the saved USING THE GUI
query chosen for comparison. This information can be very useful and In SQL Server Management Studio Object Explorer, right-click on the da-
should be added to your testing check lists. When comparing the plans, tabase, click "Properties," and then select "Query Store." In Operational
Management Studio highlights similar query operators by shading them Mode, choose "Read Write." To get started, change the Query Store
with the same color. The best place to start looking for differences is in Capture Mode to "Auto" (in SQL Server 2016 and 2017 the default for this
the properties pane, which you can see on the right in Figure 6 --- where is "All"), which reduces capture overhead on your server.
there's a "not-equal" sign, you can easily see the difference in costing
these two plans.

Figure 6

QUERY STORE
Next let's look at Query Store, a feature that was introduced in SQL Serv-
er 2016 and iterates on execution plans for the developer. Query Store
captures execution plan history and runtime statistics, and it retains Figure 7
them for review and trending. It is a great tool for finding regression over
a time period (last hour, day, week, and so on), as changes to your code You can also enable using T-SQL:

THE DZONE GUIDE TO DATABASES PAGE 20 OF 44


DZONE.COM/GUIDES

USE [master] tion. Clicking on each bubble will show you their plan in the bottom part
GO of the screen. If you find better plans available for a query --- in cases
ALTER DATABASE [AdventureWorks2016CTP3] SET QUERY_STORE such as parameter sniffing --- you can "force" a plan, which will require
(OPERATION_MODE = READ_WRITE, QUERY_CAPTURE_MODE = AUTO)
the Optimizer to use the selected plan.
GO

As queries start to run, Query Store will start to keep plans and analyze
them for overall query performance.

Figure 9

Another report I really like to view is Top Resource Consuming Queries.


It's ideal for seeing the changes in a plan after index creation, new hard-
Figure 8 ware, and new application versions. A good use case for this is when
you upgrade to a newer version of SQL Server and you need to test your
Let's look at the same query we used earlier and see its regression. Here
code for performance regressions. You can simply enable Query Store,
we are looking at the Regressed Queries report, based upon the dura-
run your queries in the lower compatibility mode, and then run them
tion metric. In this case, regression means performance has degraded
again in the new upgraded compatibility mode. Query Store will clearly
within a period of time (over the last week, day, hour, or so on). The que-
identify regression directly related to the upgrade.
ry now runs worse than before. Navigating through the window, you will
find buttons to view the query, force and unforce a query plan, options
Summary
for grid and chart formats, and a way to compare selected plans.
Don't just leave the performance tuning of your code to your SQL
Query Store lets you examine regressed queries bases on various metrics. Server Database Administrators. As a developer, it is imperative that
you continually monitor your code for performance regression and
• Duration improvements. Using tools like execution plans and Query Store that are
• CPU time free and readily available is a great way to accomplish that. I advise you
• Logical reads to read more on both topics. There are a lot of great resources available,
• Logical writes so be sure to check them out.
• Physical reads
• CLR time
• DOP
• Memory consumption
• Row counts MONICA RATHBUN lives in Virginia and is a Microsoft MVP
• Log memory used for Data Platform and Microsoft Certified Solutions Expert.
She has nearly two decades of experience working with
• TempDB memory used
a wide variety of database platforms with a focus on SQL Server
• Wait times and the Microsoft Data Platform. She is a frequent speaker at IT
industry conferences on topics including performance tuning and
Once again, if you are on a lower version of Query Store, some of these configuration management. She is the Leader of the Hampton Roads
may not be available to you. SQL Server User Group and a Mid‐Atlantic PASS Regional Mentor.
She is passionate about SQL Server and the SQL Server community,
Note there are two execution plans indicated on the right chart for the doing anything she can to give back. Monica can always be found on
Twitter (@sqlespresso) handing out helpful tips. You can find Monica
query --- plan IDs 783 and 791. We can clearly see in the chart that plan
blogging at sqlespresso.com. LinkedIn Twitter
791 is the better-performing plan based on the duration of each execu-

THE DZONE GUIDE TO DATABASES PAGE 21 OF 44


DZONE.COM/GUIDES

Your devops
will love
to run Redis

Your developers
will love
to build with Redis

You will love Redis Enterprise

GET STARTED FREE


THE DZONE GUIDE TO DATABASES PAGE 22 OF 44
DZONE.COM/GUIDES

Redis Enterprise takes the simplicity, elegance, and performance of


open-source Redis and transforms it into a highly available, linearly-
scalable, database with remarkable ease of use around clustering.

Redis Enterprise
Organizations typically use Redis Enterprise because of its shared-
nothing clustering architecture enabling horizontal and vertical
scaling with the press of a button. In addition, Redis Enterprise can
be deployed in a geo-distributed architecture, preserving data
consistency, by leveraging Conflict-free Replicated Data Types (CRDT).

Customers managing massive data sets can also take advantage


Today's applications deal with two kinds of data: transactional of Redis on Flash and Intel’s persistent memory technology that
data and instant data. Transactional data is held in a traditional preserves performance while providing significant cost savings.
disk-based database; instant data resides in memory to provide
the best possible application performance. Over the years, Redis,
In addition, Redis Enterprise provides a collection of modules that

consistently voted the most-loved database, has become the de


turns Redis into a multi-model database. With Redis Enterprise,

facto standard for instant data.


organizations can use Graph, Streams, Search, JSON, Time Series
and also serve TensorFlow AI models, all from within the same
Redis has commonly been used for caching, but due to its flexible database. Redis Enterprise can be deployed on-premise, on a
data structures — namely strings, lists, sets, sorted sets, hashes, cloud platform of your choice, and in hybrid and multi-cloud
geo, bitmaps, hyperloglogs, and streams — more and more deployment architectures.
organizations are using Redis for an increasingly broad set of use
cases, including pub-sub messaging, asynchronous job queues,
WRITTEN BY MADHUKAR KUMAR
and session management. VP TECHNICAL AND PRODUCT MARKETING, REDISLABS

PARTNER SPOTLIGHT

In-Memory Database Platform


A major release each year, and minor releases every few months

Category In-memory database platform New Release A major release each year, and Open Source? Yes
minor releases every few months

Case Study Strengths


Whitepages’s innovative people search tools are powered by more • High-performance with sub-millisecond latency, supporting over a
than five billion and growing global identity records. On average, million writes per second.
the proprietary Identity GraphTM solution makes 150,000 to • Redis Enterprise modules adds multi-model capabilities for manag-
200,000 calls per second to the three-terabyte database and can ing different data models like Graph, Time Series, Search, JSON and
even surpass this number during peak hours. Streams.

Whitepages chose Redis Enterprise to handle time-sensitive • Shared-nothing cluster architecture with always-on availability and
customer transactions and maintain sub-hundred millisecond effortless scaling.
latency of its application and provide a consistent end-user
• High throughput by running multiple Redis instances on multiple cores.
experience of its digital identity verification services for
businesses and consumers. With Redis on Flash, Whitepages
saves up to 70% in infrastructure costs while delivering single
Notable Customers
digit latency.
• Dell • TD Bank Website redislabs.com
As Whitepages expands its identity dataset beyond North America, • Staples • Home Depot
• Intuit • Mastercard Twitter @redislabs
it increasingly relies on Redis on Flash, and enterprise-grade Redis
from Redis Labs, to keep operational costs low and performance • Vodafone • Atlassian Blog redislabs.com/blog/
high. redislabs.com/customers/whitepages/ • Microsoft

THE DZONE GUIDE TO DATABASES PAGE 23 OF 44


DZONE.COM/GUIDES

QUICK VIEW

Cross-Platform
01. Azure commands can be used
to deploy everything from Virtual
Machines (VMs), Database Servers, and

Deployments in Azure
Virtual Networks and Services.

02. We'll look at a use case for


deploying an Oracle VM in Azure.

With BASH 03. Deployments can be performed at


an enterprise level without demanding
intense re-education of technical
personnel.

BY KELLYN POT’VIN-GORMAN
DATA PLATFORM ARCHITECT, MICROSOFT

When you think deployment automation, you may think cloud, been deployed. Several scripts in different languages are provided
but do you think BASH shell scripting with automation in Azure? along with the templates — PowerShell, Ruby, .NET, and a tab
Most may not even realize it's a common thing. I've been work- titled CLI. The CLI portion is just a BASH script.
ing heavily in this area for the last year and it's been incredibly
valuable and the cost savings are evident, as any cloud migration
can be. The customers I work with, mostly in Higher Education, are
working hard to embrace the cloud, but they may not have the ex-
pertise, the resources, or the time to gain the technical knowledge If you have any doubts about this, the dead give-away is that when
needed to deploy to the cloud. Many in their organizations feel you click on the CLI tab in the template output, the first line in the
that if they could just get the infrastructure deployed, they could script sets the shell:
easily get up to speed to manage the resources, but the new skills
they imagine they'll have to add to their team often creates an
overwhelming roadblock.

I felt a similar pressure this last year when I started at Microsoft


Why the tab is titled CLI, (Command Line Interface) instead of
and knew I would need to get up to speed very quickly. I'd worked
BASH is beyond me. The PowerShell tab is titled correctly, but I
in the Oracle and AWS clouds previously but had less experience
can only guess that when they designed the portal, the scripting
with Azure. I was significantly surprised when I logged into the
language they used with the CLI hadn't been set yet and the tab
Azure portal and discovered, to reverse engineer what was built
title was left over from an earlier release decision.
in the Azure portal, there were templates, also referred to as ARM
templates, which could be used to recreate much of what had My personal challenge with the ARM templates is that they are

THE DZONE GUIDE TO DATABASES PAGE 24 OF 44


DZONE.COM/GUIDES

highly dependent on complex JSON and they require the user to 5. Create the VM with the appropriate OS and database installa-
learn this along with all the other areas of the Azure cloud. As I re- tion chosen as part of our values that were entered earlier in
searched how to make the most of what I already knew, I learned the interactive part of the script.

Azure has an extensive library of AZ commands that can be used 6. Create the script to:
from the command line. PowerShell has its own built-in Azure
–– start the listener.
commands, but you can also use PowerShell scripts with the AZ
–– start the instance.
commands in a manner similar to what I will demonstrate in this
–– create the database.
article's examples.
–– check the status of the database.
The format for these commands is: –– log information on the VM about our database environment.

az <command> --<parameter> <value> --<parameter> 7. Once the script is written to disk, run a remote session to the
<parameter/template file> VM and execute the script to perform the actions.

There are two reasons you would move to the AZ commands vs. This may sound like a lot to perform, but if you use a set scripting

the JSON from the templates: format, the rest is rather simple. Most of the script can be recycled
for future deployments of other VMs, too. BASH grants us the abil-
1. Dynamic environment variable generation from the ity to perform this type of work and many DBAs may have already
script down. been doing this at some level, just not in the cloud.
2. Less code to manage.
The new skill that will need to be added for the Oracle DBA is
The JSON in the templates export every minute detail of the de- learning the AZ commands. If you're a PowerShell DBA, then
ployment. It is an excellent artifact, but less pliable or manageable running through the steps for the AZ commands and database
for deployments, granting me more power using the AZ com- creation will be necessary. Luckily, there is impressive reference
mands with little to no JSON template/parameter files. material and regular new feature releases to help make this easier
for all of us.
Azure commands can be used to deploy everything from Virtual
Machines (VMs), Database Servers, and Virtual Networks and
Step by Step
Services. When paired with shell scripts, (BASH or PowerShell) the
All my scripts begin the same way: they have a section of com-
automation can incorporate logical deployments of objects and
ments describing the script, along with its purpose. The format
data. With the addition of Azure DevOps CLI, advanced analytics
of the steps in the script then proceeds the same way each time.
features such as Azure Data Factory pipelines and dataflows can
There's a very good reason for this:
also be included.

1. Ease of readability.
An Oracle VM on Azure
2. Level of maturity.
To help demonstrate this, we'll look at a use case for deploying an
Oracle VM in Azure. Although this is a cross-platform deployment I won't include all the scripts here, just the main areas that are of
of a Linux VM and Oracle on Microsoft Azure, it's a simple task importance. The full script can be found, along with other resource
using BASH scripting and AZ commands. deployments, in my GitHub repository.

Our script will perform the following: To begin on our Azure scripting journey, I need to capture the
following values to populate the environment variables as we
1. Designate the arguments that need to be input to fulfill the
requirements for the script to run successfully. move forward:

2. Set the values for the environment variables for the user's usage() { echo "usage: $0 -g <groupname> -s <sku> -v
deployment. <version> -o <oraname> -sz <size> -s -l <zone> -gn <gdb>
3. Execute the commands to deploy the resource group (sim- -sp <syspassword> -stp <systempassword> -pp <pdbpass-
plified management of resources by containing them to a word> " 1>&2; exit 1; }
single grouping).
declare groupname=""
4. Create the virtual network as part of the VM on which the
declare sku=""
database will reside.

THE DZONE GUIDE TO DATABASES PAGE 25 OF 44


DZONE.COM/GUIDES

declare version="" Ask Questions, Make Suggestions, or Give Options


declare orname=""
Depending on the type of value that needs to be specified, I then
decalre size=""
request the values for each environment variable, first asking the
decalre adminuser=""
user for the information with an example or posing the choices from
declare zone=""
declare gdb="" the Azure catalog (any statement inside the quotes, after the ECHO):
declare syspassword=""
shift $((OPTIND-1))
declare systempasword=""
declare pdbpassword="" if [[ -z "$groupname" ]]; then
echo "Wheat is the name for the resource
The values above correspond to the following: group to create the deployment in? Example: ORA_GRP "
echo "Enter your resource groupname:"
• Groupname: Used for the Resource Group in Azure that read groupname
will group all the resources that we'll deploy for ease of [[ "${groupname:?}" ]]
management. fi

• SKU: Used to call values from the Azure catalog for Azure # Create the latest version of Oracle VM installations
SKU to be use for the sizing of the VM. By querying the Azure availavlbe and push to a file

catalog, our script has longevity vs. static values set in the az vm image list --offer Oracle --all --publisher Oracle
>db.lst
script.

• Version: Also used as part of the calls to the Azure Catalog. if [[ -z "$sku" ]]; then
echo "Here's the installation version, from
This will be for the database version.
12c through 18c available for Oracle: "
• Oraname: Name for the Oracle VM server.
cat db.lst | grep sku | awk '{print $2}'| tr -d
• Size: From the Azure catalog, size of the VM server. \"\,
echo "Enter the version you'd like to in-
• Adminuser: This is the VM administrator's name.
stall, the numbering convention must be exact, feel free
• Zone: Location Zone in Azure (when choosing a zone, I like to copy from the list and past here:"
to list out the available zones for the user). read sku
[[ "${sku:?}" ]]
• Gdb: Global Database Name.
fi

• SysPassword: This is the password for the Oracle SYS User.


For the example above, note the following command:
• SystemPassword: Password for the Oracle SYSTEM User.
az vm image list -offer Oracle --all --publisher Oracle
• PDB Password: Admin Password for the Pluggable Data- >db.lst
base. The versions available in the Azure catalog have the
option for multi-tenant, so I'm capturing the value to be This command queries the catalog and pulls the VM list for any
used if needed. Oracle images, filtered by those published by Oracle, and writes
them to a file. The output of this file is formatted in JSON and
I then ask my script to run through each one of these variables to
looks something like this:
get the values from the user, but I do this as a part of the prompts,
like the first two, demonstrated below: [
{
while getopts ":g:s:v:o:sz:a:l:gn:sp:stp:pp:" arg; do "offer": "Oracle-Database-Ee",
case "${arg}" in "publisher": "Oracle",
g) "sku": 12.1.0.2",
groupname=${OPTARG} "urn": "Oracle:Oracle-Databse-
;; Ee:12.1.0.2:12.1.20170202",
s) "version": "12.1.20170202"
sku=${OPTARG} },
;; {

THE DZONE GUIDE TO DATABASES PAGE 26 OF 44


DZONE.COM/GUIDES

"offer": "Oracle-Database-Ee", # Create a Resource Group, this must be a unique tenant


"publisher": "Oracle", and choose the location zone to deploy to:
"sku": "12.2.01", az group create --name $groupname --location $zone
"urn": "Oracle:Oracle-Database-Ee:
#Make this resource group and zone defalut
12.2.0.1:12.2.20180725",
az configure -- defaults group=$groupname location+$zone
"version": "12.2.20180725"
},
# Create Oracle VM
az vm create \
This output might look complex to work with, but with BASH, I
--name $oraname \
can include utilities like AWK and GREP to search for patterns and
--image $urn:latest \
format strings from the file to pass information to my script about --admin-username $adminuser \
SKUs, versions, and other pertinent information. You can see this --generate-ssh-keys
in the snippet from the script, just a few lines later:
az vm open-port \
cat db.lst | grep sku | awk '{print \$2}' | tr -d \"\, --name $oraname \
--port 22 \
Before I capture the list to the screen, I print to the screen, and --priority 330

(echo) that I'm going to be displaying the installation versions


# Create Storage Disks to use with database
from 12c through 18c that are available from Oracle.
az vm disk attach --vm-name $oraname \
-- caching ReadWrite \
I then ask the user to choose one and type it in. The value is then
--name $oraname"dsk" \
used to populate the value for the SKU.
--sku $size \

I continue to ask questions and offer examples or options avail- -- new

able from the catalog until all the values are entered and, if the
user answers the questions, the script proceeds, if not, it exits, Create the Oracle VM
deploying nothing. There are three steps to building the Virtual Machine that will
house my Oracle database.
Deploying to Azure
1. Create the VM.
As part of my steps, there is one environment variable (urn, for the
2. Open the ports required to gain remote access to the VM.
value of the image) that is set dynamically from the db.lst work
3. Attach the disk to be used for the database.
file after parsing the $sku and $version values. I also set my logfile
that will track all the output I write to it as part of my deployment. Note that in each of my AZ commands, I reference the variables
After these two steps are complete, it's time to create our group that were set earlier in my script.
that will contain all the resources in Azure.
Now it's time to create our script to perform the work on the VM to
az group create --name $groupname --location $zone build out the database:

Notice in the next step that I set the group and location as default # Get IP Address and log into VM

for two reasons:


export vmip+$(az network public-ip list | grep ipAddress
| awk '{print $2}' | tr -d \"\,)
1. Less coding in subsequent commands, as this information
is required. # Build Dynamic Oracle script, including install build
2. Less concerns about forgetting to enter this information and for DBCA
impacting the deployment. cat <<EOM > mkdb.sh
sudo su - oracle
# Get Correct URN alue from sku and version entered: mkdir /home/poracle/data/
urn=$(cat db.lst | grep $version:$sku | grep urn | awk export oh+$(ls /u01/app/oracle/product)
'{print $2}' | tr -d \"\,)
export loffile=./$jobrun.txt #Start Listener

# Build Steps lsncrtl start

THE DZONE GUIDE TO DATABASES PAGE 27 OF 44


DZONE.COM/GUIDES

My script requires that I know the IP address for remotely logging tnsnames.ora"

into my VM with SSH (secure shell). I capture this using the "az echo " (ADDRESS = (PROTIOCOL = TCP)(HOST = $vmh)
(PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames.
network" command. You'll also note that I take advantage of the
ora"
very valuable grep/awk/tr(im) BASH commands. Once I'm finished,
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
I need to use the concatenate (cat) utility to write to a new script
echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/
(mkdb.sh) for everything inside the EOM to EOM sections in the next tnsnames.ora"
section of my script. We start with: echo " (SERVICE_NAME = pdb1) >> $ORACLE_HOME/net-
work/admiin/tnsnames.ora"
• sudo (switch user domain owner) to the Oracle user. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
• mkdir (make the directory) for the datafiles. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
echo " >> $ORACLE_HOME/network/admin/tnsnames.ora"
• export the oracle home (oh).
echo " CDB1 = >> $ORACLE_HOME/network/admin/tnsnames.ora"
• start the listener.
echo " (DESCRIPTION = >> $ORACLE_HOME/network/tnsnames.
ora:
dbca -silent \
echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/
-createDatabase \
tnsnames.ora"
-templateName General_Purpose.dbc \
echo " (ADDRESS = (PROTOCOL = TCP)(HOST = $vmh)
-gdbname $gdb \
(PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames.
-sid cdb1 \
ora"
-responseFile NO_VALUE \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-characterSet AL32UTDF8 \
echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/
-sysPassword $syspassword \
tnsnames.ora"
-systemPassword $systempassword \
echo " (SERVICE_NAME = cdb1) >> $ORACLE_HOME/net-
-createAsContainerDatabse true \
work/admin/tnsnames.ora"
-numberOf{DBs 1 \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-pdbName pdb1 \
echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"
-pdbAdminPassword $pdbpassword \
-databaseType MULTIPURPOSE \
# third part of dynamic script build, listener reload
and db final check
-automaticMemoryManagement false \
-staorageType FS \
lsnrctl stop
-datafileDestination /home/oracle/data \
lsnrctl start
-ignorePreReqs

echo "Data creation done, Part II Complete"


In the above step in my script, I set the ORACLE_BASE and the
echo "Part III, configure TNS and validate empty database" ORACLE_HOME using the values that are called as part of my base

script. The next part appears to be simple but is quite tricky. I


The second part of the script is to use the Oracle Database Config- am required to write to a file as part of the script I'm writing to. I
uration Assistant (DBCA) to create the database. I then let the user ended up using an echo to a file and transferring this file over to
know that these steps are complete. Keep in mind that right now the VM as part of the final script, but it took me a bit to figure out
my script is just "writing" this information to another file to be how to accomplish this. Is there a better way? I'm not sure, but
executed afterwards. right now it works, and I have figured out how to use the NETCA

export ORACLE_BASE=/u01/app/oracle
(Oracle Network Configuration Assistant) from the command line
export ORACLE_HOME=/u01/app/oracle/product/$oh/db_Home1 to perform this same task on my list of enhancements.
export vmh=$vmip
Once it does write to the file, I need to stop and restart the listener,
#Create the TNSNAMES entry so this is what I add as steps to finish up my mkdb.sh creation script.

echo " PDB1 = >> $ORACLE_HOME/network/admin/tnsnames. # Log into new database to verify creation
ora" sql plus connect as /sysdba <<EOF
echo " (DESCRIPTION = >> $ORACLE_HOME/network/admin/ spool ./ck_db.txt
tnsnames.ora"
echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/ connect system@cdb1

THE DZONE GUIDE TO DATABASES PAGE 28 OF 44


DZONE.COM/GUIDES

select count(*) from dba_objects; When I executed the script, I entered the following information:
zone=EastUS and oraname=ora122db8.
sho con_name;
alter session set container=pdb1;
Note, the $oraname variable is used to populate the name for
sho con_name;
all the resources, making the naming convention dynamic and
connect / as sysdba
select count(*) from dba_objects; simplified. The user now can use traditional tools to manage the
spool off database, add enhanced security at the database level, or begin
exit development. Scaling up is easy via the portal and requires little
EOF training. If the organization desires to migrate existing Oracle data
EOM
into the Azure deployment, pre-existing Oracle migration tools can

To check everything, I log into the database with SQLPlus, do a be used to make easy work of it, as well.

count on database objects, switch over to the pluggable database


(PDB) and check it, logging to a text file and then exiting out of the
SQLPlus session.

After all of this, you'll see an End of File (EOF) and End of Message
(EOM) to close out all the writes to secondary files and scripts as
part of this process.
With a little ingenuity,
The last part is to execute it all from my deployment server to the
VM, which is done with the last AZ command, az vm run-command: the scripting skills you
# Run script created dynamically in the section above this
one. already have with BASH
az vm run-command invoke \
-- name $oraname \
-- command-id RubShellScript \
or PowerShell can be
--scripts "./mkdb.sh"
used to deploy to Azure.
Once completed, this script can be run and the VM, along with the
secondary script creations, don't take very long, but the database
creation that's part of the secondary script does take some time.
You will get a prompt with the percentage completed:

Copying database files


What I've demonstrated here is that, with a little ingenuity, the
1% complete
scripting skills you already have with BASH or PowerShell can be
Once everything is done, the following resources are created in the used to deploy to Azure. Incorporated with the cloud's CLI utility,
cloud, and can be viewed in the Azure Portal: environments can be prepped, dynamically deployed, managed,
and verified — even cross-platform. These deployments can be
performed at an enterprise level without demanding intense
re-education of technical personnel. Embrace this hybrid auto-
mation by incorporating the new with the old, while lessening the
learning curve of the Azure cloud.

KELLYN POT’VIN-GORMAN is a Data Platform Architect at


Microsoft. LinkedIn Twitter

THE DZONE GUIDE TO DATABASES PAGE 29 OF 44


DZONE.COM/GUIDES

Analytics and Transactions

UNITED
with MariaDB Platform

Download MariaDB to test drive the latest features

Mariadb.com/downloads

THE DZONE GUIDE TO DATABASES PAGE 30 OF 44


DZONE.COM/GUIDES

a small number of rows using indexes, and read most columns


in a row. A data warehouse is great for performing a market
basket analysis. These queries need most rows, but a small
Bringing Real-Time number of columns per row.

Analytics to Customer- What if an eCommerce application wanted to improve customer


engagement by alerting visitors to “soon-to-be-sold-out”

Facing Applications products based on active carts, recent purchases and current
inventory. This query is too difficult for a database. It needs to
perform aggregates on many rows. Nor is it practical for a data
warehouse. It will return a small number of rows, and many
columns per row.
There are databases for transactions, and there are data
warehouses for analytics. Not long ago, analytics was used The solution is a hybrid transactional/analytical database with
to derive the actionable insights to improving operational row and columnar storage, streaming change-data-capture
efficiency and mitigating risks. Today, it’s used to improve to synchronize them and intelligent query routing to use the
customer experience and customer engagement – to provide right storage for the right query. To use columnar storage for
customers with actionable insights. analytics (e.g., find the top soon-to-be-sold-out products) and
to use row storage for transactions( e.g., look up those five
However, when it comes to customer-facing applications and
products). This is the architecture of MariaDB Platform.
analytics, databases offer too little and data warehouses offer
too much. A database is great for browsing products, adding
them to a cart and completing a purchase. These queries return WRITTEN BY MARIADB

PARTNER SPOTLIGHT

MariaDB Platform
MariaDB Platform is the enterprise open source database for data-driven, customer-facing applications
and hybrid transactional/analytical workloads.

Category DBMS New Release MariaDB Platform X3, Open Source? Yes
January 2019

Case Study Strengths


QBerg, a leading market research institute, collecting • NoSQL – JSON for flexibility and transparent sharding for scalability

one million price observations every day to provide


• Streaming integration – Apache Kafka and Apache Spark connectors
its customers with pricing information on consumer
goods. The database must support transactional • Temporal – Go back and query data at a previous point in time

workloads (e.g., read and write current data) as well


• Real-time analytics – columnar storage and parallel processing
as analytical workloads (e.g, analyze historical data).
However, QBerg outgrew the analytical capabilities of • HTAP – streaming change-data-capture and workload-based query

its database, unable to grow beyond four months of routing


historical data without exceeding performance SLAs.
With MariaDB Platform, QBerg was able to provide
Notable Customers
its customers with more powerful, flexible and faster Website mariadb.com
• Deutsche • ServiceNow
self-service analytics, using row storage for reading
Bank • Verizon Twitter @mariadb
and writing current data and distributed, columnar
• Nasdaq • Walgreens
storage for real-time analytics on years’ worth of Blog mariadb.com/resources/blog
historical data.

THE DZONE GUIDE TO DATABASES PAGE 31 OF 44


DZONE.COM/GUIDES

QUICK VIEW

01. The keys to database effectiveness


are security, speed, scale, and

Executive Insights on
performance.

02. High-speed ingestion and queries


are the best way to manage and

the State of Databases


benefit from the onslaught on data
from an increasing number of sources.

03. The future of databases is fluid


with IoT, artificial intelligence and
machine learning, real-time computing
at the edge, quantum computing, and
the automation of it all.

BY TOM SMITH
RESEACRCH ANALYST, DEVADA

To learn about the current and future state of databases, we spoke like Spark and Kinesis have perfected the collection of data, there
with and received insights from 19 IT professionals. Here’s what are still several challenges in the analysis and operationalization of
they shared with us: the data.

3. The most significant changes to the database landscape over the


Key Findings
past couple of years have been the increased support and offerings
1. The holy grails of database effectiveness are security, speed,
from cloud vendors and the explosion of choices from vendors.
scale, and performance. These four features were more frequently
There has been a massive transition to fully managed database
mentioned than compliance, accessibility, availability, ease of use,
services in the cloud, which gives developers the ability to work
and real-time data. Each database you deploy has to meet the
with data to support both real-time transactional apps and deep
needs of your use case, be it speed, scale, security, or performance.
analytics by using a single platform that minimizes data movement
Clients are always looking for performance and security is a major
and allows them to extract value faster.
concern when moving from on-premise to the cloud.
New kinds of databases have led to the development of an evolving
2. In our interviews, our respondents did not come to a consensus
set of tools. This expansion of solutions and toolsets has resulted
on the best way to get a handle on the exploding volume of data
in a real shift toward matching the right database tool for the right
from an ever-increasing number of sources. High-speed ingestion
database job, and the number of databases that teams use has
was the only suggestion mentioned more than once. It’s incredibly
increased dramatically as a result.
important to ingest, store, and present data for querying. When peo-
ple struggle, they’ve only figured out one piece of the puzzle. They 4. When respondents were asked if they were seeing increased
may be able to ingest quickly, but they’re not able to analyze and adoption in a database persistence model, graphs were the most
get insights. It’s all about being able to capture the data and then common answer, followed by SQL/relational, document, time-se-
create value from it at the same time. While data ingestion solutions ries, and columnar databases. Given the growth of purpose-fit

THE DZONE GUIDE TO DATABASES PAGE 32 OF 44


DZONE.COM/GUIDES

databases, it appears that more clients are looking to find the right 9. To be more successful with database projects, developers need to
tool to solve the particular business problem they are working to understand the concepts of the databases they are working with as
solve. There’s also been increased adoption of database-as-a-service well as the foundational elements of AI/ML. Application developers
solutions as more companies move to the cloud and they can offload need to have a rudimentary understanding of how data is being
many administrative and operational functions. stored and accessed, as well as how the database is going to help you
achieve the goal of the application. Do some upfront work with the
5. The most frequently mentioned use cases are in financial services
database before you get into the specifics of how the application and
and retail, and revolve around forecasting, risk mitigation, and
database will work together. Be able to determine the best tool for
improving customer experience. Customer journey and hyper-per- the job.
sonalization efforts in particular are being spearheaded by marketing
across a number of industries and have specific data needs. Classical Understand how to interact with databases. Understanding database
use cases like risk management, fraud detection, and inventory man- architectures, database design, access paths/optimizations, and even

agement are being rebuilt with built-in real-time analytics. how data is arranged on disk on in-memory Is useful, particularly if
you are a backend developer.
6. The most common issues with databases revolve around scale and
knowledge. The number one issue is around scale, performance, and It’s also beneficial to understand basic data modeling techniques,

cost efficiency. You need to be able to design the database and IT normalization versus denormalization, SQL, when to use foreign

stack to cope with the explosion of data. Data volumes are only going keys, execut on plans, prepared statements, different types of joins

up. However, you must also be able to access and analyze the data. available, data obfuscation, and encryption.

It’s not very valuable unless you are gaining insight from it.
Here are the contributors of insight, knowledge, and experience:

Data management can be difficult if you don’t have a knowledge- • Ragu Chakravarthi, SVP and Chief Product Officer, Actian
able staff to handle regular database patches and updates, disaster • Joe Moser, Head of Product, Crate.io
recovery tests, and to implement and follow established processes
• Brian Rauscher, Director of Support, Cybera
and procedures.
• Sanjay Challa, Director of Product Management, Dactical

7. Given the vast number of vertical-specific solutions available, the • OJ Ngo, CTO, DH2i

future of databases is fluid with IoT, artificial intelligence and machine • Anders Wallgren, CTO, Electric Cloud
learning, real-time computing at the edge, quantum computing, and • Johnson Noel, Senior Solutions Architect, Hazelcast
the automation of it all. IoT is now producing unseen volumes and • Adam Zegelin, SVP Engineering, Instaclustr
sets of data. Companies will use real-time analytics and enhanced BI • Daniel Raskin, CMO, Kinetica
reporting to build a data-driven culture. Leading database vendors
• James Corcoran, CTO of Enterprise Solutions, Kx
are automating common DBA tasks like performance tuning, monitor-
• Neeraja Rentachintala, V.P. of Product Management, MapR
ing, disaster recovery, high availability, low latency, and auto-scaling
• Mat Keep, Senior Director of Product & Solutions, MongoDB
based on historical workloads.
• Philip Rathle, V.P. of Products and Matt Casters, Chief Solution
8. The biggest concerns regarding databases today are around secu- Architect, Neo4j
rity and complexity. Security of data at rest continues to be a major • Ariff Kassam, V.P Products, NuoDB
issue, as exemplified breaches at Equifax, StackOverflow, and several • Dhruba Borthakur, co-founder and CTO, Rockset
other targets. We still have many companies that pay very little atten- • Erik Gfesser, Principal Architect, SPR
tion to database security. Poor database management practices and • Lucas Vogel, Owner, Vogel Endpoint Systems
procedures are revealed with data breaches and cyber hacks.
• Neil Barton, CTO, WhereScape

The complexity and the sheer amount of data that’s in the open is a
concern. There are so many options to consider, it’s hard to under-
TOM SMITH is a Research Analyst at Devada who excels
stand the subtle differences and determine the optimal solution for at gathering insights from analytics—both quantitative
the business problem you are trying to solve. This is exacerbated by and qualitative—to drive business results. His passion

vendors who misrepresent what their products can do at scale. Every is sharing information of value to help people succeed. In his spare
time, you can find him either eating at Chipotle or working out at the
vendor should ship a utility to help customers verify the capability of
gym. LinkedIn - Twitter
the vendor.

THE DZONE GUIDE TO DATABASES PAGE 33 OF 44


DZONE.COM/GUIDES

Deliver value quicker while keeping your data safe

Redgate helps IT teams balance the need to


deliver software faster with the need to protect
and preserve business critical data.

You and your business benefit from a DevOps


approach to database development, while staying
compliant and minimizing risk.

Find out more at


www.red-gate.com/DevOps
THE DZONE GUIDE TO DATABASES PAGE 34 OF 44
DZONE.COM/GUIDES

Database DevOps and the Elephant in the Room

The arguments about including the database in DevOps are now It’s Not All Bad News
over and discussions have moved on from whether to include it to Rather than being a blocker to database DevOps, the requirement
how to introduce it. to protect data provides a blueprint for introducing a database
DevOps process that helps companies comply with data privacy
The latest Accelerate State of DevOps Report, for example, talks
legislation. There are four natural steps.
about database version control and the automation of database
changes alongside the application, and concludes: When teams Introduce standardized team-based development, using
follow these practices database changes don’t slow them down, or database version control to maintain a single source of truth for
cause problems when they perform code deployments. the code, encourage collaboration, and create an auditable record
of who changed what, when.
Redgate’s 2019 State of Database DevOps Report also revealed
that 83% of organizations which have already adopted DevOps
think it will take less than a year to automate and deploy database
changes – and 45% say it will take less than six months. The requirement to protect data provides
They’re doing so because it’s been recognized that database a blueprint for introducing a database
DevOps stops the problems of database deployments and
increases the speed of delivery of database changes.
DevOps process that helps companies
comply with data privacy legislation
That’s the Good News
But then there’s the elephant in the room I mentioned in the
title of this piece. Data privacy. A wave of data breaches and
concerns over how personal data is used have prompted new data Automate deployments by using a continuous integration
protection regulations to be introduced everywhere. process to trigger an automated build when changes are checked
into version control. This tests the changes and flags up any errors,
Since the enforcement of the GDPR in the EU, countries as far apart so a stable build is always available.
as New Zealand and China have enacted similar legislation, and
the California Consumer Privacy Act is the first of many state laws Monitor availability as well as performance by managing

to be introduced in the US. permissions, ensuring data is available and identifiable, and
having a record of which servers and what data is being managed.
What’s important isn’t the new legislation – it’s the common
strands that are emerging. One of those is that new regulations Protect and preserve data by introducing a data masking tool to

require personal data to be protected throughout the pseudonymize and anonymize data in database copies outside

development process. production.

The problem is that the State of Database DevOps Report I In this way, you can streamline processes, automate repetitive
tasks, minimize errors, and release value to end users faster. All
mentioned earlier also revealed that 65% of respondents use
while still being compliant with new data privacy regulations.
a copy of their production database in development and test
environments. The same database that contains the data that
WRITTEN BY MARY ROBBINS
needs to be protected. PRODUCT MARKETING MANAGER, REDGATE

THE DZONE GUIDE TO DATABASES PAGE 35 OF 44


DZONE.COM/GUIDES

Databases Solutions Directory

Company Product Product Type Free Trial Website

actian.com/data-management/
Actian Actian NoSQL Object-oriented Free solution
versant-nosql-object-database/

actian.com/data-management/
Actian Actian X RDBMS Free solution
actian-x-hybrid-rdbms/

guides.rubyonrails.org/active_
ActiveRecord ActiveRecord ORM Included in Rails
record_basics.html

Aerospike Aerospike Server In-memory, KV Open source aerospike.com

Altibase Altibase HDB In-memory, NewSQL Open source altibase.com

Amazon Web
DynamoDB KV, document, DBaaS Free tier available aws.amazon.com/dynamodb
Services

Amazon Web
SimpleDB Column store Free tier available aws.amazon.com/simpledb
Services

Apache Foundation Apache Cassandra KV, wide column Open source cassandra.apache.org

Apache Foundation Apache HBase Wide column Open source hbase.apache.org

In-memory, Hadoop,
Apache Foundation Apache Ignite Open source ignite.apache.org
data grid

Apache Foundation Apache OpenJPA ORM Open source openjpa.apache.org

THE DZONE GUIDE TO DATABASES PAGE 36 OF 44


DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

Included in iOS & developer.apple.com/


Apple Core Data ORM
macOS documentation/coredata

ArangoDB ArangoDB Graph, document, KV Open source arangodb.com

developer.atlassian.com/server/
Included in Jira &
Atlassian ActiveObjects ORM framework/atlassian-sdk/active-
Confluence
objects

CakeDC CakePHP ORM Open source cakephp.org

Cambridge Data discovery & Demo available by


Anzo cambridgesemantics.com/product
Semantics integration request

Canonical Storm ORM Open source storm.canonical.com

KV, document, data


Couchbase Couchbase Server Open source couchbase.com
caching

CUBRID CUBRID RDBMS Open source cubrid.org

DataStax DataStax Distributed cloud DB N/A datastax.com

Application release Demo available by


Datical Datical datical.com
automation for DBs request

Database Release Continuous delivery Demo available by dbmaestro.com/products/


DBmaestro
Automation for DBs request database-release-automation

Demo available by
Delphix Delphix Dynamic data plaform delphix.com
request

Eclipse Foundation EclipseLink ORM Open source eclipse.org/eclipselink

Available by embarcadero.com/products/
Embarcadero InterBase RDBMS
request interbase

enterprisedb.com/products/edb-
EnterpriseDB EDB Postgres Platform RDBMS Open source
postgres-platform

NewSQL, KV direct Available by


FairCom c-treeACE faircom.com/products/c-treeace
access request

Firebird Firebird RDBMS Open source firebirdsql.org

$300 credit over 12


Google BigTable Column store cloud.google.com/bigtable
months

THE DZONE GUIDE TO DATABASES PAGE 37 OF 44


DZONE.COM/GUIDES

NoSQL With
No Equal
Production-Ready Cassandra,
100% Open Source Compatible

Get the DataStax Distribution


of Apache Cassandra™

THE DZONE GUIDE TO DATABASES PAGE 38 OF 44


DZONE.COM/GUIDES

How DataOps Is Changing Application Development

Another day, another buzzword that surfaces in the data management efficiency, and engagement to data teams (and, by extension, entire
space with promises of changing the way things are done. organizations) that DevOps brings to software development teams.

Although in this case—it could be with good reason. But how, specifically, do companies stand to benefit from adopting a
DataOps mindset?
DataOps, or data operations, refers to the next step in the evolution
of DevOps teams, where data scientists, processes, and tools are Here are several of the benefits DataOps brings to application development:
increasingly integrated into the development process at data-focused
• Real-time insights
enterprises. This makes it easier for companies to leverage real-time data
• Enhanced collaboration across the enterprise
to serve up delightful customer experiences and make better decisions at
• Improved agility and faster response times
a moment’s notice.
• Faster bug fixes
By unlocking the true power of DataOps, companies can build modern, data- • Better decisions and avoid costly missteps
rich applications in less time while operating with more agility and baking • Higher quality products
more automation into their processes to shorten the development lifecycle. • Increased employee engagement
• Stronger user experiences
Companies that successfully implement an agile approach to data
management are four times more likely to enjoy business outcomes that The benefits of DataOps speak for themselves. Much like DevOps teams
exceed shareholder expectations. have enabled software companies to ship better products faster and with
greater frequency, DataOps teams bring the same level of efficiency and
So, while it’s still a relatively new concept, it comes as no surprise that
data-driven decision-making to their organizations.
one study found that 73% of organizations were planning to invest in
DataOps during 2018. As the promise of DataOps is realized, we expect
How to Achieve DataOps
more and more companies to invest in the space in 2019 and beyond.
Change is never easy, especially when employees have been doing things
What is DataOps? one way forever.
The names sound similar and the two practices are related. But DataOps
To launch DataOps at your organization, you first need to get your team to
is not just DevOps with data. Still, just as DevOps transformed the way
buy in to the new way of working. Beyond that, implementing DataOps at
modern applications are developed, DataOps promises to transform the
your organization requires the right suite of tools as well as an underlying
way organizations use data and run data analytics, speeding up data-
active everywhere database that ensures contextual data is always
driven processes considerably.
available in real time—even when teams are globally distributed and
Just as the agile approach to software development comes with a shiny applications are hosted in hybrid cloud environments.
Agile Manifesto, DataOps has its own manifesto, too. The DataOps
While it might take some time to get your team to buy in to DataOps and
Manifesto places emphasis on:
master the associated new workflows, whatever time and resources you
• Individuals and interactions over processes and tools invest to make that happen will undoubtedly return significant dividends
• Working analytics over comprehensive documentation down the road.
• Customer collaboration over contract negotiation
• Experimentation, iteration, and feedback over extensive upfront design Thanks to DataOps, your company will be able to move faster, make
• Cross-functional ownership of operations over siloed responsibilities better decisions, and ship better products while impressing your
customers and inspiring your employees.
The DataOps Manifesto also features a set of 18 core principles, including
“continually satisfy your customer,” “value working analytics,” “embrace A new buzzword—yes. But one that is perhaps buzzworthy.
change,” and “it’s a team sport.”

What are the Benefits of DataOps?


WRITTEN BY MATT KENNEDY
By now, you know that DataOps promises to bring the same speed, SENIOR DIRECTOR, CLOUD SOLUTIONS, DATASTAX

THE DZONE GUIDE TO DATABASES PAGE 39 OF 44


DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

$300 credit over 12


Google Cloud Spaner RDBMS, NewSQL cloud.google.com/spanner
months

GridGain In-Memory In-memory, Hadoop, gridgain.com/products/gridgain-


Gridgain Free tier available
Computing data grid products

Hazelcast IMDG Hazelcast In-memory, data grid Open source hazelcast.com/products

Hibernating Rhinos RavenDB Document Free tier available ravendb.net

Available by
IBM IBM DB2 RDBMS ibm.com/analytics/us/en/db2
request

ibm.com/analytics/us/en/
IBM Informix RDBMS, transactional Free tier available
technology/informix

Stream processing &


InfluxData InfluxEnterprise 14 days influxdata.com/products
analytics

Object-oriented, intersystems.com/products/
InterSystems Cache N/A
relational cache/#technology

Open-source
JOOQ JOOQ ORM for Java jooq.org
version available

NewSQL,
MariaDB ClustrixDB N/A mariadb.com/products/clustrixdb
transactional

RDBMS, document, mariadb.com/products/mariadb-


MariaDB MariaDB Platform Open source
MySQL family platform-transactional

marklogic.com/product/
MarkLogic MarkLogic Transactional Free tier available
marklogic-database-overview

MemSQL MemSQL In-memory, NewSQL Free tier available memsql.com/product

Micro Focus Vertica Enterprise RDBMS, column store Free tier available vertica.com/product/on-premise

Part of .NET
Microsoft Entity Framework ORM docs.microsoft.com/en-us/ef
framework

180-day preview microsoft.com/en-us/sql-server/


Microsoft SQL Server 2019 RDBMS
available default.aspx

MongoDB MongoDB Document Open source mongodb.com

THE DZONE GUIDE TO DATABASES PAGE 40 OF 44


DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

MyBatis MyBatis ORM Open source blog.mybatis.org/p/products.html

Neo4j Neo4j Graph Free tier available neo4j.com

Nhibernate Nhibernate ORM for .NET Open source nhibernate.info

NuoDB Nhibernate NewSQL Free tier available nuodb.com

opentext.com/products-and-
solutions/products/specialty-
Available by
OpenText NuoDB RDBMS technologies/opentext-gupta-
request
development-tools-databases/
opentext-gupta-sqlbase

oracle.com/technetwork/
OpenText Gupta
Oracle ORM Free solution database/application-
SQLBase
development/index-099369.html

Oracle JDBC RDBMS Open source mysql.com

Oracle Oracle Database RDBMS, document Free tier available oracle.com/database/index.html

oracle.com/technetwork/
Oracle Toplink ORM Free solution middleware/toplink/overview/
index.html

Open-source
OrientDB OrientDB Document, graph, KV orientechnologies.com
version available

OrmLite OrmLite ORM Open source ormlite.com

Percona Server for Open-source percona.com/software/mysql-


Percona RDBMS, MySQL family
MySQL versions available database/percona-server

Pivotal Gemfire In-memory, data grid Free solution pivotal.io/pivotal-gemfire

PostgreSQL PostgreSQL RDBMS Open source postgresql.org

RavenDB RavenDB NoSQL, document Free tier available ravendb.net

THE DZONE GUIDE TO DATABASES PAGE 41 OF 44


Company Product Product Type Free Trial Website

CI, source control,


Red Gate Software SQL Toolkit 14 days red-gate.com/products
change mgmt for DBs

Red Hat Hibernate ORM Open source hibernate.org

In-memory, KV, data


Redis Labs Redis Open source redis.io
caching

In-memory, KV, data redislabs.com/redis-enterprise/


Redis Labs Redis Cloud Free tier available
caching, DBaaS essentials

Basho Riak Document, KV Open source riak.com/products

In-memory, column-
SAP SAP HANA Platform Free tier available sap.com/products/hana.html
oriented RDBMS

In-memory, Hadoop,
ScaleOut ScaleOut StateServer 30 days scaleoutsoftware.com/products
data grid

Available by
Software AG Adabas Stream processing adabas.com
request

Database Performance DB performance solarwinds.com/database-


Solarwinds 14 days
Analyzer tuning performance-analyzer

Open-source splicemachine.com/product/
Splice Machine Splice Machine NewSQL, RDBMS
version available features

SQLite SQLite RDBMS Open source sqlite.org

Studio 3T Studio 3T IDE for MongoDB Free tier available studio3t.com

SymmetricDS SymmetricDS DB replication Open source symmetricds.org

teradata.com/Resources/
Open-source
Teradata Aster Database Specialist analytics Brochures/Aster-Database-
version available
Performance-and-Scalability

VoltDB VoltDB In-memory, NewSQL 30 days voltdb.com

THE DZONE GUIDE TO DATABASES PAGE 42 OF 44


Zones
Database dzone.com/database
The Database Zone is DZone’s portal for following the news and
trends of the database ecosystems, which include relational (SQL) and
nonrelational (NoSQL) solutions such as MySQL, PostgreSQL, SQL Server,
NuoDB, Neo4j, MongoDB, CouchDB, Cassandra, and many others.

Diving Deeper Big Data dzone.com/big-data


The Big Data/Analytics Zone is a prime resource and community for Big
Data professionals of all types. We’re on top of all the best tips and news

Into Databases for Hadoop, R, and data visualization technologies. Not only that, but we
also give you advice from data science experts on how to understand and
present that data.

Cloud dzone.com/cloud
The Cloud Zone covers the host of providers and utilities that make cloud
computing possible and push the limits (and savings) with which we can
deploy, store, and host applications in a flexible, elastic manner. The
Cloud Zone focuses on PaaS, infrastructures, security, scalability, and
hosting servers.

Twitter Refcardz
DevOps for Databases
@SQLEspresso @GFritchey Download this new Refcard to get started with Database Release
Automation and eliminate bottlenecks. Learn the key best practices that
your DevOps database solution should meet in order for you to get the
@KimberlyLTripp @BrentO
most out of your investment.

@MMarie @DBArgenis Building Microservices With Redis


This Refcard will teach you about types of data processing in
microservices, using Redis Streams for interservices communication,
@catherinew @cmfinlan optimizing microservices, and more.

Working With Time Series Data


@victoria_holt @erinstellato As more connected devices are implemented and data is expected to be
collected and processed in real-time, the ability to handle time series
data has become more and more important. Learn about the basics of
time series data, how it can be used across industries, and using open
source tools to manage it.

Books Podcasts
Database System Concepts SQL Data Partners
If you’re just getting started with databases, check out this book to learn all Learn about new and familiar topics in the worlds of SQL Server and
about database management. professional development.

Readings in Database Systems Voice of the DBA


Learn about the fundamentals of recent innovations and research in the Get a look at databases and the world from a data professional’s
databases world. perspective.

Fundamentals of Database Systems DBAle


Get clear explanations of database theories, designs, and systems. Get clear explanations of database theories, designs, and systems.

THE DZONE GUIDE TO DATABASES PAGE 43 OF 44


DZONE.COM/GUIDES

VISIT THE

Database Zone
Managing the growing amount of data effectively is an ongoing concern. From
handling event and streaming data to finding the best use cases for NoSQL
databases, there’s plenty to unpack in the database community.

Keep a pulse on the industry with topics such as:

• Best practices in optimization • Database performance and monitoring


•  Handling event and streaming data • Advancements in database technology

Visit the Zone

TUTORIALS CASE STUDIES BEST PRACTICES CODE SNIPPETS

THE DZONE GUIDE TO DATABASES PAGE 44 OF 44

You might also like