Oracle Berkeley DB Data Store: A Use-Case Based Tutorial
Oracle Berkeley DB Data Store: A Use-Case Based Tutorial
Data Store
A Use-Case Based Tutorial
1
Part I: Data Store (DS)
• Overview
• Creating and removing databases
• Getting and putting data
• Case Studies
• Telecom use case
• Customer account management
2
Berkeley DB Data Store
3
Blindingly Fast
• Platform Description
• 2.33 GHz Intel Core 2 Duo
• 2 GB 667 MHz DDR2 SDRAM
• Mac OSX 10.4.10
• Window manager and apps all up and running
• Insert 1,000,000 8-byte keys with 25-byte data
• 180,000 inserts/second
• Retrieve same keys/data
• 750,000 gets/second
• Bulk get of same key/data pairs
• 2,500,000 pairs/second
4
Easy to Use
5
Main Program
• main(argc, argv)
• int argc;
• char *argv[];
• {
• DB *dbp;
• int do_n, rm_db, skip_put;
• do_n = DEFAULT_N;
• rm_db = 0;
• skip_put = 0;
6
Create/Open Database
• static void
• create_database(dbpp)
• DB **dbpp;
• {
• DB *dbp;
•
• /* Create database handle. */
• assert(db_create(&dbp, NULL, 0) == 0);
•
• /* Open/Create btree database. */
• assert(dbp->open(dbp,
• NULL, DBNAME, NULL, DB_BTREE, DB_CREATE, 0644) == 0);
•
• *dbpp = dbp;
•
}
7
Put Data
• static void
• do_put(dbp, n)
• DB *dbp;
• int n;
• {
• char key[KEYLEN];
• DBT datadbt, keydbt;
• int i;
8
Get Data
1. static void
2. do_get(dbp)
3. DB *dbp;
4. {
5. DBC *dbc;
6. DBT keydbt, datadbt;
7. int n, ret;
9
Close and Remove Database
• static void
• close_database(dbp, remove)
• DB *dbp;
• int remove;
• {
• assert(dbp->close(dbp, 0) == 0);
• if (remove) {
• assert(db_create(&dbp, NULL, 0) == 0);
• assert(dbp->remove(dbp,
• DBNAME, NULL, 0) == 0);
• }
• }
10
APIs in many Languages
• C/C++
• Java
• Python
• Perl
• Tcl
• Ruby
• Etc.
11
Berkeley DB Product Family
APIs for C, C++, Java, Perl, Python, PHP, Ruby, Tcl, Eiffel, etc.
12
Part I: Data Store
• Overview
• Creating and removing databases
• Getting and putting data
• Case Studies
• Telecom Use Case
• Creating/using an environment
• Filling/draining a queue
• Btrees: storing different record types in a single database
• Cursors
• Customer account management
13
Telecommunications Use Case
14
Berkeley DB in Telecom
Archive, Inventory, Event and
Performance Records
Unsolicited events
Drain queue
CSV export
15
BDB Access Methods
• DB_QUEUE/DB_BTREE:
• Berkeley DB supports a variety of different indexing
mechanisms.
• Applications select the appropriate mechanism for the data
management task at hand.
• For example:
• DB_QUEUE: FIFO order, fixed size data, numerical keys
• DB_BTREE: Clustered lookup, variable length keys/data
• Other index types (access methods)
• Hash: truly large unordered data
• Recno: data with no natural key; numerical keys assigned
16
Databases & Environments
17
Creating Environments
18
Environment Example
/* Create handle. */
ret = db_env_create(&dbenv, 0);
… error_handling …
ret = dbenv->set_cachesize(dbenv, 1, 0, 1);
… error_handling …
/* Other configuration here. */
/* Create and open the environment. */
flags = DB_CREATE |DB_INIT_MPOOL;
ret = dbenv->open(dbenv, HOME, flags, 0);
… error_handling …
19
Filling and Draining a Queue
20
Creating a Queue Database
21
Creating a Queue
/* Create a database handle. */
ret = db_create(&dbp, dbenv, 0);
… error_handling …
22
Enqueuing Data
23
Draining the Queue
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
if (ret != 0)
… error_handling …
24
Btrees: Different record types
in a single database
• One btree per device
• Each btree contains four record types
• Archive
• Inventory
• Event
• Performance
• Not necessary to have N databases for N
record types.
25
Record Types
• Different records (with different fields/sizes):
struct archive { struct event {
int32_t field1; int64_t field1;
… other fields here … other fields here
}; };
• Assumptions:
• Every record has a timestamp
• May need to encode types of different records
26
Data Design 1
27
Allowing Duplicates
int create_database (DB_ENV *dbenv, DB *dbpp, char *name)
{
DB *dbp;
int ret;
*dbpp = dbp;
return (ret);
}
28
Inserting a Record (design 1)
int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
{
DBT key, data;
struct archive a;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key->data = &ts;
key->size = sizeof(ts);
a->type = ARCHIVE_TYPE;
BUFFER_TO_ARCHIVE(buf, a);/
data->data = &a;
data->size = sizeof(a);
29
Data Design 2
30
Inserting a Record (design 2)
int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
{
DBT key, data;
RECTYPE type;
struct archive a;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
type = ARCHIVE_TYPE;
key->data = &type;
key->size = sizeof(type);
a->timestamp = ts;
BUFFER_TO_ARCHIVE(buf, a);
data->data = &a;
data->size = sizeof(a);
31
Data Design 3
32
Inserting a Record (design 3)
int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
{
DBT key, data;
struct archive a;
struct akey {
timestamp_t ts;
RECTYE type;
} archive_key;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
archive_key.ts = ts;
archive_key.type = ARCHIVE_TYPE;
key->data = &archive_key;
key->size = sizeof(archive_key);
BUFFER_TO_ARCHIVE(buf, a);
data->data = &a;
data->size = sizeof(a);
return (dbp->put(dbp, &key, &data, 0));
}
33
Data Design 4
34
Multiple Database in a File
int create_database (DB_ENV *dbenv, DB *db_array, char **names)
{
DB *dbp;
int i, ret;
db_array[i] = dbp;
}
if (ret != 0)
/* Close all non-Null entries in db_array. */
return (ret);
35
Inserting a Record (design 4)
int insert_record(DB *db_array, timestamp_t ts, RECTYPE type, char *buf, size_t buflen)
{
DBT key, data;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key->data = &ts;
key->size = sizeof(ts);
data->data = buf;
data->size = buflen;
if (type == ARCHIVE_TYPE)
ret = put_rec(db_array[ARCHIVE_NDX], &key, &data);
else if (type == EVENT_TYPE)
ret = put_rec(db_array[EVENT_NDX], &key, &data);
else if (type == INVENTORY_TYPE)
ret = put_rec(db_array[INVENTORY_NDX], &key, &data);
else if (type == PERF_TYPE)
ret = put_rec(db_array[PERF_NDX], &key, &data);
else /* Signal invalid record type */
return (ret);
}
36
Inserting a Record (cont)
int put_rec(DB *dbp, DBT *key, DBT *data)
{
return (dbp->put(dbp, NULL, &key, &data, 0));
}
37
Data Design:Summary
38
Exporting to CSV
39
Iteration with cursors
40
Iteration (code)
int archive_to_csv(DB *dbp)
{
DBC dbc;
DBT key, data;
struct archive *a;
int ret;
41
Cleaning out the Database
42
Removing a Database
43
Truncating a Database
44
Part I: Data Store
• Overview
• Creating and removing databases
• Getting and putting data
• Case Studies
• Telecom Use Case
• Customer account management
• Secondary indexes
• Cursor-based manipulation
• Configuration and tuning
45
Customer Account
Management
Orders
Create order
Warehouse
Fulfill orders
Add Customer
View last week’s calls
46
The Schema
Add new Get by ID
Update Customer Data Get by name
Get by ID
Catalog Data Get by name
Get by time
Append
Call log Get by customer
47
Secondary Indexes
48
Selecting keys and indexes
Database Primary Key Secondary Duplicates?
Key
Customer Customer ID Name No
49
Secondaries internally
Customer Primary Database Customer Secondary
Key Data Key Data
01928374 IBM; Hawthorne, NY … Dell 28910283
… … … …
50
Secondaries Programmatically
51
Customer Secondary (1)
52
Customer Secondary (2)
rec = data.data;
result.data = rec.name;
result.size = sizeof(rec.name);
return (0);
}
53
Customer Secondary (3)
54
Representing Orders
• What is an order?
• An order number
• A customer
• A date/time
• A collection of items
• How do we represent orders?
• Key by order number with items in a single data item
• Key by order number with duplicates for items
• Key by order number/sequence number
55
Orders: Multiple Items in Data
typedef int order_id; /* Primary key */
struct order_info {
int customer_id;
time_t date;
float total;
time_t shipdate;
int nitems; /* Number of items to follow. */
char[1] order_items; /* Must martial all items into here */
};
Struct order_item {
int item_id;
int nitems; /* per-item price stored elsewhere */
float total;
};
56
Orders: Multiple Items in Data
-- Secondaries
• Secondary on customer_id
• Just like what we did on customer
• Create a callback that extracts the customer_id
field from the data field.
• Associate secondary with primary.
• Secondary on items
• Need to return multiple secondary keys for a single
primary record.
57
Secondaries: Return multiple
keys per key/data pair
int item_callback(DB *dbp, DBT *key, DBT *data, DBT *result)
{
struct order_info rec;
DBT **dbta;
int i;
order = data.data;
result.flags = DB_DBT_MULTIPLE | DB_DBT_APPMALLOC;
result.size = order->nitems;
result.data = calloc(sizeof(DBT), order->nitems);
if (result.data == NULL)
return (ENOMEM);
dbta = result.data;
for (i = 0; i < order->nitems; i++) {
dbta[i].size = sizeof(order_item);
dbta[I].data = order->order_items[i];
}
return (0);
}
58
Secondaries: Example of
multiple keys per key/data pair
Order Primary Database Order/Item Secondary
Key Data Key Data
9876543 19283746;3/17/08;138.50;3/18/09;3 0003 9876543
0003;2;130.00;0019;1;8.50 5439087
8769887 19283877;3/16/08;1024.00;3/16/0810 0019 9876543
0092;10;1024.00
0092 8769887
5430987 90867654;3/16/08;564.98;;5 5430987
0003;3;195.00;0092;1;102.40;
9902 5430987
9902;1;267.58
… …
… …
59
Orders: Duplicates for Items
struct order_item {
int item_id;
int nitems; /* per-item price stored elsewhere */
float total;
};
60
Orders: Items as duplicates
Example
Order Primary Database
Key Data
9876543 19283746;3/17/08;138.50;3/18/09
0003;2;130.00
0019;1;8.50
8769887 19283877;3/16/08;1024.00;3/16/0810
0092;10;1024.00
5430987 90867654;3/16/08;564.98;;5
0003;3;195.00
0092;1;102.40
9902;1;267.58
61
Orders: Duplicates for Items
Inserting an order
int insert_order(int order_id, struct orderer_info *oi;
struct order_items **oip)
{
DBT keydbt, datadbt;
int I;
keydbt.data = &order_id;
keydbt.size = sizeof(order_id);
datadbt.data = oi;
datadbt.size = sizeof(order_info);
dbp->put(dbp, NULL, &keydbt, &datadbt, DB_NODUPDATA);
for (i = 0; i < oi->nitems; i++) {
datadbt.data = *oip++;
datadbt.size = sizeof(order_item);
dbp->put(dbp, NULL, &keydbt, &datadbt, 0);
}
}
62
Orders: Duplicates for Items
Trade-offs
• Must configure database for duplicates.
• - No automatic secondaries (must hand-
code).
• + Easy to add/remove items from order.
63
Orders: Key by order_id/seqno
struct pkey { /* Primary key */
int order_id;
int seqno;
};
struct order_info {
int customer_id;
time_t date;
float total;
time_t shipdate;
int nitems; /* Number of items to follow. */
};
struct order_item {
int item_id;
int nitems; /* per-item price stored elsewhere */
float total;
};
64
Orders: key by order/seqno
Example
Order Primary Database Order/Item Secondary
Key Data Key Data
9876543/00 19283746;3/17/08;138.50;
0003 9876543/01
3/18/09
9876543/01 0003;2;130.00 0003 5439087/01
65
Orders: Key by order/seqno
Inserting an order
int insert_order(int order_id, struct orderer_info *oi;
struct order_items **oip)
{
DBT keydbt, datadbt;
struct pkey;
int i;
pkey.order_id = order_id;
pkey.seqno = 0;
keydbt.data = &pkey;
keydbt.size = sizeof(pkey);
datadbt.data = oi;
datadbt.size = sizeof(order_info);
dbp->put(dbp, NULL, &keydbt, &datadbt, DB_NODUPDATA);
for (i = 0; i < oi->nitems; i++) {
pkey.seqno = i + 1;
datadbt.data = *oip++;
datadbt.size = sizeof(order_item);
dbp->put(dbp, NULL, &keydbt, &datadbt, 0);
}
}
66
Order Trade-offs
• ID Key w/multiple items in data
• One get/put per order
• No duplicates means automatic secondary support
• Add/delete/update item is somewhat messier
• ID key w/duplicate data items
• Easy to add/delete/update order items
• Must implement secondaries manually
• Where do you place per-order info (CID, data) (first dup?)
• Key is ID/sequence number
• Easy to add/delete/update order items
• Can support secondaries automatically
• Where do you place per-order info (data item 0?)
67
End of Part I
68