PreviousProgrammer's Guide (9.1 revision 1) Next

Creating a Data File

Show this topic in Library frames

Btrieve gives developers tremendous flexibility in optimizing database applications. In providing such flexibility, Btrieve exposes a great deal of the inner workings of the MicroKernel. If you are new to Btrieve, the Create (14) operation may appear quite complex to you, but you do not need all of the features this operation provides to get started. This section highlights the basic requirements by stepping you through the creation of a simple, transaction-based data file. For simplification where necessary, this section uses C interface terminology.

Data Layout

This section uses an example data file that stores employee records. The application will retrieve employee information by providing either a unique employee ID or the employee's last name. Because more than one employee can have the same last name, the database will allow duplicate values for the last name. Based on these criteria, the data layout for the file is as follows:

Information in Record
Data Type
Key/Index Characteristics
Last name
25 character string
Duplicatable
First name
25 character string
None
Middle initial
1 character string
None
Employee ID
4 byte integer
Unique
Phone number
13 character string
None
Pay rate per month
4 byte float
None

Now that the basic data layout is established, you can begin applying Btrieve terminology and requirements. This includes determining information about the key structure and file structure before you actually create the file. You must work out these details in advance, because the Create (14) operation creates the file, index, and key information all at once. The following sections discuss the issues to consider in working out these details.

Key Attributes

First, determine any special functionality for the keys. The MicroKernel supports a variety of key attributes you can assign, as shown in the following table.

Table 5-1 Key Attributes 
Constant1
Description
EXTTYPE_KEY
Extended Data Type. Stores a Btrieve data type other than string or unsigned binary. Use this attribute, rather than the standard binary data type. This key attribute can accommodate the standard binary and string data types, plus many others.
BIN
Standard BINARY Data Type. Supported for historical reasons. Stores an unsigned binary number. Default data type is string.
DUP
Linked Duplicates. Allows duplicate values, which are linked by pointers from the index page to the data page. For more information, refer to Duplicatable Keys.
REPEAT_DUPS_KEY
Repeating Duplicates. Allows duplicate values, which are stored on both the index page and the data page. For more information, refer to Duplicatable Keys .
MOD
Modifiable. Allows the key value to be modified after the record is inserted.
SEG
Segmented. Specifies that this key has a segment that follows the current key segment.
NUL
Null Key (All Segments). Excludes any records from the index if all segments of the key contain a specified null value. (You assign the null value when you create the file.)
MANUAL_KEY
Null Key (Any Segment). Excludes any records from the index if any segment in the key contains a specified null value. (You assign the null value when you create the file.)
DESC_KEY
Descending Sort Order. Orders the key values in descending order (highest to lowest). Default is ascending order (lowest to highest).
NOCASE_KEY
Case Insensitive. Sorts string values without distinguishing upper and lower case letters. Do not use if the key has an alternate collating sequence (ACS). In the case of a Null Indicator segment, this attribute is overloaded to indicate that non-zero null values should be treated distinctly.
ALT
Alternate Collating Sequence. Uses an ACS to sort string keys differently from the standard ASCII collating sequence. Different keys can use different ACSs. You can specify the default ACS (the first one defined in the file), a numbered ACS defined in the file, or a named ACS defined in the COLLATE.CFG system file.
NUMBERED_ACS
NAMED_ACS
1For simplicity, these constants, defined in btrconst.h, are consistent with the C interface. Some interfaces may use other names or no constants at all. For bit masks, hexadecimal, and decimal equivalents for the key attributes, refer to the API Programmer's Reference.

You assign these key attributes for each key you define. Each key has its own key specification. If the key has multiple segments, you have to provide the specification for each segment. Some of these attributes can have different values for different segments within the same key. Using the previous example, the keys are the last name and the employee ID. Both keys use extended types; the last name is a string and the employee ID is an integer. Both are modifiable, but only the last name is duplicatable. In addition, the last name is case insensitive.

Regarding the data type you assign to a key, Btrieve does not validate that the records you input adhere to the data types defined for the keys. For example, you could define a TIMESTAMP key in a file, but store a character string there or define a date key and store a value for February 30. Your Btrieve application would work fine, but an ODBC application that tries to access the same data might fail, because the byte format could be different and the algorithms used to generate the timestamp value could be different. For complete descriptions of the data types, refer to the SQL Engine Reference.

File Attributes

Next, determine any special functionality for the file. The MicroKernel supports a variety of file attributes you can assign, as follows:

Table 5-2 File Attributes 
Constant1
Description
VAR_RECS
Variable Length Records. Use in files that contain variable length records.
BLANK_TRUNC
Blank Truncation. Conserves disk space by dropping any trailing blanks in the variable-length portion of the record. Applicable only to files that allow variable-length records and that do not use data compression. For more information, refer to Blank Truncation .
PRE_ALLOC
Page Preallocation. Reserves contiguous disk space for use by the file as it is populated. Can speed up file operations if a file occupies a contiguous area on the disk. The increase in speed is most noticeable on very large files. For more information, refer to Page Preallocation .
DATA_COMP
Data Compression. Compresses records before inserting or updating them and uncompresses records when retrieving them. For more information, refer to Data Compression .
KEY_ONLY
Key-Only File. Includes only one key, and the entire record is stored with the key, so no data pages are required. Key-only files are useful when your records contain a single key and that key takes up most of each record. For more information, refer to Key-Only Files .
BALANCED_KEYS
Index Balancing. Rotates values from full index pages onto index pages that have space available. Index balancing enhances performance during read operations, but may require extra time during write operations. For more information, refer to Index Balancing .
FREE_10
FREE_20
FREE_30
Free Space Threshold. Sets the threshold percentage for reusing disk space made available by deletions of variable length records, thus eliminating the need to reorganize files and reducing the fragmentation of variable-length records across several pages.

A larger Free Space Threshold reduces fragmentation of the variable-length portion of records which increases performance. However, it requires more disk space. If higher performance is desired, increase the Free Space Threshold to 30 percent.
DUP_PTRS
Reserve Duplicate Pointers. Preallocates pointer space for linked duplicatable keys added in the future. If no duplicate pointers are available for creating a linked-duplicatable key, the MicroKernel creates a repeating-duplicatable key.
INCLUDE_SYSTEM_DATA
System Data. Includes system data upon file creation, which allows the MicroKernel to perform transaction logging on the file. This is useful in files that do not contain a unique key.
NO_INCLUDE_SYSTEM_DATA
SPECIFY_KEY_NUMS
Key Number. Allows you to assign a specific number to a key, rather than letting the MicroKernel assign numbers automatically. Some applications may require a specific key number.
VATS_SUPPORT
Variable-tail Allocation Tables (VATs). Uses VATs (arrays of pointers to the variable-length portion of the record) to accelerate random access and to limit the size of the compression buffer used during data compression. For more information, refer to Variable-tail Allocation Tables .
1For simplicity, these constants, defined in btrconst.h, are consistent with the C interface. Some interfaces may use other names or no constants at all. For bit masks, hexadecimal, and decimal equivalents for the file attributes, refer to the API Programmer's Reference.

The example data file does not use any of these file attributes, because the records are fixed-length records of small size.

For definitions of file attributes, refer to File Types . For more information about specifying file attributes during the Create operation, refer to the Pervasive.SQL API Programmer's Reference.

Creating File and Key Specification Structures

When you use the Create operation, you pass in a data buffer that contains file and key specification structures. The following structure uses the example employee data file.

Table 5-3 Sample Data Buffer for File and Key Specifications 
Description
Data Type1
Byte #
Example Value2
File Specification
     
Logical Fixed Record Length. (Size of all fields combined: 25 + 25 + 1 + 4 + 13 + 4). For instructions, refer to Calculating the Logical Record Length .3
Short Int
0, 1
72
Page Size. The 4,096 size works best for most files, but if you want to fine-tune this, refer to Choosing a Page Size for more information.
Short Int
2, 3
4,096
Number of Keys. (Number of keys in the file: 2)
Short Int
4, 5
2
Reserved. (Not used during a Create operation.)
Reserved
6- 9
0
File Flags. Specifies the file attributes. The example file does not use any.
Short Int
10, 11
0
Number of Extra Pointers. Sets the number of duplicate pointers to reserve for future key additions. Used if the file attributes specify Reserve Duplicate Pointers.
Byte
12
0
Reserved. (Not used during a Create operation.)
Reserved
13
0
Preallocated Pages. Sets the number of pages to preallocate. Used if the file attributes specify Page Preallocation.
Short Int
14, 15
0
Key Specification for Key 0 (Last Name)
     
Key Position. Provides the position of the first byte of the key within the record. The first byte int the record is 1.
Short Int
16, 17
1
Key Length. Specifies the length of the key, in bytes.
Short Int
18, 19
25
Key Flags. Specifies the key attributes.
Short Int
20, 21
EXTTYPE_KEY + NOCASE_KEY + DUP + MOD
Not Used for a Create.
Byte
22-25
0
Extended Key Type. Used if the key flags specify Use Extended Key Type. Specifies one of the extended data types.
Byte
26
ZSTRING
Null Value (legacy nulls only). Used if the key flags specify Null Key (All Segments) or Null Key (Any Segment). Specifies an exclusion value for the key. See Null Value for more conceptual information on legacy nulls and true nulls.
Byte
27
0
Not Used for a Create.
Byte
28, 29
0
Manually Assigned Key Number. Used if the file attributes specify Key Number. Assigns a key number.
Byte
30
0
ACS Number. Used if the key flags specify Use Default ACS, Use Numbered ACS in File, or Use Named ACS. Specifies the ACS number to use.
Byte
31
0
Key Specification for Key 1 (Employee ID)
     
Key Position. (Employee ID starts at first byte after Middle Initial.)
Short Int
32, 33
52
Key Length.
Short Int
34, 35
4
Key Flags.
Short Int
36, 37
EXTTYPE_KEY + MOD
Not Used for a Create.
Byte
38-41
0
Extended Key Type.
Byte
42
INTEGER
Null Value.
Byte
43
0
Not Used for a Create.
Byte
44, 45
0
Manually Assigned Key Number.
Byte
46
0
ACS Number.
Byte
47
0
1Unless specified otherwise, all data types are unsigned.
2For simplification, the non-numeric example values are for C applications.
3For files with variable-length records, the logical record length refers only to the fixed-length portion of the record.

Calling the Create Operation

The Create operation (14) requires the following values:

In C, the API call would be as follows:

Example 5-1 Create Operation
/* define the data buffer structures */ 
typedef struct 
{ 
 BTI_SINT recLength; 
 BTI_SINT pageSize; 
 BTI_SINT indexCount; 
 BTI_CHAR reserved[4]; 
 BTI_SINT flags; 
 BTI_BYTE dupPointers; 
 BTI_BYTE notUsed; 
 BTI_SINT allocations; 
} FILE_SPECS; 
typedef struct 
{ 
 BTI_SINT position; 
 BTI_SINT length; 
 BTI_SINT flags; 
 BTI_CHAR reserved[4]; 
 BTI_CHAR type; 
 BTI_CHAR null; 
 BTI_CHAR notUsed[2]; 
 BTI_BYTE manualKeyNumber; 
 BTI_BYTE acsNumber; 
} KEY_SPECS; 
typedef struct 
{ 
 FILE_SPECS fileSpecs; 
 KEY_SPECS keySpecs[2]; 
} FILE_CREATE_BUF; 
/* populate the data buffer */ 
FILE_CREATE_BUF databug; 
memset (databuf, 0, size of (databuf)); /* 
initialize databuf */ 
dataBuf.recLength = 72; 
dataBuf.pageSize = 4096; 
dataBuf.indexCount = 2; 
dataBuf.keySpecs[0].position = 1; 
dataBuf.keySpecs[0].length = 25; 
dataBuf.keySpecs[0].flags = EXTTYPE_KEY + 
NOCASE_KEY + DUP + MOD; 
dataBuf.keySpecs[0].type = ZSTRING; 
dataBuf.keySpecs[1].position = 52; 
dataBuf.keySpecs[1].length = 4; 
dataBuf.keySpecs[1].flags = EXTTYPE_KEY; 
dataBuf.keySpecs[1].type = INTEGER; 
/* create the file */ 
strcpy((BTI_CHAR *)keyBuf, 
"c:\\sample\\sample2.mkd"); 
dataLen = sizeof(dataBuf); 
status = BTRV(B_CREATE, posBlock, &dataBuf, 
&dataLen, keyBuf, 0); 

Create Index Operation

If you create files with pre-defined keys, the indexes are populated with each insert, update or delete. This is necessary for most database files. However, there is a class of database files that are fully populated before being read. These include temporary sort files and fully populated files that are delivered as part of a product.

For these files, it may be faster to build the keys with Create Index (31) after the records are written. The file should be created with no index defined so that inserting records can be accomplished faster. Then the Create Index operation will sort the keys and build the indexes in a more efficient manner.

Indexes created this way are also more efficient and provide faster access. There is no need for the MicroKernel to leave empty space at the end of each page during a Create Index (31) because the index pages are loaded in key order. Each page is filled to nearly 100%. In contrast, when an Insert (2) or Update (3) operation fills an index page, the page is split in half, with half of the records being copied to a new page. This process causes the average key page to be about 50 to 65 percent full. If index balancing is used, it may be 65 to 75 percent full.

Another advantage of the Create Index (31) operation is that all the new key pages created are located close together at the end of the file. For large files, this means less distance for the read head to cover between reads.

This technique may not be faster when files are small, such as a few thousand records. The file would also need larger key fields to benefit. Moreover, if all key pages can fit into the Btrieve cache, this method shows no improvement in speed. But if only a small percentage of the key pages are in cache at any one time, this method saves a lot of extra key page writes. A file containing a million records can be built in hours instead of days with this technique. The greater the number of key pages in the file, the faster it is to build indexes with Create Index (31) than it is one record insert at a time.

In summary, the critical thing to avoid in loading a Btrieve file from scratch is not enough cache memory to hold all the key pages. If this is the case, use Create (14) to create the file without indexes and use Create Index (31) when all the data records have been inserted.

See Pervasive.SQL API Programmer's Reference for detailed information on these operations.


Chapter contents
Publication contents

Prev topic: Understanding MicroKernel Data Files
Next topic: Calculating the Logical Record Length