PreviousProgrammer's Guide (9.1 revision 1) Next

Estimating File Size

Show this topic in Library frames

You can estimate the number of pages, and therefore the number of bytes required to store a file. However, when using the formulas, consider that they only approximate file size because of the way the MicroKernel dynamically manipulates pages.


Note
The following discussion and the formulas for determining file size do not apply to files that use data compression, because the record length for those files depends on the number of repeating characters in each record.

While the formulas are based on the maximum storage required, they assume that only one task is updating or inserting records into the file at a time. File size increases if more than one task updates or inserts records into the file during simultaneous concurrent transactions.

The formulas also assume that no records have been deleted yet from the file. You can delete any number of records in the file, and the file remains the same size. The MicroKernel does not deallocate the pages that were occupied by the deleted records. Rather, the MicroKernel re-uses them as new records are inserted into the file (before allocating new pages).

If the final outcome of your calculations contains a fractional value, round the number to the next highest whole number.

To estimate the size of your file, perform the following steps:

  1. Calculate the number of data pages, using the following formula (use Page size - 8 in this formula for an 8.x file format):
  2. Number of data pages = Number of records /
    ((Page size - 6) / (Physical record length) 
     

    To find the physical record length, refer to Table 5-4.

  3. Calculate the number of index pages for each defined key using one of the following formulas.
  4. For each key that does not allow duplicates or that allows repeating-duplicatable keys:

    Number of index pages =
    ( Number of records / 
    ((Page size - 12) / (Key length + 8))) * 2 
     

    For each key that allows linked-duplicatable keys:

    Number of index pages = ( Number of unique key values / ((Page size - 12) / (Key length + 12))) * 2

    The B-tree index structure guarantees at least 50 percent usage of the index pages. Therefore, the index page calculations multiply the minimum number of index pages required by 2 to account for the maximum size.

  5. If your file contains variable-length records, calculate the number of variable pages in the file and add that number to the sum from the preceding steps. To do so, use the following formula:
  6. Number of variable pages = 
    (Total number of records in the file) / 
    (Average number of records whose variable-length 
    portion fits on a single page) 
     
    

    Note
    You can gain only a very rough estimate of the number of variable pages due to the difficulty in estimating the average number of records whose variable-length portion fit on the same page.
  7. To the sum obtained in the preceding steps, add the following:
  8. 1 page for each alternate collating sequence page used (if any)

    1 page for a referential integrity (RI) page if the file has RI constraints

    This new sum represents the estimated total number of logical pages that the file will contain.

  9. Calculate the number of Page Allocation Table (PAT) pages, and add that number to the estimated number of logical pages from the preceding step. (For more information about PAT pages, refer to Page Preallocation )
  10. Every file has a minimum of two PAT pages; however, to calculate the number of PAT pages in a file, use one of the following formulas:

    For pre-8.x file formats:

    Number of PAT pages = 
    ( (Sum of pages in Steps 1 through 3) * 4) / 
    ((Page size - 8 bytes for overhead) * 2) 
     

    For 8.x or later file formats:

    Number of PAT pages = ( (Sum of pages in Steps 1 through 3) * 6) / ((Page size - 20 bytes for overhead) * 2)
  11. To the sum obtained in the preceding step, add 2 pages for the FCR pages. If you are using 8.x or later file format, also add 2 pages for the DIR (directory) pages.
  12. When using 8.x or later file formats, the pages sizes for FCR, DIR, and PAT pages are different from the normal pages sizes for data, key, and variable pages, which affects the file size. See the following table for the correct sizes of the special pages with regard to the normal page size you are using:
    Normal page
    FCR page
    DIR page
    PAT page
    # of entries in a PAT page
    512
    2048
    2048
    2048
    320
    1024
    2048
    2048
    2048
    320
    1536
    3072
    3072
    3072
    480
    2048
    4096
    4096
    4096
    640
    2560
    5120
    5120
    5120
    800
    3072
    6144
    6144
    6144
    960
    3584
    7168
    7168
    7168
    1120
    4096
    8192
    8192
    8192
    1280
  13. Finally, add the estimated size of the pool of unused pages in the file. The MicroKernel uses the pool for shadow paging. To calculate the size of the pool, use the following formula:
  14. Size of the pool of unused pages = (Number of keys + 
    1) 
     

    This formula applies if tasks execute Insert, Update, and Delete operations only outside transactions. If tasks are executing these operations inside transactions, multiply the average number of Insert, Update, and Delete operations expected in the transactions times the non-transactional figure determined by the formula. Similarly, you must further increase the estimated size of the pool of unused pages if tasks are executing simultaneous concurrent transactions.

  15. Having calculated the number of pages that the file needs, use the following formula to calculate the maximum number of bytes required to store the file:
  16. File size in bytes = Total file pages * Page size 
    

Chapter contents
Publication contents

Prev topic: Choosing a Page Size
Next topic: Optimizing Your Database