/****************************************************************************** * README * * COPYRIGHT (c) 1995, 1996, 1997 by David Van Wagner ALL RIGHTS RESERVED * * http://alumni.cse.ucsc.edu/~davevw/ * davevw@alumni.cse.ucsc.edu *****************************************************************************/ 1. What is BtreeMM? BtreeMM is a C++ class that implements a B-Tree data structure from a memory mapped file. It can be used to implement a database. 1.1 What is a B-Tree data structure? A data structure describes the relationship between data, not the contents of data. Data can be simple elements such as numbers or characters, or be any complex combination of simple elements such as strings, collections of similar data, and graphics. A B-Tree data structure is a balanced tree representation of data capable of containing multiple values at a single node. 1.1.1 What is a tree? The computer tree data structure is a parent, child non-cyclic graph where each child has exactly one parent (except the root which has no parent) and each parent can have one or more children. Each element of the tree is called a node whether it is a root, child, and/or parent. 1.1.1.1 What is a balanced tree? A balanced tree is where the leaf nodes (children which are not parents) are all on the same level (same distance from the root). 1.1.1.2 What are the advantages of a balanced tree? A balanced tree guarantees O(log(n)) performance, where n is the number of nodes (root, children, parents) in the tree, meaning that the time to find a node (note: tree is self-sorted or otherwise indexed) is logrithmically related to the number of nodes in the tree. In English, the more nodes you add does NOT increase search time proportionally, which would slow, but search time increases at a much, much slower rate. A full balanced tree of height 31 where all non-leaf nodes have exactly two children (this is a full binary tree) contains over 4 billion nodes, but at most only 31 comparisons are required to find a specific node. An efficient balanced tree is implemented with search, insert, and delete operations all at O(log(n)) performance. 1.1.2 Can you show me an example of a B-Tree? Here is a B-Tree of order 3 (at most 3 children) containing the numbers 1,2,4,5,6,7. Note that 1 and 2 are in the same node, and 4 and 6 are in the same root node. 4 , 6 1,2 5 7 Adding 3 to the tree causes the 1,2 node to overflow, so it is split and the median is added to its parent recursively. 4 2 6 1 3 5 7 1.1.3 What are the advantages of a B-Tree over a balanced binary tree? Inserts are simple as the root of the tree can split and a new root parent is created. Searches are almost as simple as a standard binary tree. Deletes are efficient (even though the algorithm is not simple). Disk based B-Trees can require fewer disk accesses when there are more values in the same node. 1.1.4 What are the algorithms for search, insert, and delete of a B-Tree? See "Book Title" by author. 1.2 What is a memory mapped file? It is a file that the operating system maps into memory. In other words, the storage on disk is accessable via pointers without doing implicit reads/writes to and from disk. The data can be accessed just like any other memory via pointers. The operating system swaps portions of the storage between real memory and paging space as necessary and writes changes back to disk as necessary. Another way of looking at is it can be permanent virtual memory. Memory mapped files can also be shared with other processes. 1.2.1 What are some disadvantages of a memory mapped file? The address of the memory is not guaranteed to be at a constant location from one execution of the program to another, nor even when the size of the memory mapped file is changed. Thus, data structures containing pointers into a memory mapped file become difficult to manage since the pointers may need to change each time the file is remapped. One way around it is to fixup the pointers each time the file is remapped but is expensive because this is an O(n) operation. Another way is to NOT use pointers, but use offsets from the beginning of the file instead. 2. How can I use BtreeMM? You may use BtreeMM in your personal or commercial product (giving credit to me, especially with your copyrights) free of charge to utilize the efficiency of a B-Tree algorithm on disk based files. See the public interfaces (APIs) in btreemm.h, heapmm.h, and theapmm.h. Using with Linux requires a version 1.3.xx kernel that supports memory mapped files. I have been using 1.3.53 a lot on my ThinkPad 701C. 3. What other software have you developed? Commercial software for my employers: Various Microsoft Windows projects for a software consulting company UNIX based Hospital Surgical Information System DOS based Biligual Word Processor Publication: Scrolling BASIC Editor, Compute!'s Gazzette, 1988 (Commodore 64). Freeware (not released 'cause lazy) Problem Tracking Tool v.3 (UNIX shell scripts + Informix database + sccs for revision history) ADG Language (interpreted UNIX text file database-like processing) Windows Password (simple dialog box requiring password to use system) Reverse Colors for Windows (doubles number of color sets available) Pyramid for Windows (*FULL FEATURED solitare card game) Pyramid for DOS Uncrash TSR for DOS (CTRL-ESC exits current running program) Vote Perot '92 DOS TSR (splashes message every hour) DOS TSR development system (screen read/write, timing, keyboard traps) *FULL FEATURED = includes resource allocation leaks AND nifty features C++ classes: C version of memory (not disk) B-Tree for long ints. C++ template class of memory (not disk) B-Tree. Memory mapped heap classes (included in BtreeMM). 4. Others companies make B-Tree classes. Why are you doing it? Mostly just to do it. It has been an excercise in learning C++ (still learning!...) and my first real attempt to provide freeware via the Internet, especially to try to make my contributions to the Linux community. Thanks Linus Torvalds! Free software is so free! I am interested in both the freedom and the cost. The learning required is a bonus! I got started with it when I wrote a Problem Tracking Tool in my spare time for use at work in the interest of moving from paper to on-line information storage and retreival. I rewrote it into a new version with niftier search capabilities but as a result slowed it down tremendously making it cumbersome to use. I hoped to replace a lot of the shell scripts, and the use of costly tools such as Informix SQL, and proprietary sccs (well Linux users don't have it). Implementing an efficient key/value database and using rcs (or CVS) was the strategy. Time, interest, and determination will affect the outcome. I want to make a fast implementation of B-Tree. I haven't gone the route of container classes, virtual functions, etc. because they are too slow and I'd rather go straight to the problem. I forsee some database classes built on BtreeMM and the memory mapped heap class to test and exercise its use, as well as move closer to my original goal of an efficient, well featured Problem Tracking Tool. 5. What is the status of BtreeMM? As of Fri Mar 1 06:43:27 PST 1996: TO DO PRIORITY (percent done): [ ] single user class alpha release [ 10] document [100] create working heap class [100] create templatized heap class with pointer and reference objects [ ] improve, add to heap API (any object on heap w/ tHeapMMptr & ref) [ ] port btree template class to use heap [ ] request info, find similar free packages on internet [ ] create license agreement or put under GNU or BSD-style license [ ] find alpha/beta testers on internet [ 50] upload to home web page [ ] port test suites from single user memory source [ ] real application single-user alpha release [ ] apply bug fixes, enhancements as needed [ ] real application multi-user alpha release (single user release) [ ] implement multi-user with locking [ ] implement locks via class constructor/destructor with static count [ ] to successfully support locking on top of locking [ ] invalidate memory mapped object once lock is lost or enforce lock [ ] until object destroyed? [ prefer automatic locking ] [ ] real application multi-user beta release [ ] upload to anonymous FTP site (lookup.com?) [ ] create automated test suites for heap and btree [ ] announcement to comp.os.linux.announce [ ] announcement to database programming newsgroup [ ] real application multi-user release [ ] announcement to comp.os.linux.announce [ ] announcement to database programming newsgroup [ ] add raw disk support (partition w/o file system) [ ] real application multi-user maintenance releases [ ] announcement to comp.os.linux.announce [ ] announcement to database programming newsgroup