·
Indexing mechanisms used to speed
up access to desired data. Index
files are typically much smaller than the original file
·
Two basic kinds of indices:
1) Ordered
indices: search keys are stored in
sorted order
2) Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
·
An
index is a small table having only two columns.
Index
structre
Search Key
|
Pointer to a data file record
|
·
An index structure is usually defined on a single
Attribute of a Relation called the search Key.
·
An Index takes as input a search Key value and returns
the address of the record(S). the
second column contains a set of pointers holding the address of the disk block
where that particular key value can be found.
·
The Search Key values stored in the index are sorted
and a binary search can be done on the index.
·
Onle a small part of the records of a relation have
to be inspected. A appropriate indexs can speed up query processing passing
from minutes to seconds.
·
The
only minor disadvantage of using index is that it takes up a little more space
than the main table. Additionally, index needs to be updated periodically for
insertion or deletion of records in the main table. However, the advantages are
so huge that these disadvantages can be considered negligible.
Types of Index
Primary
Index
In primary index, there is a one-to-one relationship between the entries in the index table and the records in the main table. A Primary index forces a sequential file Organization on the Data file. Usually search key
In primary index, there is a one-to-one relationship between the entries in the index table and the records in the main table. A Primary index forces a sequential file Organization on the Data file. Usually search key
Primary index can be of two types:
Dense index: the number of entries in the index table is the same as the number of entries in the main table. In other words, each and every record in the main table has an entry in the index. This makes searching faster but requires more space to store index records itself.
Dense index: the number of entries in the index table is the same as the number of entries in the main table. In other words, each and every record in the main table has an entry in the index. This makes searching faster but requires more space to store index records itself.
·
To locate a record with searchkey value K
Given a search key K, the Index is
scanned and when K is found the associated Pointer to the data file record is
followed and the record is read in main memory.
Sparse or Non-Dense Index:
For large tables the Dense Primary Index itself begins to grow in size. To keep the size of the index smaller, instead of pointing to each and every record in the main table, the index points to the records in the main table in a gap. See the following example.
For large tables the Dense Primary Index itself begins to grow in size. To keep the size of the index smaller, instead of pointing to each and every record in the main table, the index points to the records in the main table in a gap. See the following example.
·
To locate a record with searchkey value K
·
Find index record with largest searchkey value < K
·
Search file sequentially starting at the record to which the index record points
As you can see, the data blocks have
been divided in to several blocks, each containing a fixed number of records
(in our case 10). The pointer in the index table points to the first record of
each data block, which is known as the Anchor Record for its important
function. If you are searching for roll 14, the index is first searched to find
out the highest entry which is smaller than or equal to 14. We have 11. The
pointer leads us to roll 11 where a short sequential search is made to find out
roll 14.
Clustering
Index
•
Clustering Indexes are used when the ordering index
is not
a field where each value is unique.
•
An
entry in the clustering index is composed of a SINGLE entry for each distinct value
in the clustering field and its respective file block pointer.
sometimes we are asked to create an index on a
non-unique key, such as Dept-id. There could be several employees in each
department. Here we use a clustering index, where all employees belonging to
the same Dept-id are considered to be within a single cluster, and the index
pointers point to the cluster as a whole.
Let us explain this diagram. The
disk blocks contain a fixed number of records (in this case 4 each). The index
contains entries for 5 separate departments. The pointers of these entries
point to the anchor record of the block where the first of the Dept-id in the
cluster can be found. The blocks themselves may point to the anchor record of
the next block in case a cluster overflows a block size. This can be done using
a special pointer at the end of each block (comparable to the next pointer of
the linked list organization).
The previous scheme might become a
little confusing because one disk block might be shared by records belonging to
different cluster. A better scheme could be to use separate disk blocks for
separate clusters. This has been explained in the next page.
In this scheme, as you can see, we
have used separate disk block for the clusters. The pointers, like before, have
pointed to the anchor record of the block where the first of the cluster
entries would be found. The block pointers only come into action when a cluster
overflows the block size, as for Dept-id 2. This scheme takes more space in the
memory and the disk, but the organization in much better and cleaner looking.
Secondary Indices
- If
the search key of a secondary index is not a candidate key, it is not
enough to point to just the first record with each search-key value
because the remaining records with the same search-key value could be
anywhere in the file. Therefore, a secondary index must contain pointers
to all the records.
·
Index
record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
·
Secondary
indices have to be dense