Monday, 16 January 2017

Concept of Indexing


·         Indexing mechanisms used to speed up access to desired data. Index files are typically much smaller than the original file
·         Two basic kinds of indices:
1) Ordered indices:  search keys are stored in sorted order
2) Hash indices:  search keys are distributed uniformly across “buckets” using a      “hash function”.
·         An index is a small table having only two columns.
Index structre
Search Key
Pointer to a data file record
·         An index structure is usually defined on a single Attribute of a Relation called the search Key.
·         An Index takes as input a search Key value and returns the address of the record(S). the second column contains a set of pointers holding the address of the disk block where that particular key value can be found.
·         The Search Key values stored in the index are sorted and a binary search can be done on the index.
·         Onle a small part of the records of a relation have to be inspected. A appropriate indexs can speed up query processing passing from minutes to seconds.
·         The only minor disadvantage of using index is that it takes up a little more space than the main table. Additionally, index needs to be updated periodically for insertion or deletion of records in the main table. However, the advantages are so huge that these disadvantages can be considered negligible.
 Types of Index

                          
Primary Index
In primary index, there is a one-to-one relationship between the entries in the index table and the records in the main table. A Primary index forces a sequential file Organization on the Data file. Usually search key
 Primary index can be of two types:

Dense  index: the number of entries in the index table is the same as the number of entries in the main table. In other words, each and every record in the main table has an entry in the index.
This makes searching faster but requires more space to store index records itself.
·         To locate a record with search­key value K
Given a search key K, the Index is scanned and when K is found the associated Pointer to the data file record is followed and the record is read in main memory.


Sparse or Non-Dense Index:
For large tables the Dense Primary Index itself begins to grow in size. To keep the size of the index smaller, instead of pointing to each and every record in the main table, the index points to the records in the main table in a gap. See the following example.

·         To locate a record with search­key value K
·         Find index record with largest search­key value < K
·         Search file sequentially starting at the record to which the index record points


As you can see, the data blocks have been divided in to several blocks, each containing a fixed number of records (in our case 10). The pointer in the index table points to the first record of each data block, which is known as the Anchor Record for its important function. If you are searching for roll 14, the index is first searched to find out the highest entry which is smaller than or equal to 14. We have 11. The pointer leads us to roll 11 where a short sequential search is made to find out roll 14.


Clustering Index
         Clustering Indexes are used when the ordering index is not a field where each value is unique.
         An entry in the clustering index is composed of a SINGLE entry for each distinct value in the clustering field and its respective file block pointer.
sometimes  we are asked to create an index on a non-unique key, such as Dept-id. There could be several employees in each department. Here we use a clustering index, where all employees belonging to the same Dept-id are considered to be within a single cluster, and the index pointers point to the cluster as a whole.

Let us explain this diagram. The disk blocks contain a fixed number of records (in this case 4 each). The index contains entries for 5 separate departments. The pointers of these entries point to the anchor record of the block where the first of the Dept-id in the cluster can be found. The blocks themselves may point to the anchor record of the next block in case a cluster overflows a block size. This can be done using a special pointer at the end of each block (comparable to the next pointer of the linked list organization).
The previous scheme might become a little confusing because one disk block might be shared by records belonging to different cluster. A better scheme could be to use separate disk blocks for separate clusters. This has been explained in the next page.

In this scheme, as you can see, we have used separate disk block for the clusters. The pointers, like before, have pointed to the anchor record of the block where the first of the cluster entries would be found. The block pointers only come into action when a cluster overflows the block size, as for Dept-id 2. This scheme takes more space in the memory and the disk, but the organization in much better and cleaner looking.

Secondary Indices

  1. If the search key of a secondary index is not a candidate key, it is not enough to point to just the first record with each search-key value because the remaining records with the same search-key value could be anywhere in the file. Therefore, a secondary index must contain pointers to all the records.

·         Index record points to a bucket that contains pointers to all the actual records with that particular search-key value.
·         Secondary indices have to be dense

No comments:

Post a Comment