Kısmi-eşleme erişimi ve buna uygun dosya yapıları

Efficient file structures for partial-match retrieval

Tez No: 21728
Yazar: OĞUZHAN ÖZTAŞ
Danışmanlar: DOÇ. DR. MİTHAT UYSAL
Tez Türü: Yüksek Lisans
Konular: Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Computer Engineering and Computer Science and Control
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 1992
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 94

Özet

ÖZET Kısmi-Esleme erişini çeşitli anaçlar için ihtiyaç duyulan bir erisin yöntemidir, özellikle ofis otomasyonunda çok kullanılır. Bu erisin üzerine çeşitli yöntemler vardır. Bunlardan bazılarını su şekilde sıralayabiliriz ; 1.- indekslenmis Tanımlayıcı Dosyalar 2.- Tersine Çevrilmiş Dosyalar 3.- tnza Dosyaları 4.- Genisletilebilir Hash (Ext. Hashing) 5.- Tanımlayıcı ve Hash Biz burada bu yapılardan her birini genel hatlarıyla anlatıp, örneklerle daha iyi anlaşılmalarını sağladıktan sonra bu teknikleri genel ve basit bir yapı üzerinde birbiri ile mukayese ederek aralarında en iyi sonuç vereni saptamaya çalıştık. Sinülasyon deneyleri kısmındaki tablo değerleri ve grafik üzerindeki gösterinden de anlaşılacağı gibi indekslenmiş tanımlayıcı dosya yapısı bizce diğer yapılar içerisinde en iyi performansı sağlamıştır.

Özet (Çeviri)

S U H M A R Y EFFICIENT FILE STRUCTURES FOR PARTIAL-HATCH RETRIEVAL Partial-match retrieval is a problem involving searching on secondary keys. A number of approaches have been suggested. One way to tackle the problem is to search the entire file in response to each query, using a string or a regular expression based pattern matching algorithm. Although regular expressions provide considerable flexibility in posing queries, particularly those concerned with text, the requirement of having to search through the entire file to answer each query limits this technique to files that are not too large, unless queries can be batched or special hardware is available. A number of alternative schemes have been suggested to reduce the amount of information that needs to be searched to answer a query. One such class of schemes associates with each record a short representative code word. Searches can then be performed over the shorter code word file to determine which records are potential answers to a given query (or, equivalently, to exclude records that cannot possible be answers). Another common approach is to construct an inverted index list for each field. A query can be answered by intersecting the index lists for each specified field. Some of the methods for partial-match retrieval are shown in the following case: 1.- Extendible Hashing 2.- Indexed Descriptor Files 3.- Signature Files - vi i -1. -Review of Extendible Hashing and Its Appication to Partial Hatch Retrieval The file is contained in a number of pages ( or buckets). There are two kinds of pages. Leaf pages contain the records themselves and directory pages contain the directory. The directory is organized as a linear array of m=2'1 entries where d is called the global depth of the directory. m is always a power of two and changes (doubled or halved) in response to the changes in the volume of the file. Each entry in the directory contains a pointer to a leaf page. Each leaf page has a header that contains its local depth d*. For a leaf page with local depth d'» there are 2lA~A' ' directory entries pointing to it. Fig. 1 shows a file organization when d=3. Fig. 1-- An extendible hashing file with global depth=3. - viii -Associated with the file is a hashing function h:KEY - >D where KEY is the domain of the primary key of the file, D={0, 1, 2, 3,..., 2c-l >, and 2*,“”c is the maximum allowable size of the directory. To locate or insert a record with primary key value V, we calculate the pseudokey V as the first d bits of h(V). V is then used as an index into the directory. The indexed directory entry contains a pointer to the leaf page where the record will be found or inserted. When inserting a record into a full leaf page with local depth d', d'< d, we split the bucket into two buckets, distribute the records between the two buckets according to their pseudokey values, and then change the pointers in the appropriate directory entries. However, when inserting a record into a full leaf page with local depth d'=d, we double the size of the directory; global depth d is then increased by one. Each directory entry becomes two complement directory entries with identical pointer values. Now the overflow leaf page can be split similarly as before. Lloyd and Ramamohanarao extended this single-key file organization to handle partial-match retrievals in the following way. Let n be the number of fields in the record structure of the file. With each field of the record structure, associate a hashing function ht:Ft - >B4, 1=1, 2,..., n where F* is the key space of the ith field and Bi is the pseudokey space of a certain length nu. The index of a record Cri, ra,..., r“) into the directory is assembled using the pseudokeys hı(rı), h2(r2),..., h”(r“) and the choice vector (..., i3, iz,ii) If i4=p in the choice vector, then the jth (rightmost) bit in the index should come from the pth pseudokey h_,(r»). The choice vector has important consequences on the performance of the file. It dynamically adapts to the current state of the system according to the distribution of occurrences of each field in the record structure in the file operations performed thus far. This is done by deferring the calculation of ij until the directory size grows to 2j. The procedure, called Minimal Marginal Increase (MMI), used to calculate the choice vector, and the theory behind it are discussed in C73. - ix -The operations of insertion, searching, bucket splitting, directory doubling, etc., are then similar to the single-key case. Example 1: Let the global depth = 5, record R = (vx, v2, v3, v*) and the choice vector = (..., 4, 2, 2, 3, 2). Let h(Vi) = a”... aiao, h(vz) = b»... bzbib©, h(v3) = cm... CiCo, and h(v«) = dm... dido uhere am... aia0, bm... bib©, Cm... CiCo, and dm... dido are binary numbers with a sufficiently large number m of digits. Then the address of record E = d©babıc0bo. 2.- Indexed Descriptor Files Pfaltz et al. applied the technique of disjoint coding to a file structure called the indexed descriptor file. The idea behind the technique is to speed up retrieval of records by encoding information about the records in an efficient manner. Basically, the infor mation in a single record is represented by a descriptor word and the descriptor words of all records stored in one bucket (on disk) are bitwise OR'd together to form a bucket descriptor D». The directory of an indexed descriptor file is just the collection of all bucket descriptors. A descriptor D, of a record r = (ri, r2|...» r*) where râ?D4 for l. There are k functions H4:D«. - >?1, 2,...,m4} for ISiik. In a descriptor D«- of a record r = (ri, ra,..., r», ), the H±(r4)th bit in F* is set to 1 (the remaining w4-l bits are set to 0). There are exactly k bits set to 1 in a record descriptor. A descriptor D^ for a query q = (Ai = a», A2 = az,..., A* = a*) is a bit string of w bits with the H4(at)th bit in F4 set to 1 if a* = * for l

Benzer Tezler

Tez No
660472
Türkiye'de su hakkı
The right to water in Turkey
YILDIZ AKEL ÜNAL
Doktora
Türkçe
2021
Hukuk Galatasaray Üniversitesi
Kamu Hukuku Ana Bilim Dalı
PROF. DR. ERDOĞAN BÜLBÜL
Tez No
832104
Ebeveynlerin dijital ebeveynlik yeterliliklerine yönelik mobil bir öneri sistemi geliştirilmesi ve değerlendirilmesi
Development and evaluation of a mobile recommendation system for parents' digital parenting competences
YILDIZ ÖZAYDIN AYDOĞDU
Doktora
Türkçe
2023
Eğitim ve Öğretim Gazi Üniversitesi
Bilgisayar ve Öğretim Teknolojileri Eğitimi Ana Bilim Dalı
PROF. DR. SİBEL SOMYÜREK
Tez No
827860
Derin öğrenme ve büyük veri analitiği yöntemleriKullanarak Covid-19 yayılımının ileriye dönük tahmini
Forecasting the spread of covid-19 using deep learning and big data analytics methods
CYLAS KIGANDA
Yüksek Lisans
İngilizce
2023
Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol Gazi Üniversitesi
Bilgisayar Bilimleri Ana Bilim Dalı
PROF. DR. MUHAMMET ALİ AKCAYOL
Tez No
885989
Vergilendirmeden kaynaklanan uyuşmazlıklarda yatırım tahkimi
Investment arbitration in tax-related investment disputes
ALEYNA KALENDER
Yüksek Lisans
Türkçe
2024
Hukuk Galatasaray Üniversitesi
Özel Hukuk Ana Bilim Dalı
DR. ÖĞR. ÜYESİ BALCA ÇELENER
Tez No
421257
Exploring the potential of online platforms in product design: A study on integrating craftsmen into systems of design
Ürün tasarımında çevrimiçi platformların potansiyeli: Zanaatkarların tasarım sistemlerine entegrasyonu üzerine bir çalışma
BARIŞ GÜMÜŞTAŞ
Yüksek Lisans
İngilizce
2015
Endüstri Ürünleri Tasarımı İstanbul Teknik Üniversitesi
Endüstri Ürünleri Tasarımı Ana Bilim Dalı
DOÇ. DR. ŞEBNEM TİMUR ÖĞÜT

Geri Dön