V.42 bis sıkıştırma yönteminin gerçekleşme ve başarım incelenmesi

Implementation of V.42 bis compression procedure and performance results

Tez No: 39358
Yazar: OSMAN ALİEFENDİOĞLU
Danışmanlar: PROF.DR. A. EMRE HARMANCI
Tez Türü: Yüksek Lisans
Konular: Elektrik ve Elektronik Mühendisliği, Electrical and Electronics Engineering
Anahtar Kelimeler: Belirtilmemiş.
Yıl: 1992
Dil: Türkçe
Üniversite: İstanbul Teknik Üniversitesi
Enstitü: Fen Bilimleri Enstitüsü
Ana Bilim Dalı: Belirtilmemiş.
Bilim Dalı: Belirtilmemiş.
Sayfa Sayısı: 49

Özet

ÖZET Son yıllarda veri saklama sistemlerini ve haberleşme hatlarından birim zamanda iletilecek verinin arttırılması amacı ile pekçok algoritma geçtirilmiş ve uygulanmıştır. Özellikle artan bilgisayar sistemlerinin ve buna bağlı olarak bilgisayar ağlarının yaygınlaşması verilerin aktarımında veri sıkıştırmanın hız ve ekonomik bakımdan ne kadar önemli olduğunu ortaya koymaktadır. Haberleşme sistemleri gibi veri üetiminin gerçek zamanda yapıldığı sistemlerde kullanılacak algoritma, verilerin istatistiksel özelliklerinden, tipinden bağımsız ve hızlı, kullanıcıya ek bir program yükü gerektirmeyecek şekilde; kısaca şeffaf olmalıdır. Bu amaçla CCITT, bilgisayar sistemlerinin telefon hatlarından haberleşmesini sağlayan modemlerde kullanılmak üzere temeli Lempel-Ziv algoritmasına dayanan V.42bis veri sıkıştırma algoritmasını önermiştir. Tezde veri sıkıştırma algoritmaları incenmiş ve V.42bis algoritması yazılım ile gerçeklenerek değişik tipdeki dosyalar üzerindeki başarımı ölçülmüştür. Ayrıca algoritma, günümüzde yaygın kullanım alanına sahip LZW(Lempel-Ziv- Welch) ve adaptif Huffman kodlamasıyla, sıkıştırma başarımları yönünden karşılaştırılmıştır. iv

Özet (Çeviri)

SUMMARY IMPLEMENTATION OF V.42bis COMPRESSION PROCEDURE AND PERFORMANCE RESULTS Recently, computer system development leads to large-scale information transfer and uses of massive information storage systems. Increase in the number of computer users and database application requires additional storage systems and communication facilities. Concurrent with this growth, some problem arises due to the economical reasons. Instead of adding extra storage devices and expanding existing communication facilities, we can reduce amount of stored or transmitted data by means of data compression. When data compression is used in storage systems, overall program execution time may be reduced. This is because the reduction in storage leads to reduction of disc-access attempts. Although encoding and decoding data requires additional program instructions being executed, since the execution time of a group of program instructions is normally less than time required to access and transfer data from storage devices, overall program execution time may be reduced In the transmission of data, compression provides several benefits besides the cost savings associated with sending less data over the telephone network. Compression can reduce the probabilities of transmission errors occurring since fewer characters are transmitted moreover by converting text that's represented by a code such as ASCII into a different code, compression algorithms can provide data encryption for security. For data communication, the transfer of compressed data over a network increases the effective rate of information transfer, thought the actual data transfer rate per second remains the same. Data compression can be implemented by hardware, software or firmware.Nowadays data compression for communication is implemented in modems. Figure 1 shows an expanded view of the modern high speed modem device and where fits in communication system. The DTE (i.e., a computer terminal) exchanges data with the modem by a standard interchange circuit (the EIA RS-232D). The signal converter provides the modulation and demodulation of the data signals. Between the interchange circuits and the signal converter, the control function illustrated in Figure 1 provides the overall control and coordination between data compression and error control function. This encompasses a number of specific functions, including establishment of an initial handshake, negotiation of special parameters that require agreement by both ends of link. Error control function performs reliable data transfer over telephone network consisting of many errors producing phenomena. Data compression function performs data compression. Since this function is very sensitive to error occurring, it is necessary to use high reliable error control. These are all standardized by CCITT (International Telegraph And Telephone Consultative Committee). Error correction is done in V.42 and data compression in V.42bis CCITT Recommendation. Computer Terminal \ \ \ s \ < NETWORK v < - > \ Asynchronous \ DTE ) < > Figure 1. A modem link using data compression and error control VIData stored on disk and tapes or transferred over communication links in computer systems generally contains significant redundancy. Data compression is a technique that reduces these redundancies to lessen amount of stored or transmitted data. There are several data compression algorithms. These algorithms are classified as Static Compression Algorithms and Adaptive Compression Algorithms. Although static algorithms never adapt itself dynamically to data variation and require prior knowledge of data to be compressed, adaptive ones require no prior knowledge of data and adapt itself to data variation during operation. If compression will be automatic and transparent (in communication), adaptive algorithms much more efficient than the static ones. STATIC COMPRESSION ALGORITHMS Null Suppression Null or blank suppression is one of the earliest static data compression techniques developed. As name implies, null suppression is a compression technique that scans a data stream for repeated blank or nulls and special ordered pair of characters to compress data stream. First one indicates that data compression is started, second one indicates number of repeated null or blank characters in data. Bit Mapping This technique operates effectively with data types consisting of large proportion of a specific character, such as blank, null etc. To code data stream a bit map character is used. By using this bit map character appended in front of the encoded string, presence or absence of special character can be indicated and hereby reduce the size of the data stream. Run Length Run length encoding is a data compression method that physically reduces any type of repeating character sequence. As in the null suppression, run length encoding requires special character that compression has occurred This character followed by one of the repeated character and a count character that signifies the number of times the repeated character occurred in the sequence. VIIHalf-Byte Packing This method supplies good compression performance under some situation in which data has the character consisting of same half byte. For instance, ASCII and EBCDIC codes has this proper for numerical characters that have the same first half byte. In the compression format, first character indicates the number of characters compressed followed by two numeric packed into one character. Diatomic Encoding Diatomic encoding performs compression by assigning a special character to frequently used character pair in the data stream. It requires a frequently used character pair table that prepared in advance by using prior knowledge of data. Pattern Substitution Here, a special character code is substituted for a predifined character pattern. This method is highly advantageous when source programs and other data type files containing known repeating patterns are transmitting. It requires pattern table that has frequently used patterns and corresponding codes. Relative Encoding This compression technique is effectively employed when there are sequences of runs in the original data stream that vary slightly from each other or the run sequences can be broken into patterns relative to each other. According to this compression algorithm, only differences in data stream are sent instead of sending the whole data. Huffman Coding Huffman coding is a technique, the employment of which can reduce the average length of the codes representing the characters of an alphabet. Its encoding procedure uses frequency of occurrence of input symbols. These method converts fixed size input data to variable length codes by using coding tree. A translation table is needed to store symbols and their codewords. For n bit symbols, this table has 2n entries. Such a table requires little memory size but limits the degree of compression. If the data is varied from prior knowledge, this variation leads to poor compression performance. In other VIIIwords, at the beginning of the compression, the translation table is formed according to statistical analysis of data and since the compression is done by this table performance decreases if the symbol distribution differs from expected. ADAPTIVE COMPRESSION ALGORITHMS Adaptive Huffman Coding Compression is performed as in the Huffman coding but translation table is formed from a limited portion of input data adaptively. But here, the variation of input symbols' distribution problem is still exist as in the Huffman coding since the preparation of table never guaranty any change frequency distribution of remaining input data. To remove these difficulties Truly Adaptive Huffman coding is used. This algorithm uses a count field besides original compression table. This field is continuously updated and used to resequance the entries in the table. The updating of the field occurs after a character in the data stream is matched with an entry. Then the count field comparison performs and based upon this result the character and its count value is repositioned in the table. This technique ensures that whenever the composition of the data changes, the compression table changes in tandem and provides the most efficient statistical compression possible. Lempel-Ziv Compression Algorithm The Lempel-Ziv algorithm doesn't need to know anything about statistic of the data being compressed, and encoding is done in one pass of the data. This algorithm achieves compression via string matching. It matches the current portions of the data string to the earlier portion of the data string. If an appropriate match is found, then the position and the length of that match are used as the codeword. If no match is found, then uncoded data symbols are used as the codeword. A one bit flag is used to indicate whether or not a match is found. Lempel-Ziv-Welch. Compression Algorithm(LZW) Like the original Lempel-Ziv algorithm, LZW achieves compression by matching the current portion of the data string to earlier portions of the data string. However, unlike the original Lempel-Ziv algorithm, the LZW parses the previous data string into substrings, assigns a codeword to each substring, and allows only matches that are found in this set of substrings. The dictionary is used to store substrings and corresponding codes. IXV.42bis Compression Algoritfam V.42bis compression algorithm makes use of the LZW algorithm for string matching and dictionary forming. But as a difference, an independent tree is constructed for string sets initialized with the same octet. In that way 256 different trees are defined for the dictionary, where each tree has an octet as the root node. There are two operation modes in V.42bis. In the“transparent mode”the incoming and the outgoing octets are the same although the trees are still formed using these octets. In the“compressed mode”codewords that are assigned to the nodes of the trees are used as outgoing bit string. Each node corresponds to an incomed octet string. The length of the related codeword depends on the dictionary size (minimum 9 bits). Initially, each tree in the dictionary consists of root node only corresponding to the beginning octets of the incoming string. V.42bis is initialized in transparent mode, string matching and dictionary updating procedure is applied to the incoming strings as described in the LZW algorithm. Codeword assignment operation is based here upon an up counter, named CI, value of which ranges from 259 to s-1, where s is the dictionary size. This counter contains the codeword to be assigned to the next new string. Its preset value is 259, since the first three codewords are reserved for the control codes and the rest 256 for ASCII characters. As long as the dictionary is not full, codeword assignment to the new incoming string (i.e., adding a node) is performed by incrementing the current value of CI. When the dictionary becomes full, infrequently used strings are deleted to reuse their codewords as described in dictionary management schemes. During this operation a compressibility test is performed periodically. If this test shows that the compressed mode should increase the performance, the operation mode may be changed. When in compressed mode, a mode change to the transparent mode is also possible on the results of the compressibility test. In V.42bis the string lengths may be defined among 6 and 250. Short string length may be preferred as this decision results a decrease in the run time of the algorithm. But this selection may also degrade the compression performance.Dictionary Management Schemes To improve compression performance, different dictionary management schemes have been developed. Essentially, there are three different schemes in the literature. The first one is the scheme as described in the LZW algorithm in which once the dictionary fills, no addition is made to and compression performance is calculated. If it drops under a threshold, the dictionary is reset to its initial condition. The second one is the LRU method. In this method, a linked list is held to order nodes by recency of use. The least recently used node is removed and its codeword is assigned to the new string. This scheme leads to computational overhead and usage of an additional memory. The third one is the SWAP method in which two dictionaries are maintained. When the primary dictionary fills, insertion is made in the second one while the encoding continues based upon the contents of the primary one. The roles of the dictionaries are swapped when the second one fills and the old dictionary is reset. V.42bis uses a different scheme. When CI counter reaches its maximum value (i.e., when the dictionary is full) the recovery procedure begins to search for the new value of CI starting with the preset value 259. If the codeword 259 was assigned to a leaf node, CI remains unchanged, else the nodes corresponding to next codewords are examined successively and codeword of the first leaf node is assigned to CI. This leaf node is detached from its parent and deleted. This procedure is repeated until the character stream is ended. V.42bis Compression Performance On Different Data Types As expected, the performance is for text type data much higher than for machine codes (50%-70% compared to 30%-35%). Also the increasing of string length and dictionary size effects the performance on text type data much more positively than for other types of data. A dictionary size of 1K-2K input entities may be considered as optimal especially if the data type is not know in advance. XIPerformance Comparison Of Algorithms The compression performances of LZW, Adaptive Huffman and V.42bis are compared on different data types. The UNIX utility Pack(l) to derive the performance of Adaptive Huffman Coding is used. As this utility performs the compression on a constant dictionary size, the results obtained for Adaptive Huffman Coding are the lowest. For dictionary sizes smaller than 4K, V.42bis supplies better performance than LZW or Huffman coding do. It's also interesting to observe that the performance of V.42bis for 1K-2K dictionary sizes can be achieved by LZW for much larger dictionary size. XII

Benzer Tezler

Tez No
455999
Veri sıkıştırma: Sözlük kullanan sıkıştırma algoritmaları için yazılım ve donanma önerileri; V.42bis sıkıştırma algoritmasını gerçek zamanda işleyebilen donanım tasarımı
Başlık çevirisi yok
RİFAT ÇÖLKESEN
Doktora
Türkçe
1995
Elektrik ve Elektronik Mühendisliği Çukurova Üniversitesi
Elektrik-Elektronik Mühendisliği Ana Bilim Dalı
DOÇ. DR. BÜLENT ÖRENCİK
Tez No
56386
Karadeniz'deki (Sinop) ağ kafeslerde yetiştirilen gökkuşağı alabalığının (Oncorhynchus mykiss) gelişme ve tem değerlendirmesine farklı yemleme düzeylerinin etkileri
Başlık çevirisi yok
SERAP USTAOĞLU
Yüksek Lisans
Türkçe
1996
Su Ürünleri Ondokuz Mayıs Üniversitesi
Su Ürünleri Yetiştiriciliği Ana Bilim Dalı
DOÇ. DR. RECEP BİRCAN
Tez No
275267
[N,N´-bis(5-metoksi-salisiliden)-etilendiamin] ligandı kullanılarak zeytin yağından bakır, nikel ve demir ekstraksiyonu ve tayini
Extraction and determination of cupper, nickel and iron using by [N,N´-bis(5-methoxy-saliclydene)-ethylenediamine] ligand from olive oil
HASAN YALÇIN ERGİN
Yüksek Lisans
Türkçe
2010
Kimya Balıkesir Üniversitesi
Kimya Ana Bilim Dalı
YRD. DOÇ. DR. SEMA BAĞDAT YAŞAR
Tez No
185701
X-ışınları kırınımı ile C22H10N4O2, C36H42O3, C14H11N2O2Cl, C16H16N2O3 ve C22H16N4O organik moleküllerinin kristal yapılarının incelenmesi
Investigation of crystal structure of C22H10N4O2, C36H42O3, C14H11N2O2Cl, C16H16N2O3 and C22H16N4O organic molecules by X-ray diffraction
ÖZLEM DEVECİ
Yüksek Lisans
Türkçe
2006
Fizik ve Fizik Mühendisliği Ondokuz Mayıs Üniversitesi
Fizik Ana Bilim Dalı
YRD. DOÇ. DR. ŞAMİL IŞIK
Tez No
723567
N-(Bis(2,4-dimetoksibenzil)karbamotiyoil)-(R)-benzamid (R: o-, m- ve p-nitro) bileşiklerinin sentezi, karakterizasyonu ve yapısal özelliklerinin Teorik ve deneysel olarak incelenmesi
Synthesis, characterization, and theoretical and experimental investigation of the structural properties of N-(bis(2,4-dimethoxybenzyl)carbamothioyl)-(R)-benzamide (R: o-, m- and p-nitro) compounds
BİRDAL ARSLAN
Yüksek Lisans
Türkçe
2022
Kimya Mersin Üniversitesi
Nanoteknoloji ve İleri Malzemeler Ana Bilim Dalı
DR. ÖĞR. ÜYESİ GÜN BİNZET

Geri Dön