Hybrid Wavelet-Fractal Codec

1.0 INTRODUCTION
Current compression standards are based on the discrete cosine transform (DCT), scalar quantified, block based motion prediction/compensation and variable length Huffman coding. The MPEG-4 standard is based on next generation techniques, such as, subband coding, model-based coding, and knowledge-based coding. While wavelet transforms for data compression for streaming video have been tried by several organizations, their performance has not yet demonstrated the full extend of expected major technological improvement. Objective The objective of this project is to conduct research into several promising hybrid wavelet-fractal transforms (DWFT) for an advanced compression/decompression (codec) algorithm for streaming video over the Internet.

2.0 COMPARISON OF CODEC STANDARDS
Streaming video (and audio) across networks is an effort that is attracting many participants. A key characteristic of both the commercial products and research demonstrators is the diversity in technological infrastructure, such as, networks, protocols, compression standards supported. All the commercial video products are optimized for low bandwidth modem or ISDN connections and are not designed to scale to higher bandwidth networks. The video needs to be pre-encoded with the target audience in mind. The commercial products have either adopted/developed their own proprietary standards, embraced the currently accepted standards (e.g. MPEG) or implemented a combination of the two. Compatibility between the commercial products has been limited because of these proprietary standards.

The list below outlines the main characteristics of codec standards:

Web Codecs Sorenson Video - high-quality WWW video RealVideo G2 with SVT - main video codec for RealVideo H261 - low-quality videoconferencing H263 (also known as I263) - medium-quality videoconferencing MPEG-4 - high-quality WWW video CD-ROM/DVD-ROM/Kiosk/
Presentation Codecs Cinepak - medium-quality CD-ROM video, Sorenson Video - high-quality CD-ROM video; Eidos Escape - high-quality CD-ROM video; requires high datarates Indeo 3 - medium-quality CD-ROM video Indeo Video Interactive (4, 5) - high-quality CD-ROM video Apple Video - very fast, but low-quality MPEG-1 - high-quality CD-ROM video; requires special hardware or fast computer MPEG-2 - high-quality DVD-ROM video; requires special hardware

Apple Animation - from HD, allows lossless fullscreen playback on very high-end systems Hardware/Capture Codecs Media 100 - codec allows files to be used without capture hardware VideoVision Studio - codec allows files to be used without capture hardware Avid Media Composer - codec allows files to be used without capture hardware Truevision - codec allows files to be used without capture hardware DV - format where digitizing is done by the camera Apple Component Video - for capture on systems without JPEG hardware Editing/Storage/

Special-purpose Codecs Motion-JPEG (MJPEG) - general-purpose video editing & storage Apple Animation - lossless storage Photo-JPEG - used at 100% as a storage/transfer format Audio Codecs G.723 - standards-based speech for videoconferencing IMA - 4:1 compression (usually for CD-ROM) MPEG layer III audio - high-quality WWW music QDesign Music Codec - high-quality WWW music at low datarates Qualcomm PureVoice - speech at 14.4 modem datarates RealAudio - several inter-related codecs for WWW audio Windows Media Audio - high-quality WWW music Research on Internet codecs has broadly taken two directions: Discrete Cosine Transform (DCT) based and non-DCT based. DCT based video delivery, except for MPEG 2, possesses no inherent scalability.

To achieve adaptivity various operations can be applied to the compressed data stream to reduce its bit rate. Amongst these operations is transcoding, the conversion of one compression standard to another. The beauty of the DCT based approach is that it is compatible with current and imminent draft compression standards. Furthermore it allows re-use of existing audio and video archives without explicitly re-coding them to cater for all possible formats. Non-DCT based compression techniques, e.g. layered, sub-band, wavelet etc., are intrinsically scalable. This is their great attraction. Unfortunately although several codecs exist, they are still experimental in nature and often suffer from performance problems. In addition, existing movie libraries would need to be re-coded. The research reviewed to date broadly fall into two categories, one group is developing scalable video codecs mainly using sub band coding. The other group is looking at scalable video in the context of QoS.

There is consensus in the research community that the key to efficient delivery of continuous media over heterogeneous networks is dynamic bandwidth adaption. DCT-based Filtering Methods H.261, H.263, MPEG1 and MPEG2 are all motion-compensated DCT-based schemes. The different filtering methods which can be applied to DCT-based compressed video include, frame dropping filters, and the hierarchical splitting filter provide in-line adaptive services. All other filter mechanisms provide in-line translative services. MPEG-1 Scalability Temporal scalability is possible in MPEG1 by dropping B frames and possibly P frames. B-frames can have references to past and future frames but aren't used as references themselves. It is possible to decode a sequence at a lower temporal resolution by simply skipping B frames. However because B frames are the most efficiently compressed, only small savings are obtained by omitting them. After the B farmes, P frames can also be dropped, again with relatively small savings. This will leave a stream of I-frames only. MPEG-2 Scalability Of the standard codecs, scalability is only addressed in MPEG-2. Three techniques, namely spatial scalability, data partitioning and SNR scalability, can be used.

Spatial scalability refers to an approach where the original picture is first decomposed into several lower spatial resolutions. Each resolution is encoded with its own motion-compensated coding loop. Blocks in the higher-resolution layers can be predicted either using motion-compensated temporal prediction or from spatially interpolated blocks of a lower resolution layer. Spatial scalability has attracted considerable interest in the potential of its application to HDTV transmission. Data partitioning splits encoded data into two bit streams - one containing the more important basic data (e.g. low frequency DCT coefficients and motion vectors) and one containing the less important data. A viewable picture of lower visual quality can be decoded from the more important stream. This technique is used for transmitting MPEG2 over ATM networks. Less important packets are discarded first if congestion occurs. SNR scalability allows for encoding of a base layer and an enhancement layer at the same spatial resolution (frame size).

The base layer contains coarsely quantized DCT coefficients. The enhancement layer carries information from which finely quantized DCT coefficients can be obtained. SNR scalability is similar to data partitioning and can be used for transmitting MPEG-2 over ATM. There are a number of problems associated with MPEG-2 scalability: A spatial scalability layer increases hardware costs by around 30% There is a loss in picture quality of ~0.5dB for multi-layer scalable MPEG2 compared with single layer coding of the same bit rate. Data partitioning and SNR scalability can cause a drift problem and should only be used if the loss of the higher bit-rate layer lasts for only a few seconds or if I-frames are sent more frequently in the low bit-rate layer Typically a combination of the spatial and SNR scalability is applied to create 3-layer coding of the video signal: the base layer provides the initial resolution, an additional spatial enhancement layer allows for upsampling and hence increase in frame size of the base layer an SNR enhancement layer provides for an increase in visual quality of the base+spatial enhancement layers of the video

3.0 COMPRESSION OF DATA
Compression of data is a process of reducing redundant information, either by a lossless, or lossy, process. Methods of compression include; 1/. run Length Encoding (RLE) (encoding based on numerical counting of the same repeated item), Statistical (encoding based upon assigning short codes to the most frequently occurring events) and Dictionary based. Many method rely on adaptive means to modify the applied method based on the particular file characteristics being compressed. The Shannon Fano coding is n symbols of know probability divided into multi-subsets of equal probability. Huffman coding is similar to Shannon-Fano, but instead of top to bottom, Huffman uses a code tree from the bottom up. Image Compression Segmentation & Edge Detection The edges of an image contain important information about the image. The edge tells where the objects are, their shape, size, and texture. Edge detection is the first step in segmentation.

Image segmentation, a field of image analysis, is used to group pixels into regions to determine an image's composition. The simplest and quickest edge detector determine the maximum value from a series of pixel subtractions. Gradient and second order derivatives produce contour and localization. Edge detection in color images depend on luminance discontinuity. Full color information can provide segmentation with coding of the chrominance information which can be done a extremely high compression (1000:1). In contrast to widespread coding techniques like MPEG-1 and 2, that work with rectangular blocks effects, such as blocking, second generation techniques concentrate on objects instead of blocks. The temporal stability of segmentation is a major importance in predictive coding and tracking. Focus and motion measurements are taken from high frequency data and edges, but intensity measurements are taken from low frequency data and object interiors.

Moving foreground can be segmented from stationary foreground and moving or stationary background. Typically, the foreground contains important information. The background can be transmitted less frequently. Integration of cues improves accuracy in segmenting complex scenes, and produces a sensor fusion. Huffman Coding In 1952, D. Huffman published an optimized variable length coding technique. The length of the encoded character is inversely proportional to that character's frequency. Huffman codes are similar to Morse code in that frequently used letters are assigned short codes. In 1977, the next step was taken with LZW using strings of characters. Arithmetic Coding It takes the complete data stream and outputs one specific code word as a floating point number between 0 and 1. Vector Quantization Vector quantization, like JPEG, breaks an image into blocks, or vectors, of n x n pixels.

4.0 DISCRETE WAVELET TRANSFORMS
Wavelet theory is a new form of applied mathematics. This technology has found application in many areas including; acoustics, seismic analysis, crystallography, quantum mechanics, and image compression. See Wavelet Organization Discrete wavelet transforms are like DCT's in that they will decompose an image into coefficients assigned to basic functions. The DCT is limited to cosine functions that are computationally intensive. Wavelets use a wider range of simpler functions resulting in less complex calculations. The basic compression idea is the the DWT is computed and the resulting coefficients are compared with a threshold value. The compression results from packing the information into a smaller number of coefficients. The non-zero coefficients produce a lossless encoding scheme. High compression is possible with no noticeable difference in quality. Video compression is essentially a three-dimensional reduction scheme for a sequence of images with intraframe and interframe coding.

The intraframe coding reduces correlation between pixels while the interframe coding applies the difference between successive frames in a sequence. Wavelet-based subband coding for intraframe coding produces better visual quality and higher compression ratios. The first commercial 10 bit proprietary wavelet codec was released for capture cards in March 1998. The majority of scalable video codecs are based on subband coding techniques of which the most widely used is the wavelet transform. VDONet and Vxtreme use the Wavelet codecs. There is also a lot of work going on in research organisations looking at the application of wavelet and subband coding techniques to scalable video codecs. The MPEG4 standard is directly related to this content-based scalable video codec approach.

Research Projects on Scalable Video Codecs Below are descriptions of some of the major research projects currently investigating the problem of adaptive video scaling. Lancaster Filter System Software known as the Lancaster Filter System (LFS) exists to demonstrate some of the filtering mechanisms. The LFS is implemented in C, with two interfaces written in Tcl/Tk. One interface allows explicit control of a specific filter's operations, the other is a mock client that incorporates a modified Berkeley MPEG 1 software decoder that is network aware. The filter system relies on an associated underlying QoS protocol suite. A component, the Continuous Media Protocol(CMP) provides an application level framing service. The LFS operates on MPEG 1 compressed video streams and transcodes MJPEG to MPEG I-frames only. Filter demonstrations are also available on the Web. The filtering concept approaches the idea of intelligent network agents. Each network node has a large repository of operations that can be deployed in the most appropriate place and combined when necessary.

Manual intervention is minimised. Support for multicasting is not stated. Transcoding, frame interleaving and intra-frame mixing filters however, have only been implemented as file based entities, i.e. they are providing source based services. Consequently, the implementation's efficiency may not be optimal, as the processing time constraints applicable to network based entities is no longer critical. A possible attitude towards these filters types is of format conversion and presentation rather than bandwidth adaption. Intra-coded only translation is another limitation of several filter types. Colour to Monochrome, DC Colour, Low Pass Re-quantisation and transcoding is only applicable to I-frames. Wavelet Strategic Research Programme, introduced a highly scalable video compression system for very low bit rate videoconferencing and telephony applications around 10-30 Kbps. They incorporate a high degree of video scalability into the codec by combining the layered/progressive coding strategy with the concept of embedded resolution block coding. With scalable algorithms, only one original compressed video bit stream is generated. Different subsets of the bit stream can then be selected at the decoder to support a multitude of display specifications such as bit rate, quality level, spatial resolution, frame rate, decoding hardware complexity, and end-to-end coding delay.

The proposed video codec also allows precise bit rate control at both the encoder and decoder, and this can be achieved independently of the other video scaling parameters. Such a scheme is very useful for both constant and variable bit rate transmission over mobile communication channels, as well as video distribution over heterogeneous multicast networks. Simulations demonstrated comparable objective and subjective performance when compared to the ITU-T H.263 video coding standard, while providing both multirate and multiresolution video scalability. Groups are studying a control scheme for a rate scalable video codec. They are investigating a wavelet based video codec with motion compensation used to reduce temporal redundancy. The prediction error frames are encoded using an embedded zerotree wavelet (EZW) approach which allows data rate scalability. Since motion compensation is used in the algorithm, the quality of the decoded video may decay due to the propagation of errors in the temporal domain. An adaptive motion compensation scheme has been developed to address this problem. They show that by using a control scheme the quality of the decoded video can be maintained at any data rate. In addition they have done a lot of research on other DCT, wavelet and fractal compression techniques optimized for video compression. Because many aspects of visual perception can be best understood in the frequency domain, and because visual perception is spatio- temporally local, the use of joint spatiotemporal/ spatiotemporal- frequency representations is a promising approach. Motion Compensation DWT In recent years, a fundamental goal of image sequence compression has been to reduce the bit-rate for transmission while retaining image quality. Compression is achieved through reductions in spatial and temporal dimensions. Motion compensation discrete cosine transform (MCDCT) coding schemes have been successfully utilized.

The basic idea of motion compensation coding is using the corresponding content of the previous decode frame to be the prediction of the current frame. Using adaptive motion compensation in conjunction with DWT offers an opportunity to optimize a wavelet codec expressly for hypervideo object tracking. Wavelet Products Available Compression Engine 1.2 for Windows NT/95 from Compression Engines produces a Wavelet Image Format (WIF) file and is now available. This program will allow professionals from a wide variety industries to utilize WIF technology. It is useful for evaluating the potential for including the WIF compression engine directly in software applications.

5.0 FRACTALS
Fractal compression is radically different. It stores instructions or formulas for creating the image. Various research groups are investigating the application of fractal compression to scalable video. Iterated Systems have developed a commercial product which has been implemented within Progressive Network's RealVideo product. Image Segmentation and Object-based Video Coding. A number of research groups are investigating the application of image segmentation to video compression. The approaches involve extracting important subsets of the image content of each frame and only delivering the most important e.g. object boundaries, moving objects. Object-based coding can achieve very high data compression rates while maintaining an acceptable visual quality in the decoded images. However object-based coders are computationally intensive and to be viable as a real time process, an object-based coder would need to have the image segmentation algorithm implemented as a VLSI array.

6.0 HYBRID WAVELET-FRACTAL TRANSFORMS (DWFT)
Wavelet Image Compression Subband decomposition, using a local wavelet quadrature filter, transforms images from their real viewing space to an equivalent new spectrum variable space. The local image redundancies then become apparent for extraction. A wavelet subband compression algorithm consists of two major components: the image data decorrelting transform and the data symbol entropy coding. The most efficient order for encoding entries is called embedded zero tree. The most important advantage of wavelet compression over other systems is that it is a variable manipulation where the gains are reflected in two geometric aspects. Fractal compression is related to wavelet compression through the process of destination-reference matching of tree branches.

Hybrid combinations of wavelet-fractal compression are based on geometric properties. viewing the Zero Tree as a vector tree of the Quadtree gives the following algorithm: Run a wavelet subband coding to the desired image quality. Add a fractal code to each subband branch end. Improve the fractual representation by merging fractal codes. Slice the image data based on the data resolution priorities. Entropy pack the residual image as the last slice if lossless compression is required. An other hybrid wavelet-fractal scheme uses wavelet transforms (Haar) to simplify fractal compression, by first transforming an image into four quarter-size subband images using wavelet transform and then compresses the low-low image using standard fractal method and compresses the other three subband images by matching reference blocks from the low-low image. Temporal reduncy is unque to video compression resulting from fractal similarity on the time axis. Unfortunately, 3-dimensional fractal image compression algorithm is unrealistic on today's computer capabilities.

7.0 CONCLUSIONS
Streaming video (and audio) across networks is an effort that is attracting many participants. A key characteristic of both the commercial products and research demonstrators is the diversity in technological infrastructure, such as, networks, protocols, compression standards supported. All the commercial video products are optimised for low bandwidth modem or ISDN connections and are not designed to scale to higher bandwidth networks. The video needs to be pre-encoded with the target audience in mind. The commercial products have either adopted/developed their own proprietary standards, embraced the currently accepted standards (e.g. MPEG) or implemented a combination of the two. Compatibility between the commercial products has been limited because of these proprietary standards.

However, recent products such as Sun's MediaFramework API and MicroSoft's NetShow have been designed to enable new and various codecs to be easily incorporated into their framework. H.263 and MPEG-4 are going to become the defacto standards for video delivery over low bandwidths. But broadband standards such as MPEG-1 and MPEG-2, which are useful for many types of broadcast and CD-ROM applications, are unsuitable for the Internet. Although MPEG-2 has had scalability enhancements, these will not be exploitable until the availability of reasonably priced hardware encoders and decoders which support scalable MPEG2. Codecs designed for the Internet require greater bandwidth scalability, lower computational complexity, greater resilience to network losses, and lower encode/decode latency for interactive applications. These requirements imply codecs designed specifically for the diversity and heterogeneity of Internet delivery. The research on Internet codecs has broadly taken two directions. DCT based and non-DCT based. DCT based video delivery, except for MPEG 2, possesses no inherent scalability.

To achieve adaptivity various operations can be applied to the (semi) compressed data stream to reduce its bit rate. Amongst these operations is transcoding, the conversion of one compression standard to another. The beauty of the DCT based approach is that it is compatible with current and imminent draft compression standards. Furthermore it allows re-use of existing audio and video archives without explicitly re-coding them to cater for all possible formats. Existing viewers also maintain their currency. Non-DCT based compression techniques, e.g. layered, sub-band, wavelet etc., are intrinsically scalable. This is their great attraction. Unfortunately although several CODECs exist, they are still experimental in nature and often suffer from performance problems. In addition, existing movie libraries would need to be re-coded, by no means a trivial task. The research projects reviewed in this chapter broadly fall into two categories, one group is developing scalable video CODECs mainly using sub band coding. The other group is looking at scalable video in the context of QoS. There is consensus in the research community that the key to efficient delivery of continuous media over heterogeneous networks is dynamic bandwidth adaption. A hybrid wavelet-fractal scheme uses wavelet transforms (Haar) to simplify fractal compression, by first transforming an image into four quarter-size subband images using wavelet transform and then compresses the low-low image using standard fractal method and compresses the other three subband images by matching reference blocks from the low-low image. Temporal reduncy is unque to video compression resulting from fractal similarity on the time axis. Unfortunately, 3-dimensional fractal image compression algorithm is unrealistic on today's computer capabilities.