Hybrid Wavelet-Fractal Codec
1.0 INTRODUCTION
Current compression standards are based on the
discrete cosine transform (DCT), scalar quantified, block based motion
prediction/compensation and variable length Huffman coding. The MPEG-4 standard
is based on next generation techniques, such as, subband coding, model-based
coding, and knowledge-based coding. While wavelet transforms for data
compression for streaming video have been tried by several organizations, their
performance has not yet demonstrated the full extend of expected major
technological improvement. Objective The objective of this project is to conduct
research into several promising hybrid wavelet-fractal transforms (DWFT) for an
advanced compression/decompression (codec) algorithm for streaming video over
the Internet.
2.0 COMPARISON OF CODEC STANDARDS
Streaming video (and audio) across networks is
an effort that is attracting many participants. A key characteristic of both the
commercial products and research demonstrators is the diversity in technological
infrastructure, such as, networks, protocols, compression standards supported.
All the commercial video products are optimized for low bandwidth modem or ISDN
connections and are not designed to scale to higher bandwidth networks. The
video needs to be pre-encoded with the target audience in mind. The commercial
products have either adopted/developed their own proprietary standards, embraced
the currently accepted standards (e.g. MPEG) or implemented a combination of the
two. Compatibility between the commercial products has been limited because of
these proprietary standards.
The list below
outlines the main characteristics of codec standards:
Web Codecs Sorenson Video
- high-quality WWW video RealVideo G2 with SVT - main video codec for RealVideo
H261 - low-quality videoconferencing H263 (also known as I263) - medium-quality
videoconferencing MPEG-4 - high-quality WWW video
CD-ROM/DVD-ROM/Kiosk/
Presentation Codecs Cinepak - medium-quality CD-ROM video,
Sorenson Video - high-quality CD-ROM video; Eidos Escape - high-quality CD-ROM
video; requires high datarates Indeo 3 - medium-quality CD-ROM video Indeo Video
Interactive (4, 5) - high-quality CD-ROM video Apple Video - very fast, but
low-quality MPEG-1 - high-quality CD-ROM video; requires special hardware or
fast computer MPEG-2 - high-quality DVD-ROM video; requires special hardware
Apple Animation - from HD, allows lossless fullscreen playback on very high-end
systems Hardware/Capture Codecs Media 100 - codec allows files to be used
without capture hardware VideoVision Studio - codec allows files to be used
without capture hardware Avid Media Composer - codec allows files to be used
without capture hardware Truevision - codec allows files to be used without
capture hardware DV - format where digitizing is done by the camera Apple
Component Video - for capture on systems without JPEG hardware
Editing/Storage/
Special-purpose Codecs Motion-JPEG (MJPEG) - general-purpose
video editing & storage Apple Animation - lossless storage Photo-JPEG - used at
100% as a storage/transfer format Audio Codecs G.723 - standards-based speech
for videoconferencing IMA - 4:1 compression (usually for CD-ROM) MPEG layer III
audio - high-quality WWW music QDesign Music Codec - high-quality WWW music at
low datarates Qualcomm PureVoice - speech at 14.4 modem datarates RealAudio -
several inter-related codecs for WWW audio Windows Media Audio - high-quality
WWW music Research on Internet codecs has broadly taken two directions: Discrete
Cosine Transform (DCT) based and non-DCT based. DCT based video delivery, except
for MPEG 2, possesses no inherent scalability.
To achieve adaptivity various
operations can be applied to the compressed data stream to reduce its bit rate.
Amongst these operations is transcoding, the conversion of one compression
standard to another. The beauty of the DCT based approach is that it is
compatible with current and imminent draft compression standards. Furthermore it
allows re-use of existing audio and video archives without explicitly re-coding
them to cater for all possible formats. Non-DCT based compression techniques,
e.g. layered, sub-band, wavelet etc., are intrinsically scalable. This is their
great attraction. Unfortunately although several codecs exist, they are still
experimental in nature and often suffer from performance problems. In addition,
existing movie libraries would need to be re-coded. The research reviewed to
date broadly fall into two categories, one group is developing scalable video
codecs mainly using sub band coding. The other group is looking at scalable
video in the context of QoS.
There is consensus in the research community that
the key to efficient delivery of continuous media over heterogeneous networks is
dynamic bandwidth adaption. DCT-based Filtering Methods H.261, H.263, MPEG1 and
MPEG2 are all motion-compensated DCT-based schemes. The different filtering
methods which can be applied to DCT-based compressed video include, frame
dropping filters, and the hierarchical splitting filter provide in-line adaptive
services. All other filter mechanisms provide in-line translative services.
MPEG-1 Scalability Temporal scalability is possible in MPEG1 by dropping B
frames and possibly P frames. B-frames can have references to past and future
frames but aren't used as references themselves. It is possible to decode a
sequence at a lower temporal resolution by simply skipping B frames. However
because B frames are the most efficiently compressed, only small savings are
obtained by omitting them. After the B farmes, P frames can also be dropped,
again with relatively small savings. This will leave a stream of I-frames only.
MPEG-2 Scalability Of the standard codecs, scalability is only addressed in
MPEG-2. Three techniques, namely spatial scalability, data partitioning and SNR
scalability, can be used.
Spatial scalability refers to an approach where the
original picture is first decomposed into several lower spatial resolutions.
Each resolution is encoded with its own motion-compensated coding loop. Blocks
in the higher-resolution layers can be predicted either using motion-compensated
temporal prediction or from spatially interpolated blocks of a lower resolution
layer. Spatial scalability has attracted considerable interest in the potential
of its application to HDTV transmission. Data partitioning splits encoded data
into two bit streams - one containing the more important basic data (e.g. low
frequency DCT coefficients and motion vectors) and one containing the less
important data. A viewable picture of lower visual quality can be decoded from
the more important stream. This technique is used for transmitting MPEG2 over
ATM networks. Less important packets are discarded first if congestion occurs.
SNR scalability allows for encoding of a base layer and an enhancement layer at
the same spatial resolution (frame size).
The base layer contains coarsely
quantized DCT coefficients. The enhancement layer carries information from which
finely quantized DCT coefficients can be obtained. SNR scalability is similar to
data partitioning and can be used for transmitting MPEG-2 over ATM. There are a
number of problems associated with MPEG-2 scalability: A spatial scalability
layer increases hardware costs by around 30% There is a loss in picture quality
of ~0.5dB for multi-layer scalable MPEG2 compared with single layer coding of
the same bit rate. Data partitioning and SNR scalability can cause a drift
problem and should only be used if the loss of the higher bit-rate layer lasts
for only a few seconds or if I-frames are sent more frequently in the low
bit-rate layer Typically a combination of the spatial and SNR scalability is
applied to create 3-layer coding of the video signal: the base layer provides
the initial resolution, an additional spatial enhancement layer allows for
upsampling and hence increase in frame size of the base layer an SNR enhancement
layer provides for an increase in visual quality of the base+spatial enhancement
layers of the video
3.0 COMPRESSION OF DATA
Compression of data is a process of reducing redundant
information, either by a lossless, or lossy, process. Methods of compression
include; 1/. run Length Encoding (RLE) (encoding based on numerical counting of
the same repeated item), Statistical (encoding based upon assigning short codes
to the most frequently occurring events) and Dictionary based. Many method rely
on adaptive means to modify the applied method based on the particular file
characteristics being compressed. The Shannon Fano coding is n symbols of know
probability divided into multi-subsets of equal probability. Huffman coding is
similar to Shannon-Fano, but instead of top to bottom, Huffman uses a code tree
from the bottom up. Image Compression Segmentation & Edge Detection The edges of
an image contain important information about the image. The edge tells where the
objects are, their shape, size, and texture. Edge detection is the first step in
segmentation.
Image segmentation, a field of image analysis, is used to group
pixels into regions to determine an image's composition. The simplest and
quickest edge detector determine the maximum value from a series of pixel
subtractions. Gradient and second order derivatives produce contour and
localization. Edge detection in color images depend on luminance discontinuity.
Full color information can provide segmentation with coding of the chrominance
information which can be done a extremely high compression (1000:1). In contrast
to widespread coding techniques like MPEG-1 and 2, that work with rectangular
blocks effects, such as blocking, second generation techniques concentrate on
objects instead of blocks. The temporal stability of segmentation is a major
importance in predictive coding and tracking. Focus and motion measurements are
taken from high frequency data and edges, but intensity measurements are taken
from low frequency data and object interiors.
Moving foreground can be segmented
from stationary foreground and moving or stationary background. Typically, the
foreground contains important information. The background can be transmitted
less frequently. Integration of cues improves accuracy in segmenting complex
scenes, and produces a sensor fusion. Huffman Coding In 1952, D. Huffman
published an optimized variable length coding technique. The length of the
encoded character is inversely proportional to that character's frequency.
Huffman codes are similar to Morse code in that frequently used letters are
assigned short codes. In 1977, the next step was taken with LZW using strings of
characters. Arithmetic Coding It takes the complete data stream and outputs one
specific code word as a floating point number between 0 and 1. Vector
Quantization Vector quantization, like JPEG, breaks an image into blocks, or
vectors, of n x n pixels.
4.0 DISCRETE WAVELET TRANSFORMS
Wavelet theory is a new form of applied
mathematics. This technology has found application in many areas including;
acoustics, seismic analysis, crystallography, quantum mechanics, and image
compression. See Wavelet Organization Discrete wavelet transforms are like DCT's
in that they will decompose an image into coefficients assigned to basic
functions. The DCT is limited to cosine functions that are computationally
intensive. Wavelets use a wider range of simpler functions resulting in less
complex calculations. The basic compression idea is the the DWT is computed and
the resulting coefficients are compared with a threshold value. The compression
results from packing the information into a smaller number of coefficients. The
non-zero coefficients produce a lossless encoding scheme. High compression is
possible with no noticeable difference in quality. Video compression is
essentially a three-dimensional reduction scheme for a sequence of images with
intraframe and interframe coding.
The intraframe coding reduces correlation
between pixels while the interframe coding applies the difference between
successive frames in a sequence. Wavelet-based subband coding for intraframe
coding produces better visual quality and higher compression ratios. The first
commercial 10 bit proprietary wavelet codec was released for capture cards in
March 1998. The majority of scalable video codecs are based on subband coding
techniques of which the most widely used is the wavelet transform. VDONet and
Vxtreme use the Wavelet codecs. There is also a lot of work going on in research
organisations looking at the application of wavelet and subband coding
techniques to scalable video codecs. The MPEG4 standard is directly related to
this content-based scalable video codec approach.
Research Projects on Scalable
Video Codecs Below are descriptions of some of the major research projects
currently investigating the problem of adaptive video scaling. Lancaster Filter
System Software known as the Lancaster Filter System (LFS) exists to demonstrate
some of the filtering mechanisms. The LFS is implemented in C, with two
interfaces written in Tcl/Tk. One interface allows explicit control of a
specific filter's operations, the other is a mock client that incorporates a
modified Berkeley MPEG 1 software decoder that is network aware. The filter
system relies on an associated underlying QoS protocol suite. A component, the
Continuous Media Protocol(CMP) provides an application level framing service.
The LFS operates on MPEG 1 compressed video streams and transcodes MJPEG to MPEG
I-frames only. Filter demonstrations are also available on the Web. The
filtering concept approaches the idea of intelligent network agents. Each
network node has a large repository of operations that can be deployed in the
most appropriate place and combined when necessary.
Manual intervention is
minimised. Support for multicasting is not stated. Transcoding, frame
interleaving and intra-frame mixing filters however, have only been implemented
as file based entities, i.e. they are providing source based services.
Consequently, the implementation's efficiency may not be optimal, as the
processing time constraints applicable to network based entities is no longer
critical. A possible attitude towards these filters types is of format
conversion and presentation rather than bandwidth adaption. Intra-coded only
translation is another limitation of several filter types. Colour to Monochrome,
DC Colour, Low Pass Re-quantisation and transcoding is only applicable to
I-frames. Wavelet Strategic Research Programme, introduced a highly scalable
video compression system for very low bit rate videoconferencing and telephony
applications around 10-30 Kbps. They incorporate a high degree of video
scalability into the codec by combining the layered/progressive coding strategy
with the concept of embedded resolution block coding. With scalable algorithms,
only one original compressed video bit stream is generated. Different subsets of
the bit stream can then be selected at the decoder to support a multitude of
display specifications such as bit rate, quality level, spatial resolution,
frame rate, decoding hardware complexity, and end-to-end coding delay.
The
proposed video codec also allows precise bit rate control at both the encoder
and decoder, and this can be achieved independently of the other video scaling
parameters. Such a scheme is very useful for both constant and variable bit rate
transmission over mobile communication channels, as well as video distribution
over heterogeneous multicast networks. Simulations demonstrated comparable
objective and subjective performance when compared to the ITU-T H.263 video
coding standard, while providing both multirate and multiresolution video
scalability. Groups are studying a control scheme for a rate scalable video
codec. They are investigating a wavelet based video codec with motion
compensation used to reduce temporal redundancy. The prediction error frames are
encoded using an embedded zerotree wavelet (EZW) approach which allows data rate
scalability. Since motion compensation is used in the algorithm, the quality of
the decoded video may decay due to the propagation of errors in the temporal
domain. An adaptive motion compensation scheme has been developed to address
this problem. They show that by using a control scheme the quality of the
decoded video can be maintained at any data rate. In addition they have done a
lot of research on other DCT, wavelet and fractal compression techniques
optimized for video compression. Because many aspects of visual perception can
be best understood in the frequency domain, and because visual perception is
spatio- temporally local, the use of joint spatiotemporal/ spatiotemporal-
frequency representations is a promising approach. Motion Compensation DWT In
recent years, a fundamental goal of image sequence compression has been to
reduce the bit-rate for transmission while retaining image quality. Compression
is achieved through reductions in spatial and temporal dimensions. Motion
compensation discrete cosine transform (MCDCT) coding schemes have been
successfully utilized.
The basic idea of motion compensation coding is using the
corresponding content of the previous decode frame to be the prediction of the
current frame. Using adaptive motion compensation in conjunction with DWT offers
an opportunity to optimize a wavelet codec expressly for hypervideo object
tracking. Wavelet Products Available Compression Engine 1.2 for Windows NT/95
from Compression Engines produces a Wavelet Image Format (WIF) file and is now
available. This program will allow professionals from a wide variety industries
to utilize WIF technology. It is useful for evaluating the potential for
including the WIF compression engine directly in software applications.
5.0 FRACTALS
Fractal compression is radically different. It stores instructions
or formulas for creating the image. Various research groups are investigating
the application of fractal compression to scalable video. Iterated Systems have
developed a commercial product which has been implemented within Progressive
Network's RealVideo product. Image Segmentation and Object-based Video Coding. A
number of research groups are investigating the application of image
segmentation to video compression. The approaches involve extracting important
subsets of the image content of each frame and only delivering the most
important e.g. object boundaries, moving objects. Object-based coding can
achieve very high data compression rates while maintaining an acceptable visual
quality in the decoded images. However object-based coders are computationally
intensive and to be viable as a real time process, an object-based coder would
need to have the image segmentation algorithm implemented as a VLSI array.
6.0 HYBRID WAVELET-FRACTAL TRANSFORMS (DWFT)
Wavelet Image Compression Subband
decomposition, using a local wavelet quadrature filter, transforms images from
their real viewing space to an equivalent new spectrum variable space. The local
image redundancies then become apparent for extraction. A wavelet subband
compression algorithm consists of two major components: the image data
decorrelting transform and the data symbol entropy coding. The most efficient
order for encoding entries is called embedded zero tree. The most important
advantage of wavelet compression over other systems is that it is a variable
manipulation where the gains are reflected in two geometric aspects. Fractal
compression is related to wavelet compression through the process of
destination-reference matching of tree branches.
Hybrid combinations of
wavelet-fractal compression are based on geometric properties. viewing the Zero
Tree as a vector tree of the Quadtree gives the following algorithm: Run a
wavelet subband coding to the desired image quality. Add a fractal code to each
subband branch end. Improve the fractual representation by merging fractal
codes. Slice the image data based on the data resolution priorities. Entropy
pack the residual image as the last slice if lossless compression is required.
An other hybrid wavelet-fractal scheme uses wavelet transforms (Haar) to
simplify fractal compression, by first transforming an image into four
quarter-size subband images using wavelet transform and then compresses the
low-low image using standard fractal method and compresses the other three
subband images by matching reference blocks from the low-low image. Temporal
reduncy is unque to video compression resulting from fractal similarity on the
time axis. Unfortunately, 3-dimensional fractal image compression algorithm is
unrealistic on today's computer capabilities.
7.0 CONCLUSIONS
Streaming video (and audio) across networks is an effort that is
attracting many participants. A key characteristic of both the commercial
products and research demonstrators is the diversity in technological
infrastructure, such as, networks, protocols, compression standards supported.
All the commercial video products are optimised for low bandwidth modem or ISDN
connections and are not designed to scale to higher bandwidth networks. The
video needs to be pre-encoded with the target audience in mind. The commercial
products have either adopted/developed their own proprietary standards, embraced
the currently accepted standards (e.g. MPEG) or implemented a combination of the
two. Compatibility between the commercial products has been limited because of
these proprietary standards.
However, recent products such as Sun's
MediaFramework API and MicroSoft's NetShow have been designed to enable new and
various codecs to be easily incorporated into their framework. H.263 and MPEG-4
are going to become the defacto standards for video delivery over low
bandwidths. But broadband standards such as MPEG-1 and MPEG-2, which are useful
for many types of broadcast and CD-ROM applications, are unsuitable for the
Internet. Although MPEG-2 has had scalability enhancements, these will not be
exploitable until the availability of reasonably priced hardware encoders and
decoders which support scalable MPEG2. Codecs designed for the Internet require
greater bandwidth scalability, lower computational complexity, greater
resilience to network losses, and lower encode/decode latency for interactive
applications. These requirements imply codecs designed specifically for the
diversity and heterogeneity of Internet delivery. The research on Internet
codecs has broadly taken two directions. DCT based and non-DCT based. DCT based
video delivery, except for MPEG 2, possesses no inherent scalability.
To achieve
adaptivity various operations can be applied to the (semi) compressed data
stream to reduce its bit rate. Amongst these operations is transcoding, the
conversion of one compression standard to another. The beauty of the DCT based
approach is that it is compatible with current and imminent draft compression
standards. Furthermore it allows re-use of existing audio and video archives
without explicitly re-coding them to cater for all possible formats. Existing
viewers also maintain their currency. Non-DCT based compression techniques, e.g.
layered, sub-band, wavelet etc., are intrinsically scalable. This is their great
attraction. Unfortunately although several CODECs exist, they are still
experimental in nature and often suffer from performance problems. In addition,
existing movie libraries would need to be re-coded, by no means a trivial task.
The research projects reviewed in this chapter broadly fall into two categories,
one group is developing scalable video CODECs mainly using sub band coding. The
other group is looking at scalable video in the context of QoS. There is
consensus in the research community that the key to efficient delivery of
continuous media over heterogeneous networks is dynamic bandwidth adaption. A
hybrid wavelet-fractal scheme uses wavelet transforms (Haar) to simplify fractal
compression, by first transforming an image into four quarter-size subband
images using wavelet transform and then compresses the low-low image using
standard fractal method and compresses the other three subband images by
matching reference blocks from the low-low image. Temporal reduncy is unque to
video compression resulting from fractal similarity on the time axis.
Unfortunately, 3-dimensional fractal image compression algorithm is unrealistic
on today's computer capabilities.