Archive-name: mpeg-faq/part2 Last-modified: 1996/06/02 Version: v 4.1 96/06/02 Posting-Frequency: bimonthly perceptual audio codecs. If you need more informations about the Noise-to- Mask-Ratio (NMR) technology, feel free to contact nmr@iis.fhg.de. Q: O.K., back to these listening tests. Come on, tell me some results. A: Well, for details you should study one of those AES papers or MPEG documents listed above. The main result is that for low bitrates (64 kbps per channel or below), Layer-3 always scored significantly better than Layer-2. Another important conclusion is the draft recommendation of the task group TG 10/2 within the ITU-R. It recommends the use of low bit- rate audio coding schemes for digital sound-broadcasting applications (doc. BS.1115). Q: Very interesting! Tell me more about this recommendation! A: The task group TG 10/2 concluded its work in October 93. The draft recommendation defines three fields of broadcast applications: - distribution and contribution links (20 kHz bandwidth, no audible impairments with up to 5 cascaded codecs) Recommendation: Layer-2 with 180 kbps per channel - emission (20 kHz bandwidth) Recommendation: Layer-2 with 128 kbps per channel - commentary links (15 kHz bandwidth) Recommendation: Layer-3 with 60 kbps for monophonic and 120 kbps for stereophonic signals Q: I see. Medium bitrates - Layer-2, low bitrates - Layer-3. What's about a bitrate of 96 kbps per channel that seems to be "somewhere in between" Layer-2 and Layer-3 domains? A: Interesting question. In fact, a total bitrate of 192 kbps for stereo music is useful for real applications, e.g. emission via satellite channels. The ITU-R required that emission codecs should score at least 4.0 on the CCIR impairment scale, even for the most critical material. At 128 kbps per channel, Dolby's AC-2, Layer-2 and Layer-3 fulfilled this requirement. Finally, Layer-2 got the recommendation mainly because of its "commonality with the distribution and contribution application". Further tests for emission were performed at 192 kbps joint-stereo coding. Layer-3 clearly met the requirements, Layer-2 fulfilled them only marginally, with doubts remaining during further tests with cascaded codecs in 1993. In the end, the task group decided to pronounce no recommendation for emission at 192 kbps. Q: Someone told me that in the ITU-R tests, there was some trouble with Layer-3, specifically on male voice in the German language. Still, Layer-3 got the recommendation for "commentary links". Can you explain that? A: Yes. For commentary links, the quality requirements for speech were to be equivalent to 14-bit linear PCM, and for music, some perceptible impairments were to be tolerated. In the test in 1992, Layer-3 was by far the only codec that fulfilled these requirements (e.g. overall monophonic, Layer-3 scored 3.6 in contrast to Layer-2 at 2.05 - and for male German speech, Layer-3 scored 4.4 in contrast to Layer-2 at 2.4). Further tests were performed in 1993 using headphones. They showed that MPEG-1 Layer-3 with monophonic speech (the test item is German male voice) at 60 kbps did not fully meet the quality requirements. The ITU decided to recommend Layer-3 and to include a temporary footnote that will be removed as soon as an improved Layer-3 codec fulfills the requirements completely, i.e. even with that well-known critical male German speech item (for many other speech items, Layer-3 has no trouble at all). Q: O.K., a Layer-2 codec at low bitrates may sound poor today, but couldn't that be improved in the future? I guess you just told me before that the encoder is not fixed in the standard. A: Good thinking! As the sound quality mainly depends on the encoder implementation, it is true that there is no such thing as a "Layer-N"- quality. So we definitely only know the performance of the reference codecs used during the international tests. Who knows what will happen in the future? What we do know now, is: Today, in MPEG-1 and MPEG-2, Layer-3 provides the best sound quality at low bitrates, by far better than Layer-2. Tomorrow, both Layers may improve. Layer-2 has been designed as a trade-off between quality and complexity, so the bitstream format allows only limited innovations. In contrast, even the current reference Layer-3- codec does not exploit all of the powerful mechanisms inside the Layer-3 bitstream format. Q: What other topics do I have to keep in mind? Tell me about the complexity of Layer-3. A: O.K. First, we have to separate between decoder and encoder, as the workload is distributed asymmetrically between them, i.e. the encoder needs much more computation power than the decoder. For a stereo Layer-3-decoder, you may either use a DSP (e.g. one DSP56002 from Motorola) or an "ASIC", like the masc-programmed DSP chip MAS 3503 C from Intermetall, ITT. Some rough requirements are: computation power around 12 MIPs Data ROM 2.5 Kwords Data RAM 4.5 Kwords Programm ROM 2 to 4 Kwords word length at least 20 bit Intermetall (ITT) estimated an overhead of around 30 % chip area for adding the necessary Layer-3 modules to a Layer-2-decoder. So you need not worry too much about decoder complexity. For a stereo Layer-3-encoder achieving reference quality, our current real- time implementations use two DSP32C (AT&T) and one DSP56002. With the advent of the 21060 (Analog Devices), even a single-chip stereo encoder comes into view. Q: Quality, complexity - what about the codec delay? A: Well, the standard gives some figures of the theoretical minimum delay: Layer-1: 19 ms (<50 ms) Layer-2: 35 ms (100 ms) Layer-3: 59 ms (150 ms) The practical values are significantly above that. As they depend on the implementation, exact figures are hard to give. So the figures in brackets are just rough thumb values - real codecs may show significant higher values. Q: For some applications, a very short delay is of critical importance: e.g. in a feedback link, a reporter can only talk intelligibly if the overall delay is below around 10 ms. Here, do I have to forget about MPEG audio at all? A: Not necessarily. In this application, broadcasters may use "N-1" switches in the studio to overcome this problem - or they may use equipment with appropriate echo-cancellers. But with many applications, these delay figures are small enough to present no extra problem. At least, if one can accept a Layer-2 delay, one can most likely also accept the higher Layer-3 delay. Q: Someone told me that, with Layer-3, the codec delay would depend on the actual audio signal, varying over the time. Is this really true? A: No. The codec delay does not depend on the audio signal.With all Layers, the delay depends on the actual implementation used in a specific codec, so different codecs may have different delays. Furthermore, the delay depends on the actual sample rate and bitrate of your codec. Q: All in all, you sound as if anybody should use Layer-3 for low bitrates. Why on earth do some vendors still offer only Layer-2 equipment for these applications? A: Well, maybe because they started to design and develop their systems rather early, e.g. in 1990. As Layer-2 is identical with MUSICAM, it has been available since summer of 1990, at latest. In that year, Layer-3 development started and could be successfully finished at the end of 1991. So, for a certain time, vendors could only exploit the already existing part of the new MPEG standard. Now the situation has changed. All Layers are available, the standard is completed, and new systems may capitalize on the full features of MPEG audio. 4. Products Q: What are the main fields of application for Layer-3? A: Simply put: all applications that need high-quality sound at very low bitrates to store or transmit music signals. Some examples are: - high-quality music links via ISDN phone lines (basic rate) - sound broadcasting via low bitrate satellite channels - music distribution in computer networks with low demands for channel bandwidth and memory capacity - music memories for solid state recorders based on ROM chips Q: What kind of Layer-3 products are already available? A: An increasing number of applications benefit from the advanced features of MPEG audio Layer-3. Here is a list of companies that currently sell Layer-3 products. For further informations, please contact these companies directly. Layer-3 Codecs for Telecommunication: - AETA, 361 Avenue du Gal de Gaulle (*) F-92140 Clamart, France Fax: +33-1-4136-1213 (Mr. Fric) (*) products announced for 1995 - Dialog 4 System Engineering GmbH, Monreposstr. 57 D-71634 Ludwigsburg, Germany Fax: +49-7141-22667 (Mr. Burkhardtsmaier) - PKI Philips Kommunikations Industrie, Thurn-und-Taxis-Str. 14 D-90411 Nuernberg, Germany Fax: +49-911-526-3795 (Mr. Konrad) - Telos Systems, 2101 Superior Avenue Cleveland, OH 44114, USA Fax: +1-216-241-4103 (Mr. Church) Speech Announcement Systems: - Meister Electronic GmbH, Koelner Str. 37 D-51149 Koeln, Germany Fax: +49-2203-1701-30 (Mr. Seifert) PC Cards (Hardware and/or Software): - Dialog 4 System Engineering GmbH, Monreposstr. 57 D-71634 Ludwigsburg, Germany Fax: +49-7141-22667 (Mr. Burkhardtsmaier) - Proton Data, Marrensdamm 12 b D-24944 Flensburg, Germany Fax: +49-461-38169 (Mr. Nissen) Layer-3-Decoder-Chips: - ITT Intermetall GmbH, Hans-Bunte-Str. 19 D-79108 Freiburg, Germany Fax: +49-761-517-2395 (Mrs. Mayer) Layer-3 Shareware Encoder/Decoder: - Mailbox System Nuernberg (MSN), Innerer Kleinreuther Weg 21 D-90408 Nuernberg, Germany Fax: +49-911-9933661 (Mr. Hanft) Shareware (version 1.50) is available for: - IBM-PCs or Compatibles with MS-DOS: L3ENC.EXE and L3DEC.EXE should work on practically any PC with 386 type CPU or better. For the encoder, a 486DX33 or better is recommended. On a 486DX2/66 the current shareware decoder performs in 1:3 real-time, and the shareware encoder in 1:14 real-time (with stereo signals sampled with 44.1 kHz). - Sun workstations: On a SPARC station 10, the decoder works in real time, the encoder performs in 1:5 real-time. For more information, refer to chapter 6. 5. Support by Fraunhofer-IIS Q: I understand that Fraunhofer-IIS has been the main developer of MPEG audio Layer-3. What can they do for me? A: The Fraunhofer-IIS focusses on applied research. Its engineers have profound expertise in real-time implementations of signal-processing algorithms, especially of Layer-3. The IIS may support a specific Layer-3 application in various ways: - detailed informations - technical consulting - advanced C sources for encoder and decoder - training-on-the-job - research and development projects on contract basis. For more informations, feel free to contact: - Fraunhofer-IIS, Weichselgarten 3 D-91058 Erlangen, Germany Fax: +49-9131-776-399 (Mr. Popp) Q: What are the latest audio demonstrations disclosed by Fraunhofer-IIS? A: At the Tonmeistertagung 11.94 in Karlsruhe, Germany, the IIS demonstrated: - real-time Layer-3 decoder software (mono, 32 kHz fs) including sound output on ProAudioSpectrum running on a 486DX2/66 - playback of Layer-3 stereo files from a CD-ROM that has been produced by Intermetall and contains Layer-3 data of up to 15 h of stereo music (among others, all Beethoven symphonies); the decoder is a small board that is connected to the parallel printer port. It mainly carries 3 chips: a PLD as data interface, the MAS 3503 C stereo decoder chip, and the ASCO Digital-Analog-Converter. The board has two cinch adapters that allow a very simple connection to the usual stereo amplifier. - music-from-silicon demonstration by using the standard 1 Mbyte EPROMs to store 1.5 minutes of CD-like quality stereo music - music link (with around 6 kHz bandwidth) via V.34 modem at 28.8 kbps and one analog phone line 6. Shareware Information The Layer 3 Shareware is copyright Fraunhofer - IIS 1994,1995. The shareware packages are available: - via anonymous ftp from fhginfo.fhg.de (153.96.1.4) You may download our Layer-3 audio software package from the directory /pub/layer3. You will find the following files: For IBM PCs: l3v150d1.txt a short description of the files found in l3v150.zip l3v150d1.zip encoder, decoder and documentation l3v150d2.txt a short description of the files found in l3v150n.zip l3v150d2.zip sample bitstreams For SUN workstations: l3v150.sun.txt short description of the files found in l3v100.sun.tar.gz l3v150.sun.tar.gz encoder, decoder and documentation l3v150bit.sun.txt short description of the files found in l3v150bit.sun.tar.gz l3v150bit.sun.tar.gz sample bitstreams - via direct modem download (up to 14.400 bps) Modem telephone number : +49 911 9933662 Name: FHG Packet switching network: (0) 262 45 9110 10290 Name: FHG (For the telephone number, replace "+" with your appropriate international dial prefix, e.g. "011" for the USA.) Follow the menus as desired. - via shipment of diskettes (only including registration) You may order a diskette directly from: Mailbox System Nuernberg (MSN) Hanft & Hartmann Innerer Kleinreuther Weg 21 D-90408 Nuernberg, Germany Please note: MSN will only ship a diskette if they get paid for the registration fee before. The registration fee is 85 Deutsche Mark (about 50 US$) (plus sales tax, if applicable) for one copy of the package. The preferred method of payment is via credit card. Currently, MSN accepts VISA, Master Card / Eurocard / Access credit cards. For details see the file REGISTER.TXT found in the shareware package. You may reach MSN also via Internet: msn@iis.fhg.de or via Fax: +49 911 9933661 or via BBS: +49 911 9933662 Name: FHG or via X25: 0262 45 9110 10290 Name: FHG (e.g. in USA, please replace "+" with "011" - via email You may get our shareware also by a direct request to msn@iis.fhg.de. In this case, the shareware is split into about 30 small uuencoded parts... SOFTWARE: MPEG Audio Layer 3 Shareware Codec and Windows Realtime Player ---------------------------------------------------------------- MPEG Audio Codec and Windows REALTIME Player from Fraunhofer IIS ---------------------------------------------------------------- Fraunhofer IIS announces l3enc/l3dec V2.00 and WinPlay3 V1.00. For high quality audio compression, the shareware l3enc/l3dec V2.00 package is available for Linux, SUN, NeXT and DOS on <URL:ftp://ftp.fhg.de/pub/layer3> Versions for SGI and HP will follow soon. The shareware package for DOS <URL:ftp://ftp.fhg.de/pub/layer3/l3v200d1.zip> includes a demo version of WinPlay3, a Windows MPEG Audio Layer 3 realtime-player. With MPEG Audio Layer 3 you can get a 12:1 compression with a CD like quality. Instead of 12 MByte / minute (stereo 44.1 kHz) you only need about 1 Mbyte / minute! More information can be found on <URL:ftp://ftp.fhg.de/pub/layer3/MPEG_Audio_L3_FAQ.html> or contact <URL:mailto:layer3@iis.fhg.de> - via direct modem download (up to 14.400 bps) Modem telephone number : +49 911 9933662 Name: FHG Packet switching network: (0) 262 45 9110 10290 Name: FHG (For the telephone number, replace "+" with your appropriate international dial prefix, e.g. "011" for the USA.) Follow the menus as desired. - via shipment of diskettes (only including registration) You may order a diskette directly from: Mailbox System Nuernberg (MSN) Hanft & Hartmann Innerer Kleinreuther Weg 21 D-90408 Nuernberg, Germany Please note: MSN will only ship a diskette if they get paid for the registration fee before. The registration fee is 85 Deutsche Mark (about 50 US$) (plus sales tax, if applicable) for one copy of the package. The preferred method of payment is via credit card. Currently, MSN accepts VISA, Master Card / Eurocard / Access credit cards. For details see the file REGISTER.TXT found in the shareware package. You may reach MSN also via Internet: msn@iis.fhg.de or via Fax: +49 911 9933661 or via BBS: +49 911 9933662 Name: FHG or via X25: 0262 45 9110 10290 Name: FHG (e.g. in USA, please replace "+" with "011" - via email You may get our shareware also by a direct request to msn@iis.fhg.de. In this case, the shareware is split into about 30 small uuencoded parts... Harald Popp Audio & Multimedia ("Music is the *BEST*" - F. Zappa) Fraunhofer-IIS-A, Weichselgarten 3, D-91058 Erlangen, Germany Phone: +49-9131-776-340 Fax: +49-9131-776-399 email: popp@iis.fhg.de P.S.: Look out for planetoid #3834! ------------------------------------------------------------------------------- ~Subject: What is MPEG-1+ ? This was a little mail-talk between harti@harti.de (Stefan Hartmann) and hgordon@system.xingtech.com. Q: What is MPEG-1+ ? It's MPEG-1 at MPEG-2 (CCIR) resolution. It will maybe be used fir TV-on-top-boxes for broadcasting or video-on-demand projects to enhance the picture quality. Q: I see. Is this a new standard ? No. MPEG-1 allows the definition of frames until 4000x4000 pixel, but that is usally not used. Q; So what's different ? I understand that the effective resolution is approximately 550 x 480. Typical datarates are 3.5Mbps - 5.5Mbps (sports programming and perhaps movies are higher). Q: Is the video quality lower than with real MPEG-2 movies ? The quality is better than cable TV, and in my area, we don't have cable. They de-interlace and compress the full frames. My understanding is that this is about 5%-10% less efficient than taking advantage of MPEG-2 interfield motion vectors. Q: If the fields are deinterlaced, do you see the interlace artifacts, so that a moving object in one field is already more into one direction, than in the other field ? Probably the TV-receiver also gives it out interlaced again to the TV- set, so this does not produce this interlace artifact like on PCs with live video windows displaing both fields.... Q: Can you record this anyhow on a VCR ? Does the SAT-Receiver have a video- output, so you can record movies to tape ? You should be able to record to tape, though they may have some record blocking hardware which has to be overcome with video stabilizing hardware. Q: What kind of realtime encoders do they use at the broadcast station ? CLI (Compression Labs) is the manufacturer, using C-Cube chipsets (10 CL-4000's per MPEG-1+ encoder). Q: Is there any written info about this MPEG-1 Plus technology available on the net ? Not that I'm aware. Maybe C-Cube has a Web site. [So it's up to you, dear reader, to find more and to tell me where it is ;o) ] Frank Gadegast, phade@powerweb.de ------------------------------------------------------------------------------- ~Subject: What is MPEG-2? MPEG-2 FAQ version 3.7 (May 11, 1995) by Chad Fogg (cfogg@chromatic.com) The MPEG (Moving Pictures Experts Group) committee began its life in late 1988 by the hand of Leonardo Chairiglione and Hiroshi Yasuda with the immediate goal of standardizing video and audio for compact discs. Over the next few years, participation amassed from international technical experts in the areas of Video, Audio, and Systems, reaching over 200 participants by 1992. By the end of the third year (1990), a syntax emerged, which when applied to code SIF video and compact disc audio samples rates at a combined coded bitrate of 1.5 Mbit/sec, approximated the perceptual quality of consumer video tape (VHS). After demonstrations proved that the syntax was generic enough to be applied to bit rates and sample rates far higher than the original primary target application, a second phase (MPEG-2) was initiated within the committee to define a syntax for efficient representation of broadcast video. Efficient representation of interlaced (broadcast) video signals was more challenging than the progressive (non-interlaced) signals coded by MPEG-1. Similarly, MPEG-1 audio was capable of only directly representing two channels of sound. MPEG-2 would introduce a scheme to decorrelate mutlichannel discrete surround sound audio. Need for a third phase (MPEG-3) was anticipated in 1991 for High Definition Television, although it was later discovered by late 1992 and 1993 that the MPEG-2 syntax simply scaled with the bit rate, obviating the third phase. MPEG-4 was launched in late 1992 to explore the requirements of a more diverse set of applications, while finding a more efficient means of coding low bit rate/low sample rate video and audio signals. Today, MPEG (video and systems) is exclusive syntax of the United States Grand Alliance HDTV specification, the European Digital Video Broadcasting Group, and the high density compact disc (lead by rivals Sony/Philips and Toshiba). What is MPEG video syntax ? MPEG video syntax provides an efficient way to represent image sequences in the form of more compact coded data. The language of the coded bits is the syntax. For example, a few tokens can represent an entire block of 64 samples. MPEG also describes a decoding (reconstruction) process where the coded bits are mapped from the compact representation into the original, raw format of the image sequence. For example, a flag in the coded bitstream signals whether the following bits are to be decoded with a DCT algorithm or with a prediction algorithm. The algorithms comprising the decoding process are regulated by the semantics defined by MPEG. This syntax can be applied to exploit common video characteristics such as spatial redundancy, temporal redundancy, uniform motion, spatial masking, etc. MPEG Myths A brief summary myths. 1. Compression Ratios over 100:1 Articles in the press and marketing literature will often make the claim that MPEG can achieve high quality video with compression ratios over 100:1. These figures often include the oversampling factors in the source video. In reality, the coded sample rate specified in an MPEG image sequence is usually not much larger than 30 times the specified bit rate. Pre-compression through subsampling is chiefly responsible for 3 digit ratios for all video coding methods, including those of the non-MPEG variety. 2. MPEG-1 is 352x240 Both MPEG-1 and MPEG-2 video syntax can be applied at a wide range of bitrates and sample rates. The MPEG-1 that most people are familiar with has parameters of 30 SIF pictures (352 pixels x 240 lines) per second and a bitrate less than 1.86 megabits/sec----a combination known as "Constrained Parameters Bitstreams". This popular interoperability point is promoted by Compact Disc Video (White Book). In fact, it is syntactically possible to encode picture dimensions as high as 4095 x 4095 and a bitrates up to 100 Mbit/sec. With the advent of the MPEG-2 specification, the most popular combinations have coagulated into Levels, which are described later in this text. The two most common are affectionately known as SIF (e.g. 352 pixels x 240 lines x 30 frames/sec), or Low Level, and CCIR 601 (e.g. 720 pixels/line x 480 lines x 30 frames/sec), or Main Level. 3. Motion Compensation displaces macroblocks from previous pictures Macroblock predictions are formed out of arbitrary 16x16 pixel (or 16x8 in MPEG-2) areas from previously reconstructed pictures. There are no boundaries which limit the location of a macroblock prediction within the previous picture, other than the edges of the picture. 4. Display picture size is the same as the coded picture size In MPEG, the display picture size and frame rate may differ from the size (resolution) and frame rate encoded into the bitstream. For example, a regular pattern of pictures in a source image sequence may be dropped (decimated), and then each picture may itself be filtered and subsampled prior to encoding. Upon reconstruction, the picture may be interpolated and upsampled back to the source size and frame rate. In fact, the three fundamental phases (Source Rate, Coded Rate, and Display Rate) may differ by several parameters. The MPEG syntax can separately describe Coded and Display Rates through sequence_headers, but the Source Rate is known only by the encoder. 5. Picture coding types (I, P, B) all consist of the same macroblocks types. All macroblocks within an I picture must be coded Intra (like a baseline JPEG picture). However, macroblocks within a P picture may either be coded as Intra or Non-intra (temporally predicted from a previously reconstructed picture). Finally, macroblocks within the B picture can be independently selected as either Intra, Forward predicted, Backward predicted, or both forward and backward (Interpolated) predicted. The macroblock header contains an element, called macroblock_type, which can flip these modes on and off like switches. macroblock_type is possibly the single most powerful element in the whole of video syntax. Picture types (I, P, and B) merely enable macroblock modes by widening the scope of the semantics. The component switches are: 1. Intra or Non-intra 2. Forward temporally predicted (motion_forward) 3. Backward temporally predicted (motion_backward) (2+3 in combination represent ⌠Interpolated■) 4. conditional replenishment (macroblock_pattern). 5. adaptation in quantization (macroblock_quantizer). 6. temporally predicted without motion compensation The first 5 switches are mostly orthogonal (the 6th is derived from the 1st and 2nd in P pictures, and does not exist in B pictures). Some switches are non-applicable in the presence of others. For example, in an Intra macroblock, all 6 blocks by definition contain DCT data, therefore there is no need to signal either the macroblock_pattern or any of the temporal prediction switches. Likewise, when there is no coded prediction error information in a Non-intra macroblock, the macroblock_quantizer signal would have no meaning. 6. Sequence structure is fixed to a specific I,P,B frame pattern. A sequence may consist of almost any pattern of I, P, and B pictures (there are a few minor semantic restrictions on their placement). It is common in industrial practice to have a fixed pattern (e.g. IBBPBBPBBPBBPBB), however, more advanced encoders will attempt to optimize the placement of the three picture types according to local sequence characteristics in the context of more global characteristics. Each picture type carries a penalty when coupled with the statistics of a particular picture (temporal masking, occlusion, motion activity, etc.). The variable length codes of the macroblock_type switch provide a direct clue, but it is the full scope of semantics of each picture type spell out the costs-benefits. For example, if the image sequence changes little from frame-to-frame, it is sensible to code more B pictures than P. Since B pictures by definition are never fed back into the prediction loop (i.e. not used as prediction for future pictures), bits spent on the picture are wasted in a sense (B pictures are like temporal spackle). Application requirements also govern picture type placement: random access points, mismatch/drift reduction, channel hopping, program indexing, and error recovery & concealment. The 6 Steps to Claiming Bogously High Compression Ratios: MPEG video is often quoted as achieving compression ratios over 100:1, when in reality the sweet spot rests between 8:1 and 30:1. Heres how the fabled greater than 100:1 reduction ratio is derived for the popular Compact Disc Video (White Book) bitrate of 1.15 Mbit/sec. Step 1. Start with the oversampled rate Most MPEG video sources originate at a higher sample rate than the "target sample rate encoded into the final MPEG bitstream. The most popular studio signal, known canonically as D-1 or CCIR 601 digital video, is coded at 270 Mbit/sec. The constant, 270 Mbit/sec, can be derived as follows: Luminance (Y): 858 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 135 Mbit/sec R-Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68 Mbit/sec B-Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68 Mbit/sec Total: 27 million samples/sec x 10 bits/sample = 270 Mbit/sec. So, our compression ratio is: 270/1.15... an amazing 235:1 !! Step 2. Include blanking intervals Only 720 out of the 858 luminance samples per line contain active picture information. In fact, the debate over the true number of active samples is the cause of many hair-pulling cat-fights at TV engineering seminars and conventions, so it is safer to say that the number lies somewhere between 704 and 720. Likewise, only 480 lines out of the 525 lines contain active picture information. Again, the actual number is somewhere between 480 and 496. For the purposes of MPEG-1s and MPEG-2s famous conformance points (Constrained Parameters Bitstreams and Main Level, respectively), the number shall be 704 samples x 480 lines for luminance, and 352 samples x 480 lines for each of the two chrominance pictures. Recomputing the source rate, we arrive at: (luminance) 704 samples/line x 480 lines x 30 fps x 10 bits/sample ~= 104 Mbit/sec (chrominance) 2 components x 352 samples/line x 480 lines x 30 fps x 10 bits/sample ~= 104 Mbit/sec Total: ~ 207 Mbit/sec The ratio (207/1.15) is now only 180:1 Step 3. Include higher bits/sample The MPEG sample precision is 8 bits. Studio equipment often quantize samples with 10 bits of accuracy. The 2-bit improvement to the dynamic range is considered useful for suppressing noise in multi-generation video. The ratio is now only 180 * (8/10 ), or 144:1 Step 4. Include higher chroma ratio The famous CCIR-601studio signal represents the chroma signals (Cb, Cr) with half the horizontal sample density as the luminance signal, but with full vertical resolution. This particular ratio of subsampled components is known as 4:2:2. However, MPEG-1 and MPEG-2 Main Profile specify the exclusive use of the 4:2:0 format, deemed sufficient for consumer applications, where both chrominance signals have exactly half the horizontal and vertical resolution as luminance (the MPEG Studio Profile, however, centers around the 4:2:2 macroblock structure). Seen from the perspective of pixels being comprised of samples from multiple components, the 4:2:2 signal can be expressed as having an average of 2 samples per pixel (1 for Y, 0.5 for Cb, and 0.5 for Cr). Thanks to the reduction in the vertical direction (resulting in a 352 x 240 chrominance frame), the 4:2:0 signal would, in effect, have an average of 1.5 samples per pixel (1 for Y, and 0.25 for Cb and Cr each). Our source video bit rate may now be recomputed as: 720 pixels x 480 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel = 124 Mbit/sec ... and the ratio is now 108:1. Step 5. Include pre-subsampled image size As a final act of pre-compression, the CCIR 601 frame is converted to the SIF frame by a subsampling of 2:1 in both the horizontal and vertical directions.... or 4:1 overall. Quality horizontal subsampling can be achieved by the application of a simple FIR filter (7 or 4 taps, for example), and vertical subsampling by either dropping every other field (in effect, dropping every other line) or again by an FIR filter (regulated by an interfield motion detection algorithm). Our ratio now becomes: 352 pixels x 240 lines x 30 fps x 8 bits/sample x 1.5 samples/pixel ~= 30 Mbit/sec !! .. and the ratio is now only 26:1 Thus, the true A/B comparison should be between the source sequence at the 30 Mbit/sec stage, the actual specified sample rate in the MPEG bitstream, and the reconstructed sequence produced from the 1.15 Mbit/sec coded bitstream. Step 6. Don▓t forget the 3:2 pulldown A majority of high-end programs originates from film. Most of the movies encoded onto Compact Disc Video were in captured and reproduced at 24 frames/sec. So, in such an image sequence, 6 out of the 30 frames every second are in fact redundant and need not be coded into the MPEG bitstream, leading to the shocking discovery that the actual soure bit rate has really been 24 Mbit/sec all along, and the compression ratio a mere 21:1 !!! Even at the seemingly modest 20:1 ratio, discrepancies will appear between the 24 Mbit/sec source sequence and the reconstructed sequence. Only conservative ratios in the neighborhood of 8:1 have demonstrated true transparency for sequences with complex spatial-temporal characteristics (i.e. rapid, divergent motion and sharp edges, textures, etc.). However, if the video is carefully encoded by means of pre-processing and intelligent distribution of bits, higher ratios can be made to appear at least artifact-free. What are the parts of the MPEG document? The MPEG-1 specification (official title: ISO/IEC 11172 Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, Copyright 1993.) consists of five parts. Each document is a part of the ISO/IEC number 11172. The first three parts reached International Standard in 1993. Part 4 reached IS in 1994. In mid 1995, Part 5 will go IS. Part 1---Systems: The first part of the MPEG standard has two primary purposes: 1). a syntax for transporting packets of audio and video bitstreams over digital channels and storage mediums (DSM), 2). a syntax for synchronizing video and audio streams. Part 2---Video: describes syntax (header and bitstream elements) and semantics (algorithms telling what to do with the bits). Video breaks the image sequence into a series of nested layers, each containing a finer granularity of sample clusters (sequence, picture, slice, macroblock, block, sample/coefficient). At each layer, algorithms are made available which can be used in combination to achieve efficient compression. The syntax also provides a number of different means for assisting decoders in synchronization, random access, buffer regulation, and error recovery. The highest layer, sequence, defines the frame rate and picture pixel dimensions for the encoded image sequence. Part 3---Audio: describes syntax and semantics for three classes of compression methods. Known as Layers I, II, and III, the classes trade increased syntax and coding complexity for improved coding efficiency at lower bitrates. The Layer II is the industrial favorite, applied almost exclusively in satellite broadcasting (Hughes DSS) and compact disc video (White Book). Layer I has similarities in terms of complexity, efficiency, and syntax to the Sony MiniDisc and the Philips Digitial Compact Cassette (DCC). Layer III has found a home in ISDN, satellite, and Internet audio applications. The sweet spots for the three layers are 384 kbit/sec (DCC), 224 kbit/sec (CD Video, DSS), and 128 Kbits/sec (ISDN/Internet), respectively. Part 4---Conformance: (circa 1992) defines the meaning of MPEG conformance for all three parts (Systems, Video, and Audio), and provides two sets of test guidelines for determining compliance in bitstreams and decoders. MPEG does not directly address encoder compliance. Part 5---Software Simulation: Contains an example ANSI C language software encoder and compliant decoder for video and audio. An example systems codec is also provided which can multiplex and demultiplex separate video and audio elementary streams contained in computer data files. As of March 1995, the MPEG-2 volume consists of a total of 9 parts under ISO/IEC 13818. Part 2 was jointly developed with the ITU-T, where it is known as recommendation H.262. The full title is: Information Technology--Generic Coding of Moving Pictures and Associated Audio. ISO/IEC 13818. The first five parts are organized in the same fashion as MPEG-1(System, Video, Audio, Conformance, and Software). The four additional parts are listed below: Part 6 Digital Storage Medium Command and Control (DSM-CC): provides a syntax for controlling VCR- style playback and random-access of bitstreams encoded onto digital storage mediums such as compact disc. Playback commands include Still frame, Fast Forward, Advance, Goto. Part 7 Non-Backwards Compatible Audio (NBC): addresses the need for a new syntax to efficiently de- correlate discrete mutlichannel surround sound audio. By contrast, MPEG-2 audio (13818-3) attempts to code the surround channels as an ancillary data to the MPEG-1 backwards-compatible Left and Right channels. This allows existing MPEG-1 decoders to parse and decode only the two primary channels while ignoring the side channels (parse to /dev/null). This is analogous to the Base Layer concept in MPEG-2 Scalable video. NBC candidates include non-compatible syntaxs such as Dolby AC-3. Final document is not expected until 1996. Part 8 10-bit video extension. Introduced in late 1994, this extension to the video part (13818-2) describes the syntax and semantics to coded representation of video with 10-bits of sample precision. The primary application is studio video (distribution, editing, archiving). Methods have been investigated by Kodak and Tektronix which employ Spatial scalablity, where the 8-bit signal becomes the Base Layer, and the 2-bit differential signal is coded as an Enhancement Layer. Final document is not expected until 1997 or 1998. [Part 8 will be withdrawn] <IMG SRC="mpeg2lay.gif"> <IMG SRC="mpeg2la2.gif"> Part 9 Real-time Interface (RTI): defines a syntax for video on demand control signals between set-top boxes and head-end servers. What is the evolution of an MPEG/ISO document? In chronological order: Abbr. ISO/Committee notation Author's notation ----- ------------------------------- ----------------------------- - Problem (unofficial first stage) barroom witticism or dare NI New work Item Napkin Item NP New Proposal Need Permission WD Working Draft We▓re Drunk CD Committee Draft Calendar Deadlock DIS Draft International Standard Doesn't Include Substance IS International Standard Induced patent Statements Introductory paper to MPEG? Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp. 47-58 MPEG in periodicals? The following journals and conferences have been known to contain information relating to MPEG: IEEE Transactions on Consumer Electronics IEEE Transactions on Broadcasting IEEE Transactions on Circuits and Systems for Video Technology Advanced Electronic Imaging Electronic Engineering Times (EE Times) IEEE Int'l Conference on Acoustics, Speech, and Signal Processing (ICASSP) International Broadcasting Convention (IBC) Society of Motion Pictures and Television Engineers Journal (SMPTE) SPIE conference on Visual Communications and Image Processing MPEG Book? Several MPEG books are under development. An MPEG book will be produced by the same team behind the JPEG book: Joan Mitchell and Bill Pennebaker.... along with Didier Le Gall. It is expected to be a tutorial on MPEG-1 video and some MPEG-2 video. Van Nostran Reinhold in 1995. A book, in the Japanese language, has already been published (ISBN: 4-7561-0247-6). The title is called MPEG by ASCII publishing. Keith Jack's second edition of Video Demystified, to be published in August 1995, will feature a large chapter on MPEG video. Information: ftp://ftp.pub.netcom/pub/kj/kjack/ MPEG is a DCT based scheme? The DCT and Huffman algorithms receive the most press coverage (e.g. "MPEG is a DCT based scheme with Huffman coding"), but are in fact less significant when compared to the variety of coding modes signaled to the decoder as context-dependent side information. The MPEG-1 and MPEG-2 IDCT has the same definition as H.261, H.263, JPEG. What are constant and variable bitrate streams? Constant bitrate streams are buffer regulated to allow continuos transfer of coded data across a constant rate channel without causing an overflow or underflow to a buffer on the receiving end. It is the responsibility of the Encoders Rate Control stage to generate bitstreams which prevent buffer overflow and underflow. The constant bit rate encoding can be modeled as a reservoir: variable sized coded pictures flow into the bit reservoir, but the reservoir is drained at a constant rate into the communications channel. The most challenging aspect of a constant rate encoder is, yes, to maintain constant channel rate (without overflowing or underflow a buffer of a fixed depth) while maintaining constant perceptual picture quality. In the simplest form, variable rate bitstreams do not obey any buffer rules, but will maintain constant picture quality. Constant picture quality is easiest to achieve by holding the macroblock quantizer step size constant (e.g. level 16 of 31). In its most advanced form, a variable bitrate stream may be more difficult to generate than constant bitrate streams. In advanced variable bitrate streams, the instantaneous bit rate (piece-wise bit rate) may be controlled by factors such as: 1. local activity measured against activity over large time intervals (e.g. the full span of a movie), or 2. instantaneous bandwidth availability of a communications channel. Summary of bitstream types Bitrate type Applications constant-rate fixed-rate communications channels like the original Compact Disc, digital video tape, single channel-per-carrier broadcast signal, hard disk storage simple variable-rate software decoders where the bitstream buffer (VBV) is the storage medium itself (very large). macroblock quantization scale is typically held constant over large number of macroblocks. complex variable-rate Statistical muliplexing (multiple-channel-per-carrier broadcast signals), compact discs and hard disks where the servo mechanisms can be controlled to increase or decrease the channel delivery rate, networked video where overall channel rate is constant but demand is variably share by multiple users, bitstreams which achieve average rates over very long time averages What is statistical multiplexing ? Progressive explanation: In the simplest coded bitstream, a PCM (Pulse Coded Modulated) digital signal, all samples have an equal number of bits. Bit distribution in a PCM image sequence is therefore not only uniform within a picture, (bits distributed along zero dimensions), but is also uniform across the full sequence of pictures. Audio coding algorithms such as MPEG-1s Layer I and II are capable of distributing bits over a one dimensional space, spanned by a frame. In layer II, for example, an audio channel coded at a bitrate of 128 bits/sec and sample rate of 44.1 Khz will have frames (which consist of 1152 subband coefficients each) coded with approximately 334 bits. Some subbands will receive more bits than others. In block-based still image compression methods which employ 2-D transform coding methods, bits are distributed over a 2 dimensional space (horizontal and vertical) within the block. Further, blocks throughout the picture may contain a varying number of bits as a result, for example, of adaptive quantization. For example, background sky may contain an average of only 50 bits per block, whereas complex areas containing flowers or text may contain more than 200 bits per block. In the typical adaptive quantization scheme, more bits are allocated to perceptually more complex areas in the picture. The quantization stepsizes can be selected against an overall picture normalization constant, to achieve a target bit rate for the whole picture. An encoder which generates coded image sequences comprised of independently coded still pictures, such as JPEG Motion video or MPEG Intra picture sequences, will typically generate coded pictures of equal bit size. MPEG non-intra coding introduces the concept of the distribution of bits across multiple pictures, augmenting the distribution space to 3 dimensions. Bits are now allocated to more complex pictures in the image sequence, normalized by the target bit size of the group of pictures, while at a lower layer, bits within a picture are still distributed according to more complex areas within the picture. Yet in most applications, especially those of the Constant Bitrate class, a restriction is placed in the encoder which guarantees that after a period of time, e.g. 0.25 seconds, the coded bitstream achieves a constant rate (in MPEG, the Video Buffer Verifier regulates the variable-to-constant rate mapping). The mapping of an inherently variable bitrate coded signal to a constant rate allows consistent delivery of the program over a fixed-rate communications channel. Statistical multiplexing takes the bit distribution model to 4 dimensions: horizontal, vertical, temporal, and program axis. The 4th dimension is enabled by the practice of mulitplexing multiple programs (each, for example, with respective video and audio bitstreams) on a common data carrier. In the Hughes' DSS system, a single data carrier is modulated with a payload capacity of 23 Mbits/sec, but a typical program will be transported at average bit rate of 6 Mbit/sec each. In the 4-D model, bits may be distributed according the relative complexity of each program against the complexities of the other programs of the common data carrier. For example, a program undergoing a rapid scene change will be assigned the highest bit allocation priority, whereas the program with a near-motionless scene will receive the lowest priority, or fewest bits. How does MPEG achieve compression? Here are some typical statistical conditions addressed by specific syntax and semantic tools: 1. Spatial correlation: transform coding with 8x8 DCT. 2. Human Visual Response---less acuity for higher spatial frequencies: lossy scalar quantization of the DCT coefficients. 3. Correlation across wide areas of the picture: prediction of the DC coefficient in the 8x8 DCT block. 4. Statistically more likely coded bitstream elements/tokens: variable length coding of macroblock_address_increment, macroblock_type, coded_block_pattern, motion vector prediction error magnitude, DC coefficient prediction error magnitude. 5. Quantized blocks with sparse quantized matrix of DCT coefficients: end_of_block token (variable length symbol). 6. Spatial masking: macroblock quantization scale factor. 7. Local coding adapted to overall picture perception (content dependent coding): macroblock quantization scale factor. 8. Adaptation to local picture characteristics: block based coding, macroblock_type, adaptive quantization. 9. Constant stepsizes in adaptive quantization: new quantization scale factor signaled only by special macroblock_type codes. (adaptive quantization scale not transmitted by default). 10. Temporal redundancy: forward, backwards macroblock_type and motion vectors at macroblock (16x16) granularity. 11. Perceptual coding of macroblock temporal prediction error: adaptive quantization and quantization of DCT transform coefficients (same mechanism as Intra blocks). 12. Low quantized macroblock prediction error: No prediction error for the macroblock may be signaled within macroblock_type. This is the macroblock_pattern switch. 13. Finer granularity coding of macroblock prediction error: Each of the blocks within a macroblock may be coded or not coded. Selective on/off coding of each block is achieved with the separate coded_block_pattern variable-length symbol, which is present in the macroblock only of the macroblock_pattern switch has been set. 14. Uniform motion vector fields (smooth optical flow fields): prediction of motion vectors. 15. Occlusion: forwards or backwards temporal prediction in B pictures. Example: an object becomes temporarily obscured by another object within an image sequence. As a result, there may be an area of samples in a previous picture (forward reference/prediction picture) which has similar energy to a macroblock in the current picture (thus it is a good prediction), but no areas within a future picture (backward reference) are similar enough. Therefore only forwards prediction would be selected by macroblock type of the current macroblock. Likewise, a good prediction may only be found in a future picture, but not in the past. In most cases, the object, or correlation area, will be present in both forward and backward references. macroblock_type can select the best of the three combinations. 16. Sub-sample temporal prediction accuracy: bi-linearly interpolated (filtered) "half-pel" block predictions. Real world motion displacements of objects (correlation areas) from picture-to-picture do not fall on integer pel boundaries, but on irrational . Half-pel interpolation attempts to extract the true object to within one order of approximation, often improving compression efficiency by at least 1 dB. 17. Limited motion activity in P pictures: skipped macroblocks. When the motion vector is zero for both the horizontal and vertical vector components, and no quantized prediction error for the current macroblock is present. Skipped macroblocks are the most desirable element in the bitstream since they consume no bits, except for a slight increase in the bits of the next non-skipped macroblock. 18. Co-planar motion within B pictures: skipped macroblocks. When the motion vector is the same as the previous macroblocks, and no quantized prediction error for the current macroblock is present. What is the difference between MPEG-1 and MPEG-2 syntax? Section D.9 of ISO/IEC 13818-2 is an informative piece of text describing the differences between MPEG-1 and MPEG-2 video syntax. The following is a little more informal. Sequence layer: MPEG-2 can represent interlaced or progressive video sequences, whereas MPEG-1 is strictly meant for progressive sequences since the target application was Compact Disc video coded at 1.2 Mbit/sec. MPEG-2 changed the meaning behind the aspect_ratio_information variable, while significantly reducing the number of defined aspect ratios in the table. In MPEG-2, aspect_ratio_information refers to the overall display aspect ratio (e.g. 4:3, 16:9), whereas in MPEG-2, the ratio refers to the particular pixel. The reduction in the entries of the aspect ratio table also helps interoperability by limiting the number of possible modes to a practical set, much like frame_rate_code limits the number of display frame rates that can be represented. Optional picture header variables called display_horizontal_size and display_vertical_size can be used to code unusual display sizes. frame_rate_code in MPEG-2 refers to the intended display rate, whereas in MPEG-1 it referred to the coded frame rate. In film source video, there are often 24 coded frames per second. Prior to bitstream coding, a good encoder will eliminate the redundant 6 frames or 12 fields from a 30 frame/sec video signal which encapsulates an inherently 24 frame/sec video source. The MPEG decoder or display device will then repeat frames or fields to recreate or synthesize the 30 frame/sec display rate. In MPEG-1, the decoder could only infer the intended frame rate, or derive it based on the Systems layer time stamps. MPEG-2 provides specific picture header variables called repeat_first_field and top_field_first which explicitly signal which frames or fields are to be repeated, and how many times. To address the concern of software decoders which may operate at rates lower or different than the common television rates, two new variables in MPEG-2 called frame_rate_extension_d and frame_rate_extension_n can be combined with frame_rate_code to specify a much wider variety of display frame rates. However, in the current set of define profiles and levels, these two variables are not allowed to change the value specified by frame_rate_code. Future extensions or Profiles of MPEG may enable them. In interlaced sequences, the coded macroblock height (mb_height) of a picture must be a multiple of 32 pixels, while the width, like MPEG-1, is a coded multiple of 16 pixels. A discrepancy between the coded width and height of a picture and the variables horizontal_size and vertical_size, respectively, occurs when either variable is not an integer multiple of macroblocks. All pixels must be coded within macroblocks, since there cannot be such a thing as fractional macroblocks. Never intended for display, these overhang pixels or lines exist along the left and bottom edges of the coded picture. The sample values within these trims can be arbitrary, but they can affect the values of samples within the current picture, and especially future coded pictures. In the current pictures, pixels which reside within the same 8x8 block as the overhang pixels are affect by the ripples of DCT quantization error. In future coded pictures, their energy can propagate anywhere within an image sequence as a result of motion compensated prediction. An encoder should fill in values which are easy to code, and should probably avoid creating motion vectors which would cause the Motion Compensated Prediction stage to extract samples from these areas. The application should probably select horizontal_size and vertical_size that are already multiples of 16 (or 32 in the vertical case of interlaced sequences) to begin with. Group of Pictures: The concept of the Group of Pictures layer does not exist in MPEG-2. It is an optional header useful only for establishing a SMPTE time code or for indicating that certain B pictures at the beginning of an edited sequence comprise a broken_link. This occurs when the current B picture requires prediction from a forward reference frame (previous in time to the current picture) has been removed from the bitstream by an editing process. In MPEG-1, the Group of Pictures header is mandatory, and must follow a sequence header. Picture layer: In MPEG-2, a frame may be coded progressively or interlaced, signaled by the progressive_frame variable. In interlaced frames (progressive_frame==0), frames may then be coded as either a frame picture (picture_structure==frame) or as two separately coded field pictures (picture_structure==top_field or picture_structure==bottom_field). Progressive frames are a logic choice for video material which originated from film, where all pixels are integrated or captured at the same time instant. Most electronic cameras today capture pictures in two separate stages: a top field consisting of all odd lines of the picture are nearly captured in the time instant, followed by a bottom field of all even lines. Frame pictures provide the option of coding each macroblock locally as either field or frame. An encoder may choose field pictures to save memory storage or reduce the end-to-end encoder-decoder delay by one field period. There is no longer such a thing called D pictures in MPEG-2 syntax. However, Main Profile @ Main Level MPEG-2 decoders, for example, are still required to decode D pictures at Main Level (e.g. 720x480x30 Hz). The usefulness of D pictures, a concept from the year 1990, had evaporated by the time MPEG-2 solidified in 1993. repeat_first_field was introduced in MPEG-2 to signal that a field or frame from the current frame is to be repeated for purposes of frame rate conversion (as in the 30 Hz display vs. 24 Hz coded example above). On average in a 24 frame/sec coded sequence, every other coded frame would signal the repeat_first_field flag. Thus the 24 frame/sec (or 48 field/sec) coded sequence would become a 30 frame/sec (60 field/sec) display sequence. This processes has been known for decades as 3:2 Pulldown. Most movies seen on NTSC displays since the advent of television have been displayed this way. Only within the past decade has it become possible to interpolate motion to create 30 truly unique frames from the original 24. Since the repeat_first_field flag is independently determined in every frame structured picture, the actual pattern can be irregular (it doesnt have to be every other frame literally). An irregularity would occur during a scene cut, for example. Slice: To aid implementations which break the decoding process into parallel operations along horizontal strips within the same picture, MPEG-2 introduced a general semantic mandatory requirement that all macroblock rows must start and end with at least one slice. Since a slice commences with a start code, it can be identified by inexpensively parsing through the bitstream along byte boundaries. Before, an implementation might have had to parse all the variable length tokens between each slice (thereby completing a significant stage of decoding process in advance) to know the exact position of each macroblock within the bitstream. In MPEG-1, it was possible to code a picture with only a single slice. Naturally, the mandatory slice per macroblock row restriction also facilitates error recovery. MPEG-2 also added the concept of the slice_id. This optional 6-bit element signals which picture a particular slice belongs to. In badly mangled bitstreams, the location of the picture headers could become garbled. slice_id allows a decoder to place a slice in the proper location within a sequence. Other elements in the slice header, such as slice_vertical_position, and the macroblock_address_increment of the first macroblock in the slice uniquely identify the exact macroblock position of the slice within the picture. Thus within a window of 64 pictures, a lost slice can find its way. Macroblock: motion vectors are now always represented along a half-pel grid. The usefulness of an integer-pel grid (option in MPEG-1) diminished with practice. A intrinsic half-pel accuracy can encourage use by encoders for the significant coding gain which half-pel interpolation offers. In both MPEG-1 and MPEG-2, the dynamic range of motion vectors is specified on a picture basis. A set of pictures corresponding to a rapid motion scene may need a motion vector range of up to +/- 64 integer pixels. A slower moving interval of pictures may need only a +/- 16 range. Due to the syntax by which motion vectors are signaled in a bitstream, pictures with little motion would suffer unnecessary bit overhead in describing motion vectors in a coordinate system established for a much wider range. MPEG-1s f_code picture header element prescribed a radius shared by horizontal and vertical motion vector components alike. It later became practice in industry to have a greater horizontal search range (motion vector radius) than vertical, since motion tends to be more prominent across the screen than up or down (vertical). Secondly, a decoder has a limited frame buffer size in which to store both the current picture under decoding and the set of pictures (forward, backward) used for prediction (reference) by subsequent pictures. A decoder can write over the pixels of the oldest reference picture as soon as it no longer is needed by subsequent pictures for prediction. A restricted vertical motion vector range creates a sliding window, which starts at the top of the reference picture and moves down as the macroblocks in the current picture are decoded in raster order. The moment a strip of pixels passes outside this window, they have ended their life in the MPEG decoding loop. As a result of all this, MPEG-2 created separate into horizontal and vertical range specifiers (f_code[][0] for horizontal, and f_code[][1] for vertical), and placed greater restrictions on the maximum vertical range than on the horizontal range. In Main Level frame pictures, this is range is [- 128,+127.5] vertically, and [-1024,+1023.5] horizontally. In field pictures, the vertical range is restricted to [- 64,+63.5]. Macroblock stuffing is now illegal in MPEG-2. The original intent behind stuffing in MPEG-1 was to provide a means for finer rate control adjustment at the macroblock layer. Since no self-respecting encoder would waste bits on such an element (it does not contribute to the refinement of the reconstructed video signal), and since this unlimited loop of stuffing variable length codes represent a significant headache for hardware implementations which have a fixed window of time in which to parse and decode a macroblock in a pipeline, the element was eliminated in January 1993 from the MPEG-2 syntax. Some feel that macroblock stuffing was beneficial since it permitted macroblocks to be coded along byte boundaries. A good compromise could have been a limited number of stuffs per macroblock. If stuffing is needed for purposes of rate control, an encoder can pad extra zero bytes before the start code of the next slice. If stuffing is required in the last row of macroblocks of the picture, the picture start code of the next picture can be padded with an arbitrary number of bytes. If the picture happens to be the last in the sequence, the sequence_end_code can be stuffed with zero bytes. The dct_type flag in both Intra and non-Intra coded macroblocks of frame structured pictures signals that the reconstructed samples output by the IDCT stage shall be organized in field or frame order. This flag provides an encoder with a sort of poor mans motion_type by adapting to the interparity (i.e. interfield) characteristics of the macroblock without signaling a need for motion vectors via the macroblock_type variable. dct_type plays an essential role in Intra frame pictures by organizing lines of a common parity together when there is significant interfield motion within the macroblock. This increases the decorrelation efficiency of the DCT stage. For non-intra macroblocks, dct_type organizes the 16 lines (... luminance, 8 lines chrominance) of the macroblock prediction error. In combination with motion_type, the meaning.... dct_type motion_format interpretation frame Intra coded block data is frame correlated field Intra coded block data is more strongly correlated along lines of opposite parity
Закладки на сайте Проследить за страницей |
Created 1996-2024 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |