A streaming media and video terms glossary that contains definitions of acronyms, technologies and techniques. The definitions are related to live streaming, broadcasting and video hosting.
These video terms are relevant for both new techniques and legacy methods, which still have ramifications today when handling older media. There is a larger emphasis for online video applications, although a few terms which have roots in older methodology and processes. The glossary will be continuously updated as the industry evolves. Links to learn more and for relevant articles will be added overtime as well.
2 3 Pull Down (aka: Three-two Pulldown)
A process used to convert material from film to interlaced NTSC display rates, from 24 to 29.97 frames per second. This is done by duplicating fields, 2 from one frame and then 3 from the next frame or vice-a-versa.
Refers to the resolution of the video content. 4K can mean a variety of different actual resolutions, although all are over 7 million pixels in their display. Example resolutions include 4,096 x 2,160 and 3,840 x 2,160, with the latter being the recommended resolution at IBM Cloud Video.
608 Captions (aka: line 21 captions, EIA-608, CEA-608)
These captions contain white text against a black box that surrounds the text. It appears on top of video content and has support for four caption tracks.
708 Captions (aka: CEA-708)
These captions were designed with digital distribution of content in mind. They are a more flexible version of captions over the older 608 caption approach, allowing for more caption tracks, more character types and the ability to modify the appearance.
AAC (aka: Advanced Audio Coding)
This audio coding format is lossy, featuring compression that does impact the audio quality. It offers better compression and increased sample frequency when compared to MP3.
AC-3 (aka: Audio Codec 3, Advanced Codec 3, Acoustic Coder 3)
A Dolby Digital audio format found on many home media releases. Dolby Digital is a lossy format, featuring compression that will impact audio quality. The technology is capable of utilizing up to six different channels of sound. The most common surround experience is a 5.1 presentation.
Adaptive Streaming (aka: Adaptive Bitrate Streaming)
This streaming approach offers multiple streams of the same content at varying qualities. These streams are served inside the same video player and often differ based on bitrate and resolution. Ideally the player should serve the viewer the bitrate most appropriate to their setup, based on qualifications like download speed. See our guide to learn more about adaptive streaming.
This relates to width and the height of a video which is then expressed as a ratio. The most common aspect ratios for video are 4:3 and 16:9. This is sometimes expressed as 1.33:1 (4:3 or “full screen” which came from fitting older TV sets) and 1.78:1 (16:19 or widescreen). For film, other common aspects include 1.85:1 or 2.35:1 (CinemaScope, TohoScope and other cinematic formats).
B-frames (aka: bi-directional Predicted Frames)
These frames follow another frame and only contain part of the image in a video. B-frames look backward and forward to a previous or later p-frame or keyframe (i-frame) and only contain new information not already presented.
This is supplemental footage that offers additional options from editors when creating a final cut of a video. This can be audience shots, different angles and more. It is often used to spice up video presentations, for example a presentation at a trade show might be spiced up by inserting b-roll footage of the booth to show activity. It is also commonly utilized in interviews as content to cut away to. The term originates from traditional film where editors used to utilize a roll “A” and roll “B” of identical footage to cut from.
In relation to video, bandwidth is used to describe an internet connection speed or as a form of consumption in relation to web hosting. For speed, it is used as a point of reference for an internet connection. When it comes to streaming content, this is important as a viewer has to have enough bandwidth in order to watch. For web hosting, bandwidth can be used as a measure of consumption.
Bit Rate (aka: data rate or bitrate)
The amount of data per unit of time. For streaming, this is in the context of video and audio content and often given in a unit of seconds, often expressed in terms of kilobits (kbps) and megabits (Mbps).
A technique that involves bouncing light off a reflective surface on the subject. This is done in order to achieve a softer, less harsh lighting effect as opposed to shining the light directly on the subject. It can also achieve a more natural, even look to the subject as well.
Video streaming involves sending over video chunks of data to an end user. The video player will then create a buffer involving chunks that have not yet been viewed. This process is intended to let the viewer watch from the buffer in the event a video chunk is lost. Ideally the lost video chunk will be received before the buffer is emptied, causing no disruption in viewing. However, it’s quite possible for the viewer to have a connection speed that is poor enough that the video chunk does not arrive before the buffer is empty. If this occurs the video content will stop and the player will generally wait until more data is received. This will generally provide a buffering message while the player will wait for the lost video chunk and will attempt to rebuild the buffer.
CDN (aka: Content Delivery Network)
These are large networks of servers that have copies of data, pulled from an origin server, and are often geographically diverse in their location. The end user pulls the needed resources from the server that is closest to them, which is called an edge server. This process is done to decrease any delays that might be caused due to server proximity to the end user, as larger physical distances will result in longer delays, and ideally avoid congestion issues. Due to the resource intensive process of video streaming, most streaming platforms utilize a CDN. Read more on content delivery networks.
This is text information usually combined with video content. It transcribes both dialogue that is being spoken but also relates on screen events that might be audio only. For example, a door closed off screen might create a loud noise that the captions can specify. This is how they differ from subtitles and why they are aimed at assisting the hearing impaired.
CRTP (aka: Compressed Real Time Transport Protocol)
This is a compressed form of RTP. It was designed to reduce the size of the headers for the IP, UDP (User Datagram Protocol) and RTP. For best performance, it needs to work with a fast and dependable network or can experience long delays and packet loss.
Deinterlacing filters combine the two alternating fields found in interlaced video to form a clean shot in a progressive video. Without deinterlacing, the interlaced content will often display motion with a line-like appearance. Read more on deinterlacing for streaming.
Depth of Field (aka DOF)
This relates to the nearest and furthest objects in view that appear to be in focus. As a result, a deep depth of field will showcase nearly everything inside the frame in sharp focus. A shallow depth of field, on the other hand, will only have a narrow range of focus inside the video. For example, an interview that has the individual in focus but the background out of focus would be a shallow depth of field.
Unlike using an actual optical lens change, this process gives the appearance of zooming in through cropping the image to a smaller portion of the available video frame. This process maintains the same aspect ratio and gives the illusion of zooming in, but does involve reducing the quality of the image to achieve this effect.
ECDN (aka Enterprise Content Delivery Network)
Generally an on-premise solution that empowers scaling video assets around a central location. This can include a school or office and reduces strain on the internal connection. For example, rather than having to send 100 high definition live streams to one office and greatly taxing the available download speed, ECDN would facilitate being sent one version and then distributing that to reduce strain on the network. Learn more about ECDN and monitoring network performance.
This is a media player that is enclosed in a web source, which can range dramatically from being seen in an HTML document on a website to a post on a forum. Players will vary based on appearance, features and available end user controls. An iframe embed, which can be used to embed a variety of content, is one of the most common methods of embedding a video player.
Takes source content and converts it into a digital format. Often used in the context of encoders, which can be software or hardware based, that are used for taking live video sources and converting that content to be live streamed in a digital format. Often used interchangeably with transcoding, encoding by definition takes an analog source and digitizes that content.
H.264 (aka MPEG-4 Part 10, Advanced Video Coding, MPEG-4 AVC)
A video compression technology, commonly referred to as a codec, that is defined in the MPEG-4 specification. The container format for H.264 is defined as MP4.
Refers to the resolution of the video content. Typically this relates to 1080p, 1080i and 720p resolution content. Each of these have a different resolution, with 1080p being 1920×1080 (2,073,600 pixels), 1080i being 1920×1080 (1,036,800 pixels) and 720p being 1280×720 (921,600 pixels). The difference between 1080i and 1080p is the type of content, with 1080i being interlaced and 1080p being progressive. This is why the 1080i has much fewer pixels as not all the image is being transmitted.
Adobe’s HTTP Dynamic Streaming is an HTTP-based technology for adaptive streaming. It segments the video content into smaller video chunks, allowing switching between bit rates when viewing.
Apple’s HTTP Live Streaming is an adaptive streaming technology. It functions by breaking down the stream into smaller MPEG2-TS files. These files vary by bitrate and often times resolution, and ideally are served to the viewer based on the criteria of their setup such as download speed.
Usually this term relates to tag compatible with HTML5 to play video content. This is an in-browser solution, done through the HTML5 <video> element. However, this term sometimes just refers to technology that is compatible over browsers that support HTML5 rather than the specific element. For more details and compatibility, read our HTML5 Video Player vs. Flash article.
A technique used for television video formats, such as NTSC and PAL, in which each full frame of video actually consists of alternating lines taken from two separate fields captured at slightly different times. The two fields are then interlaced or interleaved into the alternating odd and even lines of the full video frame. When displayed on television equipment, the alternating fields are displayed in sequence, depending on the field dominance of the source material.
IP Camera (aka: Internet Protocol Camera)
A digital camera that can both send and receive data via the Internet or computer network. These cameras are designed to support a limited number of users that could connect directly to the camera to view. They are RTSP (Real Time Streaming Protocol) based, and for that reason are not largely supported by broadcasting platforms without using special encoders.
A jarring transition from scene to scene, most often related to something that should have appeared sequential. For example, a man can be video taped walking from left to right but suddenly jumps in the frame to advance to a position that wasn’t witnessed them walking. Can be used artistically, but also has the reputation for being a sign of a less polished production.
Keyframe (aka: i-frame, Intra Frame)
This is the full frame of the image in a video. Subsequent frames only contain the information that has changed between frames. This process is done to compress the video content. Read more on keyframes and video compression.
Key Frame Interval (aka: Keyframe Interval)
Set inside the encoder or when the video is being encoded, the key frame interval controls how often a keyframe is created in the video. The keyframe is a full frame of the image. Other frames will generally only contain the information that has changed.
Relates to media content being delivered live over the Internet. The process involves a source (video camera, screen captured content, etc), an encoder to digitize the feed (Teradek VidiU, Telestream Wirecast, etc), and a platform such as Ustream or another provider that will typically take the feed and publish it over a CDN (Content Delivery Network). Content that is live streamed will typically have a delay in a magnitude of seconds compared to the source.
Lossless encoding is any compression scheme, especially for audio and video data, that uses a nondestructive method that retains all the original information. Consequently, lossless compression does not degrade sound or video quality meaning the original data could be completely reconstructed from the compressed data.
Lossy encoding is any compression scheme, especially for audio and video data, that removes some of the original information in order to significantly reduce the size of the compressed data. Lossy image and audio compression schemes such as JPEG and MP3 try to eliminate information in subtle ways so that the change is barely perceptible, and sound or video quality is not seriously degraded.
MPEG-DASH (aka: Dynamic Adaptive Streaming over HTTP)
An adaptive bitrate streaming technology. Contains both the encoded audio and video streams along with manifest files that identify the streams. This process involves breaking down the video stream into small HTTP sequence files. These files allow the content to be switched from one state to another.
MPEG-TS (aka: Transport Stream, MTS, TS)
A container format that hosts packetized elementary streams for transmitting MPEG video muxed with other streams. It can also have separate streams for video, audio and closed captions. It’s commonly used for digital television and streaming across networks, including the internet.
Depends upon the lens’ ability to change the focal length, attempting to create an image that is either closer or further away from the subject. This is often achieved through extending the lens, making it actually physically closer to the subject as well, although it’s really shifting the internal ratio of lens to achieve this effect. This is in contrast to digital zooming, which simulates zooming in by cropping the image to achieve this effect.
P-frames (aka: Predictive Frames, Predicted Frames)
The p-frame follows another frame and only contain part of the image in a video. P-frames look backwards to a previous p-frame or keyframe for redundancies.
Program Stream (aka: PS)
These streams are optimized for efficient storage. They contain elementary streams without an error detection or correction process. It assumes the decoder has access to the entire stream for synchronization purposes. Consequently, programs streams are often found in physical media formats, such as DVDs or Blu-rays.
A video track that consists of complete frames without interlaced fields. Each individual frame is a coherent image at a single moment in time. This means a video could be paused and the entire image could be seen. All streaming files are progressive, and this should not to be confused with the process of keyframes and p or b frames.
Reverse Telecine (aka: Inverse Telecine, IVTC)
This is a process used to reverse the effect of 3 : 2 pull down. This is achieved through removing the extra fields that were inserted to stretch 24 frame per second film to 29.97 frames per second interlaced video.
RTMP (aka: Real Time Messaging Protocol)
Is a TCP-based protocol that allows for low-latency communication. In the context of video, it allows for delivering live and on demand media content that can be viewed over Adobe Flash applications, although the source can be modified for other playback methods.
RTP (aka: Real Time Transport Protocol)
A network protocol designed to deliver video and audio content over IP networks and runs on top of UDP. The components of RTP include a sequence number, a payload identification, frame indication, source identification, and intramedia synchronization.
RTSP (aka: Real Time Streaming Protocol)
A method for streaming video content through controlling media sessions between end points. This protocol uses port 554. Using this method, data is often sent via RTP. RTSP is a common technology found in IP cameras. However, some encoders, like Wirecast, can actually take the IP camera feed and deliver it in an RTMP format.
SD-CDN (aka: Software Defined CDN)
A Software Defined Content Delivery Network utilizes several CDNs in order to optimize viewer performance. This involves multiple checks to determine performance on a per viewer basis, from buffering to more critical checks like 404, and sending traffic to the optimal network within the connected CDNs with the goal of improved video delivery and uptime. Learn more about scaling video delivery with SD-CDN.
Microsoft’s Silverlight is both a video playback solution and an authoring environment. The user interface and description language is Extensible Application Markup Language (XAML). The technology is natively compatible with the Windows Media format.
Smooth Streaming (aka: IIS)
Microsoft’s Smooth Streaming for Silverlight is an adaptive bitrate technology. It’s a hybrid media delivery method that is based on HTTP progressive download. The downloads are sent in a series of small video chunks. Like other adaptive technology, Smooth Streaming offers multiple encoded bitrates of the same content that can then be served to a viewer based on their setup.
SSO (aka: Single Sign-On)
A shared session and user authentication service. It permits users to to use the same login credentials, such as the same username/email and password, across multiple applications. Identity management services based around SSO include Okta, OneLogin, Google Apps for Work and more. In reference to video, this technology is often used to create a secure, internal video solution for enterprises.
Streaming Video (aka: Streaming Media)
Refers to video and/or audio content that can be played directly over the Internet. Unlike progressive download, an alternative method, the content does not need to be downloaded onto the device first in order to be viewed or heard. It allows for the end user to begin watching as additional content is constantly being transmitted to them.
The process of transcoding involves converting one digital video type into another format. This is often done to make a file compatible over a particular service. This process is different from encoding as transcoding involves converting a format that is already digital while encoding relates to converting an analog source to a digital format. Despite this, the terms are often used interchangeably.
UDP (aka: User Datagram Protocol)
The most universal way to transmit or receive audio or video via a network card or modem. In terms of real-time protocol, RTMP (Real Time Messaging Protocol) is based on TCP (Transmission Control Protocol), which led to the creation of RTMFP (Real Time Media Flow Protocol) that is based on UDP.
This process uses codecs to present video content in a less resource intensive format. Due to the high data rate of uncompressed video, most video content is compressed. Compression techniques can feature overt processes such as image compression or sophisticated techniques such as inter frame, which will look for redundancies between different frames in the video and only present changes via delta frames from a keyframe point.
A process to reduce the size of video data, often times with audio data included, through the use of a compression scheme. This compression can be for the purpose of storage, known as program stream (PS), or for the purpose of transmission, known as transport stream (TS).
Video Scaling (aka: Trans-sizing)
A process to either reduce or enlarge an image or video sequence by squeezing or stretching the entire image to a smaller or larger image resolution. While this sometimes can just involve a resolution change, it can also involve changing the aspect ratio, like converting a 4:3 image to a “widescreen” 16:9 image.
VOD (aka: Video On Demand)
VOD refers to content that can be viewed on demand by an end user. The term is commonly used to differentiate between live content, as VODs are previously recorded. That said, content can be presented in a way that is not on demand but using previously recorded content, such as televised programming that does not give the end user control over what they are trying to watch.
Sometimes abbreviated as simply VTT, which is the file format, WebVTT is an abbreviation for Web Video Text Tracks. It is a file format that is used for both subtitles and captions, and a W3C standard. It contains a timestamp for each caption, allowing it to be associated with VOD content.
Have any questions on this technology and practices as they relate to the IBM Cloud Video and Ustream platforms? Please visit our Support Center for compatibility questions regarding the items found in this video terms glossary. If you are looking for some tips on executing these terms, check out these 5 Pro Tips for Video Production.
Most of these terms are applicable to plans from both IBM Cloud Video and also Ustream. Certain terms relate to a specific offering, for example SSO is relevant to Streaming Manager for Enterprise. For more details and to learn which solution might be best suited for you, please contact sales to learn more.