Quick Start Guide
Install
To use ffmpegio
, the package must be installed on Python as well as
having the FFmpeg binary files at a location ffmpegio
can find.
Install the full ffmpegio
package via pip
:
pip install ffmpegio
If numpy.ndarray data I/O is not needed, instead use
pip install ffmpegio-core
If FFmpeg is not installed on your system, please follow the instructions on Installation page
Features
FFmpeg can read/write virtually any multimedia file out there, and ffmpegio
uses
the FFmpeg’s prowess to perform media I/O (and other) operations in Python. It offers two
basic modes of operation: block read/write and stream read/write. For the read operations,
it can output data either in a Numpy array or in a plain bytes
. The Numpy mode is
enabled by default if Numpy is available in the system. Another feature of
ffmpegio
is to report the properties of the media files, using FFprobe.
Media Probe
To process a media file, you first need to know what’s in it. Within FFmpeg
ecosystem, this task is handled by ffprobe.
ffmpegio
’s ffmpegio:probe module wraps ffprobe with 5
basic functions:
>>> import ffmpegio
>>> from pprint import pprint
>>> url = 'mytestvideo.mpg'
>>> format_info = ffmpegio.probe.format_basic(url)
>>> pprint(format_info)
{'duration': 66.403256,
'filename': 'mytestvideo.mpg',
'format_name': 'mpegts',
'nb_streams': 2,
'start_time': 0.0}
>>> stream_info = ffmpegio.probe.streams_basic(url)
>>> pprint(stream_info)
[{'codec_name': 'mp2', 'codec_type': 'audio', 'index': 0},
{'codec_name': 'h264', 'codec_type': 'video', 'index': 1}]
>>> vst_info = ffmpegio.probe.video_streams_basic(url)
>>> pprint(vst_info)
[{'codec_name': 'h264',
'display_aspect_ratio': Fraction(22, 15),
'duration': 66.39972222222222,
'frame_rate': Fraction(15000, 1001),
'height': 240,
'index': 1,
'pix_fmt': 'yuv420p',
'sample_aspect_ratio': Fraction(1, 1),
'start_time': 0.0,
'width': 352}]
>>> ast_info = ffmpegio.probe.audio_streams_basic(url)
>>> pprint(ast_info)
[{'channel_layout': 'stereo',
'channels': 2,
'codec_name': 'mp2',
'duration': 66.40325555555556,
'index': 0,
'nb_samples': 2928384,
'sample_fmt': 'fltp',
'sample_rate': 44100,
'start_time': 0.0}]
To obtain the complete ffprobe output, use ffmpegio.probe.full_details()
,
and to obtain specific format or stream fields, use ffmpegio.probe.query()
.
For more information on probe
, see Media Probe Function References.
Block Read/Write
Suppose you need to analyze short audio data in mytestfile.mp3
, you can
read all its samples by
>>> fs, x = ffmpegio.audio.read('mytestfile.wav')
It returns the sampling rate fs
and numpy.ndarray
x
.
The audio data is always represetned by a 2-D array, each of which column represents
an audio channel. So, a 2-second stereo recording at 8000 samples/second yields
x.shape
to be (16000,2)
. Also, the sample format is preserved: If
the samples in the wav file is 16-bit, x
is of numpy.int16
dtype.
Now, you’ve processed this audio data and produced the 8000-sample 1-D array y
at reduced sampling rate at 4000-samples/second. You want to save this new audio
data as FLAC file. To do so, you run:
>>> ffmpegio.audio.write('myoutput.flac', 4000, y)
There are video counterparts to these two functions:
>>> fs, F = ffmpegio.video.read('mytestvideo.mp4')
>>> ffmpegio.video.write('myoutput.avi', fs, F)
Let’s suppose mytestvideo.mp4
is 10 seconds long, containing a
yuv420p
-encoded color video stream with the frame size of 640x480 pixels,
and the frame rate of 29.97 (30000/1001) frames/second. Then, the video.read()
returns a 2-element tuple: the first element fs
is the frame rate in
fractions.Fraction
and the second element F
contains all the frames
of the video in numpy.ndarray
with shape (299, 480, 640, 3)
.
Because the video is in color, each pixel is represented in 24-bit RGB, thus
F.dtype
is numpy.uint8
. The video write is the reciprocal of
the read operation.
For image (or single video frame) I/O, there is a pair of functions as well:
>>> I = ffmpegio.image.read('myimage.png')
>>> ffmpegio.image.write('myoutput.bmp', I)
The image data I
is like the video frame data, but without the leading
dimension.
Stream Read/Write
Block read/write is simple and convenient for a short file, but it quickly
becomes slow and inefficient as the data size grows; this is especially true
for video. To enable on-demand data retrieval, ffmpegio
offers stream
read/write operation. It mimics the familiar Python’s file I/O with
ffmpegio.open()
:
>>> with ffmpegio.open('mytestvideo.mp4', 'rv') as f: # opens the first video stream
>>> print(f.rate) # frame rate fraction in frames/second
>>> F = f.read() # read the first frame
>>> F = f.read(5) # read the next 5 frames at once
Another example, which uses read and write streams simultaneously:
>>> with ffmpegio.open('mytestvideo.mp4', 'rv', blocksize=100) as f,
>>> ffmpegio.open('myoutput.avi', 'wv', f.rate) as g:
>>> for frames in f: # iterates over all frames, 100 frames at a time
>>> output = my_processor(frames) # function to process data
>>> g.write(output) # send the processed frames to 'myoutput.avi'
By default, ffmpegio.open()
opens the first media stream available to read.
However, the operation mode can be specified via the mode
second argument.
The above example, opens mytestvideo.mp4
file in 'rv'
or “read
video” mode and myoutput.avi
in 'wv'
or “write video” mode. The
file reader object f
is an Iterable object, which returns the next set of
frames (the number set by the blocksize
argument). For more,
see ffmpegio.open()
.
Specify Read Time Range
For both block and stream read operations, you can specify the time range to read data from. There are four options available:
Name |
Description |
---|---|
|
Start time in seconds |
|
Duration in seconds |
|
End time in seconds (ignored if |
Note it is also possible to specify these timing options for the input (i.e., using the
options ss_in
, t_in
, and to_in
). The input options, especially
ss_in
, may run faster but potentially less accurate. See FFmpeg documentation for the explanation.
>>> url = 'myvideo.mp4'
>>> #read only the first 1 seconds
>>> fs, F = ffmpegio.video.read(url, t=1.0)
>>> #read from 1.2 second mark to 2.5 second mark
>>> fs, F = ffmpegio.video.read(url, t=1.2, to=2.5)
To specify by the frame numbers for video and sample numbers for audio, user must
convert the units to seconds using probe()
. For example:
>>> # get frame rate of the (first) video stream
>>> info = ffmpegio.probe.video_streams_basic('myvideo.mp4')
>>> fs = info[0]['frame_rate']
>>> #read 30 frame from the 11th frame (remember Python uses 0-based index)
>>> with ffmpegio.open('myvideo.mp4', 'rv', t=10/fs, t=30/fs) as f:
>>> frame = f.read()
>>> # do your thing with the frame data
Likewise, for an audio input stream:
>>> # get sampling rate of the (first) audio stream
>>> info = ffmpegio.probe.audio_streams_basic('myaudio.wav')
>>> fs = info[0]['sample_rate']
>>> #read first 10000 audio samples
>>> fs, x = ffmpegio.audio.read('myaudio.wav', t=10000/fs)
Specify Output Frame/Sample Size
FFmpeg let you change video size or the number of audio channels via output
options s
and ac
, respectively, without setting up a
filtergraph. For example,
>>> # auto-scale video frame
>>> fs, F = ffmpegio.video.read('myvideo.mp4', t=1.0) # natively 320x240
>>> F.shape
(30, 240, 320, 3)
>>> # halve the size
>>> width = 160
>>> height = 120
>>> _, G = ffmpegio.video.read('myvideo.mp4', t=1.0, s=(width,height))
>>> G.shape
(29, 120, 160, 3)
>>> # auto-convert to mono
>>> fs, x = ffmpegio.audio.read('myaudio.wav') # natively stereo
>>> _, y = ffmpegio.audio.read('myaudio.wav', ac=1) # to mono
>>> x.shape
(44100, 2)
>>> y.shape
(44100, 1)
To customize the conversion configuration, use vf
output option
with with scale
filter or af
output option with
channelmap
or pan
or other channel mixing filter
Specify Sample Formats
FFmpeg can also convert the formats of video pixels and sound samples on the fly.
This feature is enabled in ffmpegio
via output options pix_fmt
for video and sample_fmt
for audio.
pix_fmt
Description
gray
grayscale
ya8
grayscale with transparent alpha channel
rgb24
RGB
rgba
RGB with alpha transparent alpha channel
sample_fmt
Description
min
max
u8
unsigned 8-bit integer
0
255
s16
signed 16-bit integer
-32768
32767
s32
signed 32-bit integer
-2147483648
2147483647
flt
single-precision floating point
-1.0
1.0
dbl
double-precision floating point
-1.0
1.0
For example,
>>> # auto-convert video frames to grayscale
>>> fs, RGB = ffmpegio.video.read('myvideo.mp4', t=1.0) # natively rgb24
>>> _, GRAY = ffmpegio.video.read('myvideo.mp4', t=1.0, pix_fmt='gray')
>>> RGB.shape
(29, 640, 480, 3)
>>> GRAY.shape
(29, 640, 480, 1)
>>> # auto-convert PNG image to remove transparency with white background
>>> RGBA = ffmpegio.image.read('myimage.png') # natively rgba with transparency
.. >>> RGB = ffmpegio.image.read('myimage.png', pix_fmt='rgb24', fill_color='white')
>>> RGB.shape
(100, 396, 4)
>>> RGB.shape
(100, 396, 3)
>>> # auto-convert to audio samples to double precision
>>> fs, x = ffmpegio.audio.read('myaudio.wav') # natively s16
>>> _, y = ffmpegio.audio.read('myaudio.wav', sample_fmt='dbl')
>>> x.max()
2324
>>> y.max()
0.0709228515625
Note when converting from an image with alpha channel (FFmpeg does not support
alpha channel in video input) the background color may be specified with
fill_color
option (which defaults to 'white'
).
See the FFmpeg color specification
for the list of predefined color names.
|
ffmpegio.image.read('ffmpeg-logo.png')
|
|
ffmpegio.image.read('ffmpeg-logo.png', pix_fmt='rgb24', fill_color='linen')
|
|
ffmpegio.image.read('ffmpeg-logo.png', pix_fmt='ya8')
|
|
ffmpegio.image.read('ffmpeg-logo.png', pix_fmt='gray',
fill_color='#F0F0F0')
|
Progress Callback
FFmpeg has -progress
option, which sends program-friendly progress
information to url. ffmpegio
takes advantage of this option to
let user monitor the transcoding progress with a callback, which could be
set with progress
argument of all media operations. The callback
function must have the following signature:
progress_callback(status:dict, done:bool) -> None|bool
The status
dict containing the information similar to what FFmpeg
displays on console. The second argument done
is only True
on the last progress call. Here is an example of status
dict:
{'bitrate': '61.9kbits/s',
'drop_frames': 0,
'dup_frames': 0,
'fps': 336.18,
'frame': 1014,
'out_time': '00:00:33.877914',
'out_time_ms': 33877914,
'out_time_us': 33877914,
'speed': '11.2x',
'stream_0_0_q': 29.0,
'total_size': 262192}
While FFmpeg does not report percent progress, it is possible to compute it from
frame
or out_time
if you know the total number of output frames
or the output duration, respectively.
If an FFmpeg media stream object is invoked by ffmpegio.open()
with progress
callback argument, the callback function can terminate
the FFmpeg execution by returning True
. This feature is useful for GUI
programming.