The purpose of this note is to estimate the number of cores AWS machine (g4dn.4xlarge) required to transcode in real-time an encoded 1920x1080p@60fps video with the SW codec x264 to lower bitrate and/or to lower framerate and/or to lower resolution. Actually our transcoding is both trans-rating and trans-scaling.
Testing methodology
AWS machine: g4dn.4xlarge
Each frame contains 15 slices
Input data is 1920x1080p@60fps , originally encoded with the bitrate 19Mbps (remind we verify transcoding, therefore the original input should be encoded)
The Encoder is libx264, activated via ffmpeg (the version 5.0-full_build-www.gyan.dev)
We check the following transcoding cases:
1. Transcoding of 1920x1080p@60fps@19Mbps to a lower bitrate 12Mbps, IDRs each second (i.e. GOP size = 60)
2. Transcoding and trans-scaling from 1920x1080p@60fps@19Mbps to 1280×720@60fps with a lower bitrate 8Mbps, IDRs each second
3. Transcoding from 1920x1080p@60fps@19Mbps to 1920x1080p@30fps with a lower bitrate 8Mbps and lower frame rate 30fps, IDRs each second
4. Transcoding and trans-scaling from 1920x1080p@60fps@19Mbps to 1280×720@30fps, bitrate 5Mbps and lower frame rate 30fps, IDRs each second
Scene: taken from the game SWBF2, containing fast zooming and flashes:
Case 1: transcoding with same resolution but lower bitrate 12Mbps
ffmpeg -y -i swbf.ts -vsync 0 -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 12M -preset veryfast -g 60 -keyint_min 60 -sc_threshold 0 -slices 15 swbf_1080p_60fps_veryfast_12M.ts
Case 2: transcoding with lower resolution 1280×720 and lower bitrate 8Mbps
ffmpeg -y -i swbf.ts -vsync 0 -s 1280x720 -sws_flags lanczos -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 60 -keyint_min 60 -sc_threshold 0 -slices 15 swbf_720p_8M.ts
Case 3: transcoding with same resolution, lower bitrate 8Mbps and lower frame rate 30fps (after decoding each second frame is discarded, -filter:v decimate=cycle=2
):
ffmpeg -y -i swbf.ts -filter:v decimate=cycle=2 -vsync 0 -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 30 -keyint_min 30 -sc_threshold 0 -slices 15 swbf_1080p_30fps_veryfast.ts
Case 4: transcoding with lower resolution, lower bitrate and lower frame rate (30fps, after decoding each second frame is discarded ):
ffmpeg -y -i swbf.ts -filter:v decimate=cycle=2 -s 1280x720 -sws_flags lanczos -vsync 0 -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 30 -keyint_min 30 -sc_threshold 0 -slices 15 swbf_720p_30fps_veryfast.ts
Results
CPU usage found is 100%
Transcoding of 1920x1080p@60fps@19Mbps to a lower bitrate 12Mbps, GOP size = 60 frames, 15 slices per frame
# cores |
Encoding Speed (fps) |
1 | 16 |
2 | 21 |
3 | 38 |
4 | 42 |
5 | 58 |
6 | 63 |
7 | 80 |
Conclusion: : for safety 7 CPU cores is sufficient for 60fps re-encoding
Transcoding to 1280x720p@60fps, bitrate 8Mbps, GOP size = 60 frames, 15 slices per frame
# cores | Encoding Speed (fps)
with -sws_flags lanczos scaling |
1 | 26 |
2 | 32 |
3 | 59 |
4 | 65 |
Conclusion: for safety 4 CPU cores is sufficient to keep 60fps re-encoding.
Transcoding to 1920x1080p@30fps, bitrate 8Mbps, GOP size = 30 frames, 15 slices per frame
# cores | Encoding Speed (fps)
with -filter:v decimate=cycle=2 |
1 | 13 |
2 | 17 |
3 | 30 |
4 | 34 |
For safety 4 CPU cores is sufficient to keep 30fps re-encoding.
Transcoding to 1280x720p@30fps, bitrate 5Mbps, GOP size = 30 frames, 15 slices per frame
# cores | Encoding Speed (fps) |
1 | 20 |
2 | 25 |
3 | 40 |
For safety 4 CPU cores is sufficient to keep 30fps re-encoding of reduced resolution video (720p).
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume