The purpose of this note is to estimate the number of cores AWS machine (g4dn.4xlarge) required to transcode in real-time an encoded 1920x1080p@60fps video with the SW codec x264 to lower bitrate and/or to lower framerate and/or to lower resolution.  Actually our transcoding is both trans-rating and trans-scaling.

 

Testing methodology

AWS machine:  g4dn.4xlarge

Each frame contains 15 slices

Input data is 1920x1080p@60fps , originally encoded with the bitrate 19Mbps (remind we verify transcoding, therefore the original input should be encoded)

The Encoder is libx264, activated via ffmpeg (the version 5.0-full_build-www.gyan.dev)

We check the following transcoding cases:

1. Transcoding of 1920x1080p@60fps@19Mbps to a lower bitrate 12Mbps, IDRs each second (i.e. GOP size = 60)

2. Transcoding and trans-scaling from 1920x1080p@60fps@19Mbps to 1280×720@60fps with a lower bitrate 8Mbps, IDRs each second

3. Transcoding from 1920x1080p@60fps@19Mbps  to 1920x1080p@30fps with a lower bitrate 8Mbps and lower frame rate 30fps, IDRs each second

4. Transcoding and trans-scaling from 1920x1080p@60fps@19Mbps  to 1280×720@30fps, bitrate 5Mbps and lower frame rate 30fps, IDRs each second

 

Scene:  taken from the game SWBF2, containing fast zooming and flashes:

 

Case 1: transcoding with same resolution but lower bitrate 12Mbps

ffmpeg -y -i swbf.ts  -vsync 0   -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 12M -preset veryfast -g 60 -keyint_min 60  -sc_threshold 0 -slices 15  swbf_1080p_60fps_veryfast_12M.ts

 

Case 2: transcoding with lower resolution 1280×720 and lower bitrate 8Mbps

ffmpeg -y -i swbf.ts  -vsync 0  -s 1280x720 -sws_flags lanczos  -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 60 -keyint_min 60  -sc_threshold 0 -slices 15  swbf_720p_8M.ts

 

Case 3: transcoding with same resolution, lower bitrate 8Mbps and lower frame rate 30fps (after decoding each second frame is discarded,                                  -filter:v decimate=cycle=2):

ffmpeg -y  -i swbf.ts  -filter:v decimate=cycle=2  -vsync 0  -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 30 -keyint_min 30  -sc_threshold 0 -slices 15  swbf_1080p_30fps_veryfast.ts

 

Case 4: transcoding with lower resolution, lower bitrate and lower frame rate (30fps, after decoding each second frame is discarded ):

ffmpeg -y  -i swbf.ts  -filter:v decimate=cycle=2  -s 1280x720 -sws_flags lanczos  -vsync 0  -c:v libx264 -x264opts aud=1:bframes=0 -profile high -b:v 8M -preset veryfast -g 30 -keyint_min 30  -sc_threshold 0 -slices 15  swbf_720p_30fps_veryfast.ts

 

 

Results

CPU usage found is 100%

Transcoding of 1920x1080p@60fps@19Mbps  to a lower bitrate 12Mbps, GOP size = 60 frames, 15 slices per frame

# cores

Encoding Speed (fps)

1 16
2 21
3 38
4 42
5 58
6 63
7 80

Conclusion: : for safety 7  CPU cores is sufficient for 60fps re-encoding

 

 

 

 Transcoding to 1280x720p@60fps, bitrate 8Mbps, GOP size = 60 frames, 15 slices per frame

# cores Encoding Speed (fps)

with -sws_flags lanczos scaling 

1 26
2 32
3 59
4 65

Conclusion: for safety 4  CPU cores is sufficient to keep 60fps re-encoding. 

 

 

 

 

 Transcoding to 1920x1080p@30fps, bitrate 8Mbps, GOP size = 30 frames, 15 slices per frame

# cores Encoding Speed (fps)

with -filter:v decimate=cycle=2 

1 13 
2 17 
3 30
4 34

For safety 4  CPU cores is sufficient to keep 30fps re-encoding. 

 

 

 

 

Transcoding to 1280x720p@30fps, bitrate 5Mbps, GOP size = 30 frames, 15 slices per frame

# cores Encoding Speed (fps)
1 20 
2 25
3 40

For safety 4  CPU cores is sufficient to keep 30fps re-encoding of reduced resolution video (720p). 

Leave a Reply

Your email address will not be published. Required fields are marked *