北京锑流科技有限责任公司官网

使用NVIDIA CUDA硬件加速编解码时，如果在多个GPU之间实现负载均衡?

从4.6.0版本开始，在利用NVIDIA CUDA硬件进行加速编解码时，Wowza Streaming Engine已经实现了多个GPU之间的负载均衡。与此同时，Wowza还提供了相应的API接口，这样可以让你按你自己的需求和逻辑实现自己需要的负载均衡功能。

下面这篇文章就是对它们的介绍。

注意:本功能需要Wowza Streaming Engine™ 4.6.0及以上版本的支持。 Wowza Transcoder在运行时会调用下面的Interface ITranscoderVideoLoadBalancer:

public interface ITranscoderVideoLoadBalancer
{
	public abstract void init(IServer server, TranscoderContextServer transcoderContextServer);
	public abstract void onHardwareInspection(TranscoderContextServer transcoderContextServer);
	public abstract void onTranscoderSessionCreate(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionInit(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionDestroy(LiveStreamTranscoder liveStreamTranscoder);
	public abstract void onTranscoderSessionLoadBalance(LiveStreamTranscoder liveStreamTranscoder);
}

其中:

init 是当服务器启动时被调用
onHardwareInspection是当Transcoder刚启动正在检测显卡等包含GPU的硬件加速设备资源时被调用
onTranscoderSessionCreate当一个转码任务(session)创建时被调用
onTranscoderSessionInit当一个转码任务(session)完成初始化且转码模板被读取完毕时被调用
onTranscoderSessionDestroy当一个转码任务(session)销毁时被调用
onTranscoderSessionLoadBalance当一个转码任务正处在decoder、scaler以及encoder的动作被初始化时被调用

配置 ITranscodeVideoLoadBalancer 接口的实现类

要使用ITranscoderVideoLoadBalancer接口，你需要按以下操作:

创建一个class，继承TranscoderVideoLoadBalancerBase (它实现了上面介绍的ITranscoderVideoLoadBalancer接口)，然后重载接口中的方法。要了解更多，请阅读例子 class - TranscoderVideoLoadBalancerCUDASimple.
然后在[install-dir]/conf/Server.xml文件中的添加一个server级别的参数，指明这个实现类的完整类包名:
```
<Property>
	<Name>transcoderVideoLoadBalancerClass</Name>
	<Value>[custom-class-path]</Value>
</Property>
			
```
其中[custom-class-path]就是你的实现类的完整类包名。例如，如果你使用的就是Wowza内部自带的这个TranscoderVideoLoadBalancerCUDASimple实现类，你就该按下面配置:
```
<Property>
	<Name>transcoderVideoLoadBalancerClass</Name>
	<Value>com.wowza.wms.transcoder.model.TranscoderVideoLoadBalancerCUDASimple</Value>
</Property>
			
```
(可选)如果你的多个GPU性能各不一样，你还可以用transcoderVideoLoadBalancerCUDASimpleGPUWeights 参数为每一个GPU设置不同的权重。具体请阅读不同性能的多个GPU之间的负载均衡。

例子 class - TranscoderVideoLoadBalancerCUDASimple

从Wowza Streaming Engine (4.5.0.01)开始，TranscoderVideoLoadBalancerCUDASimple) 就已经内置在Wowza中了。你不用做任何开发工作就可以直接使用。它的负载均衡机制是将每一个独立的转码任务(session)的所有工作都分配在一个GPU上，也就说一个转码任务内部的工作不会在多个GPU之间来回切换。

import java.util.*;

import com.wowza.util.*;
import com.wowza.wms.application.*;
import com.wowza.wms.logging.*;
import com.wowza.wms.media.model.*;
import com.wowza.wms.server.*;

public class TranscoderVideoLoadBalancerCUDASimple extends TranscoderVideoLoadBalancerBase
{
	private static final Class<TranscoderVideoLoadBalancerCUDASimple> CLASS = TranscoderVideoLoadBalancerCUDASimple.class;
	private static final String CLASSNAME = "TranscoderVideoLoadBalancerCUDASimple";
	
	public static final int DEFAULT_GPU_WEIGHT_SCALE = 1;
	public static final int DEFAULT_WEIGHT_FACTOR_ENCODE = 5;
	public static final int DEFAULT_WEIGHT_FACTOR_DECODE = 1;
	public static final int DEFAULT_WEIGHT_FACTOR_SCALE = 1;
	
	public static final int LOAD_MAG = 1000;
	
	public static final String PROPNAME_TRANSCODER_SESSION = "TranscoderVideoLoadBalancerCUDASimpleSessionInfo";
	
	class SessionInfo
	{
		private int gpuid = 0;
		private long load = 0;
		
		public SessionInfo(int gpuid, long load)
		{
			this.gpuid = gpuid;
			this.load = load;
		}
	}
	
	class GPUInfo
	{
		private int gpuid = 0;
		private long currentLoad = 0;
		private int weight = 0;
		
		private int getWeight()
		{
			return this.weight;
		}
		
		private long getUnWeightedLoad()
		{
			return currentLoad;
		}
		
		private long getWeightedLoad()
		{
			long load = 0;
			if (weight > 0)
				load = (currentLoad*gpuWeightScale)/weight;
			return load;
		}
	}
	
	private Object lock = new Object();
	private TranscoderContextServer transcoderContextServer = null;
	private boolean available = false;
	private int countGPU = 0;
	private int gpuWeightScale = DEFAULT_GPU_WEIGHT_SCALE;
	private int[] gpuWeights = null;
	private int weightFactorEncode = DEFAULT_WEIGHT_FACTOR_ENCODE;
	private int weightFactorDecode = DEFAULT_WEIGHT_FACTOR_DECODE;
	private int weightFactorScale = DEFAULT_WEIGHT_FACTOR_SCALE;
	private GPUInfo[] gpuInfos = null;

	@Override
	public void init(IServer server, TranscoderContextServer transcoderContextServer)
	{
		this.transcoderContextServer = transcoderContextServer;
		
		WMSProperties props = server.getProperties();
		
		this.weightFactorEncode = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorEncode", this.weightFactorEncode);
		this.weightFactorDecode = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorDecode", this.weightFactorDecode);
		this.weightFactorScale = props.getPropertyInt("transcoderVideoLoadBalancerCUDASimpleWeightFactorScale", this.weightFactorScale);

		String weightsStr = props.getPropertyStr("transcoderVideoLoadBalancerCUDASimpleGPUWeights", null);
		if (weightsStr != null)
		{
			String[] values = weightsStr.split(",");
			int maxWeight = 0;
			this.gpuWeights = new int[values.length];
			for(int i=0;i<values.length;i++)
			{
				String value = values[i].trim();
				if (value.length() <= 0)
				{
					this.gpuWeights[i] = -1;
					continue;
				}

				int weight = -1;
				try
				{
					weight = Integer.parseInt(value);
					if (weight < 0)
						weight = 0;
				}
				catch(Exception e)
				{
				}
				
				this.gpuWeights[i] = weight;
				if (weight > maxWeight)
					maxWeight = weight;
			}
			
			this.gpuWeightScale = maxWeight;
			for(int i=0;i<this.gpuWeights.length;i++)
			{
				if (this.gpuWeights[i] < 0)
					this.gpuWeights[i] = this.gpuWeightScale;
			}
		}
		
		WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".init: weightFactorEncode:"+weightFactorEncode+" weightFactorDecode:"+weightFactorDecode+" weightFactorScale:"+weightFactorScale);
	}

	@Override
	public void onTranscoderSessionCreate(LiveStreamTranscoder liveStreamTranscoder)
	{
	}

	@Override
	public void onTranscoderSessionInit(LiveStreamTranscoder liveStreamTranscoder)
	{
	}

	@Override
	public void onTranscoderSessionDestroy(LiveStreamTranscoder liveStreamTranscoder)
	{		
		if (this.countGPU > 1)
		{
			WMSProperties props = liveStreamTranscoder.getProperties();
			Object sessionInfoObj = props.get(PROPNAME_TRANSCODER_SESSION);
			if (sessionInfoObj != null && sessionInfoObj instanceof SessionInfo)
			{
				SessionInfo sessionInfo = (SessionInfo)sessionInfoObj;
				
				if (sessionInfo.gpuid < gpuInfos.length)
				{
					synchronized(this.lock)
					{
						gpuInfos[sessionInfo.gpuid].currentLoad -= sessionInfo.load;
						sessionInfo.load = 0;
					}
					
					WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionDestroy["+liveStreamTranscoder.getContextStr()+"]: Removing GPU session: gpuid:"+sessionInfo.gpuid+" load:"+sessionInfo.load);
				}
			}
		}
	}
	
	@Override
	public void onHardwareInspection(TranscoderContextServer transcoderContextServer)
	{		
		//{"infoCUDA":{"availabe":true,"availableFlags":65651,"countGPU":1,"driverVersion":368.81,"cudaVersion":8000,"isCUDAOldH264WindowsAvailable":false,"gpuInfo":[{"name":"GeForce GTX 960M","versionMajor":5,"versionMinor":0,"clockRate":1097500,"multiprocessorCount":5,"totalMemory":2147483648,"coreCount":640,"isCUDANVCUVIDAvailable":true,"isCUDAH264EncodeAvailable":true,"isCUDAH265EncodeAvailable":false,"getCUDANVENCVersion":5}]},"infoQuickSync":{"availabe":true,"availableFlags":537,"versionMajor":1,"versionMinor":19,"isQuickSyncH264EncodeAvailable":true,"isQuickSyncH265EncodeAvailable":true,"isQuickSyncVP8EncodeAvailable":false,"isQuickSyncVP9EncodeAvailable":false,"isQuickSyncH264DecodeAvailable":true,"isQuickSyncH265DecodeAvailable":false,"isQuickSyncMP2DecodeAvailable":true,"isQuickSyncVP8DecodeAvailable":false,"isQuickSyncVP9DecodeAvailable":false},"infoVAAPI":{"available":false},"infoX264":{"available":false},"infoX265":{"available":false}}
		
		boolean available = false;
		int countGPU = 0;

		String jsonStr = transcoderContextServer.getHardwareInfoJSON();
		if (jsonStr != null)
		{
			try
			{
				JSON jsonData = new JSON(jsonStr);
								
				if (jsonData != null)
				{
					Map<String, Object> entries = jsonData.getEntrys();
					
					Map<String, Object> infoCUDA = (Map<String, Object>)entries.get("infoCUDA");
					if (infoCUDA != null)
					{
						
						Object availableObj = infoCUDA.get("availabe");
						if (availableObj != null && availableObj instanceof Boolean)
						{
							available = ((Boolean)availableObj).booleanValue();
						}
						
						if (available)
						{
							Object countGPUObj = infoCUDA.get("countGPU");
							if (countGPUObj != null && countGPUObj instanceof Integer)
							{
								countGPU = ((Integer)countGPUObj).intValue();
							}
						}
					}
				}
			}
			catch(Exception e)
			{
				WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection: Parsing JSON: ", e);
			}
		}
				
		WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onHardwareInspection: CUDA available:"+available+" countGPU:"+countGPU);

		synchronized(lock)
		{
			this.available = available;
			this.countGPU = countGPU;
			
			if (this.countGPU > 1)
			{
				this.gpuInfos = new GPUInfo[this.countGPU];
				for(int i=0;i<this.gpuInfos.length;i++)
				{
					this.gpuInfos[i] = new GPUInfo();
					
					this.gpuInfos[i].gpuid = i;
					
					if (this.gpuWeights != null && i < this.gpuWeights.length)
						this.gpuInfos[i].weight = this.gpuWeights[i];
					else
						this.gpuInfos[i].weight = gpuWeightScale;
				}
			}
		}
	}
	
	@Override
	public void onTranscoderSessionLoadBalance(LiveStreamTranscoder liveStreamTranscoder)
	{		
		try
		{
			while(true)
			{
				if (this.gpuInfos == null)
					break;
				
				TranscoderStream transcoderStream = liveStreamTranscoder.getTranscodingStream();
				if (transcoderStream == null)
					break;
				
				TranscoderSession transcoderSession = liveStreamTranscoder.getTranscodingSession();
				if (transcoderSession == null)
					break;
				
				TranscoderSessionVideo transcoderSessionVideo = transcoderSession.getSessionVideo();
				if (transcoderSessionVideo == null)
					break;
				
				MediaCodecInfoVideo codecInfoVideo = null;
				if (transcoderSessionVideo.getCodecInfo() != null)
					codecInfoVideo = transcoderSession.getSessionVideo().getCodecInfo();

				long loadDecode = 0;
				long loadScale = 0;
				long loadEncode = 0;
				
				boolean isScalerCUDA = false;
				
				TranscoderStreamSourceVideo transcoderStreamSourceVideo = null;
				TranscoderStreamScaler transcoderStreamScaler = null;
				
				TranscoderStreamSource transcoderStreamSource = transcoderStream.getSource();
				if (transcoderStreamSource != null)
				{
					transcoderStreamSourceVideo = transcoderStreamSource.getVideo();
					if (transcoderStreamSourceVideo != null && codecInfoVideo != null && (transcoderStreamSourceVideo.isImplementationNVCUVID() || transcoderStreamSourceVideo.isImplementationCUDA()))
					{
						loadDecode = codecInfoVideo.getFrameWidth() * codecInfoVideo.getFrameHeight();
					}
					else
						transcoderStreamSourceVideo = null;
				}
				
				transcoderStreamScaler = transcoderStream.getScaler();
				if (transcoderStreamScaler != null)
				{
					isScalerCUDA = transcoderStreamScaler.isImplementationCUDA();
				}
				
				List<TranscoderStreamDestination> destinations = transcoderStream.getDestinations();
				if (destinations == null)
					break;
				
				for(TranscoderStreamDestination destination : destinations)
				{
					if (!destination.isEnable())
						continue;

					TranscoderStreamDestinationVideo destinationVideo = destination.getVideo();
					
					if (destinationVideo == null)
						continue;
					
					if (destinationVideo.isPassThrough() || destinationVideo.isDisable())
						continue;
					
					TranscoderVideoFrameSizeHolder frameSizeHolder = destinationVideo.getFrameSizeHolder();
					if (frameSizeHolder == null)
						continue;
					
					if (isScalerCUDA)
						loadScale += frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
					
					if (destinationVideo.isImplementationNVENC() || destinationVideo.isImplementationCUDA())
						loadEncode += frameSizeHolder.getActualWidth() * frameSizeHolder.getActualHeight();
				}
				
				long totalLoad = (loadDecode*weightFactorDecode) + (loadScale*weightFactorScale) + (loadEncode*weightFactorEncode);
				if (totalLoad <= 0)
					break;
				
				totalLoad /= LOAD_MAG;
				
				if (totalLoad <= 0)
					totalLoad = 1;
				
				int gpuid = -1;
				
				synchronized(lock)
				{
					long leastLoad = Long.MAX_VALUE;
					
					for(int i=0;i<gpuInfos.length;i++)
					{
						if (gpuInfos[i].getWeightedLoad() < leastLoad)
						{
							leastLoad = gpuInfos[i].getWeightedLoad();
							gpuid = i;
						}
					}
					
					if (gpuid >= 0)
						gpuInfos[gpuid].currentLoad += totalLoad;
				}

				WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance["+liveStreamTranscoder.getContextStr()+"]: gpuid:"+gpuid+" load:"+totalLoad+" [decode:"+loadDecode+" scale:"+loadScale+" encode:"+loadEncode+"]");

				if (gpuid >= 0)
				{
					liveStreamTranscoder.getProperties().put(PROPNAME_TRANSCODER_SESSION, new SessionInfo(gpuid, totalLoad));

					if (transcoderStreamSourceVideo != null)
						transcoderStreamSourceVideo.setGPUID(gpuid);
					
					if (transcoderStreamScaler != null && isScalerCUDA)
						transcoderStreamScaler.setGPUID(gpuid);
					
					for(TranscoderStreamDestination destination : destinations)
					{
						if (!destination.isEnable())
							continue;

						TranscoderStreamDestinationVideo destinationVideo = destination.getVideo();
						
						if (destinationVideo == null)
							continue;
						
						if (destinationVideo.isPassThrough() || destinationVideo.isDisable())
							continue;
																		
						if (destinationVideo.isImplementationNVENC() || destinationVideo.isImplementationCUDA())
							destinationVideo.setGPUID(gpuid);
					}
				}
				break;
			}
		}
		catch(Exception e)
		{
			WMSLoggerFactory.getLogger(CLASS).info(CLASSNAME+".onTranscoderSessionLoadBalance: Parsing JSON: ", e);
		}
	}	
}

不同性能的多个GPU之间的负载均衡

这个内建的TranscoderVideoLoadBalancerCUDASimple class 支持在不同性能的多个GPU之间实现负载均衡。你可以为每一个GPU设置不同的性能权重(或者叫负载权重，因为性能越高的当然可以承担更多的负载任务)。这个权重配置在transcoderVideoLoadBalancerCUDASimpleGPUWeights参数中。

这个参数在一个列表中，用逗号分隔各个GPU的不同权重。我们建议你将性能最好的GPU的权重设置为100，然后其它性能低的GPU根据具体性能设置为对应的百分比。

对于这个列表中的GPU权重的顺序，你可以在Wowza Streaming Engine的启动日志中看到，也就是说这里的顺序和日志中显示的GPU顺序是一样的。例如，如果你的服务器上有一个M5000 卡 (顺序 0) 和一个 M2000 卡 (顺序 1)，那么在transcoderVideoLoadBalancerCUDASimpleGPUWeights中的权重可以按如下来配置:

<Property>
	<Name>transcoderVideoLoadBalancerCUDASimpleGPUWeights</Name>
	<Value>100,66</Value>
</Property>

这表示了M2000卡的性能只有M5000卡性能的66%。