Skip to main content

Multiplayer Guide

Designing multiplayer-safe GenAI systems with ElysGenAI.

Core Principle

Audio processing is CLIENT-SIDE ONLY

Never replicate raw audio over the network. Only replicate text results and metadata.


Architecture Rules

What Runs on Client

  • Audio capture (microphone)
  • Speech-to-text processing
  • LLM generation (for player assistance)
  • Voice activity detection
  • Audio buffering

What Runs on Server

  • Text processing (from STT results)
  • Intent parsing
  • Command validation
  • Game state changes
  • NPC dialogue generation (LLM)

Never Replicate

  • Raw audio data (TArray<float>)
  • Audio buffers
  • Model inference results (internal)
  • Backend instances

Always Replicate

  • Transcribed text (FString)
  • Parsed intents
  • Command confirmations
  • NPC responses

Network Mode Check

Check network mode before processing audio:

void UERP_STTComponent::BeginPlay()
{
Super::BeginPlay();

// Only process audio on owning client
if (GetNetMode() == NM_DedicatedServer)
{
UE_LOG(LogElysSTT, Warning,
TEXT("STT on dedicated server - disabled"));
return;
}

if (!GetOwner()->HasAuthority() || GetOwner()->IsLocallyControlled())
{
StartListening();
}
}

Network Modes:

  • NM_Standalone: Single player - audio capture enabled
  • NM_ListenServer: Host + client - audio capture on locally controlled only
  • NM_DedicatedServer: Server only - NO audio capture
  • NM_Client: Connected client - audio capture on locally controlled only

Voice Command Pattern

Complete example showing client-side processing with server validation:

// PlayerController.h
UCLASS()
class AMyPlayerController : public APlayerController
{
GENERATED_BODY()

UPROPERTY(VisibleAnywhere)
UERP_STTComponent* STTComponent;

UFUNCTION(Server, Reliable, WithValidation)
void ServerExecuteVoiceCommand(const FString& Command, float Confidence);

UFUNCTION()
void OnTranscriptionReceived(const FERP_STTResult& Result);
};

// PlayerController.cpp
void AMyPlayerController::BeginPlay()
{
Super::BeginPlay();

// Only on owning client
if (IsLocalController())
{
STTComponent->OnTranscriptionComplete.AddDynamic(
this, &AMyPlayerController::OnTranscriptionReceived);
STTComponent->StartListening();
}
}

void AMyPlayerController::OnTranscriptionReceived(const FERP_STTResult& Result)
{
// Client-side: transcribe speech
if (Result.Confidence > 0.7f)
{
// Send text to server
ServerExecuteVoiceCommand(Result.TranscribedText, Result.Confidence);
}
}

// Server RPC
void AMyPlayerController::ServerExecuteVoiceCommand_Implementation(
const FString& Command, float Confidence)
{
// Server-side: validate and execute
if (Command.Contains(TEXT("attack")))
{
GetPawn()->PerformAttack();
}
else if (Command.Contains(TEXT("reload")))
{
GetPawn()->ReloadWeapon();
}
}

bool AMyPlayerController::ServerExecuteVoiceCommand_Validation(
const FString& Command, float Confidence)
{
// Prevent abuse: check confidence and length
return Confidence > 0.5f && Command.Len() < 256;
}

Flow: Microphone → Audio Capture (client) → STT (client) → RPC Text → Server Validation → Execute


Bandwidth Efficiency

Text vs Audio:

  • Raw audio: ~1.2 MB/minute (16kHz mono 16-bit)
  • Transcribed text: ~2 KB/minute (typical speech)
  • Reduction: 500x more efficient

Best Practices:

  • Always send text, never audio
  • Compress text with FArchive if sending frequently
  • Batch multiple commands if possible

Security Considerations

Command Validation

Always validate commands server-side:

bool AMyPlayerController::ServerExecuteVoiceCommand_Validation(
const FString& Command, float Confidence)
{
// Check confidence threshold
if (Confidence < 0.5f) return false;

// Check length (prevent spam)
if (Command.Len() > 256) return false;

// Check rate limit
float TimeSinceLastCommand = GetWorld()->GetTimeSeconds() - LastCommandTime;
if (TimeSinceLastCommand < 0.5f) return false;

LastCommandTime = GetWorld()->GetTimeSeconds();
return true;
}

Anti-Cheat

  • Never trust client-provided confidence scores for critical actions
  • Validate all commands match expected patterns
  • Rate-limit voice commands
  • Log suspicious activity (e.g., impossible confidence scores, rapid-fire commands)

NPC Dialogue

Server-side LLM for NPC responses:

// NPC.h (replicated actor)
UCLASS()
class ANPC : public ACharacter
{
GENERATED_BODY()

UPROPERTY(VisibleAnywhere)
UERP_LLMComponent* LLMComponent;

UFUNCTION(Server, Reliable, WithValidation)
void ServerSendDialogue(const FString& PlayerMessage);

UFUNCTION(NetMulticast, Reliable)
void MulticastDisplayResponse(const FString& Response);

UFUNCTION()
void OnLLMResponse(const FERP_LLMResult& Result);
};

// NPC.cpp
void ANPC::BeginPlay()
{
Super::BeginPlay();

// Only generate dialogue on server
if (HasAuthority())
{
LLMComponent->SetSystemPrompt(TEXT("You are a friendly merchant."));
LLMComponent->OnGenerationComplete.AddDynamic(
this, &ANPC::OnLLMResponse);
}
}

void ANPC::ServerSendDialogue_Implementation(const FString& PlayerMessage)
{
// Generate response on server
LLMComponent->SendMessage(PlayerMessage);
}

void ANPC::OnLLMResponse(const FERP_LLMResult& Result)
{
// Broadcast response to all clients
MulticastDisplayResponse(Result.GeneratedText);
}

void ANPC::MulticastDisplayResponse_Implementation(const FString& Response)
{
// Display dialogue on all clients
ShowDialogueBubble(Response);
}

Common Patterns

Pattern 1: Client STT → Server Processing

Use for: Voice commands, multiplayer actions Flow: Client captures → STT → Send text → Server validates → Execute

Pattern 2: Server LLM → Client Display

Use for: NPC dialogue, quest text Flow: Server generates → Multicast text → All clients display

Pattern 3: Client STT + Client LLM

Use for: Single-player helpers, local UI Flow: Client captures → STT → LLM → Display (no network)


Testing Multiplayer

Test in Editor

  1. Enable multiplayer PIE: Editor Preferences → Play → Number of Players = 2
  2. Set Net Mode to "Play as Listen Server"
  3. Test with both client and server windows

Verify Behavior

Check logs in both client and server windows:

// Client log (expected)
LogElysSTT: Started listening
LogElysSTT: Transcription: "attack" (0.85)

// Server log (expected)
LogGame: Received voice command: "attack" (0.85)
LogGame: Executing attack for Player 1

// Dedicated server log (expected - NO audio)
LogElysSTT: Warning: STT on dedicated server - disabled

Next Steps