Helios AI

Real-time long video generation model with 14 billion parameters. High-quality synthesis at 19.5 FPS on a single GPU.

The Next Stage in Generative Video

Helios AI introduces a different standard for high-performance generative models. Traditional video synthesis often struggles with maintaining coherence over longer durations. Many systems rely on heavy post-processing or complex strategies to prevent the output from becoming unstable. Helios AI takes a different path, achieving stability and quality through a purely autoregressive approach that operates in real-time.

This model architecture contains 14 billion parameters, providing sufficient capacity to capture the intricate patterns found in natural movement and complex scenes. Instead of adding more layers of correction, the system focuses on the fundamental efficiency of the generation process. This allows for the production of minute-scale videos without the typical drift associated with long-term generation.

Performance metrics show that Helios AI can reach 19.5 frames per second on a single H100 GPU. On specialized hardware like the Ascend NPU, it reaches approximately 10 FPS. This speed is achieved without using standard acceleration methods such as KV-cache, sparse attention, or hidden-state caching. The result is a system that is faster and easier to deploy across various hardware environments.

Architecture and Methodology

Autoregressive Generation Logic

At the core of Helios AI is an autoregressive framework designed to handle frame prediction in a sequential manner. The model processes video data in discrete chunks, typically generating 33 frames at a time. This chunk-based approach provides natural checkpoints for coherence while maintaining high throughput.

By avoiding bidirectional processing for longer segments, the model reduces the computational load significantly. The information flows from previous frames into the current generation block, ensuring that the motion remains consistent with earlier events. This temporal stability is achieved through careful parameter tuning and architectural optimizations during the training phase.

Eliminating Stability Strategies

Traditional methods often require:

  • Self-forcing mechanisms
  • Error-bank corrections
  • Keyframe re-sampling
  • Inverted sampling passes

Helios AI maintains quality without any of these secondary overheads.

Pure Performance Focus

Efficiency achieved without:

  • KV-caching structures
  • Causal masking overhead
  • Progressive noise schedules
  • Quantization techniques

Simple, native operation for maximum compatibility.

The absence of standard acceleration techniques is a deliberate choice. While methods like KV-caching provide benefits in specific contexts, they also add complexity to the inference pipeline and memory management. Helios AI proves that a well-optimized base model can deliver high frames per second through raw architectural efficiency. This simplifies the integration into third-party libraries and deployment frameworks.

The 14 billion parameter count allows the model to act as a stronger generator than smaller counterparts, even those that use extensive optimization tricks. The capacity to learn detailed textures and complex physical interactions is substantially higher, leading to videos that feel more stable and less prone to visual artifacts over time.

Model Variants

Model NameCapabilitiesPrimary Focus

Helios-Base

T2V, I2V, V2V, InteractiveBest quality, v-prediction, standard CFG

Helios-Mid

T2V, I2V, V2V, InteractiveAggressive sampling, intermediate efficiency

Helios-Distilled

T2V, I2V, V2V, InteractiveMax efficiency, x0-prediction, low steps

Note: All variants share the same 14B parameter architecture. The differences lie in the sampling pipelines and sampling schedules used to achieve specific performance profiles.

For common tasks such as Image-to-Video conversion, we suggest adjusting parameters like noise sigma values to match the specific characteristics of your input source. The text-to-video foundation provides a robust base for these extended functions.

Installation and Environment

Preparation steps

# 0. Repository cloning

git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git cd Helios

# 1. Environment creation

conda create -n helios python=3.11.2 conda activate helios

# 2. PyTorch installation

# Example for CUDA 12.8 pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128

# 3. Dependencies installation

bash install.sh

Successful deployment requires meeting certain hardware and software prerequisites. The performance figures cited earlier assume the use of H100 GPU hardware, but the base requirements allow for operation on various modern NVIDIA GPUs and Ascend NPU units. Memory management is a key focus, with optimizations allowing up to four 14B models to fit within 80GB of GPU VRAM simultaneously.

The installation script handles the core library setup, including support for Diffusers, vLLM, and SGLang. These integrations provide different entry points depending on whether your focus is on research experimentation or production-scale inference.

Inference and Operation

Operational parameters

Operation is based on an autoregressive pipeline that produces 33 frames per chunk. For the best results, it is suggested that the target frame count should be a multiple of this base chunk size. This ensures that the generated video segment aligns perfectly with the model prediction cycles.

~60s

1452 Frames

~30s

726 Frames

~11s

264 Frames

~4s

99 Frames

Durations approximate performance at 24 FPS playback speeds. 16 FPS playback will increase the temporal length of the generated sequences using the same frame counts.

Running the model is straightforward using the provided shell scripts. There are dedicated entrance points for different model variants and task types. Whether you are starting from a text prompt or an existing image, the command structure remains consistent.

cd scripts/inference bash helios-base_t2v.sh # Or for distilled performance bash helios-distilled_t2v.sh

Training Strategy

The training process for Helios AI follows a progressive three-stage pipeline. Each stage focuses on a specific aspect of the model performance, from foundation adaptation to final efficiency tuning.

1

Architectural Adaptation

Application of Unified History Injection and Multi-Term Memory Patchification to convert bidirectional kernels into autoregressive generators.

2

Token Compression

Introducing the Pyramid Unified Predictor Corrector to reduce noisy tokens and decrease the overall computational burden during generation.

3

Adversarial Hierarchical Distillation

Reducing sampling steps from 50 to 3 while removing the dependence on classifier-free guidance for maximum inference speed.

The training pipeline also incorporates dynamic shifting for all timestep-dependent operations. This ensures that the noise schedule remains perfectly matched to the latent size of the video, leading to consistent quality across different resolutions and aspect ratios.

Ecosystem and Optimization

The project maintains broad compatibility with the existing AI infrastructure. By supporting Context Parallelism, Helios AI can operate across multiple GPUs using techniques such as Ulysses Attention, Ring Attention, and Unified Attention. This allows users with large-scale hardware clusters to maximize their generation speeds further.

Memory consumption is another critical area where Helios AI provides significant advantages. Through the implementation of group offloading, the VRAM requirements can be reduced to approximately 6GB for certain tasks. This makes the 14B model accessible even to users without specialized high-memory hardware.

"Helios AI can reach up to 20.89 FPS on optimized single H100 hardware, demonstrating the high ceiling of this architecture."

Benchmark Methodology

To evaluate the performance and quality of Helios AI, we established a specialized testing framework called HeliosBench. This benchmark focuses specifically on real-time, long-duration video generation across different hardware configurations. Unlike traditional benchmarks that focus on short, isolated segments, HeliosBench tests the temporal consistency of the model over hundreds or thousands of frames.

Quality Evaluation

Assessment of textural detail and motion coherence using standardized metrics. The model is tested for drift—the tendency of objects to deform or disappear over time. Helios AI maintains a significantly lower drift score than comparable 1.3B parameter models.

Throughput Stress Testing

Measuring the frames per second sustained over long generation sessions. Testing includes various batch sizes and context parallelism configurations. The sustained throughput of 19.5 FPS is a hallmark of the architectural integrity of the system.

The methodology includes testing on diverse hardware including the NVIDIA H100, A100, and the Huawei Ascend 910B NPU. By providing a broad hardware profile, we ensure that developers can predict the behavior of the model in their specific production environments. The absence of heavy caching layers makes the performance deterministic and reliable.

Context parallelism support is verified through Ulysses and Ring Attention tests. These tests confirm that the model can scale across 2, 4, or 8 GPUs linearly, allowing for even higher throughput when resources are available. This scaling capability is vital for large-scale enterprise deployments.

Common Questions

Sustainable Open Source Progress

The development of Helios AI represents an important milestone for efficient generative AI. By prioritizing raw architectural performance over superficial acceleration tricks, the model provides a transparent and robust tool for the wider machine learning community. This commitment to simple, powerful logic ensures that the system remains adaptable as the underlying hardware and software ecosystems grow.

We invite developers, researchers, and hobbyists to explore the capabilities of this 14B parameter system. The modular nature of our repository makes it easy to experiment with new sampling schedules, architectural tweaks, and specialized training data. Together, we can continue to advance the state of real-time generative video.

Notice: This is an unofficial technical showcase. It does not represent the official views of the researchers.

Scientific Research led by PKU-YuanGroup. Distributed under Apache 2.0.

A Documentation Study for 14B Parameter Real-Time Systems.

This technical reference is part of the official Helios AI project documentation.