ASUS L40S Server Systems for Generative AI

ASUS RP-NVIDIAL40S dual-band range extender can extend wireless coverage of up to 2200 sq. ft., providing seamless WiFi across your home.

Accelerate AI and Graphics Performance

To transform with generative AI, enterprises need to deploy more compute resources at a larger scale - and ASUS offers multiple NVIDIA L40S servers, providing faster time to AI deployment with quicker access to GPU availability and better performance per dollar with powerful computing performance.

ASUS is a select NVIDIA OVX server system provider and experienced and trusted AI-solutions provider, with the knowledge and capabilities to bridge technology chasms and deliver optimized solutions to customers.

Top 3 Reasons on
ASUS L40S Server Systems

Faster Deployment

Short lead time
Better Price-Performance

2X better performance than A100
Higher Performance

Powerful AI & Graphics

NVIDIA L40S

The NVIDIA L40S GPU, based on the Ada Lovelace architecture, is the most powerful universal GPU for the data center, delivering breakthrough multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications.

Learn more

Fine Tuning LLM

4hrs

GPT-175B 860M Tokens
LLM Inference

1.1X

Performance vs. HGX A100
AI Inference

1.5X

Performance vs. A100 80GM SXM2

NVIDIA L40S Specifications

	L40S	A100 80GB SXM
Best For	Universal GPU for Gen AI	Highest Perf Multi-Node AI
GPU Architecture	NVIDIA Ada Lovelace	NVIDIA Ampere
FP64	N/A	9.7 TFLOPS
FP32	91.6 TFLOPS	19.5 TFLOPS
RT Core	212 TFLOPS	N/A
TF32 Tensor Core	366 TFLOPS	312 TFLOPS
FP16/BF16 Tensor Core	733 TFLOPS	624 TFLOPS
FP8 Tensor Core	1466 TFLOPS	N/A
INT8 Tensor Core	1466 TOPS	1248 TFLOPS
GPU Memory	48 GB GDDR6	80 GB HBM2e
GPU Memory Bandwidth	864 GB/s	2039 GB/s
L2 Cache	96 MB	40 MB
Media Engines	3 NVENC(+AV1) 3 NVDEC 4 NVJPEG	0 NVENC 5 NVDEC 5 NVJPEG
Power	Up to 350 W	Up to 400 W
Form Factor	2-slot FHFL	8-way HGX
Interconnect	PCle Gen4 x 16: 64 GB/s	PCle Gen4 x 16: 64 GB/s

NVIDIA L40S for LLM Training

Great solution for fine tuning, training small models and small/mid scale training up to 4K GPU.

Fine-Tuning Existing Models (Time to Train 860M Tokens)
		Expected Speedup w TE/FP8
	HGX A100	L40S	HGX H100
GPT-40B LoRA (8 GPU)	12 hrs.	1.7x	4.4x
GPT-175B LoRA (64 GPU)	6 hrs.	1.6x	4.3x

Training Small Models (Time to Train 10B Tokens)
		Expected Speedup w TE/FP8
	HGX A100	L40S	HGX H100
GPT-7B (8 GPU)	12 hrs.	1.7x	4.4x
GPT-13B (8 GPU)	6 hrs.	1.6x	4.3x

Training Foundation Models (Time to Train 300B Tokens)
		Expected Speedup w TE/FP8
	HGX A100	L40S	HGX H100
GPT-175B (256 GPU)	64 hrs.	1.4x	4.5x
GPT-175B (1K GPU)	16 hrs.	1.3x	4.6x
GPT-175B (4K GPU)	4 hrs.	1.2x	4.1x