ASUS RP-NVIDIAL40S dual-band range extender can extend wireless coverage of up to 2200 sq. ft., providing seamless WiFi across your home.

Accelerate AI and Graphics Performance

To transform with generative AI, enterprises need to deploy more compute resources at a larger scale - and ASUS offers multiple NVIDIA L40S servers, providing faster time to AI deployment with quicker access to GPU availability and better performance per dollar with powerful computing performance.

ASUS is a select NVIDIA OVX server system provider and experienced and trusted AI-solutions provider, with the knowledge and capabilities to bridge technology chasms and deliver optimized solutions to customers.

Top 3 Reasons on
ASUS L40S Server Systems

  • the icon of Faster Deployment

    Faster Deployment

    Short lead time

  • the icon of Better Price-Performance

    Better Price-Performance

    2X better performance than A100

  • the icon of Higher Performance

    Higher Performance

    Powerful AI & Graphics

NVIDI L40S product image

NVIDIA L40S

The NVIDIA L40S GPU, based on the Ada Lovelace architecture, is the most powerful universal GPU for the data center, delivering breakthrough multi-workload acceleration for large language model (LLM) inference and training, graphics, and video applications.
Learn more
  • Fine Tuning LLM

    4hrs

    GPT-175B 860M Tokens

  • LLM Inference

    1.1X

    Performance vs. HGX A100

  • AI Inference

    1.5X

    Performance vs. A100 80GM SXM2

NVIDIA L40S Specifications

L40S A100 80GB SXM
Best For Universal GPU for Gen AI Highest Perf Multi-Node AI
GPU Architecture NVIDIA Ada Lovelace NVIDIA Ampere
FP64 N/A 9.7 TFLOPS
FP32 91.6 TFLOPS 19.5 TFLOPS
RT Core 212 TFLOPS N/A
TF32 Tensor Core 366 TFLOPS 312 TFLOPS
FP16/BF16 Tensor Core 733 TFLOPS 624 TFLOPS
FP8 Tensor Core 1466 TFLOPS N/A
INT8 Tensor Core 1466 TOPS 1248 TFLOPS
GPU Memory 48 GB GDDR6 80 GB HBM2e
GPU Memory Bandwidth 864 GB/s 2039 GB/s
L2 Cache 96 MB 40 MB
Media Engines 3 NVENC(+AV1)
3 NVDEC
4 NVJPEG
0 NVENC
5 NVDEC
5 NVJPEG
Power Up to 350 W Up to 400 W
Form Factor 2-slot FHFL 8-way HGX
Interconnect PCle Gen4 x 16: 64 GB/s PCle Gen4 x 16: 64 GB/s

NVIDIA L40S for LLM Training

Great solution for fine tuning, training small models and small/mid scale training up to 4K GPU.
Fine-Tuning Existing Models
(Time to Train 860M Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-40B LoRA (8 GPU) 12 hrs. 1.7x 4.4x
GPT-175B LoRA (64 GPU) 6 hrs. 1.6x 4.3x


Training Small Models
(Time to Train 10B Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-7B (8 GPU) 12 hrs. 1.7x 4.4x
GPT-13B (8 GPU) 6 hrs. 1.6x 4.3x


Training Foundation Models
(Time to Train 300B Tokens)
Expected Speedup w TE/FP8
HGX A100 L40S HGX H100
GPT-175B (256 GPU) 64 hrs. 1.4x 4.5x
GPT-175B (1K GPU) 16 hrs. 1.3x 4.6x
GPT-175B (4K GPU) 4 hrs. 1.2x 4.1x

Product for your solution

ESC8000-E11

8 GPUs, 4U, Dual-socket 4th Intel Xeon Scalable CPUs

ESC4000-E11

4 GPUs, 2U, Dual-socket 4th Intel Xeon Scalable CPUs

ESC4000-E10

4 GPUs, 2U, Dual-socket 3th Intel Xeon Scalable CPUs

ESC8000A-E12

8 GPUs, 4U, Dual-socket EPYC 9004 CPUs PCIe 5.0 switch solution

ESC8000A-E11

8 GPUs, 4U, Dual-socket EPYC 7003 CPUs

ESC4000A-E12

4 GPUs, 2U, Single-socket EPYC 9004 CPU

ESC4000A-E11

4 GPUs, 2U, Single-socket EPYC 9004 CPUs