Building the
Data Platform for
Deep Learning AI
We're at the cusp of a major shift in computing.
The intersection of AI and accelerated
computing is set to redefine the future."
Jensen Huang
Co-Founder, President and CEO of Nvidia Corporation
Deep learning is happening today, but using a collection of
tools and technologies largely, designed in an era that
pre-dates modern artificial intelligence.
Current Infrastructure Complexity
Fleet & Hardware Labeling Teams
Managers and/or Partners
01 Near 02 Fleet Monitoring 05 Automated & Manual 07 Analytics Datasets
Real-time Ingest & Alerting Image Labeling & Query Tools
Test Vehicles
IoT Core Kinesis S3 Raw Redshift Managed Grafana SNS SageMaker S3 Image Lambda SageMaker S3 Human S3 Simulation Athena
Data Firehose Drives Dashboard Alerting Processing Labels Anonymize Ground Truth Annotations & KPI Results
(Parquet) for Object & Images
Lane Detection
03 Ingest 04 Data Quality Checks 06 Scene Detection
Customer to Cloud & Sensor Extraction
Facility Glue Data OpenSearch
Catalog
Data
Visualization
Copy Process
Api Gateway Batch S3 Validated Batch S3 Extracted EMR weather & S3 enriched, EMR
Drive & File Registry Recordings Data map enrichment synced drive data
& synchronization
S3 Scenes Redshift
& Labels
(Apache Iceberg)
S3 Row Recording Batch
Direct
Staging Bucket
Connect
08 Orchestration, Observability, Traceability 09 CICD 10 Visualization & Debugging Developer
DataLogger
X-Ray Neptune CloudWatch MWAA Airflow Managed CodePipeline CodeBuild EKS Data S3 Staged FSx Staged Nice Step Functions EC2 Dev
‘Removable’
Data Lineage Granafa Generation Viz Data Viz Data DCV Instance Instance
Media
Service Orchestration
11 User-Facing Tools & Applications 12 Simulation, Reprocessing & KPI Computation 13 High Level Metadata
Managed Amplify CloudFront Amazon EMR SageMaker EKS Batch QuickSight DynamoDB DynamoDB
Granafa Web Apps Web Apps Notebooks Notesbooks Drive & File Metadata Sensor Quality Labels
Search & Validation & Verification Engineers Trigger
Visualize Data Simulations
While traditional data platforms deliver on the promise of
simplicity by combining compute, storage and database
together - none of them are designed for Deep Learning and
SuperPOD scale supercomputing.
How Can We Build The Infrastructure
for the Future of AI-Powered Discovery?
The VAST Data Platform is
Data Platform
a transformative data
infrastructure offering,
unifying storage, database
and compute engine services
in a scalable system that was
built from the ground up for
the future of AI.
The VAST Data Platform
DataStore introduces integrated
The Unstructured Data Foundation for Deep Learning,
Breaking the Tradeoff Between Performance & Capacity capability that allows customers
to store, catalog and compute
DataSpace
Edge-to-Cloud Global Namespace Breaking the Tradeoff
on unstructured data from
Between Performance & Consistency
anywhere, at any scale.
DataBase
Apply Structure to Unstructured Data At Any Scale Breaking
the Tradeoffs Between Transaction Systems & Deep Analytics
By breaking decades of
tradeoffs, each aspect of
DataEngine the VAST Data Platform
The Compute Engine of the VAST Data Platform Bringing Insights
to Life By Adding Functions & Triggers to Data
is engineered to make
infrastructure ultimately simple.
Developer Tools
AI Workflows, Frameworks and Pretrained Models
Data Capture Data Prep Model Training Model Serving
Dev Control Plane Cluster Engine
AI Lifecycle Resource Pooling Workload Orchestration
Integration Policy Engine GPU Fractioning
Data Science and AI Development and Deployment Tools/Accelerators Zero
Trust
Approach
DataStore DataBase Data Engine
Data Platform
DataSpace
Edge Core Cloud
x86 and GPU enabled servers + K8s
High-Speed Compute and Storage Connectivity
Best Deep Learning Platform The Only Enterprise Built on
Platform Certified for “the architecture of the future”
NVIDIA Cloud Partners (NCP)