San Francisco-based startup Parallel Domain has introduced Data Lab, an innovative API that grants its customers the ability to generate synthetic datasets. Leveraging the power of generative AI, Data Lab empowers machine-learning engineers with control over dynamic virtual worlds, enabling them to simulate a wide range of scenarios.
To access the API, customers simply need to install it from GitHub and begin writing Python code to generate datasets, according to Kevin McNamara, Founder and CEO of Parallel Domain.
Data Lab offers engineers the capability to generate objects that were previously unavailable in the startup’s asset library. By utilizing 3D simulation, the API establishes a foundation where engineers can layer real-world elements onto the virtual world through straightforward prompts. Whether it’s training models to navigate a highway with obstacles or identifying humans disguised in inflatable dinosaur outfits, the possibilities are vast.
The primary objective is to provide autonomy, drone, and robotics companies with increased control and efficiency in constructing large datasets, enabling them to train their models more quickly and at a deeper level. McNamara highlights that the iteration time now relies on how fast ML engineers can conceive their requirements and translate them into API calls and code.
Parallel Domain boasts major original equipment manufacturers (OEMs) involved in building advanced driver assistance systems (ADAS) and autonomous driving companies as its customers. Traditionally, it would have taken weeks or months for the startup to create datasets based on specific customer parameters. However, with the self-serve API, customers can generate new datasets in near real-time.
On a larger scale, Data Lab has the potential to expedite the scaling of autonomous driving systems. McNamara reveals that when testing certain autonomous vehicle models using synthetic datasets of strollers compared to real-world datasets, the models performed better when trained on synthetic data.
Although Parallel Domain is not utilizing OpenAI APIs like ChatGPT, the startup is building components of its technology using foundation models that have been open-sourced in recent years. Custom technology stacks have been developed by the team to label objects as they generate, utilizing capabilities such as Stable Diffusion to fine-tune their own versions of foundation models and employ text input for image and content generation.
Initially launched for internal use and beta testing with trusted customers in May, Parallel Domain’s synthetic data generation engine, Reactor, is now accessible to customers through the Data Lab API. This development is expected to shape the startup’s business model, shifting towards a software-as-a-service (SaaS) approach where customers can subscribe to platform access and pay based on their usage.
Furthermore, the API holds potential for Parallel Domain to expand into various industries where computer vision-enabled technology drives efficiency, such as agriculture, retail, and manufacturing. The startup aims to become the go-to platform for training AI models in any domain that requires the perception of the world through sensor-driven systems.