Investing in Datavolo to Help Enterprises Harness Their Unstructured Data for Generative AI Applications

Vibhor Rastogi

Head of AI Investing, Citi Ventures

Cagla Kaymaz

Principal, Citi Ventures

Olivia Zhang

Assistant Vice President, Citi Ventures

Logo

Since it hit the mainstream in late 2022, generative artificial intelligence (GenAI) has quickly become a transformational technology. McKinsey predicts GenAI could add the equivalent of $2.6 trillion to $4.4 trillion annually to the global economy, contributing an additional $200 billion to $340 billion annually in the banking sector alone. In this “gold rush” spirit, organizations are rapidly assessing how GenAI can boost productivity in their businesses and create customer value — both increasing revenues and reducing costs.

However, they face a major challenge: unstructured data. Simply defined, unstructured data is any information that cannot be stored in a column-row database (e.g., documents, emails, PDFs, images, audio and video files). This type of data, which comprises 80-90% of all new enterprise data, is richer and deeper than structured data, making it critical to unlocking GenAI’s full potential via large language model (LLM) customization but also harder to manage. Current data extraction tools offer few solutions: Most only handle structured data, support limited file types and/or are unable to meet enterprise requirements for compliance, security and scale. This leaves enterprises with the sole option of processing their unstructured data in-house, which involves building complex algorithms and tackling expensive, time-consuming processes.

That's why we're excited to invest in Datavolo, the leader in multimodal data pipelines for AI. Founded in 2023, Datavolo enables enterprises to extract, clean, standardize, enrich and distribute relevant information from their unstructured and structured data alike in a secure, simple and scalable manner.

Leveraging Apache NiFi — an open-source tool purpose-built at the National Security Agency (NSA) to ingest unstructured data at scale — Datavolo offers a containerized, no-code service that helps enterprises quickly build and deploy flexible data pipelines that can handle different types of files, destinations, transformations and AI models. Datavolo also provides APIs to support advanced retrieval-augmented generation (RAG), embedding model integrations and vector databases, and has centralized security, governance and observability capabilities — all of which are top-of-mind for enterprises looking to build GenAI applications.

Furthermore, Datavolo has set forth an impressive, comprehensive roadmap to add capabilities to its already robust solution. First, it plans to create tools for building LLM-based apps, including new data extraction and transformation processors. The company is also looking to use AI to enhance data engineering tasks, including enabling data engineers to use natural language to create dataflows and to build, understand and manipulate script transforms.

Our confidence in Datavolo ultimately stems from its founders’ long history as visionaries in the data and analytics space. Through his first startup Onyara and in collaboration with the NSA, CEO Joe Witt spearheaded the project that would become Apache NiFi, which has been adopted by thousands of agencies and companies around the world since its release in 2006. After Onyara's acquisition in 2015, Joe went on to become Corporate Vice President of Engineering for the data-in-motion portfolio at Cloudera before founding Datavolo. And COO Luke Roquet has been a senior sales and marketing executive in the data and analytics space for well over a decade, working for innovative companies such as Hortonworks, Unravel Data, AWS and Cloudera.

Given Datavolo’s leadership position in the GenAI unstructured data and analytics space and its expert, deeply experienced founding team, we’re pleased to invest in the company’s Series A funding round — joining General Catalyst, Human Capital, Rob Bearden and MVP Ventures. We look forward to supporting Datavolo as it helps enterprises unlock the full potential of GenAI. Congratulations to Joe, Luke and entire Datavolo team!

For more information, email Vibhor Rastogi at vibhor.rastogi@citi.com, Cagla Kaymaz at cagla.kaymaz@citi.com or Olivia Zhang at oliva.zhang@citi.com.

To see Citi Ventures’ full portfolio of companies, click here.