Organizations that are struggling to build data products in a controlled and repeatable fashion may want to keep an eye on Zhamak Dehghani’s startup Nextdata, which is currently developing software that will automate much of the integration work when building data products as part of a data mesh.
Data mesh is one of the hottest concepts in big data, reflecting both the struggles and the aspirations that companies are experiencing as they try to embed data, analytics, and AI into everything they build. By enabling independent teams of data product developers access to enterprise data in a governed and self-service manner, a data mesh can help to unleash data creativity and productivity while avoiding data chaos.
Dehghani spearheaded the data mesh movement while working at the consulting firm Thoughtworks North America, which she left last year to create her stealth startup. In January, Dehghani announced that the startup, Nextdata, was developing a new class of product for building and managing data products as part of a data mesh.
While the company is still in the design and prototyping phase, the main pillars of the product have already been set and early testers are starting to get their hands dirty with it. Last week, Dehghani sat down with Datanami via Zoom to discuss what the new product is going to look like and how it will benefit organizations embarking upon a data mesh journey.
According to Dehghani, the main thrust of the offering is to serve as a middleware layer that enables developers to build and deploy data products in a simple yet governed manner. By enabling a data mesh architecture, the Nextdata offering will help to automate tasks that developers are currently left to fend for themselves on, while providing a higher level abstraction for developers to write to.
“We want to create this middleware, almost a logical layer of developer experience and data containers,” says Dehghani, who was a 2022 Datanami Person to Watch. “These logical containers sit on top of this very fragmented technology, and provide a data product-centric way of building and managing and sharing and connecting and discovering data products.”
Specifically, the offering will include development and runtime components. The development side will include drivers or APIs that let users incorporate popular technologies that they’re already using, like Spark and Python, into the data products they’re building. For the runtime, it will lean on Docker to encapsulate the code and data into containers that can run anywhere.
“We call it a data mesh OS [operating system], because it’s like middleware sitting on top of foundational technology,” she says. “We have a very specific containerization system and approach to what constitutes data as a product, or codifying it through the spec, and providing a set of build tools around it to manage its lifecycle,” she says.
Users will be able to deploy their data products to any infrastructure that supports Docker containers, including cloud or on-prem systems. There will also be mechanisms for bubbling up metadata to the data catalog that customers are using, Dehghani says.
“The way we want this to work is that every data product provides APIs real time, APIs about itself to make it self-discoverable,” she says. “So we provide real-time runtime information. ‘This is my address. This who I am. This is the data that I provide.’ All of this information that makes somebody that access this data product get access to it, understand it, trust it, use it.”
Each data product created under the Nextdata mesh will also be associated with a policy that states how the data generated by the product can be used. The software will provide that policy framework for responsibly governing the data product and the data that it’s generating. This is an important aspect of data product management that often gets overlooked in the haste to build new products and get them into people’s hands.
It’s “code, data, and policy,” all rolled up together, she says. In the same way that peer-to-peer APIs and microservices freed application developers to work more efficiency but at the expense of security or privacy, Nextdata hopes to create the new abstraction layer for data products that also enable developer freedom but not at the expense of privacy or security.
“At the moment that you’ve created this data product, you’re responsible for all of this attributes of it,” Dehghani says. “It could be an ML model sitting in it, or it could be a simple SQL script sitting in it. It doesn’t matter what it is. But you have to think about long-term ownership of it, the evolution of it. You have autonomy with responsibility.”
Some elements of Nextdata’s product will be open source, such as the drivers and APIs. Nextdata does not want to get into the business of building point-to-point integrations, which is an area that Deghani hopes will organically start to take off as more people start using the software.
Nextdata is targeting companies that have already started down the data mesh journey, but perhaps aren’t getting as much traction as they had hoped. To that end, they require somebody who will be a “champion” for data decentralization and data mesh inside the organization, Deghani says.
“If this is a big data team coming to us and saying ‘Oh we want a mesh but we’re in a centralized team.’ Yeah, we could give them a tool, but that’s not going to solve their problem,” she says. “So the belief in this decentralization of responsibly and data sharing and somewhere on the journey to make that happen, some partnership with their business units to make that happen.”
Dehghani and her Nextdata co-founder, CTO Raghotham Murthy, have designed and implemented data meshes for Thoughtworks. They’ve done it enough times to realize there’s a repeatable aspect to what they’re building, and with Nextdata, they’re taking a chance at defining that at the infrastructure level.
“Not everyone has to reinvent the wheel,” she says. “If every [company developing a data mesh] needs an army of solution developers and people like Thoughtworks or consultants to make it possible, we don’t have a product here.”
Data Mesh Creator Takes Next Data Step
How to Maximize the Value of Data with Data Mesh
Data Mesh Vs. Data Fabric: Understanding the Differences