Richard Mallah is the Director of Advanced Analytics for Cambridge Semantics, The Smart Data Company, where he leads the capabilities in unstructured data analysis and harmonization, distributed warehousing and analytics scale-out. He is also an advisor and board member to startups in emerging technologies and artificial intelligence.
icrunchdata News talks to analytics leaders about what they are currently working on today, their careers in technology and their interests outside of the data. We recently spoke to Richard about what is happening at Cambridge Semantics, his interests in advising startups and what he enjoys to do outside of the office.
Richard, thanks for speaking with us…
There have actually been quite a few big milestones in that time. We have so many more colleagues, so many more clients…we’ve earned a large number of industry awards, and those were all since then. On the product front, we’ve really come into our own on the unstructured analytics, the text analytics side of the house, even beating competition like Watson, and leading that effort has been such a great experience. We also entered the financial services vertical, and my prior experience there helped to orient us in that space initially.
I’d say the biggest shift in that time was in how we were perceived, going from “semantic knowledge graphs??” early on to “semantic knowledge graphs!!” today. These days, because people really get the concept of the extreme flexibility that semantic knowledge graphs provide, there’s so many more value-added advanced analytics niches we’ve been able to move into.
Currently, we’re all really excited about the Anzo Smart Data Lake platform that’s been in the works for a while – essentially a big data store where you can perform ad hoc queries at massive scale across complete disparate types of sources, with true ease.
That’s a really good question. Generally when people think of Big Data, they just think of data that has tons and tons of rows, but actually usually in relatively few tables. That’s actually pretty simple data, simple in structure. In real-world enterprises, the data that more people care about are much more complex, messier, dirtier, wider, with multiple extensive and overlapping schemas.
Smart data handles all of that. It’s semantically described data -- more of the “smarts” that make everything work are not in some application somewhere, but in the data itself. So “smart data”. This enables more knowledge to be declarative and shared in the organization rather than procedural and locked up in narrow flows. We started using the term a few years ago, but it’s picked up in the past year or two.
By smartening up your data itself with our tools, i.e. semantically describing the data, its structure, and some basic constraints all together, and having that flow alongside the data…and also by virtue of the fact that, in a semantic context, metadata is data, and data is metadata, tons of new possibilities are opened up: new possibilities for flexibly exploring, visualizing, querying, and analyzing all of that complex ragged data.
That is, all of that can then be done by subject matter experts, without the need for programmers or data scientists. And of course some of the dimensions in that ragged data can be big data themselves, and that’s where something like the Smart Data Lake comes in.
So we’re working full steam ahead on the next version of that Smart Data Lake. Enterprises will be able to manage and query across arbitrary numbers of smart big data datasets. In navigating those in the data catalog, we’ll present a dataset summary for each, where Anzo automatically figures out what concepts, properties, and whatnot are important, and can auto-visualize generalizations of what’s going on in the data, or what should stand out.
We’re also excited about the upcoming Anzo Unstructured Smart Data Lake, where our unstructured analytics themselves go truly distributed, so we’ll be able to pull out lots of quality insights, entities, events, and relationships from text at millions of documents per minute.
We’re also working on some really exciting machine learning approaches to building and analyzing knowledge graphs.
Sure. OpportunitySpace facilitated the wildly popular pop-up plaza called ReSurfaced in downtown Louisville, Kentucky. The mayor there has been great about embracing experimentation, including alternative forms of development. They have an excess of parking lots in their downtown, so both government officials and private citizens look for ways to diversify the landscape. After a trip to Memphis exposed the client, a private citizen, to a pop-up beer garden, they returned home to Louisville wanting to replicate that.
The OpportunitySpace platform helped the client a lot in navigating the site selection process, and in securing, the right site on which they then hosted “ReSurfaced: A Pop-Up Plaza on Main.” That ran for a couple of months last year as a beer garden, outdoor garden, and event space. The project drew consistent crowds and was really successful at showing the economic potential of unused spaces. In June this year, ReSurfaced came back with a bourbon-themed pop-up to coincide with Louisville’s month-long celebration of the state’s famous liquor.
Oh absolutely. So so many. Newer AI and Advanced Analytics techniques are changing work paradigms all over the place.
One great example is custom systems and custom application development. At Cambridge Semantics, this sector is really our biggest competitor, and we’re winning against it all over the place. Really when you think of it, there’s so much that’s in common between most business software, and what’s different is just what part of the world, or I mean, what types of things, you care about. So why not just let people describe how they think of the world? And from that auto-generate business apps, auto-annotate documents, auto-blend data sources, auto-write ETL jobs, assemble workflows, etc. That’s essentially what our software does…much more efficient than the traditional firehose of billed hours!
More broadly, I am concerned about what more automation like this can mean for employment and the economy though, and that’s one of the issues about AI that I help wrestle in my volunteer work at the Boston-based nonprofit Future of Life Institute.
It’s nowhere near the Valley in size, but Boston’s startup scene is actually pretty strong. Boston has always been strong in tech, particularly in B2B and also biotech-related, but recently actually there's acceleration in entrepreneurship here. VCs are even moving from their traditional Waltham to Cambridge to be close to a new generation of startups. In terms of feeder schools, of course MIT, Harvard, and lots of other local schools are great feeders, but we actually have lots of great folks from Stanford, Cornell, Columbia, Carnegie Mellon, etc. too around.
For those startups who really consider themselves “startups”, there's a strong sense of community, and it's concentrated, so it is an ideal place for first-time entrepreneurs to experiment and try new ideas. Resources like the Cambridge Innovation Center, BostonTechGuide.com, and even just Meetups help entrepreneurs quickly make connections across town.
At this point, Cambridge Semantics is a pretty established software company and has likely grown out of the term ‘startup’. As for a couple of young emerging companies to call out, in early-stage tech, I'm excited about the marketing content planning SaaS play MarketMuse (of which, full disclosure, I'm on the board). Life sciences is of course big in Boston, and so in early-stage healthcare, I'm really excited about WaveGuide Corp., which makes a cellphone-sized NMR machine.
Yeah, so, when I was growing up, my dad spoke French with his siblings, and over time I was able to get the gist of what they were talking about without ever explicitly learning the language. I was fascinated by how unstructured the unfamiliar language seemed to me at first, and how I was able to find enough in the patterns to understand it to the depth I wanted to.
Spanish I learned in school, I learned to read Hebrew in Hebrew School, and I studied Japanese in college. Korean and Mandarin were from pretty brief self-study and tutoring. Though I can’t really claim to be fluent in any of those myself, understanding the differences between languages has been valuable in my study and subsequent work in computational linguistics. As a result of that, today Anzo Unstructured understands a hundred languages!
Interesting question Todd! I’ve always been into the sciences and engineering, so the question then is how close to technology I’m allowed to come!
I love engineering, and making things, and almost went for a mix of different kinds of engineering in college. I also love the sciences of all kinds, the natural sciences, the hard sciences. There’s a lot of really interesting developments these days in cognitive science, in neuroscience, in physics, in genetics. I’d probably be some kind of interdisciplinary scientist.
It may sound cheesy, but I’m thankful for everything that’s happened to me and that I’ve made happen. They make me who I am today. That said, and assuming stock tips and the like are mooted, I might recommend founding a startup with a business-minded cofounder sooner rather than later, to take advantage of the dot-com bust of the time.
Well thank you Todd. This has been one of the more fun interviews I’ve done!