Surendra Reddy, PARC; Cirrus Shakeri, SAP; Heinz Ulrich Roggenkemper, SAP; Hartmut Vogler, SAP; Jens Doerpmund, SAP
Graph analytics is a crucial element in extracting insights from Big Data because it helps discover hidden relationships and connecting the dots. A graph, meaning the network of nodes and relationships, treats the linkage between objects as equally important as the objects themselves. You can think of social networks or supply chains as obvious examples, but graphs include any network of objects such as customers, products, purchase orders, customer support calls, product inventory, etc.
PARC has invented a set of machine learning and reasoning for analyzing large graphs in real time. As you can imagine, high dimensionality and a rich tapestry of relationships in datasets need highly scalable algorithms. After four months of exploration with Hadoop + Hive, Native Map/Reduce, R/MR, and Mahout under different execution environments like multi-core, multi-threaded, and parallel computation we found the optimal solution by integrating PARC’s reasoning and insight discovery with SAP HANA. Automated reasoning needs multiple iterations of algorithmic runs, which need to go back and forth between graph analytics and HANA’s analytics.
PARC researchers have been exploring graph analytics, egocentric collaborative filtering, automated reasoning, graph based clustering, Bayesian Networks (BN), Probabilistic Graph Models (PGM), scalable machine learning, and contextual intelligence to be at the forefront of Big Data research. PARC’s main goal is to reduce and/or eliminate the need for complex ETL processes and introduce and invent automated machine learning to enable business people to directly explore the data and discover insights with reduced need for data sciences expertise.
SAP HANA is a fast, massively parallel ACID-compliant database platform for both analytical and transactional data processing. Both transactions and analytics are supported within the in-memory columnar engine, and all data processing and calculations take place in memory. HANA provides business and predictive libraries (e.g. for planning, text processing, spatial analytics), which can be called from within a rich stored-procedure language. What is unique about HANA is that it enables customers to perform complex analytical processing directly on top of the OLTP data structures, thus eliminating redundant data transfer and storage. Via HANA Live, customers have access to a large number of non-materialized and easy consumable business views for real-time reporting and application development.
HANA’s real-time response combined with PARC’s fast graph reasoning algorithms helped us to generate qualitatively superior output, including clusters with higher modularity, rapid discovery of hidden patterns and insights. But what is really exciting for us is the qualitatively innovative solutions that we are building based on this co-innovation. There is a match between PARC’s graph analytics and SAP HANA’s analytics that is unique in terms of turning the speed of computations into new ways of solving problems. For example, we can simulate the spread of diseases, optimize when and where vaccinations should be done, analyze viral marketing, detect next-best-action, optimize supply-chains with up-to-the-second transactions, and detect frauds with input data in real-time. Without HANA, PARC algorithms would require the development of a lower level data processing platform that is equally fast.
With the combination of HANA and PARC’s graph analytics (HiperGraph)we can finally deliver on the promise of a closed feedback loop in the enterprise where transactions are analyzed in real time to provide the error signal for real-time decision making and corrective actions. With HANA + HiperGraph graph analytics the intelligence that is implicit in large volumes of structured and unstructured data in many varieties of sources from inside or outside of the enterprise can be delivered to the users in the form of smart business applications. While HANA provides the unified computing platform for data processing, its combination with graph analytics adds the capability of ‘connecting the dots’ (literally via nodes and edges) and thus generating the intelligence from the data that is bigger than the sum of the parts. For example, a business application can be built that acts as an Intelligent Assistant to enterprise employees by connecting their daily work to similar projects or colleagues that they would otherwise not know about.
Ultimately, combining PARC’s graph analytics with SAP HANA’s analytics results in superior customer experience via personalization and real-time interaction with information. For example, the combination of HANA and graph analytics can provide real-time and interactive purchase recommendations for retail customers where their feedback results in re-computing the recommendations on-the-fly. Today’s Big Data analytics is based on the labor-intensive approach that relies on the scarce data scientists for analyzing data and extracting insights from it. With the combination of HANA and PARC’s graph analytics, Big Data analytics can be put in the hands of every employee in the enterprise by enabling them to make data-driven decisions.
For more details about the PARC-SAP co-innovation in the domain of Big Data please see the following white paper at http://www.parc.com/publication/3475/parc-and-sap-co-innovation.html. Right now SAP and PARC are entering a new phase of our partnership in order to bring this co-innovation to the market. In the coming months, we will provide more details on the technology and product roadmap. Stay tuned! Visit http://www.parc.com/services/focus-area/bigdata/ or follow us at @SAPInMemory and @PARCinc for updates.
ABOUT THE AUTHORS
Surendra Reddy is the Chief Technology Officer (CTO), Cloud and Big Data Futures, leading the High Performance Analytics Research and Innovation at PARC. He provides the leadership for driving the Big Data platform innovations, IP commercialization strategy, and establishing strategic alliances for GTM at PARC. Surendra Reddy is also leading the PARC Graph Analytics research on SAP HANA in collaboration with SAP.
Cirrus Shakeri is a Senior Director of the HANA Platform Strategic Projects at SAP focusing on bringing the value of Big Data to everyone in the enterprise via semantic search, recommendation systems, and intelligent business assistants. Cirrus’ mission at SAP is to help advance HANA into an artificial intelligence platform that turns everyone in the enterprise into a superhero with special powers!
Heinz Ulrich Roggenkemper serves as an Executive Vice President for Development of SAP Labs.
Hartmut Vogler is a Development Architect in the HANA Platform Strategic Projects team at SAP. Being with SAP since 1999, he has worked in different research and innovation team on a wide variety of topics and is now focusing to turn HANA into the new intelligent application platform for SAP. Hartmut holds more than 20 US and international patents.
Jens Doerpmund is a Senior Director and member of the “Business Suite on HANA” team. After spending more than 15 years on topics related to BI and Data Warehousing, he is currently focusing on graph analytics and machine learning techniques to provide real-time business insights from applications running on HANA.