Sunday, January 31, 2016

Oracle NoSQL: A Primer

Oracle NoSQL: A Primer
Saptarshi Biswas

1.       NoSQL: What is it?
Evolution of NoSQL

The popularity of BigData and NoSQL is confirmed by the fact that Oracle Corporation, one of the biggest Database Giants, not only released their BiGData solution some 5 years back but they are enhancing the solution with newer releases and came up with Oracle Big Data Appliance as a part of their “Engineered Systems” offerings. Here I would like to provide an outline for the Oracle-BigData combination. But prior to that we must know what these “NOSQL” and “BigData” are.

Around 1994-94, when I started my career as an Oracle DBA that time RDBMS was the THING to know and SQL was the MANTRA for DBAs and the developers. Now we have crossed 2012 without any transformative events predicted by some “magician’s and now I’m hearing NO SQL! No SQL? So is this the transformation predicted by the soothsayers? Does it mean that SQL is dead? At least for me the term “NoSQL” is still a question more than an answer. Though it’s generally applied to a number of recent “non-relational” databases like Cassandra, Mongo, Neo4J etc but the term “NoSQL” came to existence in the late 90s as the name of an open-source relational database called “Strozzi NoSQL”.  Though the said database followed the relational model still it did not use SQL for data manipulation. So we can assume that the name hinted to the fact that the database doesn’t use SQL as a query language.
But even before “Strozzi NoSQL” “MultiValue databases” came to existence at TRW in 1965. Then Lotus Domino released in 1989. These could be treated as predecessors of “Strozzi NoSQL” and hence NoSQL.
After “Strozzi NoSQL” some of the major projects in the same line are
● Graph database Neo4j is started in 2000.
● Google BigTable is started in 2004. Paper published in 2006.
● CouchDB is started in 2005.
● the research paper on Amazon Dynamo is released in 2007.
● the document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009.
● Facebooks open sources the Cassandra project in 2008.
● Project Voldemort started in 2008.
● the term NoSQL was reintroduced in early 2009.
In reality, when both , owners of technology and business found that they are entering a world of Polyglot Persistence where enterprises, and even individual applications, use multiple technologies for data management then  the  architects felt the only RDBMS and SQL are not enough. Hence NoSQL came not as a replacement for a RDBMS but to compliment it.
So a NoSQL database environment could be defined as a non-relational and largely distributed database system capable of supporting rapid, ad-hoc association and analysis of extremely high-volume, disparate data types.
Due to its distributed nature and volume NoSQL databases are sometimes referred to as cloud databases, Big Data databases and a myriad of other terms… So to say NoSQL databases have become the first alternative to relational databases, with scalability, availability, and fault tolerance being key deciding factors. They go well beyond the more widely understood legacy, relational databases in satisfying the needs of today’s modern business applications. A very flexible and schema-less data model, horizontal scalability, distributed architectures, and the use of languages and interfaces that are “not only” SQL typically characterize this technology.
          As mentioned NoSQL has ability to run databases on a large cluster and that is the prime cause behind the demand of NoSQL databases.  With the increase in data volume it becomes loftier and expensive to scale up—buy a bigger server to run the database on. A more appealing option is to scale out—run the database on a cluster of servers. Aggregate orientation fits well with scaling out because the aggregate is a natural unit to use for distribution.

2.       Use Cases: Before moving further and before moving to more technical details let’s try to find out in our everyday e-world for NoSQL. As described above NoSQL is nothing but a half-brother of the new jargon BiGData. So we can say that wherever there is BigData NoSQL is there always.

The most common use of BigData is Pricing Optimization in retail domain.
For making the Pricing Team to answer loftier questions about consumers and markets in smaller time frames the need data. But retail inventory, pricing and POS data was spread across multiple systems and multiple formats. Business users have to put all that data together to understand inventory and pricing points across all stores. Specific questions like: "were higher priced items selling in certain markets?" or “should inventory be re-allocated or price optimized based upon geography? “Need answer. So some of the retailers migrated to Hadoop infrastructure and migrated all required data into a centralized HDFS.  New code was developed to process daily data files and extract for presentation through client designed Pricing Portal. As a result file preparation that required over-night processing now completes in minutes each day, enabling Pricing Team to deliver dynamic pricing analytics that quickly react to changing market conditions.
Below are other verticals where BigData, hence, NoSQL has huge use and futre:
·         Web applications (click-through capture)
·         Online retail
·         Sensor/statistics/network capture (factory automation for example)
·         Backup services for mobile devices
·         Scalable authentication
·         Personalization
·         Social Networks

3.       General Architecture of Enterprise NoSQL Solutions

In fact it’s really impossible to come out with something like General or Standard or one particular reference Architecture for NOSQL Solutions. So below I tried to list the stages of corporate data lifecycle and to compare the processes followed by traditional data warehousing and business intelligence versus BigData.
Broadly the stages of corporate data life cycle are:
1.       Acquire: Acquiring data from different source systems including legacy, third party tools, RSS feed etc.
2.       Integrate / Organize : Before Big Data days after acquiring data the next major task was to “integrate”, that is, to relate or correlate the acquired data for coming to meaningful business
But for Big Data, due to the size and nature, the term “organize” as it’s really not possible to relate/correlate that huge data volume.
3.       Analyze: Analyze the integrated/organized data to generate business insights.

Now keeping BigData and NoSQL in mind we could relate the above mentioned phases as below (refer figure 1)
figure 1

In terms of data models also,, in order to embrace huge data collected from a number of sources, the data models are different in BiGData and NoSQL areas. But for now it’s out of scope for here.
Below is the list of some of the market leading NoSQL databases with the Data Models used by each: (Refer figure 2)

figure 2

4.       Oracle NoSQL

As mentioned by me before that the popularity of BigData and NoSQL is confirmed by the fact that Oracle Corporation, one of the biggest Database Giants, not only released their BiGData solution some 5 years back but they are enhancing the solution with newer releases and came up with Oracle Big Data Appliance as a part of their “Engineered Systems” offerings.
In our current discussion I will keep Oracle Big Data Appliance out of scope. Rather would share the view and design of BigData from Oracle Corporations point of view.

Figure 3
(Source:  Oracle: Big Data for the Enterprise: June 2013 -An Oracle White Paper)

From the above architecture we’ll take, for here, only the NoSQL and will try to see how Oracle Corporation manages this.
Oracle NoSQL Database is a key-value database based. It’s heart is Oracle Berkeley DB. Enterprise class key value store is connected to distributed Berkeley DB with the help of intelligent driver. The driver keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency. Pictorially it could represented as below: (Figure 4)

Figure 4
(Source:  Oracle: Big Data for the Enterprise: June 2013 -An Oracle White Paper)

Oracle NoSQL Database provide not only low latency data capture also provides fast querying of that data. For this Oracle NoSQL uses key lookup.

Oracle Corporation provides Oracle Big Data Connectors which tightly integrates the big data environment with Oracle Database. Thus one could analyze all the data together with extreme performance. That is to say collecting browsing data from browser history and say data from the Point Of sale applications could be stored into Hadoop and one can create an External table in Oracle database to access the data from Hadoop and thus a holistic analysis could be done.
The figure below could demonstrate the idea better:

Figure 5
(Source:  Oracle: Big Data for the Enterprise: June 2013 -An Oracle White Paper)
In the whole process Oracle Big Data Connectors play a key role. The connector has the below components:

a)      Oracle Loader for Hadoop (OLH):  enables the use of Hadoop MapReduce processing. This  creates optimized data sets for efficient loading and analysis in Oracle Database 11g and upper.
b)      Oracle SQL Connector for Hadoop Distributed File System (HDFS) : a high speed connector to access data on HDFS directly from Oracle Database.
c)       Oracle Data Integrator Application Adapter for Hadoop : this simplifies data integration from Hadoop and an Oracle Database .
d)      Oracle R Connector for Hadoop:  a package that provides transparent access to Hadoop and to data stored in HDFS.

Finally the BiGData Solution for Oracle Corporation looks like below:

Figure 6

For now that’s all. Next will be back with notes on connecting procedures of Oracle and BigData.

No comments:

Post a Comment