Tuesday, February 15, 2011

Some questions related to Enterprise Architecture


 What is the difference between application database and a data warehouse???


Design goals for app/OLTP schemas and DW/BI schemas are different.
Applications schemas are designed to store data that are being generated on transactions.
For BI/DW it’s more important to understand and categorize the pattern of the transactions.
Thaus:

In a data warehouse data is stored at least in two types of schemas:
  1. target schemas
  2. Source schemas
Where as in OLPTP or general application db there is no source/target.


In DW/BI schemas the demoralized tables are used to support fast and complex SQLs for reports.
In OLTP of app schemas have normalized tables.


Below are the major concepts behind d/w schemas:

There is a variety of ways of arranging schema objects in the schema models designed for data warehousing.
One data warehouse schema model is a star schema.
However, there are other schema models that are commonly used for data warehouses. The most prevalent of these schema models is the third normal form (3NF) schema. Additionally, some data warehouse schemas are neither star schemas nor 3NF schemas, but instead share characteristics of both schemas; these are referred to as hybrid schema models.


Star Schemas
The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables.
(a)
A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them.
A typical fact table contains keys and measures. A star join is a primary key to foreign key join of the dimension tables to a fact table.
The main advantages of star schemas are that they:
  • Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design.
  • Provide highly optimized performance for typical star queries.
  • Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables
Star schemas are used for both simple data marts and very large data warehouses.

(b)
The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table.

 


 

Q 3here a tool to trace queries, like Profiler for SQL Server?


In a number of ways you could trace a query in Oracle:
1. Set AUTOTRACE ON
2. TKPROF
3. EM
And you could set different levels for the traces.
Highest is 10046.

Below are the briefs of the methods mentioned above:

1.  AUTOTRACE in SQL*Plus; it can give quick plans

After the execution of the query it gives the analysis as:
9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2308127121

--------------------------------------------------------------------------------
-----------------

| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time
| Pstart| Pstop |

--------------------------------------------------------------------------------
-----------------

|   0 | SELECT STATEMENT    |           |     9 |   585 |     2   (0)| 00:00:01
|       |       |

|*  1 |  COUNT STOPKEY      |           |       |       |            |
|       |       |

|   2 |   PARTITION HASH ALL|           |     9 |   585 |     2   (0)| 00:00:01
|     1 |    16 |

|   3 |    TABLE ACCESS FULL| CUSTOMERS |     9 |   585 |     2   (0)| 00:00:01
|     1 |    16 |

--------------------------------------------------------------------------------
-----------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)


2.  TKPROF :  TKPROF program converts Oracle trace files into a more readable form. If you have a problem query you can user TKPROF to get more information. To get the most out of the utility you must enable timed statistics by setting the init.ora parameter or performing the following command:
ALTER SYSTEM SET TIMED_STATISTICS = TRUE;
If a suitable plan table is not present one can be created by doing the fooling as the SYS user:
@ORACLE_HOME\rdbms\admin\utlxplan.sql
CREATE PUBLIC SYNONYM PLAN_TABLE FOR SYS.PLAN_TABLE;
GRANT SELECT, INSERT, UPDATE, DELETE ON SYS.PLAN_TABLE TO PUBLIC;
With this done we can trace a statement:
ALTER SESSION SET SQL_TRACE = TRUE;

SELECT COUNT(*)
FROM   dual;

ALTER SESSION SET SQL_TRACE = FALSE;
The resulting trace file will be located in the USER_DUMP_DEST directory. This can then be interpreted using TKPROF at the commmand prompt as follows:
TKPROF explain=user/password@service table=sys.plan_table
The resulting output file contains the following type of information for all SQL statements processed, including the ALTER SESSION commands:
********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

 SELECT COUNT(*)
 FROM   dual

call    count    cpu elapsed    disk   query current     rows
------- -----  ----- ------- ------- ------- -------  -------
Parse       1   0.02    0.02       0       0       0        0
Execute     1   0.00    0.00       0       0       0        0
Fetch       2   0.00    0.00       0       1       4        1
------- -----  ----- ------- ------- ------- -------  -------
total       4   0.02    0.02       0       1       4        1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 121 

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS FULL DUAL

3. You can use The Oracle Enterprise Manager to monitor the active sessions, with the the query that are beeing executed, its execution plan, locks, some statistics and even a progress bar for the longer tasks.



 

 

 

 

 

Q. How do you handle the Master Data area?


Assumption: Here we are talking about the master data of an enterprise that extracts data from different data sources and maintains a DW/BI environment:
 In Master Data Management there are two major sides to look at:
·         The technology to profile, consolidate and synchronize the master data across the enterprise
·         The applications to manage, cleanse, and enrich the structured and unstructured master data
                         
In order to successfully manage the master data there must be an Master Data Management ( MDM) system with the charcteristics below:
·         A flexible, extensible and open data model to hold the master data and all needed attributes (both structured and unstructured).
·         the data model must be application neutral, yet support OLTP workloads and directly connected applications.
·         A metadata management capability for items such as business entity matrixed relationships and hierarchies.
·         A source system management capability to fully cross-reference business objects and to satisfy seemingly conflicting data ownership requirements.
·         A data quality function that can find and eliminate duplicate data while insuring correct data attribute survivorship.
·         hA data quality interface to assist with preventing new errors from entering the system even when data entry is outside the MDM application itself.
·         A continuing data cleansing function to keep the data up to date.
·         An internal triggering mechanism to create and deploy change information to all connected systems.
·         A comprehensive data security system to control and monitor data access, update rights, and maintain change history.
·         A user interface to support casual users and data stewards.
·         A data migration management capability to insure consistency as data moves across the real time enterprise.
·         A business intelligence structure to support profiling, compliance, and business performance indicators.
·         A single platform to manage all master data objects in order to prevent the proliferation of new silos of information on top of the existing fragmentation problem.
·         An analytical foundation for directly analyzing master data.
·         A highly available and scalable platform for mission critical data access under heavy mixed workloads.

And  that MDM should have the processes described below:L
  • Profile the master data. Understand all possible sources and the current state of data quality in each source.
  • Consolidate the master data into a central repository and link it to all participating applications.
  • Govern the master data. Clean it up, deduplicate it, and enrich it with information from 3rd party systems. Manage it according to business rules.
  • Share it. Synchronize the central master data with enterprise business processes and the connected applications. Insure that data stays in sync across the IT landscape.
  • Leverage the fact that a single version of the truth exists for all master data objects by supporting business intelligence systems and reporting.
·        


Q2. What is the use of the control file? What does it (control file) contain?
What is a control file:
The Control File of the database is a binary file that contains a great deal of database information. The control file contains the database name, data about the database log files. Oracle cannot function without valid control files.
Because the control file is so important, Oracle allows you to maintain duplicate copies of the control file. When you have more than one control file, then you are said to be multiplexing your control files. It is a good practice to put these multiple copies on different disks to protect the control file. Later in this book, we will demonstrate how to do this
How I can see wheare are my control files:
SQL> show parameters control

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      /home/oracle/db/dbs/control1.c
                                                 tl, /home/oracle/db/dbs/contro
                                                 l2.ctl, /home/oracle/db/dbs/co
                                                 ntrol3.ctl

How I can see the contents of my control file:
[oracle@dctxvm55 ~]$ strings /home/oracle/db/dbs/control1.ctl
}|{z
&MDB1
h-K,
)DB1
)DB1
LVG,
LVG,
LVG,
LVG,
LVG,
LVG,
LVG,
LVG,
LVG,_
LVG,_
LVG,B
/home/oracle/db/dbs/redo1.log
/home/oracle/db/dbs/redo2.log
/home/oracle/db/dbs/redo3.log
/home/oracle/db/dbs/system.dbf
/home/oracle/db/dbs/undotbs1.dbf
/home/oracle/db/dbs/sysaux.dbf
/home/oracle/db/dbs/temp01.dbf
/home/oracle/db/dbs/soe.dbf
/home/oracle/db/dbs/soeindex.dbf
/home/oracle/db/dbs/tp_traceanalyzer1
/home/oracle/db/dbs/tmp_traceanalyzer
/home/oracle/db/dbs/redo1.log
/home/oracle/db/dbs/redo2.log
/home/oracle/db/dbs/redo3.log
/home/oracle/db/dbs/system.dbf
/home/oracle/db/dbs/undotbs1.dbf
/home/oracle/db/dbs/sysaux.dbf
/home/oracle/db/dbs/temp01.dbf
/home/oracle/db/dbs/soe.dbf
/home/oracle/db/dbs/soeindex.dbf
/home/oracle/db/dbs/tp_traceanalyzer1
/home/oracle/db/dbs/tmp_traceanalyzer
SYSTEM
UNDOTBS1
SYSAUX
TEMP
SOEINDEX
TP_TRACEANALYZER1
TMP_TRACEANALYZER
SYSTEM
UNDOTBS1
SYSAUX
TEMP
SOEINDEX
TP_TRACEANALYZER1
TMP_TRACEANALYZER
)dg
/*T]"
?*^.*
G*z0-
X*(64
+^*F
Creating Initial Control Files
The control files of an Oracle Database initially are created when
CREATE DATABASE statement is issued.
The names of the control files are specified by the CONTROL_FILES parameter in the initialization parameter file used during database creation. The filenames specified in CONTROL_FILES should be fully specified and are operating system specific. The following is an example of a CONTROL_FILES initialization parameter:
CONTROL_FILES = (/u01/oracle/prod/control01.ctl,
                 /u02/oracle/prod/control02.ctl,
                 /u03/oracle/prod/control03.ctl)
You can subsequently change the value of the CONTROL_FILES initialization parameter to add more control files or to change the names or locations of existing control files.
Creating Additional Copies, Renaming, and Relocating Control Files
You can create an additional control file copy for multiplexing by copying an existing control file to a new location and adding the file name to the list of control files. Similarly, you rename an existing control file by copying the file to its new name or location, and changing the file name in the control file list. In both cases, to guarantee that control files do not change during the procedure, shut down the database before copying the control file.
To add a multiplexed copy of the current control file or to rename a control file:
  1. Shut down the database.
  2. Copy an existing control file to a new location, using operating system commands.
  3. Edit the CONTROL_FILES parameter in the database initialization parameter file to add the new control file name, or to change the existing control filename.
  4. Restart the database.
Creating New Control Files
Case: When to Create New Control Files
It is necessary for you to create new control files in the following situations:
  • All control files for the database have been permanently damaged and you do not have a control file backup.
  • You want to change the database name.
For example, you would change a database name if it conflicted with another database name in a distributed environment.
  • The compatibility level is set to a value that is earlier than 10.2.0, and you must make a change to an area of database configuration that relates to any of the following parameters from the CREATE DATABASE or CREATE CONTROLFILE commands: MAXLOGFILES, MAXLOGMEMBERS, MAXLOGHISTORY, and MAXINSTANCES. If compatibility is 10.2.0 or later, you do not have to create new control files when you make such a change; the control files automatically expand, if necessary, to accommodate the new configuration information.
For example, assume that when you created the database or recreated the control files, you set MAXLOGFILES to 3. Suppose that now you want to add a fourth redo log file group to the database with the ALTER DATABASE command. If compatibility is set to 10.2.0 or later, you can do so and the controlfiles automatically expand to accommodate the new logfile information. However, with compatibility set earlier than 10.2.0, your ALTER DATABASE command would generate an error, and you would have to first create new control files.
For information on compatibility level, see "About The COMPATIBLE Initialization Parameter".
The CREATE CONTROLFILE Statement
You can create a new control file for a database using the CREATE CONTROLFILE statement. The following statement creates a new control file for the prod database (a database that formerly used a different database name):
CREATE CONTROLFILE
   SET DATABASE prod
   LOGFILE GROUP 1 ('/u01/oracle/prod/redo01_01.log',
                    '/u01/oracle/prod/redo01_02.log'),
           GROUP 2 ('/u01/oracle/prod/redo02_01.log',
                    '/u01/oracle/prod/redo02_02.log'),
           GROUP 3 ('/u01/oracle/prod/redo03_01.log',
                    '/u01/oracle/prod/redo03_02.log')
   RESETLOGS
   DATAFILE '/u01/oracle/prod/system01.dbf' SIZE 3M,
            '/u01/oracle/prod/rbs01.dbs' SIZE 5M,
            '/u01/oracle/prod/users01.dbs' SIZE 5M,
            '/u01/oracle/prod/temp01.dbs' SIZE 5M
   MAXLOGFILES 50
   MAXLOGMEMBERS 3
   MAXLOGHISTORY 400
   MAXDATAFILES 200
   MAXINSTANCES 6
   ARCHIVELOG;


No comments:

Post a Comment