Oracle Enterprise Data Quality Interview Questions

Which processor do you use to exclude the duplicate records?

Firstly we need to identify the duplicates by using the “Duplicate check” processor providing the attributes on which you want to list duplicates.
Take only the output records of this processor from the “Non-Duplicated” port, thereby eliminating duplicates from the data stream.

Which Processor is used to eliminate Duplicates?

To eliminate duplicates, we can use the “Group and Merge” processor, which in turn has 3 sub-processors i.e. Input, Group, and Merge.
Add Attributes to the Input Sub-processor to be considered in this data stream.
Add the Attribute(s) on which to eliminate the duplicate to the “Group” sub-processor.
In the Merge Sub-process, select the relevant Merge function, by default its “Most Common Value”
Consider the Merged output results for the De-duplicated records.

What is the difference between “Lookup and Return” and “Lookup Check” Processors?

Lookup and Return does the lookup on the Reference data/Look up and gets back the return attribute(s), which can be used to add new attribute(s) or to update the existing columns into to data stream
Lookup Check, does the lookup on the reference data/Look up to check if the attributes exist in reference data or not and does not bring back the return attributes, even though reference data is passing back.

How to convert the format of the Date attribute to a different format? For example MM/DD/YYYY HH:MM: SS to DD/MM/YY

If the Attribute that contains Date is of STRING data type then convert it to Date using the “Convert Date to String” Processor and again use the processor “Convert String to Date” by providing the desired Output format in the “Options” of this processor.
If the Attribute that contains the Date is of DATE Data type then convert it to String by using the processor “Convert String to Date” by providing the desired Output format in the “Options” of this processor and if required you can convert it back to DATE.

How do we add a unique Row-Identifier to each record in EDQ?

To generate a unique Row-identifier you can use “the Add Message Id” processor. It adds a Number attribute which assigns a sequential number to each record.

What is the main purpose of Lookup and Return?

Lookup and return are one of the main processors used in the EDQ for data enrichment. This processor takes one or more attributes as input and returns one or more attributes as output as per the reference data definition.

If you multiple files/sources to read the data, how are going to bring all the data together in one stream?

First of all, create a snapshot of all the files and add a reader processor for each file, and then by using the Merge processor you can bring all the files together.
P.S: All the files have to be in the same format to be brought together in the merge process/ you can selectively choose a few columns from each file in the Merge processor

Learn more information from the GoLogica “Oracle Enterprise Manager Cloud Control Training“

How will you identify and eliminate duplicates in EDQ?

To just identify duplicates we can use a duplicate check processor by passing one or more attributes on which duplicates need to be identified.
To eliminate/merge these duplicates, we can use a Group and merge processor by passing one or more attributes on which duplicates need to be merged.

What is the difference between Reference data and Lookup?

Reference data is an object which you create explicitly with Data and definition on which columns to refer and return. It holds both data and definition and is more static, i.e. data will not change dynamically.
Lookup is something that you can create using stage data and define which columns to lookup and return data here is dynamic, i.e. every time the Staged data gets refreshed, look up on that staged data works on refreshed data.

After cleansing the data in EDQ, how will you pass the data to the downstream system or external system?

This can be done in multiple ways, a few of the most popular methods are
Export the final cleansed staged data as a file(.txt,.xls, etc.. )
Write the cleansed data to the Staging table in a schema outside EDQ, to do so you need to have a data store pointing to that table beforehand.

What are the types of external sources from which you can import data into EDQ?

EDQ can import from different types of sources like text(.txt, .dsvetc), excel (.xls, CSV), and all types of databases like Oracle, DB2, Postgresql, Mysql, Microsoft SQL Server, Sybase, etc…

What are the objects you create in EDQ to import files or from the database?

First of all, we need to create a Datastore pointing to a file or database and then create and run the staged data to import data. In the case of a file, you can either give the local path or if its server give the server credentials and path of the file to select the file.

What is the Staged data?

Staged data is where you store the intermediate or final results within your EDQ space, it’s like an EDQ table that stores the Processed data from the processes

What is the difference between Stored data and Reference data?

Staged data is used to store the data being processed or the final data after processing and is considered as working data.
Reference data is something in which you refer to some values in the working data and get the other corresponding values from the reference data in the same record.
Ex: Suppose you have a country in your working data as “United States of America” and for the country code, you look up the Reference data with the Country name and get the country code, where you have already stored all countries and corresponding codes.

Name some of the commonly used processors

Reader
Writer
Lookup and Return
Logic Check
Duplicate check
Group and Merge
Merge etc

Critical Data Quality Challenges

Data used for decision-making and analytics has to be fully trustworthy. However in real life data rarely comes clean. It contains missing values, duplicate entries, misspelled words, nonstandardized names, and various other forms of questionable data. Making critical decisions with such data results in operational inefficiencies, loss of goodwill among customers, faulty market reading, and audit and compliance lapses.

Essential Data Quality Capabilities

Ever since there have been databases and applications, there have been data quality problems. Unfortunately, all those problems are not created equal and neither are the solutions that address them. Some of the largest differences are driven by the data type, or domain, of the data in question. The most common data domains in data quality are customer (or more generally, party data including suppliers, employees, etc.) and product data. Oracle Enterprise Data Quality products recognize these differences and provide purpose-built capabilities to address each. Quick to deploy and easy to use, Oracle Enterprise Data Quality products bring the ability to enhance the quality of data to all stakeholders in any data management initiative.

👉 Related Article:

🎯 Tutorial on Oracle Enterprise Asset Management

Post Views: 3,208

February 13, 2025

Oracle Enterprise Data Quality Interview Questions

Drop An Enquiry

Search

COURSE CATEGORIES

Upgrade your skills by applying the world Best Online Learning Platform

More Than 5000+ satisfied students and 100+ successful Corporate Trainings

We Provide Best Training by certified Industry experts on real time base

Request for Free Demo