Explain various connections available in Talend?
The connections define if the data has to be a data output, processed or a logical sequence. Various connections are:
- Row-based: Types such as Lookup, main, filter, ErrorRejects, Rejects, uniques/duplicates, Output, and Multiple Input/Output.
- Iterate: This is used to perform a recurring loop on files which are contained in a directory.
- Trigger: This connection is used to create a dependency between subjobs or Jobs which are triggered in a consecutive sequence. The two generalized categories are: Subjob and Component level triggers
- Link: It is used to transfer the table schema into the ELT component.
Give some advantages of using the Talend
Talend open studio tool can automate the tasks and offers faster development and deployment.
Talend has everything that you might need to meet today’s marketing need as well as in the future.
It is free, and it is backed up by the huge online community. They are mostly professionals or learners who share information, experiences, queries, etc.
How is Talend related to Code generator?
This is the basic Talend Interview Question asked in an interview. Please find below the different tables that are supported by Talend are: Talend is called as a code generator which provides a user-friendly graphical user interface where the components simply need to be dragged and dropped for designing a job. Talent Studio automatically compiles into a Java class once the job is submitted where the inner components, begin, main and end help in the control flow and therefore it is also referred to as the code generator.
What is tMap?
tMap is an advanced component which can be integrated as a plugin to Talend Studio. This component can transform and routes data from single or multiple sources to single or multiple destinations.
What schemas are supported by Talend?
The following schemas are supported:
Generic schema: It is not tied to any particular source and also used as a sharable resource across different data sources.
Fixed schema: Read-only schemas which come predefined with some components.
Repository Schema: Schema is reusable and any changes made in the schema will be reflected in all the jobs.
What Are the Operations of tMap?
tMap performs following operations:
- Data transformation on any fields
- Data multiplexing and demultiplexing
- Fields concatenation and interchange
- Rejection of Data
- Filtering or the field using constraints
What are routines?
They are the reusable pieces of which can be used to optimize data processing by making use of custom code. It also helps in enhancing the Talend Studio features and also improves job capacity. There are basically two kinds of routines: User routine and System routine.
- System routine: The read-only codes which can be directly called inside any Job.
- User routine: Custom created a routine by the users either by making new ones or using the existing old ones.
Discuss the use of Expression Editor in Talend
Expression editor allows you to view and edit expressions like input, output, and constraint statements. This editor comes with a dedicated view of writing any functions or transformation. The expression needed for the data transformation can be written easily using expression editor.
What is the difference between ETL and ELT?
ETL or Extraction, Transformation, and Load is the age-old concept that involves the extraction of data from external sources, transforming it to make it fit for use as per business and operational needs, then loading it into the end target data warehouse or target database. This is a very valid approach as long as there are multiple databases and source systems involved in the whole process. The data is transported from one place to another, so it is often advisable to do all the transformation-related work in a separate specialized engine.
ELT, on the other hand, is the process where the extracted data is primarily loaded into the end systems. Thereafter, transformations are done on top of it. It is a better approach when your target system is efficient and robust enough to handle all the transformations. Most of the analytical databases today like Google Big Query and Amazon Redshift often make use of ELT technology because their end systems are efficient enough to process, tackle and handle all the transformed data.
What is a sub job? How data is sent from the parent job to the child job?
A sub job is defined as a single component or more than one component joined by a data flow. One job can at least have one sub job. Context variables should be used while passing a value from the parent to the child job.
How can you expand the performance of Talend job which has a complex design?
To improve the performance of Talend job we can do following things:
- Remove redundant fields/columns using tFilterColumns component
- Remove Unwanted data/records using tFilterRows component
- Use Select Query to retrieve data from the database
- Use Database Bulk components
- Use Talend ELT Components when needed
- Split Talend Job into the smaller Subjobs
Explain tMap component and also list down the different functions which can be performed by making its use?
This is the most asked Talend Interview Questions in an interview. tMap is one of the essential components which forms a core part of the “processing” family. The main use is to map the input data with the output data. The main functions which can be performed by tap include:
- Applying transformation rules on any kind of field.
- Adding or removing columns
- Reject data
- Filter input and output data using constraints
- Concatenate and interchanging of the data
- Multiplexing and demultiplexing of data
Define the use of ‘Outline View’ in Talend Open Studio.
Outline View in Talend Open Studio allows you to keep track of return values available in a component. Moreover, user-defined values configured in a tSetGlobal component.
Explain tDenormalizeSortedRow. Also, can we use Binary Transfer mode or an ASCII code in creating an SFTP connection?
tDenormalizeSortedRow forms an integral component of the processing family. It is used to synthesize sorted input flow such that the memory is saved. All input sorted rows are combined in a group where the item separators are joined with distinct values. No, the transfer modes cannot be used while creating an SFTP connection. It is just an extension to SSH and therefore doesn’t support any kind of transfer modes.
Explain error handling in Talend?
The following is the error handling process:
- Exception throwing process can be relied upon which can also be seen in the run view of the red stack trace.
- Every component and the sub job has to return the code which leads to additional processing. The OK/Error links can be used to redirect the error towards an error handling routine.
- The best and the most trusted way to handle an error is to define an error handling subjob which gets called in case of an error.
What Is Talend?
Talend is an open source software integration platform/vendor.
This company provides various integration software and services for big data, cloud storage, data integration, data management, master data management, data quality, data preparation, and enterprise applications.
But Talend’s first product i.e. Talend Open Studio for Data Integration is more popularly referred as Talend.
What is a ‘Component’ in Talend?
A component is a functional piece which is used to perform a single operation in Talend. On the palette, whatever you can see all are the graphical representation of the components. You can use them with a simple drag and drop. At the backend, a component is a snippet of Java code that is generated as a part of a Job (which is basically a Java class). These Java codes are automatically compiled by Talend when the Job is saved.
What is a scheduler?
A scheduler is a software which selects processes from the queue and loads them into memory for execution. Talend does not provide a built-in scheduler.
Explain the usage of tContextLoad.
tContextLoad belongs to the ‘Misc’ family of components. This component helps in modifying the values of the active context on the fly. Basically, it is used to load a context from a flow. It sends warnings if the parameters defined in the input are not defined in the context and also if the context is not initialized in the incoming data.
Describe a Job Design in Talend.
A Job is a basic executable unit of anything that is built using Talend. It is technically a single Java class which defines the working and scope of information available with the help of graphical representation. It implements the data flow by translating the business needs into code, routines, and programs
What is the process of joining two input columns in the tMap configuration window?
- Dragging a column from the main input table to a column in another input table
- Right-clicking one column in the input table and selecting “Join”
- Selecting two columns in two distinct input tables, right-clicking, and selecting “Join”
- Selecting two columns in two distinct input tables dragging them to the output table.
In Talend, how to add a Shape into a Business Model?
- Click and place it from the palette
- Drag it from the repository
- Click in the quick access toolbar
- Drag and drop it from the palette
While saving the changes to a tMap configuration, sometimes Talend asks you for confirmation to propagate changes. Why?
- Because your changes affect the output schema and the source component should have a matching schema
- Because your changes affect the output schema and the target component should have a matching schema [Ans]
- Because your changes affect an input schema and the related source component should have a matching schema
- Because your changes have not been saved yet
How can you run multiple Jobs in parallel within Talend?
As Talend is a java-code generator, various Jobs and Subjobs in multiple threads can be executed to reduce the runtime of a Job. Basically, there are three ways for parallel execution in Talend Data Integration:
- Multithreading
- tParallelize component
- Automatic parallelization
Differentiate between “insert or update” and “update or insert”.
- insert or update: In this action, first Talend tries to insert a record, but if a record with a matching primary key already exists, then it updates that record.
- update or insert: In this action, Talend first tries to update a record with a matching primary key, but if there is none, then the record is inserted.
Can we use ASCII or Binary Transfer mode in SFTP connection?
No, the transfer modes can’t be used in SFTP connections. SFTP doesn’t support any kind of transfer modes as it is an extension to SSH and assumes an underlying secure channel.