Oracle-Developer.com
| Navigation:
Discussion Forums (Get expert advice) |
Publications |
Scripts |
About Us | Links |
Building a Dynamic Oracle ETL Procedure
By V.J.
Jain, July 2007 (varun.jain@varunjaininc.com)
Varun Jain, Inc. (an Oracle Partner)
|
If you are on Oracle 9i or higher, you have external tables and pipelined table functions available to meet your ETL needs. But in order to utilize these tools in a dynamic ETL environment, you need to design PL/SQL procedures that can support the loading of files dynamically. This article will provide you with one design that has proven to be very robust and scalable.
In order to understand the procedure design, you should have a basic understanding of how external tables and pipelined table functions are used. You can find more information and detailed examples at http://www.oracle-developer.com/oracle_etl.html. The result of both tools is the ability to select data from a file and insert it into a relational table.
Capturing File Information
The ETL procedure should accept the file name or path as an input parameter. If you need to capture any information from the file path into the relational table, then it is a good idea to use the file path as an input. For example if I had sales data that was uploaded from 3 different stores in 3 different incoming directories and I wanted to load all stores into the same table, I would probably want to include a column to identify the store in my table. However, if the file format and file name did not include the store information, I would need to capture the store information from the file path.
STORE A: /incoming/sales_data/storeA/20070720.dat STORE B: /incoming/sales_data/storeB/20070720.dat STORE C: /incoming/sales_data/storeC/20070720.dat
We would pass the entire path into the ETL procedure and the procedure would extract the store using a SQL query in the PL/SQL such as
We could then extract the file name using a query such as
If we want, we can get the file size as well. First create the following function…
And now, inside the PL/SQL procedure, we can capture the file size…
Now that we have the file name, we are ready to dynamically redefine the external table’s data source.
Redefine the External Table
From our original definition, the external table SAMPLE_EXT is pointing to a file called “sample.csv”. In order to load the different files from our incoming directories based on the input parameter, we will need to alter the table appropriately. After we have successfully stored the file name from the input parameter, we are ready to define our external table to point to the current data file.
Now that our external table is pointing at the file from our input parameter, we are ready to begin our insert from the external table to our relational table.
Insert the data
Since the external table is pointing at our new file, all we need to do is create the insert statement. We can pass any additional variables from the statement.
If there were some additional values defined in the destination table, we could include the variables in our SELECT. For example if we added a column, LOAD_DATE to the end of the relational table, SAMPLE_TAB, then we would just add SYSDATE at the end of the select fields as such…
If we need to capture the number of rows inserted, we can do so using (prior to the commit)
Setup for the demonstration
Here are the set up steps for the demonstration
File: july19.csv saved to /usr/tmp ID, BEGIN_DATE, FIRST_NAME, LAST_NAME, STATUS 1, 19-JUL-07, John, Adams, Active 2, 19-JUL-07, Tyler, Howell, Active 3, 19-JUL-07, Jim, Lopez, Active 4, 19-JUL-07, Carlos, White, Inactive 5, 19-JUL-07, Scott, Tiger, Active
File: july20.csv saved to /usr/tmp ID, BEGIN_DATE, FIRST_NAME, LAST_NAME, STATUS 1, 20-JUL-07, John, Adams, Active 2, 20-JUL-07, Tyler, Howell, Active 3, 20-JUL-07, Jim, Lopez, Active 4, 20-JUL-07, Carlos, White, Inactive 5, 20-JUL-07, Scott, Tiger, Active
External and matching relational table definition
Now that the objects are in place, we are ready to define our procedure.
The Final Procedure Including each of the steps above, we can create a PL/SQL procedure that loads any file from our incoming directory (DAT_DIR) based on the input parameter. This gives us a powerful tool for loading files in a large scale, dynamic environment. Based on the information above, the PL/SQL code might be…
SQL> CREATE OR REPLACE FUNCTION SAMPLE_ETL_FNC( P_FILE_PATH VARCHAR2 ) RETURN NUMBER AS 2 l_filename varchar2(100); 3 l_rowcount number; 4 l_sqlerrm varchar(2000); 5 l_filesize number; 6 BEGIN 7 8 DBMS_OUTPUT.PUT_LINE(to_char(sysdate, 'YYYY-MON-DD HH:MI:SS') || ' - BEGIN ETL PROCEDURE - Path : ' || p_file_path); 9 10 --Flexible to allow for file name or file path 11 if instr(p_file_path, '/') > 0 then 12 select substr(p_file_path, instr(p_file_path,'/', -1)+1, 13 length(p_file_path)-instr(p_file_path,'/', -1)) 14 INTO l_filename 15 from dual; 16 else 17 l_filename := p_file_path; 18 end if; 19 20 --Get file size if required 21 --l_filesize := flength ('DAT_DIR',l_filename); 22 DBMS_OUTPUT.PUT_LINE('Filename: ' || l_filename); 23 24 BEGIN 25 --Alter the Log and Bad file names for debugging 26 --Can modify the bad and log file names if required 27 DBMS_OUTPUT.PUT_LINE('ALTER EXTERNAL TABLE DATA SOURCE'); 28 execute immediate 'alter table SAMPLE_EXT location('''||l_filename||''')'; 29 --Prepare Insert SQL 30 DBMS_OUTPUT.PUT_LINE('BEGIN INSERT FROM EXTERNAL TO DATA TABLE'); 31 execute immediate 'Insert into sample_tab select * from sample_ext A'; 32 l_rowcount := SQL%rowcount; 33 DBMS_OUTPUT.PUT_LINE(to_char(sysdate,'DD-MON-YYYY HH:MI:SS AM') || ': INSERT SUCCESSFUL... IN SERTED ' || l_rowcount || ' ROWS'); 34 COMMIT; 35 36 RETURN 0; 37 exception 38 when others then 39 DBMS_OUTPUT.PUT_LINE('INSERT FAILED'); 40 l_sqlerrm:=sqlerrm; 41 DBMS_OUTPUT.PUT_LINE(l_sqlerrm); 42 ROLLBACK; 43 commit; 44 RETURN 1; 45 END; 46 47 exception 48 when others then 49 DBMS_OUTPUT.PUT_LINE(SQLERRM); 50 RETURN 1; 51 END; 52 /
Function created.
Test the Oracle ETL function by placing the sample files described above in the DAT_DIR folder.
SQL> select * from sample_tab 2 /
no rows selected
SQL> set serveroutput on SQL> declare 2 l_return number; 3 begin 4 l_return := SAMPLE_ETL_FNC('july19.csv'); 5 dbms_output.put_line('RETURN STATUS: ' || l_return); 6 end; 7 / 2007-JUL-24 04:49:11 - BEGIN ETL PROCEDURE - Path: july19.csv Filename: july19.csv ALTER EXTERNAL TABLE DATA SOURCE BEGIN INSERT FROM EXTERNAL TO DATA TABLE 24-JUL-2007 04:49:11 PM: INSERT SUCCESSFUL... INSERTED 5 ROWS RETURN STATUS: 0
PL/SQL procedure successfully completed.
SQL> select * from sample_tab 2 /
ID BEGIN_DATE FIRST_NAME LAST_NAME STATUS ---------- ---------- ------------------------------ ------------------------------ -------- 1 19-JUL-07 John Adams Active 2 19-JUL-07 Tyler Howell Active 3 19-JUL-07 Jim Lopez Active 4 19-JUL-07 Carlos White Inactive 5 19-JUL-07 Scott Tiger Active
5 rows selected.
SQL> declare 2 l_return number; 3 begin 4 l_return := SAMPLE_ETL_FNC('july20.csv'); 5 dbms_output.put_line('RETURN STATUS: ' || l_return); 6 end; 7 / 2007-JUL-24 04:50:16 - BEGIN ETL PROCEDURE - Path: july20.csv Filename: july20.csv ALTER EXTERNAL TABLE DATA SOURCE BEGIN INSERT FROM EXTERNAL TO DATA TABLE 24-JUL-2007 04:50:16 PM: INSERT SUCCESSFUL... INSERTED 5 ROWS RETURN STATUS: 0
PL/SQL procedure successfully completed.
SQL> select * from sample_tab 2 /
ID BEGIN_DATE FIRST_NAME LAST_NAME STATUS ---------- ---------- ------------------------------ ------------------------------ -------- 1 19-JUL-07 John Adams Active 2 19-JUL-07 Tyler Howell Active 3 19-JUL-07 Jim Lopez Active 4 19-JUL-07 Carlos White Inactive 5 19-JUL-07 Scott Tiger Active 1 20-JUL-07 John Adams Active 2 20-JUL-07 Tyler Howell Active 3 20-JUL-07 Jim Lopez Active 4 20-JUL-07 Carlos White Inactive 5 20-JUL-07 Scott Tiger Active
10 rows selected.
|
Sponsored by Varun Jain, Inc. Oracle Applications and Database Consulting
Copyright ©2007 Oracle-Developer.com