Oracle-Developer.com

 

Navigation: Discussion Forums (Get expert advice)  |  Publications  |  Scripts  |  About Us  | Links
 

 

 

Shared on OraFaq.com

 

Building a Dynamic Oracle ETL Procedure

By V.J. Jain, July 2007 (varun.jain@varunjaininc.com)
Varun Jain, Inc. (an Oracle Partner)

 

 

If you are on Oracle 9i or higher, you have external tables and pipelined table functions available to meet your ETL needs.  But in order to utilize these tools in a dynamic ETL environment, you need to design PL/SQL procedures that can support the loading of files dynamically.  This article will provide you with one design that has proven to be very robust and scalable.

 

In order to understand the procedure design, you should have a basic understanding of how external tables and pipelined table functions are used.  You can find more information and detailed examples at http://www.oracle-developer.com/oracle_etl.html.  The result of both tools is the ability to select data from a file and insert it into a relational table. 

 

 

Capturing File Information

 

The ETL procedure should accept the file name or path as an input parameter.  If you need to capture any information from the file path into the relational table, then it is a good idea to use the file path as an input.  For example if I had sales data that was uploaded from 3 different stores in 3 different incoming directories and I wanted to load all stores into the same table, I would probably want to include a column to identify the store in my table.  However, if the file format and file name did not include the store information, I would need to capture the store information from the file path.

 

STORE A: /incoming/sales_data/storeA/20070720.dat

STORE B: /incoming/sales_data/storeB/20070720.dat

STORE C: /incoming/sales_data/storeC/20070720.dat

 

We would pass the entire path into the ETL procedure and the procedure would extract the store using a SQL query in the PL/SQL such as

 

 

SQL> with s as (select '/incoming/sales_data/storeA/20070720.dat' p_file_path from dual)

  2  select substr(p_file_path, instr(p_file_path, '/', 1,3)+1,

  3  instr(p_file_path, '/', 1,4)-instr(p_file_path, '/', 1,3)-1) STORE_NAME from s

  4  /

 

STORE_

------

storeA

 

 

We could then extract the file name using a query such as

 

 

SQL> with s as (select '/incoming/sales_data/storeA/20070720.dat' p_file_path from dual)

  2  select substr(p_file_path, instr(p_file_path,'/', -1)+1,

  3  length(p_file_path)-instr(p_file_path,'/', -1)) FILE_NAME from s

  4  /

 

FILE_NAME

------------

20070720.dat

 

 

 

If we want, we can get the file size as well.  First create the following function…

 

 

CREATE OR REPLACE FUNCTION flength (

   location_in   IN   VARCHAR2,

   file_in       IN   VARCHAR2

)

   RETURN PLS_INTEGER

 

IS

   TYPE fgetattr_t  IS RECORD (

      fexists       BOOLEAN,

      file_length   PLS_INTEGER,

      block_size    PLS_INTEGER

   );

 

 

   fgetattr_rec   fgetattr_t;

BEGIN

   UTL_FILE.fgetattr (

      location         => location_in,

      filename         => file_in,

      fexists          => fgetattr_rec.fexists,

 

      file_length      => fgetattr_rec.file_length,

      block_size       => fgetattr_rec.block_size

   );

   RETURN fgetattr_rec.file_length;

END flength;

 

 

And now, inside the PL/SQL procedure, we can capture the file size…

 

 

declare

l_file_size number;

begin

l_file_size := flength ('DAT_DIR’,l_filename);

end;

 

 

Now that we have the file name, we are ready to dynamically redefine the external table’s data source.

 

 

Redefine the External Table

 

From our original definition, the external table SAMPLE_EXT is pointing to a file called “sample.csv”.  In order to load the different files from our incoming directories based on the input parameter, we will need to alter the table appropriately.  After we have successfully stored the file name from the input parameter, we are ready to define our external table to point to the current data file.

 

 

execute immediate 'alter table SAMPLE_EXT location(''' || p_file_name || ''')';

 

 

Now that our external table is pointing at the file from our input parameter, we are ready to begin our insert from the external table to our relational table. 

 

 

Insert the data

 

Since the external table is pointing at our new file, all we need to do is create the insert statement.  We can pass any additional variables from the statement. 

 

 

Insert into sample_tab

Select * from sample_ext

/

 

 

If there were some additional values defined in the destination table, we could include the variables in our SELECT.  For example if we added a column, LOAD_DATE to the end of the relational table, SAMPLE_TAB, then we would just add SYSDATE at the end of the select fields as such…

 

 

Insert into sample_tab

Select A.*, sysdate from sample_ext A

/

 

 

 

If we need to capture the number of rows inserted, we can do so using (prior to the commit)

 

 

ln_rowcount := SQL%rowcount;

 

 

 

Setup for the demonstration

 

Here are the set up steps for the demonstration

 


Create directory DAT_DIR as ‘/usr/tmp’;

 

 

File: july19.csv saved to /usr/tmp

ID, BEGIN_DATE, FIRST_NAME, LAST_NAME, STATUS

1, 19-JUL-07, John, Adams, Active

2, 19-JUL-07, Tyler, Howell, Active

3, 19-JUL-07, Jim, Lopez, Active

4, 19-JUL-07, Carlos, White, Inactive

5, 19-JUL-07, Scott, Tiger, Active

 

File: july20.csv saved to /usr/tmp

ID, BEGIN_DATE, FIRST_NAME, LAST_NAME, STATUS

1, 20-JUL-07, John, Adams, Active

2, 20-JUL-07, Tyler, Howell, Active

3, 20-JUL-07, Jim, Lopez, Active

4, 20-JUL-07, Carlos, White, Inactive

5, 20-JUL-07, Scott, Tiger, Active

 

 

External and matching relational table definition

 

 

CREATE TABLE SAMPLE_EXT

(

ID NUMBER,

BEGIN_DATE VARCHAR2(10),  

FIRST_NAME VARCHAR2(30),

LAST_NAME VARCHAR2(30),

STATUS VARCHAR2(10)

)                      

 ORGANIZATION EXTERNAL

 (

   TYPE ORACLE_LOADER

   DEFAULT DIRECTORY DAT_DIR

   ACCESS PARAMETERS

   (

            records delimited by newline LOAD WHEN (ID != ‘ID’)

            badfile DAT_DIR:'SAMPLE_EXT%a_%p.bad'

            logfile DAT_DIR:'SAMPLE_EXT%a_%p.log'

fields terminated by ',' optionally enclosed by '"' LRTRIM

            MISSING FIELD VALUES ARE NULL

   )

   LOCATION (‘sample.csv')

 )

 PARALLEL 4

 REJECT LIMIT 1000;

 

CREATE TABLE SAMPLE_TAB

(

ID NUMBER,

BEGIN_DATE VARCHAR2(10),  

FIRST_NAME VARCHAR2(30),

LAST_NAME VARCHAR2(30),

STATUS VARCHAR2(10)

);

 

 

Now that the objects are in place, we are ready to define our procedure. 

 

The Final Procedure

Including each of the steps above, we can create a PL/SQL procedure that loads any file from our incoming directory (DAT_DIR) based on the input parameter.  This gives us a powerful tool for loading files in a large scale, dynamic environment.  Based on the information above, the PL/SQL code might be…

 

SQL> CREATE OR REPLACE FUNCTION SAMPLE_ETL_FNC( P_FILE_PATH VARCHAR2 ) RETURN NUMBER AS

  2  l_filename varchar2(100);

  3  l_rowcount number;

  4  l_sqlerrm varchar(2000);

  5  l_filesize number;

  6  BEGIN

  7 

  8  DBMS_OUTPUT.PUT_LINE(to_char(sysdate, 'YYYY-MON-DD HH:MI:SS') || ' - BEGIN ETL PROCEDURE - Path

: ' || p_file_path);

  9 

 10  --Flexible to allow for file name or file path

 11  if instr(p_file_path, '/') > 0 then

 12   select substr(p_file_path, instr(p_file_path,'/', -1)+1,

 13   length(p_file_path)-instr(p_file_path,'/', -1))

 14   INTO l_filename

 15   from dual;

 16  else

 17   l_filename := p_file_path;

 18  end if;

 19 

 20  --Get file size if required

 21  --l_filesize := flength ('DAT_DIR',l_filename);

 22  DBMS_OUTPUT.PUT_LINE('Filename: ' || l_filename);

 23 

 24  BEGIN

 25   --Alter the Log and Bad file names for debugging

 26   --Can modify the bad and log file names if required

 27   DBMS_OUTPUT.PUT_LINE('ALTER EXTERNAL TABLE DATA SOURCE');

 28    execute immediate 'alter table SAMPLE_EXT location('''||l_filename||''')';

 29   --Prepare Insert SQL

 30   DBMS_OUTPUT.PUT_LINE('BEGIN INSERT FROM EXTERNAL TO DATA TABLE');

 31   execute immediate 'Insert into sample_tab select * from sample_ext A';

 32    l_rowcount := SQL%rowcount;

 33    DBMS_OUTPUT.PUT_LINE(to_char(sysdate,'DD-MON-YYYY HH:MI:SS AM') || ': INSERT SUCCESSFUL... IN

SERTED ' || l_rowcount || ' ROWS');

 34    COMMIT;

 35   

 36    RETURN 0;

 37   exception     

 38    when others then

 39       DBMS_OUTPUT.PUT_LINE('INSERT FAILED');

 40       l_sqlerrm:=sqlerrm;

 41       DBMS_OUTPUT.PUT_LINE(l_sqlerrm);

 42      ROLLBACK;

 43     commit;

 44      RETURN 1;

 45   END;

 46  

 47  exception      

 48    when others then

 49     DBMS_OUTPUT.PUT_LINE(SQLERRM);

 50      RETURN 1;

 51   END;

 52  /

 

Function created.

 

Test the Oracle ETL function by placing the sample files described above in the DAT_DIR folder. 

 

SQL> select * from sample_tab

  2  /

 

no rows selected

 

SQL> set serveroutput on

SQL> declare

  2  l_return number;

  3  begin

  4  l_return := SAMPLE_ETL_FNC('july19.csv');

  5  dbms_output.put_line('RETURN STATUS: ' || l_return);

  6  end;

  7  /

2007-JUL-24 04:49:11 - BEGIN ETL PROCEDURE - Path: july19.csv

Filename: july19.csv

ALTER EXTERNAL TABLE DATA SOURCE

BEGIN INSERT FROM EXTERNAL TO DATA TABLE

24-JUL-2007 04:49:11 PM: INSERT SUCCESSFUL... INSERTED 5 ROWS

RETURN STATUS: 0

 

PL/SQL procedure successfully completed.

 

 

SQL> select * from sample_tab

  2  /

 

        ID BEGIN_DATE FIRST_NAME                     LAST_NAME                      STATUS

---------- ---------- ------------------------------ ------------------------------ --------

         1 19-JUL-07  John                           Adams                          Active

         2 19-JUL-07  Tyler                          Howell                         Active

         3 19-JUL-07  Jim                            Lopez                          Active

         4 19-JUL-07  Carlos                         White                          Inactive

         5 19-JUL-07  Scott                          Tiger                          Active

 

5 rows selected.

 

SQL> declare

  2  l_return number;

  3  begin

  4  l_return := SAMPLE_ETL_FNC('july20.csv');

  5  dbms_output.put_line('RETURN STATUS: ' || l_return);

  6  end;

  7  /

2007-JUL-24 04:50:16 - BEGIN ETL PROCEDURE - Path: july20.csv

Filename: july20.csv

ALTER EXTERNAL TABLE DATA SOURCE

BEGIN INSERT FROM EXTERNAL TO DATA TABLE

24-JUL-2007 04:50:16 PM: INSERT SUCCESSFUL... INSERTED 5 ROWS

RETURN STATUS: 0

 

PL/SQL procedure successfully completed.

 

SQL> select * from sample_tab

  2  /

 

        ID BEGIN_DATE FIRST_NAME                     LAST_NAME                      STATUS

---------- ---------- ------------------------------ ------------------------------ --------

         1 19-JUL-07  John                           Adams                          Active

         2 19-JUL-07  Tyler                          Howell                         Active

         3 19-JUL-07  Jim                            Lopez                          Active

         4 19-JUL-07  Carlos                         White                          Inactive

         5 19-JUL-07  Scott                          Tiger                          Active

         1 20-JUL-07  John                           Adams                          Active

         2 20-JUL-07  Tyler                          Howell                         Active

         3 20-JUL-07  Jim                            Lopez                          Active

         4 20-JUL-07  Carlos                         White                          Inactive

         5 20-JUL-07  Scott                          Tiger                          Active

 

10 rows selected.

 

 

Interested in implementing an Oracle ETL solution? Contact Varun Jain, Inc. for Oracle ETL Consulting Services.


 

 

Sponsored by Varun Jain, Inc. Oracle Applications and Database Consulting

Copyright ©2007 Oracle-Developer.com