View Revisions: Issue #6616
Summary | 0006616: Save column references as numeric values for faster look-up in AbstractDatabaseWriter.getRowData() | ||
---|---|---|---|
Revision | 2024-10-11 14:27 by elong | ||
Description | Background: AbstractDatabaseWriter.getRowData() is used to prepare array of string values for import into target table. It does so by lookup up source columns by name and copying values from the source array to the target array. Currently getRowData() is searching for source columns (to copy data from) for every single data row. This adds up to about 1 second overhead per 500K rows. Originally discovered while working on bulk load, but affects all load batches. Proposed solution is to save column references as numeric values (source column 1 ==> target column 1) and re-use these numeric references to copy values from source to target skipping additional column name look-ups (for every row of data, after the first). |
||
Revision | 2024-10-11 14:26 by elong | ||
Description | Background: AbstractDatabaseWriter.getRowData() is used to prepare array of string values for import into target table. It does so by lookup up source columns by name and copying values from the source array to the target array. Currently getRowData() is searching for source columns (to copy data from) for every single data row. This adds up to about 1 second overhead per 500K rows. Originally discovered while working on bulk load, but affects all load batches. Proposed solution is to save column references as numeric values (source column #0000001 ==> target column #0000001) and re-use these numeric references to copy values from source to target skipping additional column name look-ups (for every row of data, after the first). |
||
Revision | 2024-10-11 12:54 by pbelov | ||
Description | Background: AbstractDatabaseWriter.getRowData() is used to prepare array of string values for import into target table. It does so by lookup up source columns by name and copying values from the source array to the target array. Currently getRowData() is searching for source columns (to copy data from) for every single data row. This adds up to about 1 second overhead per 500K rows. Originally discovered while working on bulk load, but affects all load batches. Proposed solution is to save column references as numeric values (source column 0000001 ==> target column 0000001) and re-use these numeric references to copy values from source to target skipping additional column name look-ups (for every row of data, after the first). |