View Issue Details

IDProjectCategoryView StatusLast Update
0006616SymmetricDSImprovementpublic2024-10-30 22:00
Reporterpbelov Assigned Topbelov  
Prioritynormal 
Status resolvedResolutionfixed 
Product Version3.15.0 
Target Version3.16.0Fixed in Version3.16.0 
Summary0006616: Save column references as numeric values for faster look-up in AbstractDatabaseWriter.getRowData()
DescriptionBackground: AbstractDatabaseWriter.getRowData() is used to prepare array of string values for import into target table. It does so by lookup up source columns by name and copying values from the source array to the target array.

Currently getRowData() is searching for source columns (to copy data from) for every single data row.
This adds up to about 1 second overhead per 500K rows. Originally discovered while working on bulk load, but affects all load batches.

Proposed solution is to save column references as numeric values (source column 1 ==> target column 1) and re-use these numeric references to copy values from source to target skipping additional column name look-ups (for every row of data, after the first).
Steps To ReproduceUnit test AbstractDatabaseWriterTest illustrates this issue.
Specifically testGetRowData_LotsOfRandomAndFewSkippedColumns() can target current and new implementation to capture run times.

Current way:
    rowData = abstractDatabaseWriter.getRowDataOld(csvData, CsvData.ROW_DATA);

New way:
   rowData = abstractDatabaseWriter.getRowDataNew(csvData, CsvData.ROW_DATA);
Additional InformationGiven:
S = number of columns in the source table,
T = number of columns in the target table,
N = number of rows in the data load batch,

Current algorithm cost is: O( S * T * N)
Proposed algorithm cost is: O( S * T ) + O( N ); For large N this cost growth is linear
Tagsinitial/partial load, performance

Activities

pbelov

2024-10-11 16:06

manager   ~0002500

Last edited: 2024-10-29 17:33

View 2 revisions

Branch: enhancement/6608-rowdata-lookup-columns-faster316

Related Changesets

SymmetricDS: 3.16 649524e5

2024-10-30 21:58:42

pbelov


Committer: GitHub Details Diff
6616: Save column references as numeric values for faster look-ups in AbstractDatabaseWriter.getRowData (#205)

* New TableColumnSourceReferences class and unit test to store column
lookups in AbstractDatabaseWriter.getRowData()
* AbstractDatabaseWriterTest unit test.
Affected Issues
0006616
add - symmetric-db/src/main/java/org/jumpmind/db/model/TableColumnSourceReferences.java Diff File
mod - symmetric-io/src/main/java/org/jumpmind/symmetric/io/data/writer/AbstractDatabaseWriter.java Diff File
add - symmetric-io/src/test/java/org/jumpmind/symmetric/io/data/writer/AbstractDatabaseWriterTest.java Diff File

Issue History

Date Modified Username Field Change
2024-10-11 12:54 pbelov New Issue
2024-10-11 12:54 pbelov Status new => assigned
2024-10-11 12:54 pbelov Assigned To => pbelov
2024-10-11 12:54 pbelov Tag Attached: initial/partial load
2024-10-11 12:54 pbelov Tag Attached: performance
2024-10-11 14:26 elong Description Updated View Revisions
2024-10-11 14:27 elong Description Updated View Revisions
2024-10-11 16:06 pbelov Additional Information Updated View Revisions
2024-10-11 16:06 pbelov Note Added: 0002500
2024-10-29 17:31 pbelov Status assigned => resolved
2024-10-29 17:31 pbelov Fixed in Version => 3.16.0
2024-10-29 17:32 pbelov Tag Attached: load only
2024-10-29 17:32 pbelov Tag Detached: load only
2024-10-29 17:33 pbelov Note Edited: 0002500 View Revisions
2024-10-30 22:00 pbelov Changeset attached => SymmetricDS 3.16 649524e5