Select Page

From time to time, the requirement emerges to compare two data sets of similar shape for differences.

The Komparator library was written for just such a scenario.

For example the code below compares two local CSV files that have id, name and age columns. 


val source = CsvConnection(Supplier.uri(“/source-persons.csv”), Connection.Companion::materializeDefault)

val target = CsvConnection(Supplier.uri(“/target-persons.csv”), Connection.Companion::materializeDefault)

Comparison(source, target,this::mapping).compare(this::handleDifferences)

The connection takes a file URI and a function that materializes the data set from the underlying data context but here we are using the default function that selects all the data from the respective CSV files.

The comparison function takes in the two connections and a mapping function that determines how the source columns are mapped to the target columns. Mapping the source to target columns by name can be done with the following sample functions:-

private fun mapping(source: DataContext, target: DataContext): Mapping {

val sourceColumns = source.defaultSchema.tables[0].columns
val targetColumns = target.defaultSchema.tables[0].columns

val items = sourceColumns
.map { sourceColumn -> mapColumn(sourceColumn, targetColumns) }
.flatMap { it.toList() }

return Mapping(items)
}

private fun mapColumn(column: org.apache.metamodel.schema.Column, columns: List<org.apache.metamodel.schema.Column>): List<ColumnMapping> {

return columns
.filter { otherColumn -> otherColumn.name == column.name }
.map { otherColumn -> ColumnMapping(column, otherColumn, column.name == “id”) }
}

And finally, when comparing the data the differences should be handled. For this a handler function is supplied to the compare method and the following sample function writes the results to the console.

private fun handleDifferences(differences: List<Difference>) {

System.out.format(“%15s %15s %15s %15s”, “—————“, “—————“, “—————“, “—————“)
System.out.println()
differences.forEach { diff ->
System.out.format(
“%15s %15s %15s %15s”,
diff.first.name,
diff.first.value.orElse(“null”),
diff.second.value.orElse(“null”),
diff.type)
System.out.println()
}
}