wakasce.blogg.se - Pentaho data integration wiki

For String data java's native compareTo() will be called. Comparison will be delegated to metadata object. According to step settings this compare can be performed for some key fields or for the whole row (described above). When the second row came to input step will compare this data with previous stored in cache. Since we have first row at the beginning of transformation and some 'compare to' fields defined, step will copy only one current row to it's internal cache and output this first row since first row is always unique. So there is to options possible - we will determine unique rows by all data in that row or only by a some fields. If none key fields for 'Unique rows' step is defined all row will be used to compare (that means all fields will be compared one by one). If you remember there is a warning in kettle if user try to split rows with different structure. So rows structure for unique rows must be same. For all next rows this fields will be extracted by index, not by name. By the name field step will try to find array indexes of 'fields to compare'. Remember this if previous steps performs some weighted operations. If you have input like table input step this means connection with database will be established, data will be fetched and when first row will enter unique rows - all will fall down. This will happen on run-time, not on 'preparing transformation' phase. If this fields names will not be found in the input step will fail immediately and all transformation will also fails. When first row came (on the example above first row will came from 'Sort rows' step) - 'Unique rows' step inspect first row for some keys identified as 'Fields to compare on'. When transformation started this step waiting for the first row for input. Lets take a look to algorithm that implement this step.