Java Code Examples for org.apache.hive.hcatalog.data.transfer.ReaderContext#numSplits()
The following examples show how to use
org.apache.hive.hcatalog.data.transfer.ReaderContext#numSplits() .
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: HCatalogIO.java From beam with Apache License 2.0 | 6 votes |
/** * Calculates the 'desired' number of splits based on desiredBundleSizeBytes which is passed as * a hint to native API. Retrieves the actual splits generated by native API, which could be * different from the 'desired' split count calculated using desiredBundleSizeBytes */ @Override public List<BoundedSource<HCatRecord>> split( long desiredBundleSizeBytes, PipelineOptions options) throws Exception { int desiredSplitCount = 1; long estimatedSizeBytes = getEstimatedSizeBytes(options); if (desiredBundleSizeBytes > 0 && estimatedSizeBytes > 0) { desiredSplitCount = (int) Math.ceil((double) estimatedSizeBytes / desiredBundleSizeBytes); } ReaderContext readerContext = getReaderContext(desiredSplitCount); // process the splits returned by native API // this could be different from 'desiredSplitCount' calculated above LOG.info( "Splitting into bundles of {} bytes: " + "estimated size {}, desired split count {}, actual split count {}", desiredBundleSizeBytes, estimatedSizeBytes, desiredSplitCount, readerContext.numSplits()); List<BoundedSource<HCatRecord>> res = new ArrayList<>(); for (int split = 0; split < readerContext.numSplits(); split++) { res.add(new BoundedHCatalogSource(spec.withContext(readerContext).withSplitId(split))); } return res; }
Example 2
Source File: HCatalogIOTest.java From beam with Apache License 2.0 | 6 votes |
/** Test of Read using SourceTestUtils.readFromSource(..). */ @Test @NeedsTestData public void testReadFromSource() throws Exception { ReaderContext context = getReaderContext(getConfigPropertiesAsMap(service.getHiveConf())); HCatalogIO.Read spec = HCatalogIO.read() .withConfigProperties(getConfigPropertiesAsMap(service.getHiveConf())) .withContext(context) .withTable(TEST_TABLE); List<String> records = new ArrayList<>(); for (int i = 0; i < context.numSplits(); i++) { BoundedHCatalogSource source = new BoundedHCatalogSource(spec.withSplitId(i)); for (HCatRecord record : SourceTestUtils.readFromSource(source, OPTIONS)) { records.add(record.get(0).toString()); } } assertThat(records, containsInAnyOrder(getExpectedRecords(TEST_RECORDS_COUNT).toArray())); }
Example 3
Source File: PartitionReaderFn.java From beam with Apache License 2.0 | 5 votes |
@ProcessElement public void processElement(ProcessContext c) throws Exception { final Read readRequest = c.element().getKey(); final Integer partitionIndexToRead = c.element().getValue(); ReaderContext readerContext = getReaderContext(readRequest, partitionIndexToRead); for (int i = 0; i < readerContext.numSplits(); i++) { HCatReader reader = DataTransferFactory.getHCatReader(readerContext, i); Iterator<HCatRecord> hcatIterator = reader.read(); while (hcatIterator.hasNext()) { final HCatRecord record = hcatIterator.next(); c.output(record); } } }