Java Code Examples for org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers#parseTableSpec()

The following examples show how to use org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers#parseTableSpec() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: BigQueryToParquet.java    From DataflowTemplates with Apache License 2.0 6 votes vote down vote up
/**
 * Creates ReadSession for schema extraction.
 *
 * @param client BigQueryStorage client used to create ReadSession.
 * @param tableString String that represents table to export from.
 * @param tableReadOptions TableReadOptions that specify any fields in the table to filter on.
 * @return session ReadSession object that contains the schema for the export.
 */
static ReadSession create(
    BigQueryStorageClient client, String tableString, TableReadOptions tableReadOptions) {
  TableReference tableReference = BigQueryHelpers.parseTableSpec(tableString);
  String parentProjectId = "projects/" + tableReference.getProjectId();

  TableReferenceProto.TableReference storageTableRef =
      TableReferenceProto.TableReference.newBuilder()
          .setProjectId(tableReference.getProjectId())
          .setDatasetId(tableReference.getDatasetId())
          .setTableId(tableReference.getTableId())
          .build();

  CreateReadSessionRequest.Builder builder =
      CreateReadSessionRequest.newBuilder()
          .setParent(parentProjectId)
          .setReadOptions(tableReadOptions)
          .setTableReference(storageTableRef);
  try {
    return client.createReadSession(builder.build());
  } catch (InvalidArgumentException iae) {
    LOG.error("Error creating ReadSession: " + iae.getMessage());
    throw new RuntimeException(iae);
  }
}
 
Example 2
Source File: KeyByBigQueryTableDestination.java    From gcp-ingestion with Mozilla Public License 2.0 4 votes vote down vote up
/**
 * Return the appropriate table destination instance for the given document type and other
 * attributes.
 */
public TableDestination getTableDestination(Map<String, String> attributes) {
  attributes = new HashMap<>(attributes);

  // We coerce all docType and namespace names to be snake_case and to remove invalid
  // characters; these transformations MUST match with the transformations applied by the
  // jsonschema-transpiler and mozilla-schema-generator when creating table schemas in BigQuery.
  final String namespace = attributes.get(Attribute.DOCUMENT_NAMESPACE);
  final String docType = attributes.get(Attribute.DOCUMENT_TYPE);
  if (namespace != null) {
    attributes.put(Attribute.DOCUMENT_NAMESPACE, getAndCacheNormalizedName(namespace));
  }
  if (docType != null) {
    attributes.put(Attribute.DOCUMENT_TYPE, getAndCacheNormalizedName(docType));
  }

  // Only letters, numbers, and underscores are allowed in BigQuery dataset and table names,
  // but some doc types and namespaces contain '-', so we convert to '_'; we don't pass all
  // values through getAndCacheBqName to avoid expensive regex operations and polluting the
  // cache of transformed field names.
  attributes = Maps.transformValues(attributes, v -> v.replaceAll("-", "_"));

  final String tableSpec = StringSubstitutor.replace(tableSpecTemplate.get(), attributes);

  // Send to error collection if incomplete tableSpec; $ is not a valid char in tableSpecs.
  if (tableSpec.contains("$")) {
    throw new IllegalArgumentException("Element did not contain all the attributes needed to"
        + " fill out variables in the configured BigQuery output template: "
        + tableSpecTemplate.get());
  }

  final TableDestination tableDestination = new TableDestination(tableSpec, null,
      new TimePartitioning().setField(partitioningField.get()),
      new Clustering().setFields(clusteringFields.get()));
  final TableReference ref = BigQueryHelpers.parseTableSpec(tableSpec);
  final DatasetReference datasetRef = new DatasetReference().setProjectId(ref.getProjectId())
      .setDatasetId(ref.getDatasetId());

  if (bqService == null) {
    bqService = BigQueryOptions.newBuilder().setProjectId(ref.getProjectId())
        .setRetrySettings(RETRY_SETTINGS).build().getService();
  }

  // Get and cache a listing of table names for this dataset.
  Set<String> tablesInDataset;
  if (tableListingCache == null) {
    // We need to be very careful about settings for the cache here. We have had significant
    // issues in the past due to exceeding limits on BigQuery API requests; see
    // https://bugzilla.mozilla.org/show_bug.cgi?id=1623000
    tableListingCache = CacheBuilder.newBuilder().expireAfterWrite(Duration.ofMinutes(10))
        .build();
  }
  try {
    tablesInDataset = tableListingCache.get(datasetRef, () -> {
      Set<String> tableSet = new HashSet<>();
      Dataset dataset = bqService.getDataset(ref.getDatasetId());
      if (dataset != null) {
        dataset.list().iterateAll().forEach(t -> {
          tableSet.add(t.getTableId().getTable());
        });
      }
      return tableSet;
    });
  } catch (ExecutionException e) {
    throw new UncheckedExecutionException(e.getCause());
  }

  // Send to error collection if dataset or table doesn't exist so BigQueryIO doesn't throw a
  // pipeline execution exception.
  if (tablesInDataset.isEmpty()) {
    throw new IllegalArgumentException("Resolved destination dataset does not exist or has no "
        + " tables for tableSpec " + tableSpec);
  } else if (!tablesInDataset.contains(ref.getTableId())) {
    throw new IllegalArgumentException("Resolved destination table does not exist: " + tableSpec);
  }

  return tableDestination;
}
 
Example 3
Source File: FeatureSetSpecToTableSchema.java    From feast with Apache License 2.0 4 votes vote down vote up
private TableId generateTableId(String specKey) {
  TableDestination tableDestination = BigQuerySinkHelpers.getTableDestination(dataset, specKey);
  TableReference tableReference = BigQueryHelpers.parseTableSpec(tableDestination.getTableSpec());
  return TableId.of(
      tableReference.getProjectId(), tableReference.getDatasetId(), tableReference.getTableId());
}