python - adapt() doesn't work within tf.data.Dataset.map() while building a preprocessing pipeline using keras preproces|Programmer puzzle solving

I'm trying to create a neural network that assigns tasks to employees based on their skills ().

The data contains categorical features. I'm using Keras preprocessing layers in the preprocessing function. The data isn't too large, so I chose to adapt the preprocessing layers to the features to build the vocabulary (instead of using a pre-built one). But whenever I try to use adapt() on a feature, it throws the following error:

dataset = tf.data.experimental.make_csv_dataset(csv_path, 
                                                label_name=label, 
                                                batch_size=128, 
                                                num_epochs=1, 
                                                shuffle=True, 
                                                shuffle_buffer_size=1000)

def preprocess(features, label):
    preprocessed_features = {}
    for name, feature in features.items():
        if feature.dtype == 'string':
            encoder = layers.StringLookup(output_mode='one_hot')
            encoder.adapt(feature)
            preprocessed_features[name] = encoder(feature)

        elif feature.dtype == 'int':
            normalizer = layers.Normalization(output_mode='one_hot')
            normalizer.adapt(feature)
            preprocessed_features[name] = normalizer(feature)

    return preprocessed_features, label

dataset = dataset.map(preprocess)

OperatorNotAllowedInGraphError: in user code:

    File "C:\Users\User\AppData\Local\Temp\4\ipykernel_7508\1668558205.py", line 8, in preprocess  *
        encoder.adapt(feat[name])
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\string_lookup.py", line 368, in adapt  **
        super().adapt(data, steps=steps)
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\index_lookup.py", line 582, in adapt
        self.finalize_state()
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\index_lookup.py", line 626, in finalize_state
        if self._has_input_vocabulary or tf.equal(self.token_counts.size(), 0):

    OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See .md#access-to-source-code for more information.

I figured the issue was caused by adapt() being implemented on a symbolic tensor, which doesn't hold any data. So I took a batch of data and saved the features and labels in memory to pass to adapt().

[(feat, label)] = dataset.take(1)

def preprocess(features, label):
    preprocessed_features = {}
    for name, feature in features.items():
        if feature.dtype == 'string':
            encoder = layers.StringLookup(output_mode='one_hot')
            encoder.adapt(feat[name])
            preprocessed_features[name] = encoder(feature)

        elif feature.dtype == 'int':
            normalizer = layers.Normalization(output_mode='one_hot')
            normalizer.adapt(feat[name])
            preprocessed_features[name] = normalizer(feature)

    return preprocessed_features, label

dataset.map(preprocess)

This code still produced the same error. What am I doing wrong?

And while we're at it, What could be the optimal way, preferably without using pandas, to build input and preprocessing pipelines that can run asynchronously on batches of data if the data is too large to be kept in memory (mine isn't but I might encounter such a situation in the future)?

I'm planning to keep the preprocessing model and the training model separate from each other (because the tf docs it is the best way to preprocess categorical features).

Programmer puzzle solving

python - adapt() doesn't work within tf.data.Dataset.map() while building a preprocessing pipeline using keras preproces

Articles related to this article

comment list (0)