最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

python - adapt() doesn't work within tf.data.Dataset.map() while building a preprocessing pipeline using keras preproces

matteradmin6PV0评论

I'm trying to create a neural network that assigns tasks to employees based on their skills ().

The data contains categorical features. I'm using Keras preprocessing layers in the preprocessing function. The data isn't too large, so I chose to adapt the preprocessing layers to the features to build the vocabulary (instead of using a pre-built one). But whenever I try to use adapt() on a feature, it throws the following error:

dataset = tf.data.experimental.make_csv_dataset(csv_path, 
                                                label_name=label, 
                                                batch_size=128, 
                                                num_epochs=1, 
                                                shuffle=True, 
                                                shuffle_buffer_size=1000)

def preprocess(features, label):
    preprocessed_features = {}
    for name, feature in features.items():
        if feature.dtype == 'string':
            encoder = layers.StringLookup(output_mode='one_hot')
            encoder.adapt(feature)
            preprocessed_features[name] = encoder(feature)

        elif feature.dtype == 'int':
            normalizer = layers.Normalization(output_mode='one_hot')
            normalizer.adapt(feature)
            preprocessed_features[name] = normalizer(feature)

    return preprocessed_features, label

dataset = dataset.map(preprocess)
OperatorNotAllowedInGraphError: in user code:

    File "C:\Users\User\AppData\Local\Temp\4\ipykernel_7508\1668558205.py", line 8, in preprocess  *
        encoder.adapt(feat[name])
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\string_lookup.py", line 368, in adapt  **
        super().adapt(data, steps=steps)
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\index_lookup.py", line 582, in adapt
        self.finalize_state()
    File "c:\Users\User\Spark\spark_env\Lib\site-packages\keras\src\layers\preprocessing\index_lookup.py", line 626, in finalize_state
        if self._has_input_vocabulary or tf.equal(self.token_counts.size(), 0):

    OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See .md#access-to-source-code for more information.

I figured the issue was caused by adapt() being implemented on a symbolic tensor, which doesn't hold any data. So I took a batch of data and saved the features and labels in memory to pass to adapt().

[(feat, label)] = dataset.take(1)

def preprocess(features, label):
    preprocessed_features = {}
    for name, feature in features.items():
        if feature.dtype == 'string':
            encoder = layers.StringLookup(output_mode='one_hot')
            encoder.adapt(feat[name])
            preprocessed_features[name] = encoder(feature)

        elif feature.dtype == 'int':
            normalizer = layers.Normalization(output_mode='one_hot')
            normalizer.adapt(feat[name])
            preprocessed_features[name] = normalizer(feature)

    return preprocessed_features, label

dataset.map(preprocess)

This code still produced the same error. What am I doing wrong?

And while we're at it, What could be the optimal way, preferably without using pandas, to build input and preprocessing pipelines that can run asynchronously on batches of data if the data is too large to be kept in memory (mine isn't but I might encounter such a situation in the future)?

I'm planning to keep the preprocessing model and the training model separate from each other (because the tf docs it is the best way to preprocess categorical features).

Articles related to this article

Post a comment

comment list (0)

  1. No comments so far