I’m doing a sentiment analysis on the IMDB dataset in tensorflow and I’m trying to augment the training dataset by using the textaugment library which they said is ‘plug and play’ into tensorflow. So it should be rather simple, but I’m new to tf so I’m not sure how to go about doing that. Here is what I have and what I am trying, based on reading the tutorials on the site.
I tried to do a map to augment the training data but I got an error. You can scroll down to the last code block to see the error.
pip install -q tensorflow-text pip install -q tf-models-official import os import shutil import tensorflow as tf import tensorflow_hub as hub import tensorflow_text as text from official.nlp import optimization # to create AdamW Optimizer import matplotlib.pyplot as plt tf.get_logger().setLevel('ERROR')
#Downloading the IMDB dataset and making the train/validation/test sets
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz' dataset = tf.keras.utils.get_file('aclImdb_v1.tar.gz', url, untar=True, cache_dir='.', cache_subdir='') dataset_dir = os.path.join(os.path.dirname(dataset), 'aclImdb') train_dir = os.path.join(dataset_dir, 'train') # remove unused folders to make it easier to load the data remove_dir = os.path.join(train_dir, 'unsup') shutil.rmtree(remove_dir) AUTOTUNE = tf.data.AUTOTUNE batch_size = 32 seed = 42 raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory( 'aclImdb/train', batch_size=batch_size, validation_split=0.2, subset='training', seed=seed) class_names = raw_train_ds.class_names train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE) val_ds = tf.keras.preprocessing.text_dataset_from_directory( 'aclImdb/train', batch_size=batch_size, validation_split=0.2, subset='validation', seed=seed) val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE) test_ds = tf.keras.preprocessing.text_dataset_from_directory( 'aclImdb/test', batch_size=batch_size) test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE) #setting up the textaugment try: import textaugment except ModuleNotFoundError: !pip install textaugment import textaugment from textaugment import EDA import nltk nltk.download('stopwords')
Now this is where I get the error, I tried a map on the train_ds and tried to add a random swap to each of the elements while keeping the class the same:
aug_ds = train_ds.map( lambda x, y: (t.random_swap(x), y))
AttributeError Traceback (most recent call last) <ipython-input-24-b4af68cc0677> in <module>() 1 aug_ds = train_ds.map( ----> 2 lambda x, y: (t.random_swap(x), y)) 10 frames /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs) 668 except Exception as e: # pylint:disable=broad-except 669 if hasattr(e, 'ag_error_metadata'): --> 670 raise e.ag_error_metadata.to_exception(e) 671 else: 672 raise AttributeError: in user code: <ipython-input-24-b4af68cc0677>:2 None * lambda x, y: (t.random_swap(x), y)) /usr/local/lib/python3.6/dist-packages/textaugment/eda.py:187 random_swap * self.validate(sentence=sentence, n=n) /usr/local/lib/python3.6/dist-packages/textaugment/eda.py:74 validate * if not isinstance(kwargs['sentence'].strip(), str) or len(kwargs['sentence'].strip()) == 0: AttributeError: 'Tensor' object has no attribute 'strip'
I am also trying to do the same. The error occurs because the textaugment function
t.random_swap() is supposed to work on Python string objects.
In your code, the function is taking in a Tensor with dtype=string. As of now, tensor objects do not have the same methods as Python strings. Hence, the error code.
Nb. tensorflow_text has some additional APIs to work with such tensors of string types. Albeit, it is limited at the moment to tokenization, checking upper or lower case etc. A long winded workaround is to use the
py_function wrapper but this reduces performance. Cheers and hope this helps. I opted not to use textaugment in the end in my use case.
Nbb. tf.strings APIs have a bit more functionalities, such as regex replace etc but it is not complicated enough for your use case of augmentation. Would be helpful to see what others come up with, or if there are future updates to either TF or textaugment.
Answered By – TangibleTech
Answer Checked By – Cary Denson (AngularFixing Admin)