Tf-Rec | Tensorflow 2 to utilize GPU Acceleration | Recommender System library
kandi X-RAY | Tf-Rec Summary
kandi X-RAY | Tf-Rec Summary
Tf-Rec is a python package for building Recommender Systems. It is built on top of Keras and Tensorflow 2 to utilize GPU Acceleration during training.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Preprocess the dataframe
- Cross validation
- Performs implicit feed prediction
- Fetches 10k 10k 10k ratings
Tf-Rec Key Features
Tf-Rec Examples and Code Snippets
Community Discussions
Trending Discussions on Tf-Rec
QUESTION
I am trying to follow this guide in order to serialize my input data into the TFRecord format but I keep hitting this error when trying to read it:
InvalidArgumentError: Key: my_key. Can't parse serialized Example.
I am not sure where I'm going wrong. Here is a minimal reproduction of the issue I cannot get past.
Serialise some sample data:
...ANSWER
Answered 2018-Nov-27 at 17:20tf.FixedLenFeature() is used for reading the fixed size arrays of data. And the shape of the data should be defined beforehand. Updating the parse function to
QUESTION
I was trying to save images of different sizes into tf-records. I found that even though the images have different sizes, I can still load them with FixedLenFeature
.
By checking the docs on FixedLenFeature
and VarLenFeature
, I found the difference seems to be that VarLenFeauture
returns a sparse tensor.
Could anyone illustrate some situations one should use FixedLenFeature
or VarLenFeature
?
ANSWER
Answered 2018-Apr-17 at 10:47You can load images probably beacause you saved them using feature type tf.train.BytesList()
and whole image data is one big byte value inside a list.
If I'm right you're using tf.decode_raw
to get the data out of the image you load from TFRecord.
Regarding example use cases:
I use VarLenFeature
for saving datasets for object detection task:
There's variable amount of bounding boxes per image (equal to object in image) therefore I need another feature objects_number
to track amount of objects (and bboxes).
Each bounding box itself is a list of 4 float coordinates
I'm using following code to load it:
QUESTION
I am getting used to the new dataset API and try to do some time series classification. I have a dataset formatted as tf-records in the shape of:
(time_steps x features)
. Also I have a label for each time step.
(time_steps x 1)
What I want to do is to reformat the dataset to have a rolling window of time steps like this:
(n x windows_size x features)
. With n
being the amounts of time_steps-window_size (if I use a stride of 1 for the rolling window)
The labels are supposed to be
(window_size x 1)
, meaning that we take the label of the last time_step in the window.
I already know, that I can use tf.sliding_window_batch()
to create the sliding window for the features. However, the labels get shaped in the same way, and I do not know how to do this correctly: (n x window_size x 1
How do I do this using the tensorflow dataset API? https://www.tensorflow.org/programmers_guide/datasets
Thanks for your help!
...ANSWER
Answered 2018-Jul-27 at 09:49I couldn't figure out how to do this, but I figured I might as well do it using numpy.
I found this great answer and applied it to my case.
Afterwards it was just using numpy like so:
QUESTION
This is a follow up to these two SO questions
Select random value from row in a TF.record array, with limits on what the value can be?
The first one mentioned that tfrecords can handle variable length data using tf.VarLenFeature()
I am still having trouble figuring how how to convert my number array to tfrecord file though. Here's what the first 10 rows looks like
...ANSWER
Answered 2018-Dec-22 at 15:50I assume you want to add numbers feature and list feature respectively here.
QUESTION
I have about 1.7 million observations. Each of them has about 4000 boolean features and 4 floating point labels/targets. The features are sparse and approximately homogeneously distributed (about 150 of the 4000 boolean values are set to True
per observation).
If I store the whole (1700000, 4000)
matrix as raw numpy file (npz
format), it takes about 100 MB of disk space. If I load it via np.load()
, it takes a few minutes and my RAM usage rises by about 7 GB, which is fine on its own.
The problem is, that I have to turn over my boolean values in a feed_dict
to a tf.placeholder
in order for the tf.data.Dataset
to be able to use it. This process takes another 7 GB of RAM. My plan is to collect even more data in the future (might become more than 10 million observations at some point).
Question: So how can I feed the data to my DNN (Feed-Forward, dense, not convolutional and not recurrent) without creating a bottle-neck and in a way that is native to TensorFlow? I would have thought that this is a pretty standard setting and many people should have that problem – why not? What do I do wrong/different than people without the problem?
I heard the tfrecord format is well integrated with TensorFlow and is able to load lazy but I think it is a bad idea to use that format for my feature structure as it creates one Message
per observation and saves the features as map
with the keys of all features as string per observation.
ANSWER
Answered 2018-Dec-20 at 17:18I've found a solution, called tf.data.Dataset.from_generator
.
This basically does the trick:
QUESTION
I am trying to create data batches for training 2 class semantic segmentation network. The target segmented image has 2 layers, first layer with 1 for all pixels of class-1 and 0 otherwise. The second layer has pixels inverted.
In the dataset I have the output images are 3 channel rgb images with [255,255,255]
and [0,0,0]
. The input and output images are stored in tf-record files.
When I was experimenting in numpy I created a 2 channel binary image with code below:
...ANSWER
Answered 2018-Oct-08 at 19:08After going through tf documentation for some time, I came up with the following solution
QUESTION
I am using ubuntu 16.04, with GPU Geforce 1080, 8 GB GPU memory.
I have properly created TF-record files, and I trained the model successfully. However I still have two problems.
I did the following steps and I still have two problems, just tell me please what I am missing:-
I used VOCdevkit and I properly created two files which are:- pascal_train.record
and pascal_val.record
Then,
1- From this link, I used the raccoon images, I placed them into the following directory models/object_detection/VOCdevkit/VOC2012/JPEGImages
(after I deleted the previous images).
Then, I used the raccoon annotation, I placed them into the following directory models/object_detection/VOCdevkit/VOC2012/Annotation
(after I deleted the previous ones).
2- I modified the models/object_detection/data/pascal_label_map.pbxt
and I wrote one class name which is 'raccoon'
3- I used ssd_mobilenet_v1_pets.config
. I modified it, the number of class is only one and I did not train from scratch, I used ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
ANSWER
Answered 2017-Aug-22 at 21:40Question 1 - this is just a problem that you'll encounter because of your hardware. Once you get to a point where you'd like to a evaluate the model, just stop your training and run your eval command (it seems as though you've successfully evaluated your model, so you know the command). It will provide you a some metrics for the most recent model checkpoint. You can iterate through this process until you're comfortable with the performance of your model.
Question 2 - These event files are used as input into Tensorboard. The events files are in binary format, thus are not human readable. Start a Tensorboard application while your model is training and/or evaluating. To do so, run something like this:
tensorboard --logdir=train:/home/grasp001/abdu-py2/models/object_detection/train1/train,eval:/home/grasp001/abdu-py2/models/object_detection/train1/eval
Once you have Tensorboard running, use your web browser to navigate to localhost:6006
to check out your metrics. You can use this during training as well to monitor loss and other metrics for each step of training.
QUESTION
I am trying to get deterministic behaviour from tf.train.shuffle_batch()
. I could, instead, use tf.train.batch()
which works fine (always the same order of elements), but I need to get examples from multiple tf-records and so I am stuck with shuffle_batch()
.
I am using:
...ANSWER
Answered 2018-Jan-08 at 20:41Maybe I misunderstood something, but you can collect multiple tf-records in a queue with tf.train.string_input_producer()
, then read the examples into tensors and finally use tf.train.batch()
.
Take a look at CIFAR-10 input.
QUESTION
As denoted here youtube-8m tf-records are saved with the format comes at the end of my question.I write a code to extract features. but there is a problem. the code can read all elements in features successfully but it is not able to read feature_lists. in fact, the example does not include features_list at all and I get an error while I try to access it. How can I read the feauures_list. I attach Data format, My code and the output :
...ANSWER
Answered 2017-Sep-14 at 12:23Instead of
QUESTION
I'm using slim to convert data into TF-Record format and looking at this example, where the MNIST data-set is being converted.
On lines 127
to 128
, the image png_string
is assigned a label, labels[j]
.
ANSWER
Answered 2017-Jul-10 at 04:56Add it similar to the one you have already declared:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Tf-Rec
API Documentation
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page