renderdoc_for_game_data | Access data with labels from GTA5 using renderdoc | Dataset library

 by   xiaofeng94 C++ Version: Current License: MIT

kandi X-RAY | renderdoc_for_game_data Summary

renderdoc_for_game_data is a C++ library typically used in Artificial Intelligence, Dataset applications. renderdoc_for_game_data has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.
Access data with labels from GTA5 using renderdoc
    Support
      Quality
        Security
          License
            Reuse
            Support
              Quality
                Security
                  License
                    Reuse

                      kandi-support Support

                        summary
                        renderdoc_for_game_data has a low active ecosystem.
                        summary
                        It has 7 star(s) with 3 fork(s). There are 2 watchers for this library.
                        summary
                        It had no major release in the last 6 months.
                        summary
                        There are 0 open issues and 3 have been closed. On average issues are closed in 4 days. There are no pull requests.
                        summary
                        It has a neutral sentiment in the developer community.
                        summary
                        The latest version of renderdoc_for_game_data is current.
                        This Library - Support
                          Best in #Dataset
                            Average in #Dataset
                            This Library - Support
                              Best in #Dataset
                                Average in #Dataset

                                  kandi-Quality Quality

                                    summary
                                    renderdoc_for_game_data has no bugs reported.
                                    This Library - Quality
                                      Best in #Dataset
                                        Average in #Dataset
                                        This Library - Quality
                                          Best in #Dataset
                                            Average in #Dataset

                                              kandi-Security Security

                                                summary
                                                renderdoc_for_game_data has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
                                                This Library - Security
                                                  Best in #Dataset
                                                    Average in #Dataset
                                                    This Library - Security
                                                      Best in #Dataset
                                                        Average in #Dataset

                                                          kandi-License License

                                                            summary
                                                            renderdoc_for_game_data is licensed under the MIT License. This license is Permissive.
                                                            summary
                                                            Permissive licenses have the least restrictions, and you can use them in most projects.
                                                            This Library - License
                                                              Best in #Dataset
                                                                Average in #Dataset
                                                                This Library - License
                                                                  Best in #Dataset
                                                                    Average in #Dataset

                                                                      kandi-Reuse Reuse

                                                                        summary
                                                                        renderdoc_for_game_data releases are not available. You will need to build from source code and install.
                                                                        This Library - Reuse
                                                                          Best in #Dataset
                                                                            Average in #Dataset
                                                                            This Library - Reuse
                                                                              Best in #Dataset
                                                                                Average in #Dataset
                                                                                  Top functions reviewed by kandi - BETA
                                                                                  kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
                                                                                  Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
                                                                                  Get all kandi verified functions for this library.
                                                                                  Get all kandi verified functions for this library.

                                                                                  renderdoc_for_game_data Key Features

                                                                                  Access data with labels from GTA5 using renderdoc

                                                                                  renderdoc_for_game_data Examples and Code Snippets

                                                                                  No Code Snippets are available at this moment for renderdoc_for_game_data.
                                                                                  Community Discussions

                                                                                  Trending Discussions on Dataset

                                                                                  Replacing dataframe value given multiple condition from another dataframe with R
                                                                                  chevron right
                                                                                  Does Hub support integrations for MinIO, AWS, and GCP? If so, how does it work?
                                                                                  chevron right
                                                                                  Custom Sampler correct use in Pytorch
                                                                                  chevron right
                                                                                  C++ what is the best sorting container and approach for large datasets (millions of lines)
                                                                                  chevron right
                                                                                  How to create a dataset for tensorflow from a txt file containing paths and labels?
                                                                                  chevron right
                                                                                  Converting 0-1 values in dataset with the name of the column if the value of the cell is 1
                                                                                  chevron right
                                                                                  How can i get person class and segmentation from MSCOCO dataset?
                                                                                  chevron right
                                                                                  R - If column contains a string from vector, append flag into another column
                                                                                  chevron right
                                                                                  How to divide a large image dataset into groups of pictures and save them inside subfolders using python?
                                                                                  chevron right
                                                                                  Proper way of cleaning csv file
                                                                                  chevron right

                                                                                  QUESTION

                                                                                  Replacing dataframe value given multiple condition from another dataframe with R
                                                                                  Asked 2022-Apr-14 at 16:16

                                                                                  I have two dataframes one with the dates (converted in months) of multiple survey replicates for a given grid cell and the other one with snow data for each month for the same grid cell, they have a matching ID column to identify the cells. What I would like to do is to replace in the first dataframe, the one with months of survey replicates, the month value with the snow value for that month considering the grid cell ID. Thank you

                                                                                  CellID <- c(1,2,3,4,5,6)
                                                                                  sampl1 <- c("oct", "oct", "oct", "nov", NA, NA)
                                                                                  sampl2 <- c("nov", "nov", "jan", NA, NA, NA)
                                                                                  sampl3 <- c("dec", "dec", "jan", NA, NA, NA)
                                                                                  df1 <- data.frame(CellID, sampl1, sampl2, sampl3)
                                                                                  print(df1)
                                                                                  
                                                                                  CellID <- c(1,2,3,4,5,6)
                                                                                  oct <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1)
                                                                                  nov <- c(0.4, 0.5, 0.4, 0.5, 0.6, 0.5)
                                                                                  dec <- c(0.6, 0.7, 0.8, 0.7, 0.6, 0.8)
                                                                                  df2 <- data.frame(CellID, oct, nov, dec)
                                                                                  print(df2)
                                                                                  
                                                                                  CellID <- c(1,2,3,4,5,6)
                                                                                  sampl1_snow <- c(0.1, 0.1, 0.1, 0.5, NA, NA)
                                                                                  sampl2_snow <- c(0.4, 0.5, 0.9, NA, NA, NA)
                                                                                  sampl3_snow <- c(0.6, 0.7, 1, NA, NA, NA)
                                                                                  df3 <- data.frame(CellID, sampl1_snow, sampl2_snow, sampl3_snow)
                                                                                  print(df3)
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Apr-14 at 14:50
                                                                                  df3 <- df1
                                                                                  df3[!is.na(df1)] <- df2[!is.na(df1)]
                                                                                  #   CellID sampl1 sampl2 sampl3
                                                                                  # 1      1    0.1    0.4    0.6
                                                                                  # 2      2    0.1    0.5    0.7
                                                                                  # 3      3    0.1    0.4    0.8
                                                                                  # 4      4    0.1      
                                                                                  # 5      5         
                                                                                  # 6      6         
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/71873315

                                                                                  QUESTION

                                                                                  Does Hub support integrations for MinIO, AWS, and GCP? If so, how does it work?
                                                                                  Asked 2022-Mar-19 at 16:28

                                                                                  I was taking a look at Hub—the dataset format for AI—and noticed that hub integrates with GCP and AWS. I was wondering if it also supported integrations with MinIO.

                                                                                  I know that Hub allows you to directly stream datasets from cloud storage to ML workflows but I’m not sure which ML workflows it integrates with.

                                                                                  I would like to use MinIO over S3 since my team has a self-hosted MinIO instance (aka it's free).

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-19 at 16:28

                                                                                  Hub allows you to load data from anywhere. Hub works locally, on Google Cloud, MinIO, AWS as well as Activeloop storage (no servers needed!). So, it allows you to load data and directly stream datasets from cloud storage to ML workflows.

                                                                                  You can find more information about storage authentication in the Hub docs.

                                                                                  Then, Hub allows you to stream data to PyTorch or TensorFlow with simple dataset integrations as if the data were local since you can connect Hub datasets to ML frameworks.

                                                                                  Source https://stackoverflow.com/questions/71539946

                                                                                  QUESTION

                                                                                  Custom Sampler correct use in Pytorch
                                                                                  Asked 2022-Mar-17 at 19:22

                                                                                  I have a map-stype dataset, which is used for instance segmentation tasks. The dataset is very imbalanced, in the sense that some images have only 10 objects while others have up to 1200.

                                                                                  How can I limit the number of objects per batch?

                                                                                  A minimal reproducible example is:

                                                                                  import math
                                                                                  import torch
                                                                                  import random
                                                                                  import numpy as np
                                                                                  import pandas as pd
                                                                                  from torch.utils.data import Dataset
                                                                                  from torch.utils.data.sampler import BatchSampler
                                                                                  
                                                                                  
                                                                                  np.random.seed(0)
                                                                                  random.seed(0)
                                                                                  torch.manual_seed(0)
                                                                                  
                                                                                  
                                                                                  W = 700
                                                                                  H = 1000
                                                                                  
                                                                                  def collate_fn(batch) -> tuple:
                                                                                      return tuple(zip(*batch))
                                                                                  
                                                                                  class SyntheticDataset(Dataset):
                                                                                      def __init__(self, image_ids):
                                                                                          self.image_ids = torch.tensor(image_ids, dtype=torch.int64)
                                                                                          self.num_classes = 9
                                                                                  
                                                                                      def __len__(self):
                                                                                          return len(self.image_ids)
                                                                                  
                                                                                      def __getitem__(self, idx: int):
                                                                                          """
                                                                                              returns single sample
                                                                                          """
                                                                                          # print("idx: ", idx)
                                                                                  
                                                                                          # deliberately left dangling
                                                                                          # id = self.image_ids[idx].item()
                                                                                          # image_id = self.image_ids[idx]
                                                                                          image_id = torch.as_tensor(idx)
                                                                                          image = torch.randint(0, 255, (H, W))
                                                                                  
                                                                                          num_objects = random.randint(10, 1200)
                                                                                          image = torch.randint(0, 255, (3, H, W))
                                                                                          masks = torch.randint(0, 255, (num_objects, H, W))
                                                                                  
                                                                                          target = {}
                                                                                          target["image_id"] = image_id
                                                                                  
                                                                                          areas = torch.randint(100, 20000, (1, num_objects), dtype=torch.int64)
                                                                                          boxes = torch.randint(100, H * W, (num_objects, 4), dtype=torch.int64)
                                                                                          labels = torch.randint(1, self.num_classes, (1, num_objects), dtype=torch.int64)
                                                                                          iscrowd = torch.zeros(len(labels), dtype=torch.int64)
                                                                                  
                                                                                          target["boxes"] = boxes
                                                                                          target["labels"] = labels
                                                                                          target["area"] = areas
                                                                                          target["iscrowd"] = iscrowd
                                                                                          target["masks"] = masks
                                                                                  
                                                                                          return image, target, image_id
                                                                                  
                                                                                  
                                                                                  class BalancedObjectsSampler(BatchSampler):
                                                                                      """Samples either batch_size images or batches num_objs_per_batch objects.
                                                                                  
                                                                                      Args:
                                                                                          data_source (list): contains tuples of (img_id).
                                                                                          batch_size (int): batch size.
                                                                                          num_objs_per_batch (int): number of objects in a batch.
                                                                                      Return
                                                                                          yields the batch_ids/image_ids/image_indices
                                                                                  
                                                                                      """
                                                                                  
                                                                                      def __init__(self, data_source, batch_size, num_objs_per_batch, drop_last=False):
                                                                                          self.data_source = data_source
                                                                                          self.sampler = data_source
                                                                                          self.batch_size = batch_size
                                                                                          self.drop_last = drop_last
                                                                                          self.num_objs_per_batch = num_objs_per_batch
                                                                                          self.batch_count = math.ceil(len(self.data_source) / self.batch_size)
                                                                                  
                                                                                      def __iter__(self):
                                                                                  
                                                                                          obj_count = 0
                                                                                          batch = []
                                                                                          batches = []
                                                                                          counter = 0
                                                                                          for i, (k, s) in enumerate(self.data_source.iteritems()):
                                                                                              if (
                                                                                                  obj_count <= obj_count + s
                                                                                                  and len(batch) <= self.batch_size - 1
                                                                                                  and obj_count + s <= self.num_objs_per_batch
                                                                                                  and i < len(self.data_source) - 1
                                                                                              ):
                                                                                                  # because of https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler
                                                                                                  batch.append(i)
                                                                                                  obj_count += s
                                                                                              else:
                                                                                                  batches.append(batch)
                                                                                                  yield batch
                                                                                                  obj_count = 0
                                                                                                  batch = []
                                                                                              counter += 1
                                                                                  
                                                                                  
                                                                                  obj_sums = {}
                                                                                  batch_size = 10
                                                                                  workers = 4
                                                                                  fake_image_ids = np.random.randint(1600000, 1700000, 100)
                                                                                  
                                                                                  # assigning any in-range number objects count to each image
                                                                                  for i, k in enumerate(fake_image_ids):
                                                                                      obj_sums[k] = random.randint(10, 1200)
                                                                                  
                                                                                  obj_counts = pd.Series(obj_sums)
                                                                                  
                                                                                  train_dataset = SyntheticDataset(image_ids=fake_image_ids)
                                                                                  
                                                                                  balanced_sampler = BalancedObjectsSampler(
                                                                                      data_source=obj_counts,
                                                                                      batch_size=batch_size,
                                                                                      num_objs_per_batch=1500,
                                                                                      drop_last=False,
                                                                                  )
                                                                                  
                                                                                  data_loader_sampler = torch.utils.data.DataLoader(
                                                                                      train_dataset,
                                                                                      num_workers=workers,
                                                                                      collate_fn=collate_fn,
                                                                                      sampler=balanced_sampler,
                                                                                  )
                                                                                  
                                                                                  data_loader_iter = torch.utils.data.DataLoader(
                                                                                      train_dataset,
                                                                                      batch_size=batch_size,
                                                                                      shuffle=False,
                                                                                      num_workers=workers,
                                                                                      collate_fn=collate_fn,
                                                                                  )
                                                                                  
                                                                                  

                                                                                  Iterating over the balanced_sampler

                                                                                  for i, bal_batch in enumerate(balanced_sampler):
                                                                                      print(f"batch_{i}: ", bal_batch)
                                                                                  

                                                                                  yields

                                                                                  batch_0:  [0]
                                                                                  batch_1:  [2, 3]
                                                                                  batch_2:  [5]
                                                                                  batch_3:  [7]
                                                                                  batch_4:  [9, 10]
                                                                                  batch_5:  [12, 13, 14, 15]
                                                                                  batch_6:  [17, 18]
                                                                                  batch_7:  [20, 21, 22]
                                                                                  batch_8:  [24, 25]
                                                                                  batch_9:  [27]
                                                                                  batch_10:  [29]
                                                                                  batch_11:  [31]
                                                                                  batch_12:  [33]
                                                                                  batch_13:  [35, 36, 37]
                                                                                  batch_14:  [39, 40]
                                                                                  batch_15:  [42, 43]
                                                                                  batch_16:  [45, 46]
                                                                                  batch_17:  [48, 49, 50]
                                                                                  batch_18:  [52, 53, 54]
                                                                                  batch_19:  [56]
                                                                                  batch_20:  [58, 59]
                                                                                  batch_21:  [61, 62]
                                                                                  batch_22:  [64]
                                                                                  batch_23:  [66]
                                                                                  batch_24:  [68]
                                                                                  batch_25:  [70, 71]
                                                                                  batch_26:  [73]
                                                                                  batch_27:  [75, 76, 77]
                                                                                  batch_28:  [79, 80]
                                                                                  batch_29:  [82, 83, 84, 85, 86, 87]
                                                                                  batch_30:  [89]
                                                                                  batch_31:  [91]
                                                                                  batch_32:  [93, 94]
                                                                                  batch_33:  [96]
                                                                                  batch_34:  [98]
                                                                                  

                                                                                  The above displayed values are the images' indices, but could also be the batch index or even the images' ids.

                                                                                  By running

                                                                                  for i, batch in enumerate(data_loader_sampler):
                                                                                      print("__sample__: ", i, len(batch[0]))
                                                                                  

                                                                                  One sees that the batch contains a single sample instead of the expected amount.

                                                                                  __sample__:  0 1
                                                                                  __sample__:  1 1
                                                                                  __sample__:  2 1
                                                                                  __sample__:  3 1
                                                                                  __sample__:  4 1
                                                                                  __sample__:  5 1
                                                                                  __sample__:  6 1
                                                                                  __sample__:  7 1
                                                                                  __sample__:  8 1
                                                                                  __sample__:  9 1
                                                                                  __sample__:  10 1
                                                                                  __sample__:  11 1
                                                                                  __sample__:  12 1
                                                                                  __sample__:  13 1
                                                                                  __sample__:  14 1
                                                                                  __sample__:  15 1
                                                                                  __sample__:  16 1
                                                                                  __sample__:  17 1
                                                                                  __sample__:  18 1
                                                                                  __sample__:  19 1
                                                                                  __sample__:  20 1
                                                                                  __sample__:  21 1
                                                                                  __sample__:  22 1
                                                                                  __sample__:  23 1
                                                                                  __sample__:  24 1
                                                                                  __sample__:  25 1
                                                                                  __sample__:  26 1
                                                                                  __sample__:  27 1
                                                                                  __sample__:  28 1
                                                                                  __sample__:  29 1
                                                                                  __sample__:  30 1
                                                                                  __sample__:  31 1
                                                                                  __sample__:  32 1
                                                                                  __sample__:  33 1
                                                                                  __sample__:  34 1
                                                                                  

                                                                                  What I am really trying to prevent is the following behavior that arises from

                                                                                  for i, batch in enumerate(data_loader_iter):
                                                                                      print("__iter__: ", i, sum([k["masks"].shape[0] for k in batch[1]]))
                                                                                  

                                                                                  which is

                                                                                  __iter__:  0 2510
                                                                                  __iter__:  1 2060
                                                                                  __iter__:  2 2203
                                                                                  __iter__:  3 2815
                                                                                  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
                                                                                  Traceback (most recent call last):
                                                                                    File "/usr/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
                                                                                      obj = _ForkingPickler.dumps(obj)
                                                                                    File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
                                                                                      cls(buf, protocol).dump(obj)
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 328, in reduce_storage
                                                                                      fd, size = storage._share_fd_()
                                                                                  RuntimeError: falseINTERNAL ASSERT FAILED at "../aten/src/ATen/MapAllocator.cpp":300, please report a bug to PyTorch. unable to write to file 
                                                                                  Traceback (most recent call last):
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
                                                                                      data = self._data_queue.get(timeout=timeout)
                                                                                    File "/usr/lib/python3.8/multiprocessing/queues.py", line 107, in get
                                                                                      if not self._poll(timeout):
                                                                                    File "/usr/lib/python3.8/multiprocessing/connection.py", line 257, in poll
                                                                                      return self._poll(timeout)
                                                                                    File "/usr/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
                                                                                      r = wait([self], timeout)
                                                                                    File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait
                                                                                      ready = selector.select(timeout)
                                                                                    File "/usr/lib/python3.8/selectors.py", line 415, in select
                                                                                      fd_event_list = self._selector.poll(timeout)
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
                                                                                      _error_if_any_worker_fails()
                                                                                  RuntimeError: DataLoader worker (pid 431257) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
                                                                                  
                                                                                  The above exception was the direct cause of the following exception:
                                                                                  
                                                                                  Traceback (most recent call last):
                                                                                    File "so.py", line 170, in 
                                                                                      for i, batch in enumerate(data_loader_iter):
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
                                                                                      data = self._next_data()
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
                                                                                      idx, data = self._get_data()
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
                                                                                      success, data = self._try_get_data()
                                                                                    File "/blip/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
                                                                                      raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
                                                                                  RuntimeError: DataLoader worker (pid(s) 431257) exited unexpectedly
                                                                                  
                                                                                  

                                                                                  which invariably happens when the number of objects per batch is greater than ~2500.

                                                                                  An immediate workaround would be to set the batch_size low, I just need a more optimal solution.

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-17 at 19:22

                                                                                  If what you are trying to solve really is:

                                                                                  ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
                                                                                  

                                                                                  You could try resizing the allocated shared memory with

                                                                                  # mount -o remount,size=G /dev/shm
                                                                                  

                                                                                  However, as this is not always possible, one fix to your problem would be

                                                                                  class SyntheticDataset(Dataset):
                                                                                  
                                                                                      def __init__(self, image_ids):
                                                                                          self.image_ids = torch.tensor(image_ids, dtype=torch.int64)
                                                                                          self.num_classes = 9
                                                                                  
                                                                                      def __len__(self):
                                                                                          return len(self.image_ids)
                                                                                  
                                                                                      def __getitem__(self, indices):
                                                                                          worker_info = torch.utils.data.get_worker_info()
                                                                                  
                                                                                          batch = []
                                                                                          for i in indices:
                                                                                              sample = self.get_sample(i)
                                                                                              batch.append(sample)
                                                                                          gc.collect()
                                                                                          return batch
                                                                                  
                                                                                      def get_sample(self, idx: int):
                                                                                  
                                                                                          image_id = torch.as_tensor(idx)
                                                                                          image = torch.randint(0, 255, (H, W))
                                                                                  
                                                                                          num_objects = idx
                                                                                          image = torch.randint(0, 255, (3, H, W))
                                                                                          masks = torch.randint(0, 255, (num_objects, H, W))
                                                                                  
                                                                                          target = {}
                                                                                          target["image_id"] = image_id
                                                                                  
                                                                                          areas = torch.randint(100, 20000, (1, num_objects), dtype=torch.int64)
                                                                                          boxes = torch.randint(100, H * W, (num_objects, 4), dtype=torch.int64)
                                                                                          labels = torch.randint(1, self.num_classes, (1, num_objects), dtype=torch.int64)
                                                                                          iscrowd = torch.zeros(len(labels), dtype=torch.int64)
                                                                                  
                                                                                          target["boxes"] = boxes
                                                                                          target["labels"] = labels
                                                                                          target["area"] = areas
                                                                                          target["iscrowd"] = iscrowd
                                                                                          target["masks"] = masks
                                                                                  
                                                                                          return image, target, image_id
                                                                                  
                                                                                  

                                                                                  and

                                                                                  class BalancedObjectsSampler(BatchSampler):
                                                                                      """Samples either batch_size images or batches num_objs_per_batch objects.
                                                                                  
                                                                                      Args:
                                                                                          data_source (list): contains tuples of (img_id).
                                                                                          batch_size (int): batch size.
                                                                                          num_objs_per_batch (int): number of objects in a batch.
                                                                                      Return
                                                                                          yields the batch_ids/image_ids/image_indices
                                                                                  
                                                                                      """
                                                                                  
                                                                                      def __init__(self, data_source, batch_size, num_objs_per_batch, drop_last=False):
                                                                                          self.data_source = data_source
                                                                                          self.sampler = data_source
                                                                                          self.batch_size = batch_size
                                                                                          self.drop_last = drop_last
                                                                                          self.num_objs_per_batch = num_objs_per_batch
                                                                                          self.batch_count = math.ceil(len(self.data_source) / self.batch_size)
                                                                                  
                                                                                          obj_count = 0
                                                                                          batch = []
                                                                                          batches = []
                                                                                          batches_sums = []
                                                                                          for i, (k, s) in enumerate(self.data_source.iteritems()):
                                                                                  
                                                                                              if (
                                                                                                  len(batch) < self.batch_size
                                                                                                  and obj_count + s < self.num_objs_per_batch
                                                                                                  and i < len(self.data_source) - 1
                                                                                              ):
                                                                                                  batch.append(s)
                                                                                                  obj_count += s
                                                                                              else:
                                                                                                  batches.append(len(batch))
                                                                                                  batches_sums.append(obj_count)
                                                                                                  obj_count = 0
                                                                                                  batch = []
                                                                                  
                                                                                          self.batches = batches
                                                                                          self.batch_count = len(batches)
                                                                                  
                                                                                      def __iter__(self):
                                                                                          batch = []
                                                                                          img_counts_id = 0
                                                                                          for idx, (k, s) in enumerate(self.data_source.iteritems()):
                                                                                              if len(batch) < self.batches[img_counts_id] and idx < len(self.data_source):
                                                                                                  batch.append(s)
                                                                                              elif len(batch) == self.batches[img_counts_id]:
                                                                                                  gc.collect()
                                                                                                  yield batch
                                                                                                  batch = []
                                                                                                  if img_counts_id < self.batch_count - 1:
                                                                                                      img_counts_id += 1
                                                                                                  else:
                                                                                                      break
                                                                                  
                                                                                          if len(batch) > 0 and not self.drop_last:
                                                                                              yield batch
                                                                                  
                                                                                      def __len__(self) -> int:
                                                                                          if self.drop_last:
                                                                                              return len(self.data_source) // self.batch_size
                                                                                          else:
                                                                                              return (len(self.data_source) + self.batch_size - 1) // self.batch_size
                                                                                  

                                                                                  As SyntheticDataset's __getitem__ was receiving a list of indices, the simplest solution would just iterate over the indices and retrieve a list of samples. You may just have to collate the output differently in order to feed it to your model.

                                                                                  For the BalancedObjectsSampler, I calculated the size of each batch within the __init__ and used it in __iter__ to assemble the batches.

                                                                                  NOTE: This will still fail if your num_workers > 0 for you are trying to pack at most 1500 objects into a batch - and usually one worker loads one batch at a time. Hence, you have to re-assess your num_objs_per_batch when considering using multiprocessing.

                                                                                  Source https://stackoverflow.com/questions/71500629

                                                                                  QUESTION

                                                                                  C++ what is the best sorting container and approach for large datasets (millions of lines)
                                                                                  Asked 2022-Mar-08 at 11:24

                                                                                  I'm tackling a exercise which is supposed to exactly benchmark the time complexity of such code.

                                                                                  The data I'm handling is made up of pairs of strings like this hbFvMF,PZLmRb, each string is present two times in the dataset, once on position 1 and once on position 2 . so the first string would point to zvEcqe,hbFvMF for example and the list goes on....

                                                                                  example dataset of 50k pairs

                                                                                  I've been able to produce code which doesn't have much problem sorting these datasets up to 50k pairs, where it takes about 4-5 minutes. 10k gets sorted in a matter of seconds.

                                                                                  The problem is that my code is supposed to handle datasets of up to 5 million pairs. So I'm trying to see what more I can do. I will post my two best attempts, initial one with vectors, which I thought I could upgrade by replacing vector with unsorted_map because of the better time complexity when searching, but to my surprise, there was almost no difference between the two containers when I tested it. I'm not sure if my approach to the problem or the containers I'm choosing are causing the steep sorting times...

                                                                                  Attempt with vectors:

                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  
                                                                                  using namespace std;
                                                                                  
                                                                                  
                                                                                  template
                                                                                  void search_bricks_backwards(string resume, vector& vec, vector& vec2) {
                                                                                      int index = 0;
                                                                                      int temp_index = 0;
                                                                                      while (true) {
                                                                                          if (index == vec.size()) {
                                                                                              vec2.insert(vec2.begin(), vec[temp_index].first); 
                                                                                              cout << "end of backward search, exitting..." << endl;
                                                                                              break;
                                                                                  
                                                                                  
                                                                                          }
                                                                                          
                                                                                          if (vec[index].second == resume) {
                                                                                              vec2.insert(vec2.begin(), resume);
                                                                                              
                                                                                  
                                                                                              resume = vec[index].first;
                                                                                              //vec.erase(vec.begin() + index);
                                                                                              temp_index = index;
                                                                                  
                                                                                              index = 0;
                                                                                          }
                                                                                          
                                                                                          index++;
                                                                                      }
                                                                                  
                                                                                  }
                                                                                  
                                                                                  
                                                                                  template
                                                                                  void search_bricks(string start, vector& vec, vector& vec2) {
                                                                                      int index = 0;
                                                                                      int temp_index = 0;
                                                                                      while (true) {
                                                                                          //cout << "iteration " << index << endl;
                                                                                          if (index == vec.size()) {
                                                                                              vec2.push_back(vec[temp_index].second);
                                                                                              
                                                                                              cout << "all forward bricks sorted" << endl;
                                                                                              break;
                                                                                  
                                                                                  
                                                                                          }
                                                                                          if (vec[index].first == start) {
                                                                                              vec2.push_back(vec[index].first);
                                                                                              
                                                                                              
                                                                                              start = vec[index].second;
                                                                                              //vec.erase(vec.begin() + index);
                                                                                              temp_index = index;
                                                                                              index = 0;
                                                                                              
                                                                                          }
                                                                                          
                                                                                          index++;
                                                                                      }
                                                                                  
                                                                                      search_bricks_backwards(vec[0].first, vec, vec2);
                                                                                  
                                                                                  }
                                                                                  
                                                                                  template
                                                                                  void search_bricks_recursion(string start, vector& vec, vector& vec2) {
                                                                                      int index = 0;
                                                                                      for (const auto& pair : vec) {
                                                                                          //cout << "iteration " << index << endl;
                                                                                          if (pair.first == start) {
                                                                                              vec2.push_back(start);
                                                                                              cout << "found " << start << " and " << pair.first << endl;
                                                                                              search_bricks(pair.second, vec, vec2);
                                                                                          }
                                                                                          if (index + 1 == vec.size()) {
                                                                                              search_bricks_backwards(start, vec, vec2);
                                                                                              
                                                                                  
                                                                                          }
                                                                                          index++;
                                                                                      }
                                                                                      
                                                                                  }
                                                                                  
                                                                                  template
                                                                                  void printVectorElements(vector& vec)
                                                                                  {
                                                                                      for (auto i = 0; i < vec.size(); ++i) {
                                                                                          cout << "(" << vec.at(i).first << ","
                                                                                              << vec.at(i).second << ")" << " ";
                                                                                      }
                                                                                      cout << endl;
                                                                                  }
                                                                                  
                                                                                  vector split(string s, string delimiter) {
                                                                                      size_t pos_start = 0, pos_end, delim_len = delimiter.length();
                                                                                      string token;
                                                                                      vector res;
                                                                                  
                                                                                      while ((pos_end = s.find(delimiter, pos_start)) != string::npos) {
                                                                                          token = s.substr(pos_start, pos_end - pos_start);
                                                                                          pos_start = pos_end + delim_len;
                                                                                          res.push_back(token);
                                                                                      }
                                                                                  
                                                                                      res.push_back(s.substr(pos_start));
                                                                                      return res;
                                                                                  }
                                                                                  
                                                                                  unordered_map brick_to_map(string const& s)
                                                                                  {
                                                                                      unordered_map m;
                                                                                  
                                                                                      string key, val;
                                                                                      istringstream iss(s);
                                                                                  
                                                                                      while (getline(getline(iss, key, ',') >> ws, val))
                                                                                          m[key] = val;
                                                                                  
                                                                                      return m;
                                                                                  }
                                                                                  
                                                                                  int main()
                                                                                  {
                                                                                      vector> bricks;
                                                                                      
                                                                                      vector sorted_bricks;
                                                                                  
                                                                                      ifstream inFile;
                                                                                      inFile.open("input-pairs-50K.txt"); //open the input file
                                                                                  
                                                                                      stringstream strStream;
                                                                                      strStream << inFile.rdbuf(); //read the file
                                                                                      string str = strStream.str(); //str holds the content of the file
                                                                                  
                                                                                      //cout << str << endl;
                                                                                      
                                                                                      istringstream iss(str);
                                                                                      
                                                                                      for (string line; getline(iss, line); )
                                                                                      {
                                                                                       
                                                                                          string delimiter = ",";
                                                                                          string s = line;
                                                                                          vector v = split(s, delimiter);
                                                                                          string s1 = v.at(0);
                                                                                          string s2 = v.at(1);
                                                                                          
                                                                                  
                                                                                          bricks.push_back(make_pair(s1, s2));
                                                                                      }
                                                                                  
                                                                                     
                                                                                      search_bricks(bricks[0].second, bricks, sorted_bricks);
                                                                                      
                                                                                      
                                                                                      //display the results
                                                                                      for (auto i = sorted_bricks.begin(); i != sorted_bricks.end(); ++i)
                                                                                          cout << *i << " ";
                                                                                  
                                                                                  
                                                                                      
                                                                                   
                                                                                  }
                                                                                  

                                                                                  Attempt with unsorted map:

                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  #include 
                                                                                  
                                                                                  using namespace std;
                                                                                  
                                                                                  
                                                                                  void search_bricks_backwards(string resume, unordered_map brick_map, vector& vec2) {
                                                                                      
                                                                                      typedef unordered_map::value_type map_value_type;
                                                                                      while (true) {
                                                                                  
                                                                                          unordered_map::const_iterator got = find_if(brick_map.begin(), brick_map.end(), [&resume](const map_value_type& vt)
                                                                                              { return vt.second == resume; }
                                                                                          );
                                                                                          if (got == brick_map.end()) {
                                                                                              vec2.insert(vec2.begin(), resume); 
                                                                                              cout << "end of backward search, exitting..." << endl;
                                                                                              break;
                                                                                  
                                                                                  
                                                                                          }
                                                                                          //cout << "iteration " << index << endl;
                                                                                          else if (got->second == resume) {
                                                                                              vec2.insert(vec2.begin(), resume);
                                                                                  
                                                                                              
                                                                                              resume = got->first;
                                                                                          
                                                                                          }
                                                                                  
                                                                                         
                                                                                      }
                                                                                  
                                                                                  }
                                                                                  
                                                                                  
                                                                                  void search_bricks(string start, unordered_map brick_map, vector& vec2) {
                                                                                      
                                                                                      typedef unordered_map::value_type map_value_type;
                                                                                      while (true) {
                                                                                          
                                                                                  
                                                                                          unordered_map::const_iterator got = find_if(brick_map.begin(), brick_map.end(), [&start](const map_value_type& vt)
                                                                                              { return vt.first == start; }
                                                                                          );
                                                                                          if (got == brick_map.end()) {
                                                                                              vec2.push_back(start);
                                                                                  
                                                                                              cout << "all forward bricks sorted" << endl;
                                                                                              
                                                                                              break;
                                                                                          }
                                                                                          else if (got->first == start) {
                                                                                              vec2.push_back(start);
                                                                                  
                                                                                              //cout << "found " << start << " and " << vec[index].first << endl;
                                                                                              start = got->second;
                                                                                              
                                                                                          }
                                                                                      }
                                                                                      auto it = brick_map.begin();
                                                                                      search_bricks_backwards(it->first, brick_map, vec2);
                                                                                      
                                                                                  
                                                                                  
                                                                                      
                                                                                  
                                                                                      
                                                                                  
                                                                                  }
                                                                                  
                                                                                  
                                                                                  template
                                                                                  void printVectorElements(vector& vec)
                                                                                  {
                                                                                      for (auto i = 0; i < vec.size(); ++i) {
                                                                                          cout << "(" << vec.at(i).first << ","
                                                                                              << vec.at(i).second << ")" << " ";
                                                                                      }
                                                                                      cout << endl;
                                                                                  }
                                                                                  
                                                                                  vector split(string s, string delimiter) {
                                                                                      size_t pos_start = 0, pos_end, delim_len = delimiter.length();
                                                                                      string token;
                                                                                      vector res;
                                                                                  
                                                                                      while ((pos_end = s.find(delimiter, pos_start)) != string::npos) {
                                                                                          token = s.substr(pos_start, pos_end - pos_start);
                                                                                          pos_start = pos_end + delim_len;
                                                                                          res.push_back(token);
                                                                                      }
                                                                                  
                                                                                      res.push_back(s.substr(pos_start));
                                                                                      return res;
                                                                                  }
                                                                                  
                                                                                  
                                                                                  int main()
                                                                                  {
                                                                                      unordered_map bricks;
                                                                                      
                                                                                      vector sorted_bricks;
                                                                                  
                                                                                      ifstream inFile;
                                                                                      inFile.open("input-pairs-50K.txt"); //open the input file
                                                                                  
                                                                                      for (string line; getline(inFile, line); )
                                                                                      {
                                                                                  
                                                                                          string delimiter = ",";
                                                                                          string s = line;
                                                                                          vector v = split(s, delimiter);
                                                                                          string s1 = v.at(0);
                                                                                          string s2 = v.at(1);
                                                                                  
                                                                                  
                                                                                          bricks.insert(make_pair(s1, s2));
                                                                                      }
                                                                                  
                                                                                  
                                                                                      /*for (auto& x : bricks)
                                                                                          std::cout << x.first << "," << x.second << " ";*/
                                                                                  
                                                                                  
                                                                                      auto it = bricks.begin();
                                                                                      search_bricks(it->second, bricks, sorted_bricks);
                                                                                  
                                                                                  
                                                                                      // display results
                                                                                      for (auto i = sorted_bricks.begin(); i != sorted_bricks.end(); ++i)
                                                                                          cout << *i << " ";
                                                                                  
                                                                                  
                                                                                  
                                                                                  
                                                                                  }
                                                                                  

                                                                                  I'm looking to improve the time complexity of my code to be able to process the data more eficiently, if anyone can suggest what to improve in my code or container wise I'd be very thankful.

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-22 at 07:13

                                                                                  You can use a trie data structure, here's a paper that explains an algorithm to do that: https://people.eng.unimelb.edu.au/jzobel/fulltext/acsc03sz.pdf

                                                                                  But you have to implement the trie from scratch because as far as I know there is no default trie implementation in c++.

                                                                                  Source https://stackoverflow.com/questions/71215478

                                                                                  QUESTION

                                                                                  How to create a dataset for tensorflow from a txt file containing paths and labels?
                                                                                  Asked 2022-Feb-09 at 08:09

                                                                                  I'm trying to load the DomainNet dataset into a tensorflow dataset. Each of the domains contain two .txt files for the training and test data respectively, which is structured as follows:

                                                                                  painting/aircraft_carrier/painting_001_000106.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000060.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000130.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000058.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000093.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000107.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000088.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000014.jpg 0
                                                                                  painting/aircraft_carrier/painting_001_000013.jpg 0
                                                                                  ...
                                                                                  

                                                                                  Which is one line per image containing a relative path and a label. My question is, if there is already some built-in way in tensorflow/keras to load this kind of structure, or if I have to parse and load the data manually? So far my google-fu let me down...

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-09 at 08:09

                                                                                  You can use tf.data.TextLineDataset to load and process multiple txt files at a time:

                                                                                  import tensorflow as tf
                                                                                  import matplotlib.pyplot as plt
                                                                                  
                                                                                  with open('data.txt', 'w') as f:
                                                                                    f.write('/content/result_image1.png 0\n')
                                                                                    f.write('/content/result_image2.png 1\n')
                                                                                  
                                                                                  with open('more_data.txt', 'w') as f:
                                                                                    f.write('/content/result_image1.png 1\n')
                                                                                    f.write('/content/result_image2.png 0\n')
                                                                                  
                                                                                  dataset = tf.data.TextLineDataset(['/content/data.txt', '/content/more_data.txt'])
                                                                                  for element in dataset.as_numpy_iterator():
                                                                                    print(element)
                                                                                  
                                                                                  b'/content/result_image1.png 0'
                                                                                  b'/content/result_image2.png 1'
                                                                                  b'/content/result_image1.png 1'
                                                                                  b'/content/result_image2.png 0'
                                                                                  

                                                                                  Process data:

                                                                                  def process(x):
                                                                                    splits = tf.strings.split(x, sep=' ')
                                                                                    image_path, label = splits[0], splits[1]
                                                                                    img = tf.io.read_file(image_path)
                                                                                    img = tf.io.decode_png(img, channels=3)
                                                                                    return  img, tf.strings.to_number(label, out_type=tf.int32)
                                                                                  
                                                                                  dataset = dataset.map(process)
                                                                                  for x, y in dataset.take(1):
                                                                                    print('Label -->', y)
                                                                                    plt.imshow(x.numpy())
                                                                                  
                                                                                  Label --> tf.Tensor(0, shape=(), dtype=int32)
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/71045309

                                                                                  QUESTION

                                                                                  Converting 0-1 values in dataset with the name of the column if the value of the cell is 1
                                                                                  Asked 2022-Feb-02 at 07:02

                                                                                  I have a csv dataset with the values 0-1 for the features of the elements. I want to iterate each cell and replace the values 1 with the name of its column. There are more than 500 thousand rows and 200 columns and, because the table is exported from another annotation tool which I update often, I want to find a way in Python to do it automatically. This is not the table, but a sample test which I was using while trying to write a code I tried some, but without success. I would really appreciate it if you can share your knowledge with me. It will be a huge help. The final result I want to have is of the type: (abonojnë, token_pos_verb). If you know any method that I can do this in Excel without the help of Python, it would be even better. Thank you, Brikena

                                                                                  Text,Comment,Role,ParentID,doc_completeness,lemma,MultiWord_Expr,token,pos,punctuation,verb,noun,adjective
                                                                                  abonojnë,,,,,,,1,1,0,1,0,0
                                                                                  çokasin,,,,,,,1,1,0,1,0,1
                                                                                  gërgasin,,,,,,,1,1,0,1,0,0
                                                                                  godasin,,,,,,,1,1,0,1,0,0
                                                                                  përkasin,,,,,,,1,1,1,1,0,0
                                                                                  përdjegin,,,,,,,1,1,0,1,0,0
                                                                                  lakadredhin,,,,,,,1,1,0,1,1,0
                                                                                  përdredhin,,,,,,,1,1,0,1,0,0
                                                                                  spërdredhin,,,,,,,1,1,0,1,0,0
                                                                                  përmbledhin,,,,,,,1,1,0,1,0,0
                                                                                  shpërdredhin,,,,,,,1,1,0,1,0,0
                                                                                  arsejnë,,,,,,,1,1,0,1,1,0
                                                                                  çapëlejnë,,,,,,,1,1,0,1,0,0
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-31 at 10:08

                                                                                  Using pandas, this is quite easy:

                                                                                  # pip install pandas
                                                                                  import pandas as pd
                                                                                  
                                                                                  # read data (here example with csv, but use "read_excel" for excel)
                                                                                  df = pd.read_csv('input.csv').set_index('Text')
                                                                                  
                                                                                  # reshape and export
                                                                                  (df.mul(df.columns).where(df.eq(1))
                                                                                     .stack().rename('xxx')
                                                                                     .groupby(level=0).apply('_'.join)
                                                                                  ).to_csv('output.csv') # here use "to_excel" for excel format
                                                                                  

                                                                                  output file:

                                                                                  Text,xxx
                                                                                  abonojnë,token_pos_verb
                                                                                  arsejnë,token_pos_verb_noun
                                                                                  godasin,token_pos_verb
                                                                                  gërgasin,token_pos_verb
                                                                                  lakadredhin,token_pos_verb_noun
                                                                                  përdjegin,token_pos_verb
                                                                                  përdredhin,token_pos_verb
                                                                                  përkasin,token_pos_punctuation_verb
                                                                                  përmbledhin,token_pos_verb
                                                                                  shpërdredhin,token_pos_verb
                                                                                  spërdredhin,token_pos_verb
                                                                                  çapëlejnë,token_pos_verb
                                                                                  çokasin,token_pos_verb_adjective
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/70923533

                                                                                  QUESTION

                                                                                  How can i get person class and segmentation from MSCOCO dataset?
                                                                                  Asked 2022-Jan-06 at 05:04

                                                                                  I want to download only person class and binary segmentation from COCO dataset. How can I do it?

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-06 at 05:04

                                                                                  use pycocotools .

                                                                                  • import library
                                                                                  from pycocotools.coco import COCO
                                                                                  
                                                                                • load json file of coco annotation
                                                                                • coco = COCO('/home/office/cocoDataset/annotations/instances_train2017.json')
                                                                                  
                                                                                • get category IDs of coco dataset
                                                                                • category_ids = coco.getCatIds(catNms=['person'])
                                                                                  
                                                                                • get annotations of a single image
                                                                                • annotations = coco.getAnnIds(imgIds=image_id, catIds=category_ids, iscrowd=False)
                                                                                  
                                                                                • here each person has different annotation, and i'th person's annotation is annotation[i] hence merge all the annotations and save it
                                                                                • if annotations:
                                                                                    mask = coco.annToMask(annotations[0])
                                                                                    for i in range(len(annotations)):
                                                                                      mask |= coco.annToMask(annotations[i])
                                                                                    mask = mask * 255
                                                                                    im = Image.fromarray(mask)
                                                                                    im.save('~/mask_name.png')
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/70531408

                                                                                  QUESTION

                                                                                  R - If column contains a string from vector, append flag into another column
                                                                                  Asked 2021-Dec-16 at 23:33
                                                                                  My Data

                                                                                  I have a vector of words, like the below. This is an oversimplification, my real vector is over 600 words:

                                                                                  myvec <- c("cat", "dog, "bird")
                                                                                  

                                                                                  I have a dataframe with the below structure:

                                                                                  structure(list(id = c(1, 2, 3), onetext= c("cat furry pink british", 
                                                                                  "dog cat fight", "bird cat issues"), cop= c("Little Grey Cat is the nickname given to a kitten of the British Shorthair breed that rose to viral fame on Tumblr through a variety of musical tributes and photoshopped parodies in late September 2014", 
                                                                                  "Dogs have soft fur and tails so do cats Do cats like to chase their tails", 
                                                                                  "A cat and bird can coexist in a home but you will have to take certain measures to ensure that a cat cannot physically get to the bird at any point"
                                                                                  ), text3 = c("On October 4th the first single topic blog devoted to the little grey cat was launched On October 20th Tumblr blogger Torridgristle shared a cutout exploitable image of the cat, which accumulated over 21000 notes in just over three months.", 
                                                                                  "there are many fights going on and this is just an example text", 
                                                                                  "Some cats will not care about a pet bird at all while others will make it its life mission to get at a bird You will need to assess the personalities of your pets and always remain on guard if you allow your bird and cat to interact"
                                                                                  )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                  -3L))
                                                                                  

                                                                                  It looks like the below picture

                                                                                  My issue

                                                                                  For each keyword on my vector myvec, I need to go around the dataset and check the columns onetext, cop, text3, and if I find the keyword on either of those 3 columns, then I need to append the keyword into a new column. The result would be as the image as follows:

                                                                                  My original dataset is quite large (the last column is the longest), so doing multiple nested loops (which is what I tried) is not ideal.

                                                                                  EDIT: Note that as long as the word appears once in that row, that's enough and should be listed. All keywords should be listed.

                                                                                  How could I do this? I'm using tidyverse, so my dataset is actually a tibble.

                                                                                  Similar Posts (but not quite)

                                                                                  The following posts are somewhat similar, but not quite:

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-16 at 23:33

                                                                                  Update: If a list is preferred: Using str_extract_all:

                                                                                  df %>%  
                                                                                    transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) 
                                                                                  

                                                                                  gives:

                                                                                    new_colonetext new_colcop new_coltext3
                                                                                                        
                                                                                  1               
                                                                                  2               
                                                                                  3           
                                                                                  

                                                                                  Here is how you could achieve the result:

                                                                                  1. create a pattern of the vector
                                                                                  2. use mutate across to check the needed columns
                                                                                  3. if the desired string is detected then extract to a new column !
                                                                                  myvec <- c("cat", "dog", "bird")
                                                                                  
                                                                                  pattern <- paste(myvec, collapse="|")
                                                                                  
                                                                                  library(dplyr)
                                                                                  library(tidyr)
                                                                                  df %>% 
                                                                                    mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>% 
                                                                                    unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
                                                                                  
                                                                                      id onetext                cop                                                                        text3                                                                              topic                                     
                                                                                                                                                                                                                                                                                                
                                                                                  1     1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"            
                                                                                  2     2 dog cat fight          Dogs have soft fur and tails so do cats Do cats like to chase their tails  there are many fights going on and this is just an example text                    "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
                                                                                  3     3 bird cat issues        A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~                                                                                    
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/70386370

                                                                                  QUESTION

                                                                                  How to divide a large image dataset into groups of pictures and save them inside subfolders using python?
                                                                                  Asked 2021-Dec-08 at 15:13

                                                                                  I have an image dataset that looks like this: Dataset

                                                                                  The timestep of each image is 15 minutes (as you can see, the timestamp is in the filename).

                                                                                  Now I would like to group those images in 3hrs long sequences and save those sequences inside subfolders that would contain respectively 12 images(=3hrs). The result would ideally look like this: Sequences

                                                                                  I have tried using os.walk and loop inside the folder where the image dataset is saved, then I created a dataframe using pandas because I thought I could handle the files more easily but I think I am totally off target here.

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-08 at 15:10

                                                                                  The timestep of each image is 15 minutes (as you can see, the timestamp is in the filename).

                                                                                  Now I would like to group those images in 3hrs long sequences and save those sequences inside subfolders that would contain respectively 12 images(=3hrs)

                                                                                  I suggest exploiting datetime built-in libary to get desired result, for each file you have

                                                                                  1. get substring which is holding timestamp
                                                                                  2. parse it into datetime.datetime instance using datetime.datetime.strptime
                                                                                  3. convert said instance into seconds since epoch using .timestamp method
                                                                                  4. compute number of seconds integer division (//) 10800 (number of seconds inside 3hr)
                                                                                  5. convert value you got into str and use it as target subfolder name

                                                                                  Source https://stackoverflow.com/questions/70276989

                                                                                  QUESTION

                                                                                  Proper way of cleaning csv file
                                                                                  Asked 2021-Nov-15 at 22:58

                                                                                  I've got a huge CSV file, which looks like this:

                                                                                  1. 02.01.18;"""2,871""";"""2,915""";"""2,871""";"""2,878""";"""+1,66 %""";"""57.554""";"""166.075 EUR""";"""0,044"""
                                                                                  2. 03.01.18;"""2,875""";"""2,965""";"""2,875""";"""2,925""";"""+1,63 %""";"""39.116""";"""114.441 EUR""";"""0,090"""
                                                                                  3. 04.01.18;"""2,915""";"""3,005""";"""2,915""";"""2,988""";"""+2,15 %""";"""58.570""";"""174.168 EUR""";"""0,090"""
                                                                                  

                                                                                  In the end I only want to extract the date and ratio. The dataset should look like this:

                                                                                  1.02.01.18, +1,66 %
                                                                                  2.03.01.18, +1,63 %
                                                                                  3.04.01.18, +2,15 %
                                                                                  

                                                                                  I tried this and until now I'm just getting more trouble:

                                                                                  import pandas as pd
                                                                                  df = pd.read_csv("Dataset.csv", nrows=0)
                                                                                  print(df)
                                                                                  data = []
                                                                                  for response in df:
                                                                                      data.append(
                                                                                         response.split(';')
                                                                                      )
                                                                                  print(data[0])
                                                                                  

                                                                                  Do you know some better way to clean up this dataset?

                                                                                  ANSWER

                                                                                  Answered 2021-Nov-15 at 21:33

                                                                                  You can use a regular expression for this:

                                                                                  regex = re.compile(r'([\d\. ]+).*([+-][\d, %]+)')
                                                                                  date, ratio = regex.match(s).groups()
                                                                                  date = date.replace(' ', '')
                                                                                  

                                                                                  Test:

                                                                                  >>> date
                                                                                  '2.03.01.18'
                                                                                  
                                                                                  >>> ratio
                                                                                  '+1,63 %'
                                                                                  

                                                                                  Source https://stackoverflow.com/questions/69981109

                                                                                  Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                                                                                  Vulnerabilities

                                                                                  No vulnerabilities reported

                                                                                  Install renderdoc_for_game_data

                                                                                  You can download it from GitHub.

                                                                                  Support

                                                                                  For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
                                                                                  Find more information at:
                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit
                                                                                  CLONE
                                                                                • HTTPS

                                                                                  https://github.com/xiaofeng94/renderdoc_for_game_data.git

                                                                                • CLI

                                                                                  gh repo clone xiaofeng94/renderdoc_for_game_data

                                                                                • sshUrl

                                                                                  git@github.com:xiaofeng94/renderdoc_for_game_data.git

                                                                                • Share this Page

                                                                                  share link

                                                                                  Explore Related Topics

                                                                                  Reuse Pre-built Kits with renderdoc_for_game_data

                                                                                  Consider Popular Dataset Libraries

                                                                                  datasets

                                                                                  by huggingface

                                                                                  gods

                                                                                  by emirpasic

                                                                                  covid19india-react

                                                                                  by covid19india

                                                                                  doccano

                                                                                  by doccano

                                                                                  Try Top Libraries by xiaofeng94

                                                                                  VL-PLM

                                                                                  by xiaofeng94Python

                                                                                  RefineDNet-for-dehazing

                                                                                  by xiaofeng94Python

                                                                                  pytorch-img2img-pix2pix

                                                                                  by xiaofeng94Python

                                                                                  handwriting_transfer

                                                                                  by xiaofeng94Python

                                                                                  Compare Dataset Libraries with Highest Support

                                                                                  xarray

                                                                                  by pydata

                                                                                  text

                                                                                  by pytorch

                                                                                  mne-python

                                                                                  by mne-tools

                                                                                  pymatgen

                                                                                  by materialsproject

                                                                                  datasets

                                                                                  by huggingface

                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit