bulk | Launching collective tasks in bulk | Architecture library
kandi X-RAY | bulk Summary
kandi X-RAY | bulk Summary
Launching collective tasks in bulk
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bulk
bulk Key Features
bulk Examples and Code Snippets
Community Discussions
Trending Discussions on bulk
QUESTION
Yet another question about the style and the good practices. The code, that I will show, works and do the functionality. But I'd like to know is it ok as solution or may be it's just too ugly?
As the question is a little bit obscure, I will give some points at the end.
So, the use case.
I have a site with the items. There is a functionality to add the item by user. Now I'd like a functionality to add several items via a csv-file.
How should it works?
- User go to special upload page.
- User choose a csv-file, click upload.
- Then he is redirected to the page that show the content of csv-file (as a table).
- If it's ok for user, he clicks "yes" (button with "confirm_items_upload" value) and the items from file are added to database (if they are ok).
I saw already examples for bulk upload for django, and they seem pretty clear. But I don't find an example with an intermediary "verify-confirm" page. So how I did it :
- in views.py : view for upload csv-file page
ANSWER
Answered 2021-May-28 at 09:27a) Even if obviously it could be better, is this solution is acceptable or not at all ?
I think it has some problems you want to address, but the general idea of using the filesystem and storing just filenames can be acceptable, depending on how many users you need to serve and what guarantees regarding data consistency and concurrent accesses you want to make.
I would consider the uploaded file temporary data that may be lost on system failure. If you want to provide any guarantees of not losing the data, you want to store it in a database instead of on the filesystem.
b) I pass 'uploaded_file' from one view to another using "request.session" is it a good practice? Is there another way to do it without using GET variables?
There are up- and downsides to using request.session.
- attackers can not change the filename and thus retrieve data of other users. This is also the reason why you should not use a GET parameter here: If you used one, attackers could simpy change that parameter and get access to files of other users.
- users can upload a file, go and do other stuff, and later come back to actually import the file, however:
- if users end their session, you lose the filename. Also, users can not upload the file on one device, change to another device, and then go on with the import, since the other device will have a different session.
The last point correlates with the leftover files problem: If you lose your information about which files are still needed, it makes cleaning up harder (although, in theory, you can retrieve which files are still needed from the session store).
If it is a problem that sessions might end or change because users clear their cookies or change devices, you could consider adding the filename to the UserProfile
in the database. This way, it is not bound to sessions.
c) At first my wish was to avoid to save the csv-file. But I could not figure out how to do it? Reading all the file to request.session seems not a good idea for me. Is there some possibility to upload the file into memory in Django?
You want to store state. The go-to ways of storing state are the database or a session store. You could load the whole CSVFile and put it into the database as text. Whether this is acceptable depends on your databases ability to handle large, unstructured data. Traditional databases were not originally built for that, however, most of them can handle small binary files pretty well nowadays. A database could give you advantages like ACID guarantees where concurrent writes to the same file on the file system will likely break the file. See this discussion on the dba stackexchange
Your database likely has documentation on the topic, e.g. there is this page about binary data in postgres.
d) If I have to use the tmp-file. How should I handle the situation if user abandon upload at the middle (for example, he sees the confirmation page, but does not click "yes" and decide to re-write his file). How to remove the tmp-file?
Some ideas:
- Limit the count of uploaded files per user to one by design. Currently, your filename is based on a timestamp. This breaks if two users simultaneously decide to upload a file: They will both get the same timestamp, and the file on disk may be corrupted. If you instead use the user's primary key, this guarantees that you have at most one file per user. If they later upload another file, their old file will be overwritten. If your user count is small enough that you can store one leftover file per user, you don't need additional cleaning. However, if the same user simultaneusly uploads two files, this still breaks.
- Use a unique identifier, like a UUID, and delete the old stored file whenever the user uploads a new file. This requires you to still have the old filename, so session storage can not be used with this. You will still always have the last file of the user in the filesystem.
- Use a unique identifier for the filename and set some arbitrary maximum storage duration. Set up a cronjob or similar that regularly goes through the files and deletes all files that have been stored longer than your specified maximum duration. If a user uploads a file, but does not do the actual import soon enough, their data is deleted, and they would have to do the upload again. Here, your code has to handle the case that the file with the stored filename does not exist anymore (and may even be deleted while you are reading the file).
You probably want to limit your server to one file stored per user so that attackers can not fill your filesystem.
e) Small additional question : what kind of checks there are in Django about uploaded file? For example, how could I check that the file is at least a text-file? Should I do it?
You definitely want to set up some maximum file size for the file, as described e.g. here. You could limit the allowed file extensions, but that would only be a usability thing. Attackers could also give you garbage data with any accepted extension.
Keep in mind: If you only store the csv as text data that you load and parse everytime a certain view is accessed, this can be an easy way for attackers to exhaust your servers, giving them an easy DoS attack.
Overall, it depends on what guarantees you want to make, how many users you have and how trustworthy they are. If users might be malicious, you want to keep all possible kinds of data extraction and resource exhaustion attacks in mind. The filesystem will not scale out (at least not as easily as a database).
I know of a similar setup in a project where only a handful of priviliged users are allowed to upload stuff, and we can tolerate deletion of all temporary files on failure. Users will simply have to reupload their files. This works fine.
QUESTION
i wrote the node js get api with restaurant details. result is fine. how to get the map with required fields? bulk of fields are there but i required 2 fields only restaurant name and address. i am learning node js.
...ANSWER
Answered 2021-Jun-14 at 07:39You can use Array map function to iterate over the array. This map function will return new Array.
QUESTION
Assume we have a redis set with hundreds thousands elements in it. As smember
command does eager-loading, it fetches all of the elements just by this one command and consequently it consumes too much time. I want to know is there a way to read redis data as bulks or maybe as a stream?
ANSWER
Answered 2021-Jun-14 at 12:35Data from Redis Set
data structure can be read in bulks using SSCAN command.
QUESTION
I am receiving the error "The query references an object that is not supported in distributed processing mode" when using the HASHBYTES() function to hash rows in Synapse Serverless SQL Pool.
The end goal is to parse the json and store it as parquet along with a hash of the json document. The hash will be used in future imports of new snapshots to identify differentials.
Here is a sample query that produces the error:
...ANSWER
Answered 2021-Jan-06 at 11:19Jason, I'm sorry, hashbytes() is not supported against external tables.
QUESTION
I'm using a BULK INSERT to load delimited .txt files into a staging table with 5 columns. The .txt files can sometimes contain errors and have more/less than 5 fields per line. If this happens, is it possible to detect it and cancel the entire BULK INSERT?
Each table column is of type VARCHAR. This was done because header (H01) and line (L0101, L0102, etc...) rows contain fields with different types. Because of this, setting MAXERRORS = 0 doesn't seem to be working as there are technically no syntax errors. As a result the transaction is committed, the catch block never activates and the rollback doesn't occur. Lines still get inserted into the table incorrectly shifted or bunched.
...ANSWER
Answered 2021-Jun-09 at 16:17As many before have noted: BULK INSERT is fast, but not very flexible, especially to column inconsistencies.
When your input might have bad data (and technically, from a SQL standpoint that is what you are describing), you have to employ one or more of some different approaches:
- Pre-process and "clean" the data with an external program first, or
- BULK INSERT to a staging table with one big VARCHAR(MAX) column, and then parse and clean the data yourself with SQL before moving it into tables with your real columns, or
- Use CLR code/tricks to effectively to (1) and/or (2) above, or
- Write an external program to simultaneously clean/pre-process and SqlBulkCopy the data into your SQL Server (replacing BULK INSERT), or
- Use SSMS instead (still pretty hard to deal with bad/variable columns though)
I have done all of these at one time or another during my career, and they are all somewhat difficult and time-consuming (the work was time-consuming, their run-times were pretty good).
QUESTION
I'm planning to display an listview of CarServiceEntries. The CarServiceEntry class contains basic data of a service:
...ANSWER
Answered 2021-Jun-11 at 03:36SQLite doesn't support inheritance and I believe that it will be as simple, if not simpler, to utilise relationships which SQLite and Room support.
Creating multiple tables via room is pretty easy as is creating and handling relationships. So I would suggest taking the typical approach.
Here's an example based upon what I think that yo are trying to accomplish.
First the CarServiceEntry table (which will later have Expenses and Incomes related to it) :-
QUESTION
My program grabs ~70 pages of 1000 items from an API and bulk-inserts it into a SQLite database using Sequelize. After looping through a few times, the memory usage of node goes up to around 1.2GB and and then eventually crashes the program with this error: FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory
. I've tried using delete
for all of the big variables that I use for the response of the API call and stuff with variable = undefined
and then global.gc()
, however I still get huge amounts of memory usage and eventually it crashes. Would increasing the memory cap of Node.js help? Or would the memory usage of it just keep increasing until it hits the next cap?
Here's the full output of the error:
...ANSWER
Answered 2021-Jun-10 at 10:01From the data you've provided, it's impossible to tell why you're running out of memory.
Maybe the working set (i.e. the amount of stuff that you need to keep around at the same time) just happens to be larger than your current heap limit; in that case increasing the limit would help. It's easy to find out by trying it, e.g. with --max-old-space-size=8000
(megabytes).
Maybe there's a memory leak somewhere, either in your own code, or in one of your third-party modules. In other words, maybe you're accidentally keeping objects reachable that you don't really need any more.
If you provide a repro case, then people can investigate and tell you more.
Side notes:
- according to your output, heap memory consumption is growing to ~4 GB; not sure why you think it tops out at 1.2 GB.
- it is never necessary to invoke
global.gc()
manually; the garbage collector will kick in automatically when memory pressure is high. That said, if something is keeping old objects reachable, then the garbage collector can't do anything.
QUESTION
I usually hear the term vectorized functions in one of two ways:
- In a very high-level language when the data is passed all-at-once (or at least, in bulk chunks) to a lower-level library that does the calculations in faster way. An example of this would be python's use of
numpy
for array/LA-related stuff. - At the lowest level, when using a specific machine instruction or procedure that makes heavy use of them (such as YMM, ZMM, XMM register instructions).
However, it seems like the term is passed around quite generally, and I wanted to know if there's a third (or even more) ways in which it's used. And this would just be, for example, passing multiple values to a function rather than one (usually done via an array) for example:
...ANSWER
Answered 2021-Jun-10 at 20:43Vectorized code, in the context you seem to be referring to, normally means "an implementation that happens to make use of Single Instruction Multiple Data (SIMD) hardware instructions".
This can sometimes mean that someone manually wrote a version of a function that is equivalent to the canonical one, but happens to make use of SIMD. More often than not, it's something that the compiler does under the hood as part of its optimization passes.
In a very high-level language when the data is passed all-at-once (or at least, in bulk chunks) to a lower-level library that does the calculations in faster way. An example of this would be python's use of numpy for array/LA-related stuff.
That's simply not correct. The process of handing off a big chunk of data to some block of code that goes through it quickly is not vectorization in of itself.
You could say "Now that my code uses numpy, it's vectorized" and be sort of correct, but only transitively. A better way to put it would be "Now that my code uses numpy, it runs a lot faster because numpy is vectorized under the hood.". Importantly though, not all fast libraries to which big chunks of data are passed at once are vectorized.
...Code examples...
Since there is no SIMD instruction in sight in either example, then neither are vectorized yet. It might be true that the second version is more likely to lead to a vectorized program. If that's the case, then we'd say that the program is more vectorizable than the first. However, the program is not vectorized until the compiler makes it so.
QUESTION
I am trying to process bulk RNA-seq data using salmon through snakemake in the conda/mamba environment.
I am receiving the following error when running snakemake:
...ANSWER
Answered 2021-Jun-10 at 20:38I think the Snakefile is ok, SRR3350597_GSM2112330_RA_hip_3_Homo_sapiens_RNA-Seq_1.fastq.gz
is simply missing. See the ls
output of yours, that file is not in it.
QUESTION
I would like to do a bulk insert using a loop that will cycle through several hundred files. But I can't seem to use a variable as the FROM path. Can I use FROM @PATH3 or is there another way to BULK INSERT many text files? Thank you
...ANSWER
Answered 2021-Jun-10 at 14:06Try with dynamic sql, something like this:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install bulk
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page