Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Popular New Releases in Data Mining
bulk-downloader-for-reddit
Bulk Downloader for Reddit 2.2
pipeline
Pipeline v1.6
striplog
v0.9.2
arxiv-miner
Bug Fixes
Snippext_public
Snippext for Rotom
Popular Libraries in Data Mining
by snap-stanford c++
1835 NOASSERTION
Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
by aliparlakci python
986 GPL-3.0
Downloads and archives content from reddit
by asaini python
653 MIT
Python Implementation of Apriori Algorithm for finding Frequent sets and Association Rules
by bonzanini python
487
Companion code for the book "Mastering Social Media Mining with Python"
by bartdag python
432 NOASSERTION
A few data mining algorithms in pure python
by chuanconggao python
248 MIT
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
by 9b python
231 MIT
Interface to manage and centralize Google Alert information
by mayconbordin java
181 Apache-2.0
A collection of algorithms for mining data streams
by ShawnyXiao jupyter notebook
176 MIT
2017-CCF-BDCI-让AI当法官(初赛):7th/415 (Top 1.68%)
Trending New libraries in Data Mining
by valayDave python
84 MIT
arxiv_miner is a toolkit for mining research papers on CS ArXiv.
by rit-git python
42 BSD-3-Clause
Snippext: Semi-supervised Opinion Mining with Augmented Data
by Alic3C python
40
A collection of write-ups and solutions for Cyber FastTrack Spring 2021.
by juliasilge css
38 CC-BY-4.0
Learn about text mining 📄 with tidy data principles
by sTechLab jupyter notebook
38
A multi-modal Twitter dataset with 7.6M tweets and 25.6M retweets related to voter fraud claims.
by ghidraninja python
27 MIT
The (Python-based) mining software required for the Game Boy mining project.
by chonyy python
22 MIT
🔨 Python implementation of Apriori algorithm, new and simple!
by MohamedHmini python
22
implementing an end-to-end tweets ETL/Analysis pipeline.
by sparks-baird python
19 MIT
A materials discovery algorithm geared towards exploring high-performance candidates in new chemical spaces.
Top Authors in Data Mining
1
3 Libraries
10
2
3 Libraries
11
3
3 Libraries
300
4
3 Libraries
60
5
2 Libraries
61
6
2 Libraries
42
7
2 Libraries
59
8
2 Libraries
23
9
2 Libraries
147
10
2 Libraries
49
1
3 Libraries
10
2
3 Libraries
11
3
3 Libraries
300
4
3 Libraries
60
5
2 Libraries
61
6
2 Libraries
42
7
2 Libraries
59
8
2 Libraries
23
9
2 Libraries
147
10
2 Libraries
49
Trending Kits in Data Mining
No Trending Kits are available at this moment for Data Mining
Trending Discussions on Data Mining
Unable to install ray[tune] tune-sklearn
Get total no of classes of each subject within a semester using pandas
How to create a frequency table of each subject from a given timetable using pandas?
Regenerate SSAS multidimentional partitions files from the database
Counting repeated pairs in a list
Python KeyError: 0 when i use if elif
React JS floated tag around a component, is position absolute a prudent idea?
Gensim doc2vec's d2v.wv.most_similar() gives not relevant words with high similarity scores
Creating a CSV file from Python Script
Which is the best Data Mining model to extrapolate known values to missing values in a table? (General question)
QUESTION
Unable to install ray[tune] tune-sklearn
Asked 2022-Mar-14 at 20:10I'm trying to install ray[tune] tune-sklearn on my machine but keeps failing. I'm using a MacBook Pro 2019 with Big Sur Version 11.6 and Python 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin. All other packages I've tried to installed fine either using conda install or pip install except for this one. I'm struggling to find an answer online myself. I was on Python 3.8 but I removed this and installed 3.9 as I thought this was the problem. Apologies in advance, I'm new to data mining and still don't know a great deal yet.
I tried
1conda install -c conda-forge -y ray-tune tune-sklearn
2
But got back this:
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26
I also tried
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27
But got back
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28
Any help would be greatly appreciated, thank you.
Update:
I also tried
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28pip install 'ray[tune]'
29
And got back
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28pip install 'ray[tune]'
29ERROR: Could not find a version that satisfies the requirement ray[tune] (from versions: none)
30ERROR: No matching distribution found for ray[tune]
31
ANSWER
Answered 2022-Mar-14 at 20:10ray[tune]
is a library within the Ray distributed compute project that supports scalable hyperparameter tuning -- not a stand-alone Python package. You should be able to install ray
with the proper dependencies using:
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28pip install 'ray[tune]'
29ERROR: Could not find a version that satisfies the requirement ray[tune] (from versions: none)
30ERROR: No matching distribution found for ray[tune]
31pip install "ray[tune]"
32
After Ray has been installed, you can reference it within your Python project using either:
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28pip install 'ray[tune]'
29ERROR: Could not find a version that satisfies the requirement ray[tune] (from versions: none)
30ERROR: No matching distribution found for ray[tune]
31pip install "ray[tune]"
32import ray[tune]
33
or
1conda install -c conda-forge -y ray-tune tune-sklearn
2Collecting package metadata (current_repodata.json): done
3Solving environment: failed with initial frozen solve. Retrying with flexible solve.
4Collecting package metadata (repodata.json): done
5Solving environment: failed with initial frozen solve. Retrying with flexible solve.
6
7PackagesNotFoundError: The following packages are not available from current channels:
8
9 - ray-tune
10
11Current channels:
12
13 - https://conda.anaconda.org/conda-forge/osx-64
14 - https://conda.anaconda.org/conda-forge/noarch
15 - https://repo.anaconda.com/pkgs/main/osx-64
16 - https://repo.anaconda.com/pkgs/main/noarch
17 - https://repo.anaconda.com/pkgs/r/osx-64
18 - https://repo.anaconda.com/pkgs/r/noarch
19
20To search for alternate channels that may provide the conda package you're
21looking for, navigate to
22
23 https://anaconda.org
24
25and use the search bar at the top of the page.
26pip install ray[tune] tune-sklearn
27zsh: no matches found: ray[tune]
28pip install 'ray[tune]'
29ERROR: Could not find a version that satisfies the requirement ray[tune] (from versions: none)
30ERROR: No matching distribution found for ray[tune]
31pip install "ray[tune]"
32import ray[tune]
33from ray import tune
34
QUESTION
Get total no of classes of each subject within a semester using pandas
Asked 2022-Mar-06 at 08:58Time table, columns=hour, rows=weekday, data=subject
[weekday x hour]
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8
Frequency table rows=weekday, columns=subject, data = subject frequency in the corresponding weekday
[weekday x subject]
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8Data Data Mining Data Science Embedded Systems Industrial Psychology Project
9Name
10Friday 1 1 1 1 3
11Monday 1 1 1 1 3
12Thursday 2 0 1 1 3
13Tuesday 0 1 1 1 4
14Wednesday 0 1 0 0 6
15
Code
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8Data Data Mining Data Science Embedded Systems Industrial Psychology Project
9Name
10Friday 1 1 1 1 3
11Monday 1 1 1 1 3
12Thursday 2 0 1 1 3
13Tuesday 0 1 1 1 4
14Wednesday 0 1 0 0 6
15self.start = datetime(2022, 1, 1)
16self.end = datetime(2022, 3, 31)
17
18self.file = 'timetable.csv'
19self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
20self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
21print(self.subject_frequency.to_string())
22self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
23
24self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
25 .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
26print(self.p.to_string())
27
Required Table
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8Data Data Mining Data Science Embedded Systems Industrial Psychology Project
9Name
10Friday 1 1 1 1 3
11Monday 1 1 1 1 3
12Thursday 2 0 1 1 3
13Tuesday 0 1 1 1 4
14Wednesday 0 1 0 0 6
15self.start = datetime(2022, 1, 1)
16self.end = datetime(2022, 3, 31)
17
18self.file = 'timetable.csv'
19self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
20self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
21print(self.subject_frequency.to_string())
22self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
23
24self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
25 .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
26print(self.p.to_string())
27 classes ...
28Data Mining 32
29Data Science 32
30Embedded Systems 32
31Industrial Psychology 32
32Project 146
33
Will be adding more columns later, like current attendance percentage, percentage drop for each class missed, percent losses for taking leaves on Monday, Tuesday, ... etc so as to subtract them from attendance percentage.
The end goal is to analyse which day is safe to take a leave, and to monitor my percentage. If my direction could be better, please advise me.
ANSWER
Answered 2022-Mar-06 at 07:111 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8Data Data Mining Data Science Embedded Systems Industrial Psychology Project
9Name
10Friday 1 1 1 1 3
11Monday 1 1 1 1 3
12Thursday 2 0 1 1 3
13Tuesday 0 1 1 1 4
14Wednesday 0 1 0 0 6
15self.start = datetime(2022, 1, 1)
16self.end = datetime(2022, 3, 31)
17
18self.file = 'timetable.csv'
19self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
20self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
21print(self.subject_frequency.to_string())
22self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
23
24self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
25 .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
26print(self.p.to_string())
27 classes ...
28Data Mining 32
29Data Science 32
30Embedded Systems 32
31Industrial Psychology 32
32Project 146
33select_rows = [date.strftime("%A") for date in pd.bdate_range(self.start, self.end)]
34r = self.p.loc[select_rows, :]
35print(r.to_string())
36print(r.sum())
37
Please feel free to add a simpler code, design advice is also appreciated!
QUESTION
How to create a frequency table of each subject from a given timetable using pandas?
Asked 2022-Mar-05 at 16:06This is a time table, columns=hour, rows=weekday, data=subject [weekday x hour]
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8
How do you generate a pandas.Dataframe
where, rows=weekday, columns=subject, data = subject frequency in the corresponding weekday?
Required table: [weekday x subject]
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8 Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
9Name
10Monday 1 1 1 1 3
11Tuesday ...
12Wednesday
13Thursday
14Friday
15
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8 Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
9Name
10Monday 1 1 1 1 3
11Tuesday ...
12Wednesday
13Thursday
14Friday
15 self.file = 'timetable.csv'
16 self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
17 print(self.sdf.to_string())
18 self.subject_frequency = self.sdf.apply(pd.value_counts)
19 print(self.subject_frequency.to_string())
20 self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
21
ANSWER
Answered 2022-Mar-05 at 16:06Use melt
to flatten your dataframe then pivot_table
to reshape your dataframe:
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8 Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
9Name
10Monday 1 1 1 1 3
11Tuesday ...
12Wednesday
13Thursday
14Friday
15 self.file = 'timetable.csv'
16 self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
17 print(self.sdf.to_string())
18 self.subject_frequency = self.sdf.apply(pd.value_counts)
19 print(self.subject_frequency.to_string())
20 self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
21out = (
22 df.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
23 .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
24 .loc[df.index] # sort by original index: Monday > Thuesday > ...
25)
26
Output:
1 1 2 3 4 5 6 7
2Name
3Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
4Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
5Wednesday Data Science Project Project Project Project Project Project
6Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
7Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
8 Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project
9Name
10Monday 1 1 1 1 3
11Tuesday ...
12Wednesday
13Thursday
14Friday
15 self.file = 'timetable.csv'
16 self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
17 print(self.sdf.to_string())
18 self.subject_frequency = self.sdf.apply(pd.value_counts)
19 print(self.subject_frequency.to_string())
20 self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
21out = (
22 df.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
23 .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
24 .loc[df.index] # sort by original index: Monday > Thuesday > ...
25)
26>>> out
27Data Data Mining Data Science Embedded Systems Industrial Psychology Project
28Name
29Monday 1 1 1 1 3
30Tuesday 0 1 1 1 4
31Wednesday 0 1 0 0 6
32Thursday 2 0 1 1 3
33Friday 1 1 1 1 3
34
QUESTION
Regenerate SSAS multidimentional partitions files from the database
Asked 2022-Feb-25 at 05:11I have an ssas cube and I want to create the solution with ssdt on visual studio. I need to generate the .partations file of the cube.
When I do New Project -> Import from server (multidimentional and data mining) The project is created but the .partations file are empty (2ko)
I tried with VS2019, 2017 and BIDS2008R2, it's always the same problem
Any idea about this issue ?
ANSWER
Answered 2022-Feb-25 at 05:11This is a issue when we import from a SSAS database containing custom partitions.
To get the correct partitions, you need to just open the cube (in the Visual Studio Solution) and navigate to the partitions tab.
The moment you select the partitions tab, you can notice, the "star" symbol in the tab denoting that the project has been updated.
You will notice that the partitions file is now updated with the latest partitions.
QUESTION
Counting repeated pairs in a list
Asked 2022-Feb-15 at 03:11I have an assignment that has a data mining element. I need to find which authors collaborate the most across several publication webpages.
I've scraped the webpages and compiled the author text into a list.
My current output looks like this:
1for author in list:
2 print(author)
3
4##output :
5['Author 1', 'Author 2', 'Author 3']
6['Author 2', 'Author 4', 'Author 1']
7['Author 1', 'Author 5', 'Author 6', 'Author 7', 'Author 4']
8
etc for ~100 more rows.
My idea is, for in each section of the list, to produce another list that contains each of the unique pairs in that list. E.g. the third demo row would give 'Author 1 + Author 5', 'Author 1 + Author 6', 'Author 1 + Author 7', 'Author 1 + Author 4', 'Author 5 + Author 6', 'Author 5 + Author 7', 'Author 5 + Author 4', 'Author 6 + Author 7', 'Author 6 + Author 4', 'Author 7 + Author 4'. Then I'd append these pairs lists to one large list and put it through a counter to see which pairs came up the most.
The problem is I'm just not sure how to actually implement that pair matcher, so if anyone has any pointers that would be great. I'm sure it can't be that complicated an answer, but I've been unable to find it. Alternative ideas on how to measure collaboration would be good too.
ANSWER
Answered 2022-Feb-14 at 21:36You could use a dictionary where the pair is the key and the number how often it occurs is the value. You'll need to make sure that you always generate the same key for (Author1,Author2)
and (Author2, Author1)
but you could choose alphabetic ordering for dealing with that.
Then you simply increment the number stored for the pair whenever you encounter it.
QUESTION
Python KeyError: 0 when i use if elif
Asked 2021-Dec-24 at 12:26I am using python to make a simple application for data mining, and I coded it in Google Colab. And I use elif on my function, here is the code
1def data_pred(data):
2 # split(data)
3 X_train, y_train, X_test, y_test = split(data)
4
5 linreg = LinearRegression()
6 linreg.fit(X_train, y_train)
7 y_preds = linreg.predict(X_test)
8
9 for x in range(17):
10 y_test = np.insert(y_test, len(y_test), y_preds[len(y_preds)-1])
11 X_test = np.insert(X_test, len(X_test), y_test[len(X_test)-1])
12 X_test = np.array(X_test).reshape(X_test.size, 1)
13 y_preds = linreg.predict(X_test)
14
15
16 plt.scatter(X_test, y_test)
17 plt.scatter(X_test, y_preds, color='green')
18 plt.plot(X_test, y_preds, color="red")
19 plt.xlabel("X axis")
20 plt.ylabel("Y axis")
21
22 plt.show()
23
24 print("nilai slope/koef/a:",linreg.coef_)
25 print("nilai intercept/b :",linreg.intercept_)
26 print('Data hasil prediksi :', y_preds)
27 print('Data aktual :',y_test)
28 print()
29 print('MAPE : ', mape(y_test, y_preds))
30
31
32
33 if data["Nama Golongan"][0] == "INDUSTRI":
34 golongan = data.loc[0:23, "Nama Golongan"]
35 elif data["Nama Golongan"][44] == "INSTANSI PEMERINTAH":
36 golongan = data.loc[44:67, "Nama Golongan"]
37 elif data["Nama Golongan"][88] == "NIAGA KECIL":
38 golongan = data.loc[88:111, "Nama Golongan"]
39 elif data["Nama Golongan"][132] == "RUMAH MENENGAH":
40 golongan = data.loc[132:155, "Nama Golongan"]
41 elif data["Nama Golongan"][176] == "RUMAH MEWAH":
42 golongan = data.loc[176:119, "Nama Golongan"]
43 elif data["Nama Golongan"][220] == "SOSIAL KHUSUS":
44 golongan = data.loc[220:243, "Nama Golongan"]
45 elif data["Nama Golongan"][264] == "TOTAL PERBULAN":
46 golongan = data.loc[264:287, "Nama Golongan"]
47
48
49 more code...
50
when I run,
1def data_pred(data):
2 # split(data)
3 X_train, y_train, X_test, y_test = split(data)
4
5 linreg = LinearRegression()
6 linreg.fit(X_train, y_train)
7 y_preds = linreg.predict(X_test)
8
9 for x in range(17):
10 y_test = np.insert(y_test, len(y_test), y_preds[len(y_preds)-1])
11 X_test = np.insert(X_test, len(X_test), y_test[len(X_test)-1])
12 X_test = np.array(X_test).reshape(X_test.size, 1)
13 y_preds = linreg.predict(X_test)
14
15
16 plt.scatter(X_test, y_test)
17 plt.scatter(X_test, y_preds, color='green')
18 plt.plot(X_test, y_preds, color="red")
19 plt.xlabel("X axis")
20 plt.ylabel("Y axis")
21
22 plt.show()
23
24 print("nilai slope/koef/a:",linreg.coef_)
25 print("nilai intercept/b :",linreg.intercept_)
26 print('Data hasil prediksi :', y_preds)
27 print('Data aktual :',y_test)
28 print()
29 print('MAPE : ', mape(y_test, y_preds))
30
31
32
33 if data["Nama Golongan"][0] == "INDUSTRI":
34 golongan = data.loc[0:23, "Nama Golongan"]
35 elif data["Nama Golongan"][44] == "INSTANSI PEMERINTAH":
36 golongan = data.loc[44:67, "Nama Golongan"]
37 elif data["Nama Golongan"][88] == "NIAGA KECIL":
38 golongan = data.loc[88:111, "Nama Golongan"]
39 elif data["Nama Golongan"][132] == "RUMAH MENENGAH":
40 golongan = data.loc[132:155, "Nama Golongan"]
41 elif data["Nama Golongan"][176] == "RUMAH MEWAH":
42 golongan = data.loc[176:119, "Nama Golongan"]
43 elif data["Nama Golongan"][220] == "SOSIAL KHUSUS":
44 golongan = data.loc[220:243, "Nama Golongan"]
45 elif data["Nama Golongan"][264] == "TOTAL PERBULAN":
46 golongan = data.loc[264:287, "Nama Golongan"]
47
48
49 more code...
50 a = this[this['Nama Golongan'] == 'INDUSTRI']
51 data_pred(a)
52
I get graphic plot and the result without error. But, when I run this code
1def data_pred(data):
2 # split(data)
3 X_train, y_train, X_test, y_test = split(data)
4
5 linreg = LinearRegression()
6 linreg.fit(X_train, y_train)
7 y_preds = linreg.predict(X_test)
8
9 for x in range(17):
10 y_test = np.insert(y_test, len(y_test), y_preds[len(y_preds)-1])
11 X_test = np.insert(X_test, len(X_test), y_test[len(X_test)-1])
12 X_test = np.array(X_test).reshape(X_test.size, 1)
13 y_preds = linreg.predict(X_test)
14
15
16 plt.scatter(X_test, y_test)
17 plt.scatter(X_test, y_preds, color='green')
18 plt.plot(X_test, y_preds, color="red")
19 plt.xlabel("X axis")
20 plt.ylabel("Y axis")
21
22 plt.show()
23
24 print("nilai slope/koef/a:",linreg.coef_)
25 print("nilai intercept/b :",linreg.intercept_)
26 print('Data hasil prediksi :', y_preds)
27 print('Data aktual :',y_test)
28 print()
29 print('MAPE : ', mape(y_test, y_preds))
30
31
32
33 if data["Nama Golongan"][0] == "INDUSTRI":
34 golongan = data.loc[0:23, "Nama Golongan"]
35 elif data["Nama Golongan"][44] == "INSTANSI PEMERINTAH":
36 golongan = data.loc[44:67, "Nama Golongan"]
37 elif data["Nama Golongan"][88] == "NIAGA KECIL":
38 golongan = data.loc[88:111, "Nama Golongan"]
39 elif data["Nama Golongan"][132] == "RUMAH MENENGAH":
40 golongan = data.loc[132:155, "Nama Golongan"]
41 elif data["Nama Golongan"][176] == "RUMAH MEWAH":
42 golongan = data.loc[176:119, "Nama Golongan"]
43 elif data["Nama Golongan"][220] == "SOSIAL KHUSUS":
44 golongan = data.loc[220:243, "Nama Golongan"]
45 elif data["Nama Golongan"][264] == "TOTAL PERBULAN":
46 golongan = data.loc[264:287, "Nama Golongan"]
47
48
49 more code...
50 a = this[this['Nama Golongan'] == 'INDUSTRI']
51 data_pred(a)
52b = this[this['Nama Golongan'] == 'INSTANSI PEMERINTAH']
53data_pred(b)
54
I get this
1def data_pred(data):
2 # split(data)
3 X_train, y_train, X_test, y_test = split(data)
4
5 linreg = LinearRegression()
6 linreg.fit(X_train, y_train)
7 y_preds = linreg.predict(X_test)
8
9 for x in range(17):
10 y_test = np.insert(y_test, len(y_test), y_preds[len(y_preds)-1])
11 X_test = np.insert(X_test, len(X_test), y_test[len(X_test)-1])
12 X_test = np.array(X_test).reshape(X_test.size, 1)
13 y_preds = linreg.predict(X_test)
14
15
16 plt.scatter(X_test, y_test)
17 plt.scatter(X_test, y_preds, color='green')
18 plt.plot(X_test, y_preds, color="red")
19 plt.xlabel("X axis")
20 plt.ylabel("Y axis")
21
22 plt.show()
23
24 print("nilai slope/koef/a:",linreg.coef_)
25 print("nilai intercept/b :",linreg.intercept_)
26 print('Data hasil prediksi :', y_preds)
27 print('Data aktual :',y_test)
28 print()
29 print('MAPE : ', mape(y_test, y_preds))
30
31
32
33 if data["Nama Golongan"][0] == "INDUSTRI":
34 golongan = data.loc[0:23, "Nama Golongan"]
35 elif data["Nama Golongan"][44] == "INSTANSI PEMERINTAH":
36 golongan = data.loc[44:67, "Nama Golongan"]
37 elif data["Nama Golongan"][88] == "NIAGA KECIL":
38 golongan = data.loc[88:111, "Nama Golongan"]
39 elif data["Nama Golongan"][132] == "RUMAH MENENGAH":
40 golongan = data.loc[132:155, "Nama Golongan"]
41 elif data["Nama Golongan"][176] == "RUMAH MEWAH":
42 golongan = data.loc[176:119, "Nama Golongan"]
43 elif data["Nama Golongan"][220] == "SOSIAL KHUSUS":
44 golongan = data.loc[220:243, "Nama Golongan"]
45 elif data["Nama Golongan"][264] == "TOTAL PERBULAN":
46 golongan = data.loc[264:287, "Nama Golongan"]
47
48
49 more code...
50 a = this[this['Nama Golongan'] == 'INDUSTRI']
51 data_pred(a)
52b = this[this['Nama Golongan'] == 'INSTANSI PEMERINTAH']
53data_pred(b)
54KeyError Traceback (most recent call last)
55/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key,
56method, tolerance)
57 2897 try:
58 -> 2898 return self._engine.get_loc(casted_key)
59 2899 except KeyError as err:
60
61 pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
62 pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
63 pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
64 pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
65
66
I thought its cause the elif
code, but i don't know why. can anyone tell me why and how to fix it ? Please help me, thanks.
ANSWER
Answered 2021-Dec-24 at 03:27OK, I finally see the problem. You are extracting a subset of a dataframe and passing it to this file. So, data["Nama Golongan"][44]
is referring to index 44, because the indicies get carried through with the subset.
The problem is that data.loc
does NOT use the index. It's strictly row numbers. They're all going to start with 0. If you ONLY want the first 23 rows, you don't need your if
sequence at all. Replace the whole thing with this:
1def data_pred(data):
2 # split(data)
3 X_train, y_train, X_test, y_test = split(data)
4
5 linreg = LinearRegression()
6 linreg.fit(X_train, y_train)
7 y_preds = linreg.predict(X_test)
8
9 for x in range(17):
10 y_test = np.insert(y_test, len(y_test), y_preds[len(y_preds)-1])
11 X_test = np.insert(X_test, len(X_test), y_test[len(X_test)-1])
12 X_test = np.array(X_test).reshape(X_test.size, 1)
13 y_preds = linreg.predict(X_test)
14
15
16 plt.scatter(X_test, y_test)
17 plt.scatter(X_test, y_preds, color='green')
18 plt.plot(X_test, y_preds, color="red")
19 plt.xlabel("X axis")
20 plt.ylabel("Y axis")
21
22 plt.show()
23
24 print("nilai slope/koef/a:",linreg.coef_)
25 print("nilai intercept/b :",linreg.intercept_)
26 print('Data hasil prediksi :', y_preds)
27 print('Data aktual :',y_test)
28 print()
29 print('MAPE : ', mape(y_test, y_preds))
30
31
32
33 if data["Nama Golongan"][0] == "INDUSTRI":
34 golongan = data.loc[0:23, "Nama Golongan"]
35 elif data["Nama Golongan"][44] == "INSTANSI PEMERINTAH":
36 golongan = data.loc[44:67, "Nama Golongan"]
37 elif data["Nama Golongan"][88] == "NIAGA KECIL":
38 golongan = data.loc[88:111, "Nama Golongan"]
39 elif data["Nama Golongan"][132] == "RUMAH MENENGAH":
40 golongan = data.loc[132:155, "Nama Golongan"]
41 elif data["Nama Golongan"][176] == "RUMAH MEWAH":
42 golongan = data.loc[176:119, "Nama Golongan"]
43 elif data["Nama Golongan"][220] == "SOSIAL KHUSUS":
44 golongan = data.loc[220:243, "Nama Golongan"]
45 elif data["Nama Golongan"][264] == "TOTAL PERBULAN":
46 golongan = data.loc[264:287, "Nama Golongan"]
47
48
49 more code...
50 a = this[this['Nama Golongan'] == 'INDUSTRI']
51 data_pred(a)
52b = this[this['Nama Golongan'] == 'INSTANSI PEMERINTAH']
53data_pred(b)
54KeyError Traceback (most recent call last)
55/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key,
56method, tolerance)
57 2897 try:
58 -> 2898 return self._engine.get_loc(casted_key)
59 2899 except KeyError as err:
60
61 pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
62 pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
63 pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
64 pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
65
66 golongan = data.loc[0:23, "Nama Golongan"]
67
The first row, when using loc
, is always 0.
QUESTION
React JS floated tag around a component, is position absolute a prudent idea?
Asked 2021-Dec-15 at 22:21I would like to get involved in something perhaps complicated. I would like to create the following render (see image below) with React JS. However, I thought it would be prudent to begin by using position: absolute
and repositioning my divs accordingly. However, it appears to be a difficult idea at first glance, considering the number of tags I desire (floated around the main component) for the responsive aspect and the sake that moving them with some pixel will be an indefinitely task. As a result, I was wondering whether there is a plug-in or if you have any suggestions for resolving this particular aspect. Additionally, remember that if you like to respond, it is OK to do so using basic coloured square-rectangles; I am looking forward to learning how to apply such a thing not the specific design.
Today, I have the following, but it would be unmanageable to perform this for each and hope for the best during responsive resizing.
My current code:
React JS divs:
1profile_picture = () => {
2return (
3 <div className="profilepicturetechstack">
4 <div className="home">
5 <div className="frame-1-3">
6 <img src="./resources/simon-provost-02-min.jpg" alt="profile_pic"/>
7 </div>
8 <div className="photo--wrapper--ellipse">
9 <p className="text-4">ML/RESEARCH</p>
10 </div>
11 <p className="text-1">Simon provost</p>
12 <p className="text-2">Paris, France</p>
13 </div>
14 <div className="frame-1-4">
15 <p className="text-7">⚙️ Machine Learning</p>
16 </div>
17 {/*<div className="frame-1-9">
18 <p className="text-8">💡 AutoML</p>
19 </div>
20 <div className="frame-1-5">
21 <p className="text-9">⛏ Data Mining</p>
22 </div>
23 <div className="frame-1-6">
24 <p className="text-1-0">🎨 UI.UX</p>
25 </div>
26 <div className="frame-1-8">
27 <p className="text-1-1">🔬 Research</p>
28 </div>
29 <div className="frame-1-2">
30 <img src="" />
31 </div>
32 <div className="frame-1-7">
33 <p className="text-1-3">🌤 MLOps</p>
34 </div>*/}
35 </div>
36)
37
}
CSS associated class:
1profile_picture = () => {
2return (
3 <div className="profilepicturetechstack">
4 <div className="home">
5 <div className="frame-1-3">
6 <img src="./resources/simon-provost-02-min.jpg" alt="profile_pic"/>
7 </div>
8 <div className="photo--wrapper--ellipse">
9 <p className="text-4">ML/RESEARCH</p>
10 </div>
11 <p className="text-1">Simon provost</p>
12 <p className="text-2">Paris, France</p>
13 </div>
14 <div className="frame-1-4">
15 <p className="text-7">⚙️ Machine Learning</p>
16 </div>
17 {/*<div className="frame-1-9">
18 <p className="text-8">💡 AutoML</p>
19 </div>
20 <div className="frame-1-5">
21 <p className="text-9">⛏ Data Mining</p>
22 </div>
23 <div className="frame-1-6">
24 <p className="text-1-0">🎨 UI.UX</p>
25 </div>
26 <div className="frame-1-8">
27 <p className="text-1-1">🔬 Research</p>
28 </div>
29 <div className="frame-1-2">
30 <img src="" />
31 </div>
32 <div className="frame-1-7">
33 <p className="text-1-3">🌤 MLOps</p>
34 </div>*/}
35 </div>
36)
37.profilepicturetechstack {
38 display: flex;
39 flex-direction: row;
40 justify-content: center;
41 padding-right: 10%;
42}
43.home {
44 display: flex;
45 position: relative;
46 flex-direction: column;
47 align-items: center;
48 justify-content: center;
49 margin-right: 100px;
50
51 border-radius: 13px;
52 height: 300px;
53 width: 350px;
54 background-color: #ffffff;
55 box-shadow: 0 40px 30px rgba(25, 25, 46, 0.06);
56}
57.text-1 {
58 text-align: center;
59 vertical-align: top;
60 font-size: 16px;
61 font-family: Roboto, serif;
62
63 color: #25323c;
64}
65.text-2 {
66 text-align: left;
67 vertical-align: top;
68 font-size: 14px;
69 margin-top: -15px;
70 font-family: Roboto, serif;
71
72 color: #859fb3;
73}
74.photo--wrapper--ellipse {
75 display: flex;
76 justify-content: center;
77 align-items: center;
78 text-align: center;
79 margin-top: -15px;
80 width: 96px;
81 height: 25px;
82
83 background: linear-gradient(135deg, #FF26B2 0%, #851BD9 80%, #3F0FFF 100%);
84 opacity: 0.8;
85 box-shadow: 0 5px 20px rgba(250, 118, 96, 0.2);
86 border-radius: 66px;
87}
88.img-3 {
89 height: 84px;
90 width: 84px;
91}
92.component-/points-/-m {
93 opacity: 0.80;
94 border-radius: 66px;
95 display: flex;
96 flex-direction: row;
97 justify-content: flex-start;
98 align-items: center;
99 padding: 6px 10px;
100 gap: 7px;
101 background-color: red;
102}
103.text-4 {
104 text-align: center;
105 vertical-align: top;
106 font-size: 11px;
107 font-family: Roboto, serif;
108
109 color: #ffffff;
110}
111.frame-1-3 {
112 height: 120px;
113 width: 120px;
114}
115
116.frame-1-3 img {
117 object-fit: contain;
118 border-radius: 62px;
119 height: 100%;
120 width: 100%;
121}
122
123.frame-1-1 {
124 border-radius: 25px;
125 height: 61px;
126 width: 61px;
127 background-color: rgba(36, 150, 237, 0.5);
128}
129.img-6 {
130 height: 35px;
131 width: 37px;
132}
133.frame-1-4 {
134 display: flex;
135 position: absolute;
136 flex-direction: row;
137 justify-content: flex-start;
138 align-items: center;
139 padding: 16px 24px;
140 gap: 10px;
141 right: 5%;
142 box-shadow: 0 40px 30px rgba(25, 25, 46, 0.04);
143 border-radius: 16px;
144 background-color: #ffffff;
145}
146.text-7 {
147 text-align: left;
148 vertical-align: top;
149 font-size: 16px;
150 font-family: 'Poppins', serif;
151 letter-spacing: 3px;
152
153 color: #5d86a7;
154}
155.frame-1-9 {
156 border-radius: 16px;
157 display: flex;
158 flex-direction: row;
159 justify-content: flex-start;
160 align-items: center;
161 padding: 16px 24px;
162 gap: 10px;
163 background-color: #ffffff;
164}
165.text-8 {
166 text-align: left;
167 vertical-align: top;
168 font-size: 16px;
169 font-family: 'Poppins', serif;
170 letter-spacing: 3px;
171
172 color: #5d86a7;
173}
174.frame-1-5 {
175 border-radius: 16px;
176 display: flex;
177 flex-direction: row;
178 justify-content: flex-start;
179 align-items: center;
180 padding: 16px 24px;
181 gap: 10px;
182 background-color: #ffffff;
183}
184.text-9 {
185 text-align: left;
186 vertical-align: top;
187 font-size: 16px;
188 font-family: 'Poppins', serif;
189 letter-spacing: 3px;
190
191 color: #5d86a7;
192}
193.frame-1-6 {
194 border-radius: 16px;
195 display: flex;
196 flex-direction: row;
197 justify-content: flex-start;
198 align-items: center;
199 padding: 16px 24px;
200 gap: 10px;
201 background-color: #ffffff;
202}
203.text-1-0 {
204 text-align: left;
205 vertical-align: top;
206 font-size: 16px;
207 font-family: 'Poppins', serif;
208 letter-spacing: 3px;
209
210 color: #5d86a7;
211}
212.frame-1-8 {
213 border-radius: 16px;
214 display: flex;
215 flex-direction: row;
216 justify-content: flex-start;
217 align-items: center;
218 padding: 16px 24px;
219 gap: 10px;
220 background-color: #ffffff;
221}
222.text-1-1 {
223 text-align: left;
224 vertical-align: top;
225 font-size: 16px;
226 font-family: 'Poppins', serif;
227 letter-spacing: 3px;
228
229 color: #5d86a7;
230}
231.frame-1-2 {
232 border-radius: 25px;
233 height: 61px;
234 width: 61px;
235 background-color: #f3eefa;
236}
237.img-1-2 {
238 height: 29px;
239 width: 29px;
240}
241.frame-1-7 {
242 border-radius: 16px;
243 display: flex;
244 flex-direction: row;
245 justify-content: flex-start;
246 align-items: center;
247 padding: 16px 24px;
248 gap: 10px;
249 background-color: #ffffff;
250}
251.text-1-3 {
252 text-align: left;
253 vertical-align: top;
254 font-size: 16px;
255 font-family: 'Poppins', serif;
256 letter-spacing: 3px;
257
258 color: #5d86a7;
259}
260
I am open to learning more about tips and best practises; you may remove my code and provide a solution that focuses on the purpose rather than the particular design once again; that is fine. I am a little befuddled. Many thanks.
ANSWER
Answered 2021-Dec-15 at 22:21As Ramesh mentioned in the comments, absolute positioning is needed for the list items surrounding the main div.
- Create a container div surrounding the list items and have the width and height dimensions the same as className home. This will ensure that the list items will not be affected by flexbox.
- I would remove all flex containers inside the classNames for the list items. Instead, use position: absolute in order to use right, left, bottom, and top properties. From here, you can test different values using percentages or pixels to get the placements you wish for. For more information regarding using either pixels or percentages, this article helps with clarifying this: https://www.hongkiat.com/blog/css-units/
- As for responsive resizing: Use media queries. It is also important to use the !important property as it would give more weight to the appropriate value needed based on the screen size. For more information on media queries, visit https://css-tricks.com/a-complete-guide-to-css-media-queries/
One of the list items for responsive resizing should look something like this:
1profile_picture = () => {
2return (
3 <div className="profilepicturetechstack">
4 <div className="home">
5 <div className="frame-1-3">
6 <img src="./resources/simon-provost-02-min.jpg" alt="profile_pic"/>
7 </div>
8 <div className="photo--wrapper--ellipse">
9 <p className="text-4">ML/RESEARCH</p>
10 </div>
11 <p className="text-1">Simon provost</p>
12 <p className="text-2">Paris, France</p>
13 </div>
14 <div className="frame-1-4">
15 <p className="text-7">⚙️ Machine Learning</p>
16 </div>
17 {/*<div className="frame-1-9">
18 <p className="text-8">💡 AutoML</p>
19 </div>
20 <div className="frame-1-5">
21 <p className="text-9">⛏ Data Mining</p>
22 </div>
23 <div className="frame-1-6">
24 <p className="text-1-0">🎨 UI.UX</p>
25 </div>
26 <div className="frame-1-8">
27 <p className="text-1-1">🔬 Research</p>
28 </div>
29 <div className="frame-1-2">
30 <img src="" />
31 </div>
32 <div className="frame-1-7">
33 <p className="text-1-3">🌤 MLOps</p>
34 </div>*/}
35 </div>
36)
37.profilepicturetechstack {
38 display: flex;
39 flex-direction: row;
40 justify-content: center;
41 padding-right: 10%;
42}
43.home {
44 display: flex;
45 position: relative;
46 flex-direction: column;
47 align-items: center;
48 justify-content: center;
49 margin-right: 100px;
50
51 border-radius: 13px;
52 height: 300px;
53 width: 350px;
54 background-color: #ffffff;
55 box-shadow: 0 40px 30px rgba(25, 25, 46, 0.06);
56}
57.text-1 {
58 text-align: center;
59 vertical-align: top;
60 font-size: 16px;
61 font-family: Roboto, serif;
62
63 color: #25323c;
64}
65.text-2 {
66 text-align: left;
67 vertical-align: top;
68 font-size: 14px;
69 margin-top: -15px;
70 font-family: Roboto, serif;
71
72 color: #859fb3;
73}
74.photo--wrapper--ellipse {
75 display: flex;
76 justify-content: center;
77 align-items: center;
78 text-align: center;
79 margin-top: -15px;
80 width: 96px;
81 height: 25px;
82
83 background: linear-gradient(135deg, #FF26B2 0%, #851BD9 80%, #3F0FFF 100%);
84 opacity: 0.8;
85 box-shadow: 0 5px 20px rgba(250, 118, 96, 0.2);
86 border-radius: 66px;
87}
88.img-3 {
89 height: 84px;
90 width: 84px;
91}
92.component-/points-/-m {
93 opacity: 0.80;
94 border-radius: 66px;
95 display: flex;
96 flex-direction: row;
97 justify-content: flex-start;
98 align-items: center;
99 padding: 6px 10px;
100 gap: 7px;
101 background-color: red;
102}
103.text-4 {
104 text-align: center;
105 vertical-align: top;
106 font-size: 11px;
107 font-family: Roboto, serif;
108
109 color: #ffffff;
110}
111.frame-1-3 {
112 height: 120px;
113 width: 120px;
114}
115
116.frame-1-3 img {
117 object-fit: contain;
118 border-radius: 62px;
119 height: 100%;
120 width: 100%;
121}
122
123.frame-1-1 {
124 border-radius: 25px;
125 height: 61px;
126 width: 61px;
127 background-color: rgba(36, 150, 237, 0.5);
128}
129.img-6 {
130 height: 35px;
131 width: 37px;
132}
133.frame-1-4 {
134 display: flex;
135 position: absolute;
136 flex-direction: row;
137 justify-content: flex-start;
138 align-items: center;
139 padding: 16px 24px;
140 gap: 10px;
141 right: 5%;
142 box-shadow: 0 40px 30px rgba(25, 25, 46, 0.04);
143 border-radius: 16px;
144 background-color: #ffffff;
145}
146.text-7 {
147 text-align: left;
148 vertical-align: top;
149 font-size: 16px;
150 font-family: 'Poppins', serif;
151 letter-spacing: 3px;
152
153 color: #5d86a7;
154}
155.frame-1-9 {
156 border-radius: 16px;
157 display: flex;
158 flex-direction: row;
159 justify-content: flex-start;
160 align-items: center;
161 padding: 16px 24px;
162 gap: 10px;
163 background-color: #ffffff;
164}
165.text-8 {
166 text-align: left;
167 vertical-align: top;
168 font-size: 16px;
169 font-family: 'Poppins', serif;
170 letter-spacing: 3px;
171
172 color: #5d86a7;
173}
174.frame-1-5 {
175 border-radius: 16px;
176 display: flex;
177 flex-direction: row;
178 justify-content: flex-start;
179 align-items: center;
180 padding: 16px 24px;
181 gap: 10px;
182 background-color: #ffffff;
183}
184.text-9 {
185 text-align: left;
186 vertical-align: top;
187 font-size: 16px;
188 font-family: 'Poppins', serif;
189 letter-spacing: 3px;
190
191 color: #5d86a7;
192}
193.frame-1-6 {
194 border-radius: 16px;
195 display: flex;
196 flex-direction: row;
197 justify-content: flex-start;
198 align-items: center;
199 padding: 16px 24px;
200 gap: 10px;
201 background-color: #ffffff;
202}
203.text-1-0 {
204 text-align: left;
205 vertical-align: top;
206 font-size: 16px;
207 font-family: 'Poppins', serif;
208 letter-spacing: 3px;
209
210 color: #5d86a7;
211}
212.frame-1-8 {
213 border-radius: 16px;
214 display: flex;
215 flex-direction: row;
216 justify-content: flex-start;
217 align-items: center;
218 padding: 16px 24px;
219 gap: 10px;
220 background-color: #ffffff;
221}
222.text-1-1 {
223 text-align: left;
224 vertical-align: top;
225 font-size: 16px;
226 font-family: 'Poppins', serif;
227 letter-spacing: 3px;
228
229 color: #5d86a7;
230}
231.frame-1-2 {
232 border-radius: 25px;
233 height: 61px;
234 width: 61px;
235 background-color: #f3eefa;
236}
237.img-1-2 {
238 height: 29px;
239 width: 29px;
240}
241.frame-1-7 {
242 border-radius: 16px;
243 display: flex;
244 flex-direction: row;
245 justify-content: flex-start;
246 align-items: center;
247 padding: 16px 24px;
248 gap: 10px;
249 background-color: #ffffff;
250}
251.text-1-3 {
252 text-align: left;
253 vertical-align: top;
254 font-size: 16px;
255 font-family: 'Poppins', serif;
256 letter-spacing: 3px;
257
258 color: #5d86a7;
259}
260.frame-1-6 {
261 position: absolute;
262 padding: 16px 24px;
263 left: -200px; //Over 900px
264 bottom: 115%; //Over 900px
265 border: 1px solid black;
266 width: 200px;
267 box-shadow: 0 40px 30px rgba(25, 25, 46, 0.04);
268 border-radius: 16px;
269 background-color: #ffffff;
270 }
271
272 @media screen and (max-width: 900px) {
273 .frame-1-6 {
274 left: -150px !important; //Under 900px
275 bottom: 100% !important; //Under 900px
276 }
277 }
278
In the live example, I have gone ahead and placed some of your list items in the desired areas in order to showcase how it works.
Live Example: https://jsfiddle.net/t3qry2oa/286/
QUESTION
Gensim doc2vec's d2v.wv.most_similar() gives not relevant words with high similarity scores
Asked 2021-Dec-14 at 20:14I've got a dataset of job listings with about 150 000 records. I extracted skills from descriptions using NER using a dictionary of 30 000 skills. Every skill is represented as an unique identificator.
My data example:
1 job_title job_id skills
21 business manager 4 12 13 873 4811 482 2384 48 293 48
32 java developer 55 48 2838 291 37 484 192 92 485 17 23 299 23...
43 data scientist 21 383 48 587 475 2394 5716 293 585 1923 494 3
5
Then, I train a doc2vec model using these data where job titles (their ids to be precise) are used as tags and skills vectors as word vectors.
1 job_title job_id skills
21 business manager 4 12 13 873 4811 482 2384 48 293 48
32 java developer 55 48 2838 291 37 484 192 92 485 17 23 299 23...
43 data scientist 21 383 48 587 475 2394 5716 293 585 1923 494 3
5def tagged_document(df):
6 for index, row in df.iterrows():
7 yield gensim.models.doc2vec.TaggedDocument(row['skills'].split(), [str(row['job_id'])])
8
9
10data_for_training = list(tagged_document(data[['job_id', 'skills']]))
11
12model_d2v = gensim.models.doc2vec.Doc2Vec(dm=0, dbow_words=1, vector_size=80, min_count=3, epochs=100, window=100000)
13
14model_d2v.build_vocab(data_for_training)
15
16model_d2v.train(data_for_training, total_examples=model_d2v.corpus_count, epochs=model_d2v.epochs)
17
It works mostly okay, but I have issues with some job titles. I tried to collect more data from them, but I still have an unpredictable behavior with them.
For example, I have a job title "Director Of Commercial Operations" which is represented as 41 data records having from 11 to 96 skills (mean 32). When I get most similar words for it (skills in my case) I get the following:
1 job_title job_id skills
21 business manager 4 12 13 873 4811 482 2384 48 293 48
32 java developer 55 48 2838 291 37 484 192 92 485 17 23 299 23...
43 data scientist 21 383 48 587 475 2394 5716 293 585 1923 494 3
5def tagged_document(df):
6 for index, row in df.iterrows():
7 yield gensim.models.doc2vec.TaggedDocument(row['skills'].split(), [str(row['job_id'])])
8
9
10data_for_training = list(tagged_document(data[['job_id', 'skills']]))
11
12model_d2v = gensim.models.doc2vec.Doc2Vec(dm=0, dbow_words=1, vector_size=80, min_count=3, epochs=100, window=100000)
13
14model_d2v.build_vocab(data_for_training)
15
16model_d2v.train(data_for_training, total_examples=model_d2v.corpus_count, epochs=model_d2v.epochs)
17docvec = model_d2v.docvecs[id_]
18model_d2v.wv.most_similar(positive=[docvec], topn=5)
19
1 job_title job_id skills
21 business manager 4 12 13 873 4811 482 2384 48 293 48
32 java developer 55 48 2838 291 37 484 192 92 485 17 23 299 23...
43 data scientist 21 383 48 587 475 2394 5716 293 585 1923 494 3
5def tagged_document(df):
6 for index, row in df.iterrows():
7 yield gensim.models.doc2vec.TaggedDocument(row['skills'].split(), [str(row['job_id'])])
8
9
10data_for_training = list(tagged_document(data[['job_id', 'skills']]))
11
12model_d2v = gensim.models.doc2vec.Doc2Vec(dm=0, dbow_words=1, vector_size=80, min_count=3, epochs=100, window=100000)
13
14model_d2v.build_vocab(data_for_training)
15
16model_d2v.train(data_for_training, total_examples=model_d2v.corpus_count, epochs=model_d2v.epochs)
17docvec = model_d2v.docvecs[id_]
18model_d2v.wv.most_similar(positive=[docvec], topn=5)
19capacity utilization 0.5729076266288757
20process optimization 0.5405482649803162
21goal setting 0.5288119316101074
22aeration 0.5124399662017822
23supplier relationship management 0.5117508172988892
24
These are top 5 skills and 3 of them look relevant. However the top one doesn't look too valid together with "aeration". The problem is that none of the job title records have these skills at all. It seems like a noise in the output, but why it gets one of the highest similarity scores (although generally not high)? Does it mean that the model can't outline very specific skills for this kind of job titles? Can the number of "noisy" skills be reduced? Sometimes I see much more relevant skills with lower similarity score, but it's often lower than 0.5.
One more example of correct behavior with similar amount of data: BI Analyst, 29 records, number of skills from 4 to 48 (mean 21). The top skills look alright.
1 job_title job_id skills
21 business manager 4 12 13 873 4811 482 2384 48 293 48
32 java developer 55 48 2838 291 37 484 192 92 485 17 23 299 23...
43 data scientist 21 383 48 587 475 2394 5716 293 585 1923 494 3
5def tagged_document(df):
6 for index, row in df.iterrows():
7 yield gensim.models.doc2vec.TaggedDocument(row['skills'].split(), [str(row['job_id'])])
8
9
10data_for_training = list(tagged_document(data[['job_id', 'skills']]))
11
12model_d2v = gensim.models.doc2vec.Doc2Vec(dm=0, dbow_words=1, vector_size=80, min_count=3, epochs=100, window=100000)
13
14model_d2v.build_vocab(data_for_training)
15
16model_d2v.train(data_for_training, total_examples=model_d2v.corpus_count, epochs=model_d2v.epochs)
17docvec = model_d2v.docvecs[id_]
18model_d2v.wv.most_similar(positive=[docvec], topn=5)
19capacity utilization 0.5729076266288757
20process optimization 0.5405482649803162
21goal setting 0.5288119316101074
22aeration 0.5124399662017822
23supplier relationship management 0.5117508172988892
24business intelligence 0.6986587047576904
25business intelligence development 0.6861011981964111
26power bi 0.6589289903640747
27tableau 0.6500121355056763
28qlikview (data analytics software) 0.6307920217514038
29business intelligence tools 0.6143202781677246
30dimensional modeling 0.6032138466835022
31exploratory data analysis 0.6005223989486694
32marketing analytics 0.5737696886062622
33data mining 0.5734485387802124
34data quality 0.5729933977127075
35data visualization 0.5691111087799072
36microstrategy 0.5566076636314392
37business analytics 0.5535123348236084
38etl 0.5516749620437622
39data modeling 0.5512707233428955
40data profiling 0.5495884418487549
41
ANSWER
Answered 2021-Dec-14 at 20:14If the your gold standard of what the model should report is skills that appeared in the training data, are you sure you don't want a simple count-based solution? For example, just provide a ranked list of the skills that appear most often in Director Of Commercial Operations
listings?
On the other hand, the essence of compressing N job titles, and 30,000 skills, into a smaller (in this case vector_size=80
) coordinate-space model is to force some non-intuitive (but perhaps real) relationships to be reflected in the model.
Might there be some real pattern in the model – even if, perhaps, just some idiosyncracies in the appearance of less-common skills – that makes aeration
necessarily slot near those other skills? (Maybe it's a rare skill whose few contextual appearances co-occur with other skills very much near 'capacity utilization' -meaning with the tiny amount of data available, & tiny amount of overall attention given to this skill, there's no better place for it.)
Taking note of whether your 'anomalies' are often in low-frequency skills, or lower-freqeuncy job-ids, might enable a closer look at the data causes, or some disclaimering/filtering of most_similar()
results. (The most_similar()
method can limit its returned rankings to the more frequent range of the known vocabulary, for cases when the long-tail or rare words are, in with their rougher vectors, intruding in higher-quality results from better-reqpresented words. See the restrict_vocab
parameter.)
That said, tinkering with training parameters may result in rankings that better reflect your intent. A larger min_count
might remove more tokens that, lacking sufficient varied examples, mostly just inject noise into the rest of training. A different vector_size
, smaller or larger, might better capture the relationships you're looking for. A more-aggressive (smaller) sample
could discard more high-frequency words that might be starving more-interesting less-frequent words of a chance to influence the model.
Note that with dbow_words=1
& a large window, and records with (perhaps?) dozens of skills each, the words are having a much-more neighborly effect on each other, in the model, than the tag
<->word
correlations. That might be good or bad.
QUESTION
Creating a CSV file from Python Script
Asked 2021-Nov-30 at 09:42I am learning data mining from a book and I am trying to write my first script to gather info from Youtube's API and feed it into a new .csv file. For some reason, it isn't working. I tried inputting the script line by line in a CLI and the script will eventually create an empty .csv file, but the information is never fed in. Here is my code, it's basically copied line by line from the book:
1import csv
2import json
3import requests
4
5api_url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCJFp8uSYCjXOMnkUyb3CQ3Q&key=AIzaSyDaMzUYRFzDfjMq-bTm38Y_1swWDMfg03E"
6api_response = requests.get(api_url)
7videos = json.loads(api_response.text)
8
9with open("C:\Users\jacks\Documents\PythonScripts\youtube_videos.csv", "w", encoding="utf-8") as csv_file:
10 csv_writer = csv.writer(csv_file)
11 csv_writer.writerow(["publishedAt",
12 "title",
13 "description",
14 "thumbnailurl"])
15 if videos.get("items") is not None:
16 for video in videos.get("items"):
17 videos_data_row = [
18 video["snippet"]["publishedAt"],
19 video["snippet"]["title"],
20 video["snippet"]["description"],
21 video["snippet"]["thumbnails"]["default"]["url"]
22 ]
23 csv_writer.writerow(video_data_row)
24
ANSWER
Answered 2021-Nov-30 at 09:42I ran your code & the only problem I found was in csv_writer.writerow(video_data_row)
You're missing an s
Replace with:
1import csv
2import json
3import requests
4
5api_url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCJFp8uSYCjXOMnkUyb3CQ3Q&key=AIzaSyDaMzUYRFzDfjMq-bTm38Y_1swWDMfg03E"
6api_response = requests.get(api_url)
7videos = json.loads(api_response.text)
8
9with open("C:\Users\jacks\Documents\PythonScripts\youtube_videos.csv", "w", encoding="utf-8") as csv_file:
10 csv_writer = csv.writer(csv_file)
11 csv_writer.writerow(["publishedAt",
12 "title",
13 "description",
14 "thumbnailurl"])
15 if videos.get("items") is not None:
16 for video in videos.get("items"):
17 videos_data_row = [
18 video["snippet"]["publishedAt"],
19 video["snippet"]["title"],
20 video["snippet"]["description"],
21 video["snippet"]["thumbnails"]["default"]["url"]
22 ]
23 csv_writer.writerow(video_data_row)
24csv_writer.writerow(videos_data_row)
25
QUESTION
Which is the best Data Mining model to extrapolate known values to missing values in a table? (General question)
Asked 2021-Oct-27 at 21:21I am working on a little data mining project (I am still a Data Science student, not a professional). Maybe you can help me to choose a proper model for my task.
So, let's say we have a table with three columns and around 4000 rows:
YEAR | COLOR | NAME |
---|---|---|
1900 | Green | David |
1901 | Yellow | Sarah |
1902 | Green | ??? |
1902 | Red | Sarah |
… | … | … |
2020 | Purple | John |
Any value for any field can be repeated in the dataset (also Year values).
In the first two columns we don't have missing values, but we only have around 20% of Name values in the third column. Name value deppends somewhat on the first two columns (not a causal relation).
My goal is to extrapolate the available Name values to the whole table and get a range of occurrences for each name value (for example in a boxplot)
I have imagined a process like that, although I am not very sure if statitically it makes sense (any objections and suggestions are appreciated):
For every unknown NAME value, the algorythm choose randomly one of the already known NAME values. The odds of a particular NAME value to be chosen depend on the variables YEAR and COLOR. For instance, if 'David' values tend to be correlated with low Year values AND with 'Green' or 'Purple' values for Color, the algorythm give 'David' a higher probability to be chosen if input values for Year and Color are "1900, Purple".
When the above process ends, the number of occurrences for each name is counted.
The above process is applied 30 times and the results for each name are displayed in a plotbox.
However, I don't know which is the best model to implement an idea similar to this. I have drawn the process in a simple paint drawing:
Which do you think it could be a good approach to this task? I appreciate any help.
ANSWER
Answered 2021-Oct-27 at 21:21I think you have the process down, it's converting the data which may be the first hurdle.
I would look at using from sklearn.preprocessing import OrdinalEncoder
to encode the data to convert from categorical to numeric.
You could then use a random number generator to produce a number within the range defined by the encoding which would randomly select a name.
Loop through this 30 times with an f loop to achieve the result.
It also looks like you will need to provide the ranking values for year and colour prior to building out your code. From here you would just provide bands, for example, if year > 1985
, etc within your for loop to specify the names.
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in Data Mining
Tutorials and Learning Resources are not available at this moment for Data Mining