Consumer Goods

Explore all Consumer Goods realted open source software, libraries, packages, source code, cloud functions and APIs.

Fast-moving consumer goods (FMCG), also known as consumer packaged goods (CPG), are products that are sold quickly and at a relatively low cost. Examples include non-durable household goods such as packaged foods, beverages, toiletries, candies, cosmetics, over-the-counter drugs, dry goods, and other consumables.

These software components cover functions across Catalog, Consignment, Drop Ship, Electronic Data Interchange, Supplier Management, Wholesale areas.

Popular New Releases in Consumer Goods

Smarter.food.selection

Popular Libraries in Consumer Goods

Ropper

by sashs python

1387 BSD-3-Clause

Display information about files in different file formats and find gadgets to build rop chains for different architectures (x86/x86_64, ARM/ARM64, MIPS, PowerPC, SPARC64). For disassembly ropper uses the awesome Capstone Framework.

Online-Food-Ordering-System

by winston-dsouza java

63 MIT

Online Food Ordering And Order Retrieval System

food-delivery-application

by lbrobinho java

feature-factory

by databrickslabs python

33 NOASSERTION

Accelerator to rapidly deploy customized features for your business

order

by vanilophp php

31 MIT

Order Module For Vanilo (Laravel)

Food-Order-App

by ruhulidb-R33-J2EE java

Food Order Application using Firebase, Google Location, Material Design, SQLite Database

TestDriven.Net-Issues

by jcansdale csharp

Issue tracking for TestDriven.Net

tomato-food-delivery

by ShahAnuj2610 javascript

Full Stack Food Delivery App

Mail-Merge-Examples

by SyncfusionExamples csharp

Mail merge data to a Word document in C#, VB.NET without Microsoft Word or interop.

Explore all libraries in Consumer Goods

Trending New libraries in Consumer Goods

tomato-food-delivery

by ShahAnuj2610 javascript

Full Stack Food Delivery App

Microsoft_bot

by ketonium10 javascript

Microsoft Bot

food-at-you

by sujoys10 javascript

let the food come to you

insights-backend

by adrift-hackGT-7 python

4 MIT

Backend API + algorithm built for surfacing actionable insights for a small business given a network of other similar small businesses

Top Authors in Consumer Goods

constructorlabs

1 Libraries

dnikolovv

1 Libraries

zzhou9

1 Libraries

naveenrj98

1 Libraries

onedanshow

1 Libraries

IzzySoft

1 Libraries

Tyler3D

1 Libraries

ShahAnuj2610

1 Libraries

pamdesilva

1 Libraries

jessepearson

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

1 Libraries

Trending Kits in Consumer Goods

No Trending Kits are available at this moment for Consumer Goods

Trending Discussions on Consumer Goods

Pandas DataFrame if it doesn't contain certain substrings

Repeat Row Values

how to break up data in column value to multiple rows in pandas dataframe

Python Sorting through nested dictionary

R: How can I lower the amount of levels in a dataframe?

R: How can I make a column with lots of categorical levels binary?

Pandas DataFrame if it doesn't contain certain substrings

Repeat Row Values

how to break up data in column value to multiple rows in pandas dataframe

Python Sorting through nested dictionary

R: How can I lower the amount of levels in a dataframe?

R: How can I make a column with lots of categorical levels binary?

QUESTION

Pandas DataFrame if it doesn't contain certain substrings

Asked 2021-Dec-14 at 22:00

I'm creating a program to get an overview of my expenses by category. However, I cannot possibly write down every place where I use my card, so at the end I want to categorize all transactions that do not have my set category names in the 'Category' column already to be categorized as "Other".

Below is how I try to do it, searched for solutions and some people just told to whip a ~ in front of the action to make it do the opposite. Like regular negation. Ain't working. What's the optimal solution here?

My idea here is that wherever the Category isn't Entertainment, Microinvestments etc, the Category column cell in that row will be set to "Other".

1df['Category'] = np.where(~df['Category'].str.contains('Entertainment|Microinvestments|Food|Transport|Transfers|Cash|Bills|Apparel|Consumer goods|Services', case=False), 'Other', df['Category'])
2

ANSWER

Answered 2021-Dec-14 at 22:00

Try this:

1df['Category'] = np.where(~df['Category'].str.contains('Entertainment|Microinvestments|Food|Transport|Transfers|Cash|Bills|Apparel|Consumer goods|Services', case=False), 'Other', df['Category'])
2df.loc[~df['Category'].str.contains('Entertainment|Microinvestments|Food|Transport|Transfers|Cash|Bills|Apparel|Consumer goods|Services', case=False), 'Category'] = 'Other'
3

Source https://stackoverflow.com/questions/70356020

QUESTION

Repeat Row Values

Asked 2021-Nov-12 at 12:59

I have the following data:

1with source (Account,AccountNumber,Indentation) as
2(
3select 'INCOME STATEMENT',1000,0 union all
4select 'REVENUE',1100,0 union all
5select 'Revenue - Aircon',1110,1 union all
6select 'Revenue - Consumer Goods',1120,1 union all
7select 'Revenue - Spares',1130,1 union all
8select 'Revenue - Accessories',1140,1 union all
9select 'Revenue - Sub Stock',1150,1 union all
10select 'Revenue - Services',1160,1 union all
11select 'Revenue - Other',1170,1 union all
12select 'Revenue - Intercompany',1180,1 union all
13select 'Revenue - Delivery Charges',1400,1 union all
14select 'COST OF SALES',1500,0 union all
15select 'COGS - Aircon',1510,1 union all
16select 'COGS - Consumer Goods',1520,1 union all
17select 'COGS - Spares',1530,1 union all
18select 'COGS - Accessories',1540,1 union all
19select 'COGS - Sub Stock',1550,1 union all
20select 'COGS - Services',1560,1 union all
21select 'COGS - Other',1570,1 union all
22select 'COGS - Intercompany',1580,1 union all
23select 'COS - Sub Stock Stock Adjustments',1610,1 union all
24select 'COS - Sub Stock Repairs',1620,1 union all
25select 'COS - Consumables &amp; Packing Materials',1810,1 union all
26select 'COS - Freight &amp; Delivery',1820,1 union all
27select 'COS - Inventory Adj - Stock Count',1910,1 union all
28select 'COS - Inv. Adj - Stock Write up / Write down',1920,1 union all
29select 'COS - Provision for Obsolete Stock (IS)',1930,1 union all
30select 'COS - Inventory Adj - System A/c',1996,1 union all
31select 'COS - Purch &amp; Dir. Cost Appl A/c - System A/c',1997,1 union all
32select 'GROSS MARGIN',1999,0 union all
33select 'OTHER INCOME',2000,0 union all
34select 'Admin Fees Received',2100,1 union all
35select 'Bad Debt Recovered',2110,1 union all
36select 'Discount Received',2120,1 union all
37select 'Dividends Received',2130,1 union all
38select 'Fixed Assets - NBV on Disposal',2140,1 union all
39select 'Fixed Assets - Proceeds on Disposal',2145,1 union all
40select 'Rebates Received',2150,1 union all
41select 'Rental Income',2160,1 union all
42select 'Sundry Income',2170,1 union all
43select 'Warranty Income',2180,1 union all
44select 'INTEREST RECEIVED',2200,0 union all
45select 'Interest Received - Banks',2210,1
46)
47
48select
49    Account
50,   AccountNumber
51,   Indentation
52from    source;
53

Using the following script:

1with source (Account,AccountNumber,Indentation) as
2(
3select 'INCOME STATEMENT',1000,0 union all
4select 'REVENUE',1100,0 union all
5select 'Revenue - Aircon',1110,1 union all
6select 'Revenue - Consumer Goods',1120,1 union all
7select 'Revenue - Spares',1130,1 union all
8select 'Revenue - Accessories',1140,1 union all
9select 'Revenue - Sub Stock',1150,1 union all
10select 'Revenue - Services',1160,1 union all
11select 'Revenue - Other',1170,1 union all
12select 'Revenue - Intercompany',1180,1 union all
13select 'Revenue - Delivery Charges',1400,1 union all
14select 'COST OF SALES',1500,0 union all
15select 'COGS - Aircon',1510,1 union all
16select 'COGS - Consumer Goods',1520,1 union all
17select 'COGS - Spares',1530,1 union all
18select 'COGS - Accessories',1540,1 union all
19select 'COGS - Sub Stock',1550,1 union all
20select 'COGS - Services',1560,1 union all
21select 'COGS - Other',1570,1 union all
22select 'COGS - Intercompany',1580,1 union all
23select 'COS - Sub Stock Stock Adjustments',1610,1 union all
24select 'COS - Sub Stock Repairs',1620,1 union all
25select 'COS - Consumables &amp; Packing Materials',1810,1 union all
26select 'COS - Freight &amp; Delivery',1820,1 union all
27select 'COS - Inventory Adj - Stock Count',1910,1 union all
28select 'COS - Inv. Adj - Stock Write up / Write down',1920,1 union all
29select 'COS - Provision for Obsolete Stock (IS)',1930,1 union all
30select 'COS - Inventory Adj - System A/c',1996,1 union all
31select 'COS - Purch &amp; Dir. Cost Appl A/c - System A/c',1997,1 union all
32select 'GROSS MARGIN',1999,0 union all
33select 'OTHER INCOME',2000,0 union all
34select 'Admin Fees Received',2100,1 union all
35select 'Bad Debt Recovered',2110,1 union all
36select 'Discount Received',2120,1 union all
37select 'Dividends Received',2130,1 union all
38select 'Fixed Assets - NBV on Disposal',2140,1 union all
39select 'Fixed Assets - Proceeds on Disposal',2145,1 union all
40select 'Rebates Received',2150,1 union all
41select 'Rental Income',2160,1 union all
42select 'Sundry Income',2170,1 union all
43select 'Warranty Income',2180,1 union all
44select 'INTEREST RECEIVED',2200,0 union all
45select 'Interest Received - Banks',2210,1
46)
47
48select
49    Account
50,   AccountNumber
51,   Indentation
52from    source;
53with s as (
54select
55    iif(Account like 'Total%',null,iif(Indentation=0,Account,null)) Header
56,   iif(Account like 'Total%',null,iif(Indentation=1,Account,null)) SubHeader1
57,   *
58from    Source
59)
60
61select
62    Header
63--, case lag(Header) over (order by [Account Number]) when Header then isnull(Header,lag(Header) over (order by [Account Number])) else Header end
64,   SubHeader1
65,   [Account Number]
66,   Indentation
67from    s
68

I'm able to split the columns like this:

I need to be able to report the Header Column to look like this:

I tried doing it using LAG(), but it doesn't work, how would I script this?

ANSWER

Answered 2021-Nov-12 at 12:59

This is one option. I created a Group for each header and then used it to grab the first in that group that had an Indention = 0. Tack this on to your source CTE:

1with source (Account,AccountNumber,Indentation) as
2(
3select 'INCOME STATEMENT',1000,0 union all
4select 'REVENUE',1100,0 union all
5select 'Revenue - Aircon',1110,1 union all
6select 'Revenue - Consumer Goods',1120,1 union all
7select 'Revenue - Spares',1130,1 union all
8select 'Revenue - Accessories',1140,1 union all
9select 'Revenue - Sub Stock',1150,1 union all
10select 'Revenue - Services',1160,1 union all
11select 'Revenue - Other',1170,1 union all
12select 'Revenue - Intercompany',1180,1 union all
13select 'Revenue - Delivery Charges',1400,1 union all
14select 'COST OF SALES',1500,0 union all
15select 'COGS - Aircon',1510,1 union all
16select 'COGS - Consumer Goods',1520,1 union all
17select 'COGS - Spares',1530,1 union all
18select 'COGS - Accessories',1540,1 union all
19select 'COGS - Sub Stock',1550,1 union all
20select 'COGS - Services',1560,1 union all
21select 'COGS - Other',1570,1 union all
22select 'COGS - Intercompany',1580,1 union all
23select 'COS - Sub Stock Stock Adjustments',1610,1 union all
24select 'COS - Sub Stock Repairs',1620,1 union all
25select 'COS - Consumables &amp; Packing Materials',1810,1 union all
26select 'COS - Freight &amp; Delivery',1820,1 union all
27select 'COS - Inventory Adj - Stock Count',1910,1 union all
28select 'COS - Inv. Adj - Stock Write up / Write down',1920,1 union all
29select 'COS - Provision for Obsolete Stock (IS)',1930,1 union all
30select 'COS - Inventory Adj - System A/c',1996,1 union all
31select 'COS - Purch &amp; Dir. Cost Appl A/c - System A/c',1997,1 union all
32select 'GROSS MARGIN',1999,0 union all
33select 'OTHER INCOME',2000,0 union all
34select 'Admin Fees Received',2100,1 union all
35select 'Bad Debt Recovered',2110,1 union all
36select 'Discount Received',2120,1 union all
37select 'Dividends Received',2130,1 union all
38select 'Fixed Assets - NBV on Disposal',2140,1 union all
39select 'Fixed Assets - Proceeds on Disposal',2145,1 union all
40select 'Rebates Received',2150,1 union all
41select 'Rental Income',2160,1 union all
42select 'Sundry Income',2170,1 union all
43select 'Warranty Income',2180,1 union all
44select 'INTEREST RECEIVED',2200,0 union all
45select 'Interest Received - Banks',2210,1
46)
47
48select
49    Account
50,   AccountNumber
51,   Indentation
52from    source;
53with s as (
54select
55    iif(Account like 'Total%',null,iif(Indentation=0,Account,null)) Header
56,   iif(Account like 'Total%',null,iif(Indentation=1,Account,null)) SubHeader1
57,   *
58from    Source
59)
60
61select
62    Header
63--, case lag(Header) over (order by [Account Number]) when Header then isnull(Header,lag(Header) over (order by [Account Number])) else Header end
64,   SubHeader1
65,   [Account Number]
66,   Indentation
67from    s
68,CTE2 AS
69(
70select
71    iif(Account like 'Total%',null,iif(Indentation=0,Account,null)) Header
72,   iif(Account like 'Total%',null,iif(Indentation=1,Account,null)) SubHeader1
73, SUM(CASE WHEN Indentation = 0 THEN 1 ELSE 0 END) OVER (ORDER BY AccountNUmber) H1
74,   *
75from    Source
76)
77
78SELECT T2.Header, t1.SubHeader1, t1.AccountNumber, t1.Indentation
79FROM CTE2 t1
80CROSS APPLY(SELECT MAX(t3.HEADER) HEADER FROM CTE2 T3 where t3.H1 = T1.H1 and T3.Indentation = 0 ) T2
81ORDER BY t1.AccountNumber
82

Source https://stackoverflow.com/questions/69942003

QUESTION

how to break up data in column value to multiple rows in pandas dataframe

Asked 2021-Oct-12 at 01:09

I have an issue where I have multiple rows in a csv file that have to be converted to a pandas data frame but there are some rows where the columns 'name' and 'business' have multiple names and businesses that should be in separate rows and need to be split up while keeping the data from the other columns the same for each row that is split.

Here is the example data:

input:

software	name	business
abc	Andrew Johnson, Steve Martin	Outsourcing/Offshoring, 201-500 employees,Health, Wellness and Fitness, 5001-10,000 employees
xyz	Jack Jones, Rick Paul, Johnny Jones	Banking, 1001-5000 employees,Construction, 51-200 employees,Consumer Goods, 10,001+ employees
def	Tom D., Connie J., Ricky B.	Unspecified, Unspecified, Self-employed

output I need:

software	name	business
abc	Andrew Johnson	Outsourcing/Offshoring, 201-500 employees
abc	Steve Martin	Health, Wellness and Fitness, 5001-10,000 employees
xyz	Jack Jones	Banking, 1001-5000 employees
xyz	Rick Paul	Construction, 51-200 employees
xyz	Johnny Jones	Consumer Goods, 10,001+ employees
def	Tom D	Unspecified
def	Connie J	Unspecified
def	Ricky B	Self-employed

There are additional columns similar to 'name' and 'business' that contain multiple pieces of information that need to be split up just like 'name' and 'business'. Cells that contain multiple pieces of information are in sequence (ordered).

Here's the code I have so far and creates new rows but it only splits up the contents in name column, but that leaves the business column and a few other columns left over that need to be split up along with the contents from the name column.

1name2 = df.name.str.split(',', expand=True).stack()
2df = df.join(pd.Series(index=name2.index.droplevel(1), data=name2.values, name = 'name2'))
3
4dict = df.to_dict('record')
5for row in dict:
6    new_segment = {}
7    new_segment['name'] = str(row['name2'])
8    #df['name'] = str(row['name2'])
9
10    for col,content in new_segment.items():
11            row[col] = content
12
13df = pd.DataFrame.from_dict(dict)
14
15df = df.drop('name2', 1)
16

Here's an alternative solution I was trying as well but it gives me an error too:

1name2 = df.name.str.split(',', expand=True).stack()
2df = df.join(pd.Series(index=name2.index.droplevel(1), data=name2.values, name = 'name2'))
3
4dict = df.to_dict('record')
5for row in dict:
6    new_segment = {}
7    new_segment['name'] = str(row['name2'])
8    #df['name'] = str(row['name2'])
9
10    for col,content in new_segment.items():
11            row[col] = content
12
13df = pd.DataFrame.from_dict(dict)
14
15df = df.drop('name2', 1)
16review_path = r'data/base_data'
17review_files = glob.glob(review_path + &quot;/test_data.csv&quot;)
18
19review_df_list = []
20for review_file in review_files:
21    df = pd.read_csv(io.StringIO(review_file), sep = '\t')
22    print(df.head())
23    df[&quot;business&quot;] = (df[&quot;business&quot;].str.extractall(r&quot;(?:[\s,]*)(.*?(?:Unspecified|employees|Self-employed))&quot;).groupby(level=0).agg(list))
24    df[&quot;name&quot;] = df[&quot;name&quot;].str.split(r&quot;\s*,\s*&quot;)
25    print(df.explode([&quot;name&quot;, &quot;business&quot;]))
26    outPutPath = Path('data/base_data/test_data.csv')
27    df.to_csv(outPutPath, index=False)
28

Error Message for alternative solution:

Read:data/base_data/review_base.csv

Success!

Empty DataFrame

Columns: [data/base_data/test_data.csv]

Index: []

ANSWER

Answered 2021-Oct-10 at 23:11

Try:

1name2 = df.name.str.split(',', expand=True).stack()
2df = df.join(pd.Series(index=name2.index.droplevel(1), data=name2.values, name = 'name2'))
3
4dict = df.to_dict('record')
5for row in dict:
6    new_segment = {}
7    new_segment['name'] = str(row['name2'])
8    #df['name'] = str(row['name2'])
9
10    for col,content in new_segment.items():
11            row[col] = content
12
13df = pd.DataFrame.from_dict(dict)
14
15df = df.drop('name2', 1)
16review_path = r'data/base_data'
17review_files = glob.glob(review_path + &quot;/test_data.csv&quot;)
18
19review_df_list = []
20for review_file in review_files:
21    df = pd.read_csv(io.StringIO(review_file), sep = '\t')
22    print(df.head())
23    df[&quot;business&quot;] = (df[&quot;business&quot;].str.extractall(r&quot;(?:[\s,]*)(.*?(?:Unspecified|employees|Self-employed))&quot;).groupby(level=0).agg(list))
24    df[&quot;name&quot;] = df[&quot;name&quot;].str.split(r&quot;\s*,\s*&quot;)
25    print(df.explode([&quot;name&quot;, &quot;business&quot;]))
26    outPutPath = Path('data/base_data/test_data.csv')
27    df.to_csv(outPutPath, index=False)
28df[&quot;business&quot;] = (
29    df[&quot;business&quot;]
30    .str.extractall(r&quot;(?:[\s,]*)(.*?(?:Unspecified|employees|Self-employed))&quot;)
31    .groupby(level=0)
32    .agg(list)
33)
34df[&quot;name&quot;] = df[&quot;name&quot;].str.split(r&quot;\s*,\s*&quot;)
35
36print(df.explode([&quot;name&quot;, &quot;business&quot;]))
37

Prints:

1name2 = df.name.str.split(',', expand=True).stack()
2df = df.join(pd.Series(index=name2.index.droplevel(1), data=name2.values, name = 'name2'))
3
4dict = df.to_dict('record')
5for row in dict:
6    new_segment = {}
7    new_segment['name'] = str(row['name2'])
8    #df['name'] = str(row['name2'])
9
10    for col,content in new_segment.items():
11            row[col] = content
12
13df = pd.DataFrame.from_dict(dict)
14
15df = df.drop('name2', 1)
16review_path = r'data/base_data'
17review_files = glob.glob(review_path + &quot;/test_data.csv&quot;)
18
19review_df_list = []
20for review_file in review_files:
21    df = pd.read_csv(io.StringIO(review_file), sep = '\t')
22    print(df.head())
23    df[&quot;business&quot;] = (df[&quot;business&quot;].str.extractall(r&quot;(?:[\s,]*)(.*?(?:Unspecified|employees|Self-employed))&quot;).groupby(level=0).agg(list))
24    df[&quot;name&quot;] = df[&quot;name&quot;].str.split(r&quot;\s*,\s*&quot;)
25    print(df.explode([&quot;name&quot;, &quot;business&quot;]))
26    outPutPath = Path('data/base_data/test_data.csv')
27    df.to_csv(outPutPath, index=False)
28df[&quot;business&quot;] = (
29    df[&quot;business&quot;]
30    .str.extractall(r&quot;(?:[\s,]*)(.*?(?:Unspecified|employees|Self-employed))&quot;)
31    .groupby(level=0)
32    .agg(list)
33)
34df[&quot;name&quot;] = df[&quot;name&quot;].str.split(r&quot;\s*,\s*&quot;)
35
36print(df.explode([&quot;name&quot;, &quot;business&quot;]))
37  software            name                                             business
380      abc  Andrew Johnson            Outsourcing/Offshoring, 201-500 employees
390      abc    Steve Martin  Health, Wellness and Fitness, 5001-10,000 employees
401      xyz      Jack Jones                         Banking, 1001-5000 employees
411      xyz       Rick Paul                       Construction, 51-200 employees
421      xyz    Johnny Jones                    Consumer Goods, 10,001+ employees
432      def          Tom D.                                          Unspecified
442      def       Connie J.                                          Unspecified
452      def        Ricky B.                                        Self-employed
46

Source https://stackoverflow.com/questions/69519578

QUESTION

Python Sorting through nested dictionary

Asked 2020-May-29 at 15:30

From multiple tables I'm getting values of the form: (Sector, Stock, InvestedValue) I'm using python dictionary object to insert these values and during insert if the combination (sector, stock) exists add the InvestedValue to the existing entry else add new entry to dictionary. After all data insert, let's say I end up with nested dictionary like:

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6

How to print through this nested dictionary in sorted fashion:

Get combination of Sector,Company,InvestedValue sorted on the basis of InvestedValue
Get combination of Sector, sum(InvestedValue) ie. sum of InvestedValue of each company in that sector again sorted on the sum

My current approach to solve these problems is to flatten the nested dictionary to list of tupples and run sorted on it. For example:

To solve 1:

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11

To solve 2:

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11sector_list = []
12for sector in stock_dict:
13    sector_list.append((sector,sum(stock_dict[sector].values())))
14sorted_sector_list = sorted(sector_list, key=lambda sector: sector[1], reverse=True)
15

IS there a better approach i.e. sort directly on the nested dictionary without having to flatten it into the list ?

ANSWER

Answered 2020-May-29 at 15:30

If you uses pandas, it can convert you dict to a dataframe. You can then unstack the dataframe which will make it long form instead of wide. Once it's in the long format, you will have null values for any sector which a company does not participate in, you can drop those with dropna(), reset the index and sort values by your desired column. Once this is done rename your columns to the desired names, and select them in the order you want. You can take that same df and groupby sector and sum the InvestedValue.

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11sector_list = []
12for sector in stock_dict:
13    sector_list.append((sector,sum(stock_dict[sector].values())))
14sorted_sector_list = sorted(sector_list, key=lambda sector: sector[1], reverse=True)
15import pandas as pd
16stock_dict = {
17    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
18    "Automobile": {"Tata Motors": 135.67},
19    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
20}
21
22df = pd.DataFrame.from_dict(stock_dict, orient='index')
23df = df.unstack().dropna().reset_index(name='InvestedValue').sort_values(by='InvestedValue', ascending=False)
24
25df.columns = ['Company','Sector','InvestedValue']
26df[['Sector','Company','InvestedValue']]
27

Output

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11sector_list = []
12for sector in stock_dict:
13    sector_list.append((sector,sum(stock_dict[sector].values())))
14sorted_sector_list = sorted(sector_list, key=lambda sector: sector[1], reverse=True)
15import pandas as pd
16stock_dict = {
17    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
18    "Automobile": {"Tata Motors": 135.67},
19    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
20}
21
22df = pd.DataFrame.from_dict(stock_dict, orient='index')
23df = df.unstack().dropna().reset_index(name='InvestedValue').sort_values(by='InvestedValue', ascending=False)
24
25df.columns = ['Company','Sector','InvestedValue']
26df[['Sector','Company','InvestedValue']]
27           Sector             Company  InvestedValue
280       Financial           HDFC Bank         230.25
294  Consumer Goods  Avenue Supermarket         190.45
303      Automobile         Tata Motors         135.67
315  Consumer Goods   Godrej Industries         120.32
322       Financial          ICICI Bank         110.45
331       Financial           Axis Bank          70.15
34

2nd Part

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11sector_list = []
12for sector in stock_dict:
13    sector_list.append((sector,sum(stock_dict[sector].values())))
14sorted_sector_list = sorted(sector_list, key=lambda sector: sector[1], reverse=True)
15import pandas as pd
16stock_dict = {
17    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
18    "Automobile": {"Tata Motors": 135.67},
19    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
20}
21
22df = pd.DataFrame.from_dict(stock_dict, orient='index')
23df = df.unstack().dropna().reset_index(name='InvestedValue').sort_values(by='InvestedValue', ascending=False)
24
25df.columns = ['Company','Sector','InvestedValue']
26df[['Sector','Company','InvestedValue']]
27           Sector             Company  InvestedValue
280       Financial           HDFC Bank         230.25
294  Consumer Goods  Avenue Supermarket         190.45
303      Automobile         Tata Motors         135.67
315  Consumer Goods   Godrej Industries         120.32
322       Financial          ICICI Bank         110.45
331       Financial           Axis Bank          70.15
34df.groupby('Sector')['InvestedValue'].sum().reset_index().sort_values(by='InvestedValue', ascending=False)
35

Output

1stock_dict = {
2    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
3    "Automobile": {"Tata Motors": 135.67},
4    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
5}
6stock_list = []
7for sector in stock_dict:
8    for stock in stock_dict[sector]:
9        stock_list.append((sector, stock, stock_dict[sector][stock]))
10sorted_list = sorted(stock_list, key=lambda stock: stock[2], reverse=True)
11sector_list = []
12for sector in stock_dict:
13    sector_list.append((sector,sum(stock_dict[sector].values())))
14sorted_sector_list = sorted(sector_list, key=lambda sector: sector[1], reverse=True)
15import pandas as pd
16stock_dict = {
17    "Financial": {"HDFC Bank": 230.25, "Axis Bank": 70.15, "ICICI Bank": 110.45},
18    "Automobile": {"Tata Motors": 135.67},
19    "Consumer Goods": {"Avenue Supermarket": 190.45, "Godrej Industries": 120.32}
20}
21
22df = pd.DataFrame.from_dict(stock_dict, orient='index')
23df = df.unstack().dropna().reset_index(name='InvestedValue').sort_values(by='InvestedValue', ascending=False)
24
25df.columns = ['Company','Sector','InvestedValue']
26df[['Sector','Company','InvestedValue']]
27           Sector             Company  InvestedValue
280       Financial           HDFC Bank         230.25
294  Consumer Goods  Avenue Supermarket         190.45
303      Automobile         Tata Motors         135.67
315  Consumer Goods   Godrej Industries         120.32
322       Financial          ICICI Bank         110.45
331       Financial           Axis Bank          70.15
34df.groupby('Sector')['InvestedValue'].sum().reset_index().sort_values(by='InvestedValue', ascending=False)
35           Sector  InvestedValue
362       Financial         410.85
371  Consumer Goods         310.77
380      Automobile         135.67
39

Source https://stackoverflow.com/questions/62089172

QUESTION

R: How can I lower the amount of levels in a dataframe?

Asked 2020-Apr-30 at 18:31

I have a column in a dataframe with over 40 levels, I want to make it 4 levels. The important variables are "ecommerce", "technology", and "consumer goods", everything else I want to fall under "other". How can I make it into 4 levels?

ANSWER

Answered 2020-Apr-30 at 02:33

We can use %in% to check :

1df$column_name &lt;- as.character(df$column_name)
2df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] &lt;- 'Other'
3

If you want to keep the column as factors :

1df$column_name &lt;- as.character(df$column_name)
2df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] &lt;- 'Other'
3levels(df$column_name) &lt;- c(levels(df$column_name), 'Other')
4df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] &lt;- 'Other'
5

Source https://stackoverflow.com/questions/61514922

QUESTION

R: How can I make a column with lots of categorical levels binary?

Asked 2020-Apr-29 at 22:29

I would like to have "ecommerce", "consumer goods" and "technology" equal 1, and all other industries equal 0. There are a ton of levels within the Industry.Vertical column, how can I make it binary?

A little piece of the dataset:

1*Industry.Vertical*      *City..Location*
2technology                Andheri
3healthcare                Mumbai
4luxury label              Mumbai
5technology                Chembur
6ecommerce                 Bengaluru
7food &amp; beverages          New Delhi
8ecommerce                 Gurgaon
9finance                   Bengaluru
10finance                   New Delhi
11waste management service  Hyderabad
12technology                Bengaluru
13agriculture               Nairobi
14energy                    New Delhi
15

ANSWER

Answered 2020-Apr-29 at 04:40

We can use %in% to check if Industry column has any of c("ecommerce", "consumer goods", "technology") values and turn them to integers.

1*Industry.Vertical*      *City..Location*
2technology                Andheri
3healthcare                Mumbai
4luxury label              Mumbai
5technology                Chembur
6ecommerce                 Bengaluru
7food &amp; beverages          New Delhi
8ecommerce                 Gurgaon
9finance                   Bengaluru
10finance                   New Delhi
11waste management service  Hyderabad
12technology                Bengaluru
13agriculture               Nairobi
14energy                    New Delhi
15df$new_col &lt;- as.integer(df$Industry %in% c("ecommerce", "consumer goods", "technology"))
16

which is faster than using ifelse :

1*Industry.Vertical*      *City..Location*
2technology                Andheri
3healthcare                Mumbai
4luxury label              Mumbai
5technology                Chembur
6ecommerce                 Bengaluru
7food &amp; beverages          New Delhi
8ecommerce                 Gurgaon
9finance                   Bengaluru
10finance                   New Delhi
11waste management service  Hyderabad
12technology                Bengaluru
13agriculture               Nairobi
14energy                    New Delhi
15df$new_col &lt;- as.integer(df$Industry %in% c("ecommerce", "consumer goods", "technology"))
16df$new_col &lt;- ifelse(df$Industry %in% c("ecommerce", "consumer goods", "technology"), 1, 0)
17