kandi background
Explore Kits

falcon | Mirror of Apache Falcon

 by   apache Java Version: Current License: Apache-2.0

 by   apache Java Version: Current License: Apache-2.0

Download this library from

kandi X-RAY | falcon Summary

falcon is a Java library typically used in Big Data, Kafka, Spark, Hadoop applications. falcon has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can download it from GitHub.
Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • falcon has a highly active ecosystem.
  • It has 95 star(s) with 111 fork(s). There are 25 watchers for this library.
  • It had no major release in the last 12 months.
  • falcon has no issues reported. There are 14 open pull requests and 0 closed requests.
  • It has a positive sentiment in the developer community.
  • The latest version of falcon is current.
falcon Support
Best in #Java
Average in #Java
falcon Support
Best in #Java
Average in #Java

quality kandi Quality

  • falcon has 0 bugs and 0 code smells.
falcon Quality
Best in #Java
Average in #Java
falcon Quality
Best in #Java
Average in #Java

securitySecurity

  • falcon has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • falcon code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
falcon Security
Best in #Java
Average in #Java
falcon Security
Best in #Java
Average in #Java

license License

  • falcon is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
falcon License
Best in #Java
Average in #Java
falcon License
Best in #Java
Average in #Java

buildReuse

  • falcon releases are not available. You will need to build from source code and install.
  • Build file is available. You can build the component from source.
  • falcon saves you 155733 person hours of effort in developing the same functionality from scratch.
  • It has 160373 lines of code, 9231 functions and 1439 files.
  • It has high code complexity. Code complexity directly impacts maintainability of the code.
falcon Reuse
Best in #Java
Average in #Java
falcon Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed falcon and discovered the below as its top functions. This is intended to give you an instant insight into falcon implemented functionality, and help decide if they suit your requirements.

  • Executes entity command .
    • Gets additional properties .
      • Check the properties for the process .
        • Partitions the given database into partitions .
          • Get command line options .
            • Executes an instance command .
              • Handles extension command .
                • Create process from summary box
                  • Get the cluster summary
                    • returns a set of date times for the consumer process

                      Get all kandi verified functions for this library.

                      Get all kandi verified functions for this library.

                      falcon Key Features

                      Mirror of Apache Falcon

                      falcon Examples and Code Snippets

                      See all related Code Snippets

                      Formatting Phone number with +1 with pandas.Series.replace

                      copy iconCopydownload iconDownload
                      df['Contact phone number'] = '+' + df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
                      
                      >>> df
                             Company phone number Contact phone number  num_specimen_seen
                      falcon      +1-541-296-2271       +1-511-296-227                 10
                      dog         +1-542-296-2271                  NaN                  2
                      cat         +1-543-296-2271       +1-531-296-227                  3
                      
                      df['Contact phone number'] = '+' + df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
                      
                      >>> df
                             Company phone number Contact phone number  num_specimen_seen
                      falcon      +1-541-296-2271       +1-511-296-227                 10
                      dog         +1-542-296-2271                  NaN                  2
                      cat         +1-543-296-2271       +1-531-296-227                  3
                      
                       df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d+)$', r'+1-\1-\2-\3-\4', regex=True)
                      
                      >>> df['Contact phone number']
                      falcon    +1-1-511-296-2271
                      dog                    None
                      cat       +1-1-531-296-2271
                      
                       df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d+)$', r'+1-\1-\2-\3-\4', regex=True)
                      
                      >>> df['Contact phone number']
                      falcon    +1-1-511-296-2271
                      dog                    None
                      cat       +1-1-531-296-2271
                      

                      Why does np.select not allow me to put in index above total length into choicelist?

                      copy iconCopydownload iconDownload
                      df['Emails'] = df['Emails'].explode().groupby(level=0).first()
                      
                      >>> df
                                     Emails  num_wings  num_specimen_seen
                      falcon    j@gmail.com          2                 10
                      dog     jzp@gmail.com          0                  2
                      
                      df['Emails'] = df['Emails'].explode().groupby(level=0).first()
                      
                      >>> df
                                     Emails  num_wings  num_specimen_seen
                      falcon    j@gmail.com          2                 10
                      dog     jzp@gmail.com          0                  2
                      

                      Mapping complex JSON to Pandas Dataframe

                      copy iconCopydownload iconDownload
                      def process_json(api_response): 
                          
                          def get_column_values(df):    
                              return pd.concat([df, pd.json_normalize(df.pop('columns')).set_axis(df.index)], axis=1)
                          
                          def expand_children(df):
                              if len(df.index) > 1:
                                  df['children'] = df['children'].fillna('').apply(lambda x: None if len(x) == 0 else x)
                              df_children = df.pop('children').dropna().explode()
                              if len(df_children.index) == 0: # return df if no children to append
                                  return df.index.names, df
                              df_children = pd.json_normalize(df_children, max_level=0).set_axis(df_children.index).set_index('name', append=True)
                              df_children = get_column_values(df_children)
                              idx_names = list(df_children.index.names)
                              idx_names[-1] = idx_names[-1] + '_' + str(len(idx_names))
                              df[idx_names[-1]] = None
                              return idx_names, pd.concat([df.set_index(idx_names[-1], append=True), df_children], axis=0)    
                          
                          columns_dict = pd.DataFrame(api_response['meta']['columns']).set_index('key').to_dict(orient='index') # save column definitions
                          df = pd.DataFrame(api_response['data']['attributes']['total']['children']).set_index('name') # get initial dataframe     
                          df = get_column_values(df) # get columns for initial level
                          
                          # expand children
                          while 'children' in df.columns:
                              idx_names, df = expand_children(df)
                          
                          # reorder/replace column headers and sort index
                          df = (df.loc[:, [x for x in df.columns if x not in columns_dict.keys()] + list(columns_dict.keys())]
                                .rename(columns={k:v['display_name'] for k,v in columns_dict.items()})
                                .sort_index(na_position='first').reset_index())
                          
                          #collapse "name" columns (careful of potential duplicate rows)  
                          for col in idx_names[::-1]:
                                  df[idx_names[-1]] = df[idx_names[-1]].fillna(df[col])
                          df = df.rename(columns={'name': 'portfolio', idx_names[-1]: 'name'}).drop(columns=idx_names[1:-1])      
                          
                          return df
                      
                      process_json(api_response)
                      54.2 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                      
                      unpack_response(api_response) # iterrows
                      84.3 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                      
                      def process_json(api_response): 
                          
                          def get_column_values(df):    
                              return pd.concat([df, pd.json_normalize(df.pop('columns')).set_axis(df.index)], axis=1)
                          
                          def expand_children(df):
                              if len(df.index) > 1:
                                  df['children'] = df['children'].fillna('').apply(lambda x: None if len(x) == 0 else x)
                              df_children = df.pop('children').dropna().explode()
                              if len(df_children.index) == 0: # return df if no children to append
                                  return df.index.names, df
                              df_children = pd.json_normalize(df_children, max_level=0).set_axis(df_children.index).set_index('name', append=True)
                              df_children = get_column_values(df_children)
                              idx_names = list(df_children.index.names)
                              idx_names[-1] = idx_names[-1] + '_' + str(len(idx_names))
                              df[idx_names[-1]] = None
                              return idx_names, pd.concat([df.set_index(idx_names[-1], append=True), df_children], axis=0)    
                          
                          columns_dict = pd.DataFrame(api_response['meta']['columns']).set_index('key').to_dict(orient='index') # save column definitions
                          df = pd.DataFrame(api_response['data']['attributes']['total']['children']).set_index('name') # get initial dataframe     
                          df = get_column_values(df) # get columns for initial level
                          
                          # expand children
                          while 'children' in df.columns:
                              idx_names, df = expand_children(df)
                          
                          # reorder/replace column headers and sort index
                          df = (df.loc[:, [x for x in df.columns if x not in columns_dict.keys()] + list(columns_dict.keys())]
                                .rename(columns={k:v['display_name'] for k,v in columns_dict.items()})
                                .sort_index(na_position='first').reset_index())
                          
                          #collapse "name" columns (careful of potential duplicate rows)  
                          for col in idx_names[::-1]:
                                  df[idx_names[-1]] = df[idx_names[-1]].fillna(df[col])
                          df = df.rename(columns={'name': 'portfolio', idx_names[-1]: 'name'}).drop(columns=idx_names[1:-1])      
                          
                          return df
                      
                      process_json(api_response)
                      54.2 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                      
                      unpack_response(api_response) # iterrows
                      84.3 ms ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      pip install --upgrade jsonpath-ng
                      
                      import json
                      import jsonpath_ng as jp
                      import pandas as pd
                      
                      def unpack_response(r):
                          # Create a dataframe from extracted data
                          expr = jp.parse('$..children.[*]')
                          data = [{'full_path': str(m.full_path), **m.value} for m in expr.find(r)]
                          df = pd.json_normalize(data).sort_values('full_path', ignore_index=True)
                      
                          # Append a portfolio column
                          df['portfolio'] = df.loc[df.full_path.str.contains(r'total\.children\.\[\d+]$'), 'name']
                          df['portfolio'].fillna(method='ffill', inplace=True)
                      
                          # Deal with columns
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['full_path', 'portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols]
                      
                          return df
                      
                      # Load the sample data from file
                      # with open('api_response_2022-02-13.json', 'r') as f:
                      #     api_response = json.load(f)
                      
                      # Load the sample data from string
                      api_response = json.loads('{"meta": {"columns": [{"key": "value", "display_name": "Adjusted Value (No Div, USD)", "output_type": "Number", "currency": "USD"}, {"key": "time_weighted_return", "display_name": "Current Quarter TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "time_weighted_return_2", "display_name": "YTD TWR (USD)", "output_type": "Percent", "currency": "USD"}, {"key": "_custom_twr_audit_note_911328", "display_name": "TWR Audit Note", "output_type": "Word"}], "groupings": [{"key": "_custom_name_747205", "display_name": "* Reporting Client Name"}, {"key": "_custom_new_entity_group_453577", "display_name": "NEW Entity Group"}, {"key": "_custom_level_2_624287", "display_name": "* Level 2"}, {"key": "legal_entity", "display_name": "Legal Entity"}]}, "data": {"type": "portfolio_views", "attributes": {"total": {"name": "Total", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Falconer Family", "grouping": "_custom_name_747205", "columns": {"time_weighted_return": -0.046732301295604683, "time_weighted_return_2": -0.046732301295604683, "_custom_twr_audit_note_911328": null, "value": 23132492.905107163}, "children": [{"name": "Wealth Bucket A", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.045960317420568164, "time_weighted_return_2": -0.045960317420568164, "_custom_twr_audit_note_911328": null, "value": 13264448.506587159}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": [{"entity_id": 10604454, "name": "HUDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": 3.434094574039648e-06, "time_weighted_return_2": 3.434094574039648e-06, "_custom_twr_audit_note_911328": null, "value": 3337.99}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": [{"entity_id": 10604454, "name": "HUDG Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.025871339096964152, "time_weighted_return_2": -0.025871339096964152, "_custom_twr_audit_note_911328": null, "value": 1017004.7192636987}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": [{"entity_id": 10604454, "name": "HKDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370376329670656, "time_weighted_return_2": -0.030370376329670656, "_custom_twr_audit_note_911328": null, "value": 231142.67772000004}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": [{"entity_id": 10604454, "name": "HUDW Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05382756475465478, "time_weighted_return_2": -0.05382756475465478, "_custom_twr_audit_note_911328": null, "value": 9791282.570000006}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": [{"entity_id": 10604454, "name": "HJDJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01351630404081805, "time_weighted_return_2": -0.01351630404081805, "_custom_twr_audit_note_911328": null, "value": 2153366.6396034593}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": [{"entity_id": 10604454, "name": "HADJ Trust", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.002298190175237247, "time_weighted_return_2": -0.002298190175237247, "_custom_twr_audit_note_911328": null, "value": 68313.90999999999}, "children": []}]}]}, {"name": "Wealth Bucket B", "grouping": "_custom_new_entity_group_453577", "columns": {"time_weighted_return": -0.04769870075659244, "time_weighted_return_2": -0.04769870075659244, "_custom_twr_audit_note_911328": null, "value": 9868044.398519998}, "children": [{"name": "Asset Class A", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": 2.8632718065191298e-05, "time_weighted_return_2": 2.8632718065191298e-05, "_custom_twr_audit_note_911328": null, "value": 10234.94}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 2.82679297198829e-05, "time_weighted_return_2": 2.82679297198829e-05, "_custom_twr_audit_note_911328": null, "value": 244.28}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 4.9373572795108345e-05, "time_weighted_return_2": 4.9373572795108345e-05, "_custom_twr_audit_note_911328": null, "value": 5081.08}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.609603754315074e-06, "time_weighted_return_2": 6.609603754315074e-06, "_custom_twr_audit_note_911328": null, "value": 1523.62}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": 1.0999769004760296e-05, "time_weighted_return_2": 1.0999769004760296e-05, "_custom_twr_audit_note_911328": null, "value": 1828.9}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": 6.466673995619843e-06, "time_weighted_return_2": 6.466673995619843e-06, "_custom_twr_audit_note_911328": null, "value": 1557.06}, "children": []}]}, {"name": "Asset Class B", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.024645947842438676, "time_weighted_return_2": -0.024645947842438676, "_custom_twr_audit_note_911328": null, "value": 674052.31962}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.043304004172576405, "time_weighted_return_2": -0.043304004172576405, "_custom_twr_audit_note_911328": null, "value": 52800.96}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.022408434778798836, "time_weighted_return_2": -0.022408434778798836, "_custom_twr_audit_note_911328": null, "value": 599594.11962}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.039799855483646174, "time_weighted_return_2": -0.039799855483646174, "_custom_twr_audit_note_911328": null, "value": 7219.08}, "children": []}]}, {"name": "Asset Class C", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.03037038746301135, "time_weighted_return_2": -0.03037038746301135, "_custom_twr_audit_note_911328": null, "value": 114472.69744}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.030370390035505124, "time_weighted_return_2": -0.030370390035505124, "_custom_twr_audit_note_911328": null, "value": 114472.68744000001}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": 0, "time_weighted_return_2": 0, "_custom_twr_audit_note_911328": null, "value": 0.01}, "children": []}]}, {"name": "Asset Class D", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.06604362523792162, "time_weighted_return_2": -0.06604362523792162, "_custom_twr_audit_note_911328": null, "value": 5722529.229999997}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06154960593668424, "time_weighted_return_2": -0.06154960593668424, "_custom_twr_audit_note_911328": null, "value": 1191838.9399999995}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.06750460387418267, "time_weighted_return_2": -0.06750460387418267, "_custom_twr_audit_note_911328": null, "value": 4416618.520000002}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 38190.33}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.05604507809250081, "time_weighted_return_2": -0.05604507809250081, "_custom_twr_audit_note_911328": null, "value": 37940.72}, "children": []}]}, {"name": "Asset Class E", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.017118805423322003, "time_weighted_return_2": -0.017118805423322003, "_custom_twr_audit_note_911328": null, "value": 3148495.0914600003}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.015251157805867277, "time_weighted_return_2": -0.015251157805867277, "_custom_twr_audit_note_911328": null, "value": 800493.06146}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.01739609576880241, "time_weighted_return_2": -0.01739609576880241, "_custom_twr_audit_note_911328": null, "value": 2215511.2700000005}, "children": []}, {"entity_id": 10598341, "name": "Cht 11th Tr HBO Shirley", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02085132265594647, "time_weighted_return_2": -0.02085132265594647, "_custom_twr_audit_note_911328": null, "value": 44031.21}, "children": []}, {"entity_id": 10598337, "name": "Cht 11th Tr HBO Hannah", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.02089393244695803, "time_weighted_return_2": -0.02089393244695803, "_custom_twr_audit_note_911328": null, "value": 44394.159999999996}, "children": []}, {"entity_id": 10598334, "name": "Cht 11th Tr HBO Lau", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.020607507059866248, "time_weighted_return_2": -0.020607507059866248, "_custom_twr_audit_note_911328": null, "value": 44065.39000000001}, "children": []}]}, {"name": "Asset Class F", "grouping": "_custom_level_2_624287", "columns": {"time_weighted_return": -0.0014710489231547497, "time_weighted_return_2": -0.0014710489231547497, "_custom_twr_audit_note_911328": null, "value": 198260.12}, "children": [{"entity_id": 10868778, "name": "2012 Desc Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.0014477244560456848, "time_weighted_return_2": -0.0014477244560456848, "_custom_twr_audit_note_911328": null, "value": 44612.33}, "children": []}, {"entity_id": 10643052, "name": "2013 Irrev Tr HBO Thalia", "grouping": "legal_entity", "columns": {"time_weighted_return": -0.001477821083437858, "time_weighted_return_2": -0.001477821083437858, "_custom_twr_audit_note_911328": null, "value": 153647.78999999998}, "children": []}]}]}]}]}}, "included": []}}')
                      
                      df = unpack_response(api_response)
                      
                      print(df.iloc[:5:,1:])
                      
                      print(df.iloc[:10,:3])
                      
                      expr = jp.parse('$..children.[*]')
                      
                      def unpack_response(r):
                          df = pd.DataFrame()
                          for _, r1 in pd.json_normalize(r, ['data', 'attributes', 'total', 'children']).iterrows(): 
                              r1['portfolio'] = r1['name']
                              df = df.append(r1)
                              for _, r2 in pd.json_normalize(r1.children).iterrows(): 
                                  df = df.append(r2)
                                  for _, r3 in pd.json_normalize(r2.children).iterrows(): 
                                      df = df.append(r3).append(pd.json_normalize(r3.children))
                          df['portfolio'].fillna(method='ffill', inplace=True)
                          trans = {'columns.' + c['key']: c['display_name'] for c in r['meta']['columns']}
                          cols = ['portfolio', 'name', 'entity_id', 'Adjusted Value (No Div, USD)', 'Current Quarter TWR (USD)', 'YTD TWR (USD)', 'TWR Audit Note']
                          df = df.rename(columns=trans)[cols].reset_index(drop=True)
                          return df
                      
                      import pandas as pd
                      import json
                      
                      # get the json file
                      json_dict = json.load(open('api_response_2022-02-13.json'))
                          
                      
                      # create a (nested) df out of it, and rename the 'top-level' name field to 'portfolio'
                      packed_df = pd.DataFrame.from_dict(json_dict['data']['attributes']['total']['children'])\
                                  .rename(columns={'name': 'portfolio'})
                      
                      # expand the level-1 'children' (and call their 'name' field 'grand-parent')
                      unpacked_df = packed_df.groupby('portfolio')['children']\
                                    .apply(lambda x: pd.DataFrame(x.values[0])).reset_index()\
                                    .rename(columns={'name': 'grand_parent_name'})
                      
                      # expand the level-2 'children' (and call their 'name' field 'parent')
                      unpacked_df = unpacked_df.groupby(['portfolio', 'grand_parent_name'])['children']\
                                    .apply(lambda x: pd.DataFrame(x.values[0])).reset_index()\
                                    .rename(columns={'name': 'parent_name'})
                      
                      # expand the level-3 'children' (and keep their name as is)
                      unpacked_df = unpacked_df.groupby(['portfolio', 'grand_parent_name', 'parent_name'])['children']\
                                    .apply(lambda x: pd.DataFrame(x.values[0])).reset_index()
                      
                      # expand the column field info from 'dict' to multiple columns
                      unpacked_df = pd.concat([unpacked_df.drop('columns', axis=1), pd.DataFrame(unpacked_df['columns'].tolist())], axis=1)
                      

                      Using a list of models to make predictions over a list of results using lapply in R

                      copy iconCopydownload iconDownload
                      pred1 <- lapply(1:length(model_list), 
                                      function(x) predict(model_list[[x]], 
                                                          newdata = result[[x]], 
                                                          type = "response"))
                      

                      How can I destructure my array of multiple objects

                      copy iconCopydownload iconDownload
                      {
                          orders?.map((order, idx) =>
                              Object.entries(order).map(
                                  ([key, { description, title, price, cartQuantity }]) => {
                                      if (key !== "_id") {
                                          return (
                                              <tr>
                                                  <td>{key}</td>
                                                  <td>{title}</td>
                                                  <td>{price}</td>
                                                  <td>{cartQuantity}</td>
                                                  <td>{description}</td>
                                                  <td>email</td>
                                                  <td>address</td>
                                              </tr>
                                          );
                                      }
                                  }
                              )
                          );
                      }
                      
                      {orders?.["email"]}
                      {orders?.["_id"]}
                      
                      {
                          orders?.map((order, idx) =>
                              Object.entries(order).map(
                                  ([key, { description, title, price, cartQuantity }]) => {
                                      if (key !== "_id") {
                                          return (
                                              <tr>
                                                  <td>{key}</td>
                                                  <td>{title}</td>
                                                  <td>{price}</td>
                                                  <td>{cartQuantity}</td>
                                                  <td>{description}</td>
                                                  <td>email</td>
                                                  <td>address</td>
                                              </tr>
                                          );
                                      }
                                  }
                              )
                          );
                      }
                      
                      {orders?.["email"]}
                      {orders?.["_id"]}
                      
                          useEffect(() => {
                          fetch(`https://polar-savannah-40370.herokuapp.com/dashboard/orders`)
                              .then(res => res.json())
                              .then(data => setOrders(data))
                      }, [user.email])
                      

                      How to group rows in pandas without groupby?

                      copy iconCopydownload iconDownload
                      classes = df['class'].unique()
                      
                      data = {cls: df['class'] == cls for cls in classes}
                      
                      data = {cls: df['class'] == cls for cls in df['class'].unique()}
                      
                      classes = df['class'].unique()
                      
                      data = {cls: df['class'] == cls for cls in classes}
                      
                      data = {cls: df['class'] == cls for cls in df['class'].unique()}
                      
                      classes = df['class'].unique()
                      
                      data = {cls: df['class'] == cls for cls in classes}
                      
                      data = {cls: df['class'] == cls for cls in df['class'].unique()}
                      
                      birds = []
                      mammal = []
                      for i, (columnclass, _, _) in df.iterrows():
                          if columnclass == "bird":
                              birds.append(i)
                          else:
                              mammal.append(i)
                      print(birds)
                      print(mammal)
                      
                      birds = []
                      mammal = []
                      for i, columnclass in df.iterrows():
                        if columnclass['class'] == 'bird':
                          birds.append(i)
                        else:
                          mammal.append(i) 
                      print(birds)
                      print(mammal)
                      
                      ['falcon', 'parrot']
                      ['lion', 'monkey', 'leopard']
                      
                      birds = []
                      mammal = []
                      for i, columnclass in df.iterrows():
                        if columnclass['class'] == 'bird':
                          birds.append(i)
                        else:
                          mammal.append(i) 
                      print(birds)
                      print(mammal)
                      
                      ['falcon', 'parrot']
                      ['lion', 'monkey', 'leopard']
                      
                      group = df.set_index('class')['max_speed'].sum(level=0)
                      
                      >>> group
                      class
                      bird      413.0
                      mammal    138.2
                      Name: max_speed, dtype: float64
                      
                      group = df.set_index('class')['max_speed'].sum(level=0)
                      
                      >>> group
                      class
                      bird      413.0
                      mammal    138.2
                      Name: max_speed, dtype: float64
                      

                      Can I get the value of the grouped column in groupby apply?

                      copy iconCopydownload iconDownload
                      print (df.groupby('class').apply(lambda x: x.name))
                      class
                      bird        bird
                      mammal    mammal
                      dtype: object
                      

                      Need To Perform a Merge in Pandas Exactly Like VLOOKUP

                      copy iconCopydownload iconDownload
                      import pandas as pd
                      
                      odds = pd.DataFrame([
                      ['Patriots',    '-7.0 ' ,  'Steelers' ,   '7'  , '51.0'  ,  '+270' ,   '-340' ,   '1'],
                      ['Packers', '-6.0' ,   'Bears'  , '6' ,  '48.0'  ,  '-286' ,   '+230',    '1'],
                      ['Chiefs', '-1.0'  ,  'Texans'  ,'1'  , '40.0'   , '-115'  ,  '-105' ,   '1'],
                      ['Jets' ,   '-4.0' ,   'Browns' , '4' ,  '40.0'  ,  '+170' ,   '-190',    '1'],
                      ['Colts',   '-1.0' ,   'Bills'  , '1'  , '44.0'  ,  '-115' ,   '-105',    '1'],
                      ['Dolphins',    '-4.0' ,  'Football Team' ,  '4'   , '46.0' ,  '-210',    '+175'  ,  '1'],
                      ['Panthers',    '-3.0' ,   'Jaguars', '3' ,  '41.0' ,   '-150' ,   '+130' ,   '1'],
                      ['Seahawks' ,   '-4.0' ,   'Rams'   , '4' ,  '42.0' ,   '-185' ,   '+160' ,   '1'],
                      ['Cardinals',   '-2.0' ,   'Saints' ,'2'  , '49.0'  ,  '+120'  ,  '-140'  ,  '1'],
                      ['Chargers' ,   '-4.0' ,   'Lions'  , '4' ,  '46.0' ,    '+160' ,  '-180' ,   '1'],
                      ['Buccaneers',  '-3.0' ,   'Titans' , '3' ,  '40.0' ,   '+130'  ,  '-150' ,   '1'],
                      ['Bengals' ,'-3.0'  ,  'Raiders', '3',   '43.0'  ,  '-154'  ,  '+130'  ,  '1'],
                      ['Broncos', '-4.0'  , 'Ravens' , '4' ,  '46.0'   , '+180'   , '-220'   , '1'],
                      ['Cowboys', '-7.0'  ,  'Giants' , '7' ,  '52.0'  ,  '+240'  ,  '-300'  ,  '1'],
                      ['Eagles',  '-3.0'  ,  'Falcons', '3'  , '55.0'  ,  '-188'  ,  '+150'  ,  '1'],
                      ['Vikings', '-2.0'  ,  '49ers'  , '2'   ,'42.0'  ,  '-142'  ,  '+120'  , '1']],
                      columns = ['Favorite',    'Spread',  'Underdog' ,   'Spread2', 'Total' ,  'Away Money Line', 'Home Money Line', 'Week'])
                      
                      
                      df = pd.DataFrame([
                              ['Devonte Freemon', 'RB', '2015-09-14 00:00:00', '1', 'Falcons'],
                              ['Antonion Brownn', 'WR', '2015-09-10 00:00:00', '1', 'Steelers'],
                              ['Adrian Peterson', 'RB', '2015-09-14 00:00:00', '1', 'Vikings'],
                              ['Cam Newton', 'QB', '2015-09-14 00:00:00', '1', 'Panthers']],
                              columns = ['Name','Position','Date','Week','Tm'])
                              
                      
                      df['Week'] = df['Week'].astype(int)
                      odds['Week'] = odds['Week'].astype(int)
                      odds['Spread'] = odds['Spread'].astype(float)
                      
                      
                      favorite_merged = df.merge(odds.rename(columns={'Favorite':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      underdog_merged = df.merge(odds.rename(columns={'Underdog':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      
                      # Because the spread for the Underdog should be positive
                      underdog_merged['Spread'] = underdog_merged['Spread'] * -1
                      
                      final_df = pd.concat([favorite_merged, underdog_merged]).reset_index(drop=True)
                      
                      print(df)
                                    Name Position                 Date  Week        Tm
                      0  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons
                      1  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers
                      2  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings
                      3       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers
                      
                      print(final_df)
                                    Name Position                 Date  Week        Tm  Spread
                      0  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings    -2.0
                      1       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers    -3.0
                      2  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons     3.0
                      3  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers     7.0
                      
                      import pandas as pd
                      
                      odds = pd.DataFrame([
                      ['Patriots',    '-7.0 ' ,  'Steelers' ,   '7'  , '51.0'  ,  '+270' ,   '-340' ,   '1'],
                      ['Packers', '-6.0' ,   'Bears'  , '6' ,  '48.0'  ,  '-286' ,   '+230',    '1'],
                      ['Chiefs', '-1.0'  ,  'Texans'  ,'1'  , '40.0'   , '-115'  ,  '-105' ,   '1'],
                      ['Jets' ,   '-4.0' ,   'Browns' , '4' ,  '40.0'  ,  '+170' ,   '-190',    '1'],
                      ['Colts',   '-1.0' ,   'Bills'  , '1'  , '44.0'  ,  '-115' ,   '-105',    '1'],
                      ['Dolphins',    '-4.0' ,  'Football Team' ,  '4'   , '46.0' ,  '-210',    '+175'  ,  '1'],
                      ['Panthers',    '-3.0' ,   'Jaguars', '3' ,  '41.0' ,   '-150' ,   '+130' ,   '1'],
                      ['Seahawks' ,   '-4.0' ,   'Rams'   , '4' ,  '42.0' ,   '-185' ,   '+160' ,   '1'],
                      ['Cardinals',   '-2.0' ,   'Saints' ,'2'  , '49.0'  ,  '+120'  ,  '-140'  ,  '1'],
                      ['Chargers' ,   '-4.0' ,   'Lions'  , '4' ,  '46.0' ,    '+160' ,  '-180' ,   '1'],
                      ['Buccaneers',  '-3.0' ,   'Titans' , '3' ,  '40.0' ,   '+130'  ,  '-150' ,   '1'],
                      ['Bengals' ,'-3.0'  ,  'Raiders', '3',   '43.0'  ,  '-154'  ,  '+130'  ,  '1'],
                      ['Broncos', '-4.0'  , 'Ravens' , '4' ,  '46.0'   , '+180'   , '-220'   , '1'],
                      ['Cowboys', '-7.0'  ,  'Giants' , '7' ,  '52.0'  ,  '+240'  ,  '-300'  ,  '1'],
                      ['Eagles',  '-3.0'  ,  'Falcons', '3'  , '55.0'  ,  '-188'  ,  '+150'  ,  '1'],
                      ['Vikings', '-2.0'  ,  '49ers'  , '2'   ,'42.0'  ,  '-142'  ,  '+120'  , '1']],
                      columns = ['Favorite',    'Spread',  'Underdog' ,   'Spread2', 'Total' ,  'Away Money Line', 'Home Money Line', 'Week'])
                      
                      
                      df = pd.DataFrame([
                              ['Devonte Freemon', 'RB', '2015-09-14 00:00:00', '1', 'Falcons'],
                              ['Antonion Brownn', 'WR', '2015-09-10 00:00:00', '1', 'Steelers'],
                              ['Adrian Peterson', 'RB', '2015-09-14 00:00:00', '1', 'Vikings'],
                              ['Cam Newton', 'QB', '2015-09-14 00:00:00', '1', 'Panthers']],
                              columns = ['Name','Position','Date','Week','Tm'])
                              
                      
                      df['Week'] = df['Week'].astype(int)
                      odds['Week'] = odds['Week'].astype(int)
                      odds['Spread'] = odds['Spread'].astype(float)
                      
                      
                      favorite_merged = df.merge(odds.rename(columns={'Favorite':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      underdog_merged = df.merge(odds.rename(columns={'Underdog':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      
                      # Because the spread for the Underdog should be positive
                      underdog_merged['Spread'] = underdog_merged['Spread'] * -1
                      
                      final_df = pd.concat([favorite_merged, underdog_merged]).reset_index(drop=True)
                      
                      print(df)
                                    Name Position                 Date  Week        Tm
                      0  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons
                      1  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers
                      2  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings
                      3       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers
                      
                      print(final_df)
                                    Name Position                 Date  Week        Tm  Spread
                      0  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings    -2.0
                      1       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers    -3.0
                      2  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons     3.0
                      3  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers     7.0
                      
                      import pandas as pd
                      
                      odds = pd.DataFrame([
                      ['Patriots',    '-7.0 ' ,  'Steelers' ,   '7'  , '51.0'  ,  '+270' ,   '-340' ,   '1'],
                      ['Packers', '-6.0' ,   'Bears'  , '6' ,  '48.0'  ,  '-286' ,   '+230',    '1'],
                      ['Chiefs', '-1.0'  ,  'Texans'  ,'1'  , '40.0'   , '-115'  ,  '-105' ,   '1'],
                      ['Jets' ,   '-4.0' ,   'Browns' , '4' ,  '40.0'  ,  '+170' ,   '-190',    '1'],
                      ['Colts',   '-1.0' ,   'Bills'  , '1'  , '44.0'  ,  '-115' ,   '-105',    '1'],
                      ['Dolphins',    '-4.0' ,  'Football Team' ,  '4'   , '46.0' ,  '-210',    '+175'  ,  '1'],
                      ['Panthers',    '-3.0' ,   'Jaguars', '3' ,  '41.0' ,   '-150' ,   '+130' ,   '1'],
                      ['Seahawks' ,   '-4.0' ,   'Rams'   , '4' ,  '42.0' ,   '-185' ,   '+160' ,   '1'],
                      ['Cardinals',   '-2.0' ,   'Saints' ,'2'  , '49.0'  ,  '+120'  ,  '-140'  ,  '1'],
                      ['Chargers' ,   '-4.0' ,   'Lions'  , '4' ,  '46.0' ,    '+160' ,  '-180' ,   '1'],
                      ['Buccaneers',  '-3.0' ,   'Titans' , '3' ,  '40.0' ,   '+130'  ,  '-150' ,   '1'],
                      ['Bengals' ,'-3.0'  ,  'Raiders', '3',   '43.0'  ,  '-154'  ,  '+130'  ,  '1'],
                      ['Broncos', '-4.0'  , 'Ravens' , '4' ,  '46.0'   , '+180'   , '-220'   , '1'],
                      ['Cowboys', '-7.0'  ,  'Giants' , '7' ,  '52.0'  ,  '+240'  ,  '-300'  ,  '1'],
                      ['Eagles',  '-3.0'  ,  'Falcons', '3'  , '55.0'  ,  '-188'  ,  '+150'  ,  '1'],
                      ['Vikings', '-2.0'  ,  '49ers'  , '2'   ,'42.0'  ,  '-142'  ,  '+120'  , '1']],
                      columns = ['Favorite',    'Spread',  'Underdog' ,   'Spread2', 'Total' ,  'Away Money Line', 'Home Money Line', 'Week'])
                      
                      
                      df = pd.DataFrame([
                              ['Devonte Freemon', 'RB', '2015-09-14 00:00:00', '1', 'Falcons'],
                              ['Antonion Brownn', 'WR', '2015-09-10 00:00:00', '1', 'Steelers'],
                              ['Adrian Peterson', 'RB', '2015-09-14 00:00:00', '1', 'Vikings'],
                              ['Cam Newton', 'QB', '2015-09-14 00:00:00', '1', 'Panthers']],
                              columns = ['Name','Position','Date','Week','Tm'])
                              
                      
                      df['Week'] = df['Week'].astype(int)
                      odds['Week'] = odds['Week'].astype(int)
                      odds['Spread'] = odds['Spread'].astype(float)
                      
                      
                      favorite_merged = df.merge(odds.rename(columns={'Favorite':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      underdog_merged = df.merge(odds.rename(columns={'Underdog':'Tm'})[['Tm','Week','Spread']], how='inner', on=['Tm','Week'])
                      
                      # Because the spread for the Underdog should be positive
                      underdog_merged['Spread'] = underdog_merged['Spread'] * -1
                      
                      final_df = pd.concat([favorite_merged, underdog_merged]).reset_index(drop=True)
                      
                      print(df)
                                    Name Position                 Date  Week        Tm
                      0  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons
                      1  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers
                      2  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings
                      3       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers
                      
                      print(final_df)
                                    Name Position                 Date  Week        Tm  Spread
                      0  Adrian Peterson       RB  2015-09-14 00:00:00     1   Vikings    -2.0
                      1       Cam Newton       QB  2015-09-14 00:00:00     1  Panthers    -3.0
                      2  Devonte Freemon       RB  2015-09-14 00:00:00     1   Falcons     3.0
                      3  Antonion Brownn       WR  2015-09-10 00:00:00     1  Steelers     7.0
                      

                      R: split-apply-combine for geographic distance

                      copy iconCopydownload iconDownload
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      LatLong2Cart = function(latitude, longitude, REarth = 6371) { 
                        tibble(
                          x = REarth * cos(latitude) * cos(longitude),
                          y = REarth * cos(latitude) * sin(longitude),
                          z = REarth *sin(latitude))
                      }
                      
                      library(tidyverse)
                      somewhere = somewhere %>% as_tibble()
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude))
                      
                      # A tibble: 100 x 12
                         state   geoid ansicode city                lsad  funcstat latitude longitude designation      x      y      z
                         <fct>   <int>    <int> <chr>               <chr> <chr>       <dbl>     <dbl> <chr>        <dbl>  <dbl>  <dbl>
                       1 or    4120100  2410344 donald              25    a            45.2    -123.  city        -1972.   644.  6024.
                       2 pa    4280962  2390453 warminster heights  57    s            40.2     -75.1 cdp         -4815. -1564.  3867.
                       3 ca     668028  2411793 san juan capistrano 25    a            33.5    -118.  city          485. -3096.  5547.
                       4 pa    4243944  1214759 littlestown         21    a            39.7     -77.1 borough       350.  2894.  5665.
                       5 nj    3460600   885360 port republic       25    a            39.5     -74.5 city        -1008. -1329.  6149.
                       6 tx    4871948  2412035 taylor              25    a            30.6     -97.4 city        -4237.   160. -4755.
                       7 ks    2046000   485621 merriam             25    a            39.0     -94.7 city         1435.  -686.  6169.
                       8 nc    3747695  2403359 northlakes          57    s            35.8     -81.4 cdp         -2066.  -670. -5990.
                       9 co     839965  2412812 julesburg           43    a            41.0    -102.  town         1010.  6223.  -915.
                      10 ia    1909910  2583481 california junction 57    s            41.6     -96.0 cdp           840.  4718. -4198.
                      # ... with 90 more rows
                      
                      calcDist = function(data, key){
                        if(!all(c("x", "y", "z") %in% names(data))) {
                          stop("date must contain the variables x, y and z!")
                        }
                        key=enquo(key)
                        n = nrow(data)
                        dist = array(data = as.double(0), dim=n*n)
                        x = data$x
                        y = data$y
                        z = data$z
                        keys = data %>% pull(!!key)
                        for(i in 1:n){
                          for(j in 1:n){
                            dist[i+(j-1)*n] = sqrt((x[i]-x[j])^2+(y[i]-y[j])^2+(z[i]-z[j])^2)
                          }
                        }
                        tibble(
                          expand.grid(factor(keys), factor(keys)),
                          dist = dist
                          )
                      }
                      
                      somewhere %>%  
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        calcDist(city)
                      
                      # A tibble: 10,000 x 3
                         Var1                Var2     dist
                         <fct>               <fct>   <dbl>
                       1 donald              donald     0 
                       2 warminster heights  donald  4197.
                       3 san juan capistrano donald  4500.
                       4 littlestown         donald  3253.
                       5 port republic       donald  2200.
                       6 taylor              donald 11025.
                       7 merriam             donald  3660.
                       8 northlakes          donald 12085.
                       9 julesburg           donald  9390.
                      10 california junction donald 11358.
                      # ... with 9,990 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        nest_by(state) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 400 x 4
                      # Groups:   state [40]
                         state Var1                      Var2                        dist
                         <fct> <fct>                     <fct>                      <dbl>
                       1 ak    ester                     ester                         0 
                       2 ak    unalaska                  ester                      9245.
                       3 ak    ester                     unalaska                   9245.
                       4 ak    unalaska                  unalaska                      0 
                       5 al    heath                     heath                         0 
                       6 al    huntsville                heath                     12597.
                       7 al    heath                     huntsville                12597.
                       8 al    huntsville                huntsville                    0 
                       9 ar    stamps                    stamps                        0 
                      10 az    orange grove mobile manor orange grove mobile manor     0 
                      # ... with 390 more rows
                      
                      somewhere %>% 
                        mutate(LatLong2Cart(latitude, longitude)) %>% 
                        mutate(
                          geo_block = substr(geoid, 1, 1),
                          ansi_block = substr(ansicode, 1, 1)) %>% 
                        nest_by(geo_block, ansi_block) %>% 
                        mutate(dist = list(calcDist(data, city))) %>% 
                        select(-data) %>% 
                        unnest(dist)
                      
                      # A tibble: 1,716 x 5
                      # Groups:   geo_block, ansi_block [13]
                         geo_block ansi_block Var1                Var2                  dist
                         <chr>     <chr>      <fct>               <fct>                <dbl>
                       1 1         2          california junction california junction     0 
                       2 1         2          pleasantville       california junction 10051.
                       3 1         2          willacoochee        california junction 11545.
                       4 1         2          effingham           california junction 11735.
                       5 1         2          heath               california junction  4097.
                       6 1         2          middle amana        california junction  7618.
                       7 1         2          new baden           california junction 12720.
                       8 1         2          hannahs mill        california junction 11681.
                       9 1         2          germantown hills    california junction  5097.
                      10 1         2          la fontaine         california junction 11397.
                      # ... with 1,706 more rows
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      library("dplyr")
                      
                      dd <- somewhere %>%
                        as_tibble() %>%
                        mutate(geo_block = as.factor(as.integer(substr(geoid, 1L, 1L))),
                               ansi_block = as.factor(as.integer(substr(ansicode, 1L, 1L)))) %>%
                        select(geo_block, ansi_block, city, longitude, latitude)
                      dd
                      # # A tibble: 100 × 5
                      #    geo_block ansi_block city                longitude latitude
                      #    <fct>     <fct>      <chr>                   <dbl>    <dbl>
                      #  1 4         2          donald                 -123.      45.2
                      #  2 4         2          warminster heights      -75.1     40.2
                      #  3 6         2          san juan capistrano    -118.      33.5
                      #  4 4         1          littlestown             -77.1     39.7
                      #  5 3         8          port republic           -74.5     39.5
                      #  6 4         2          taylor                  -97.4     30.6
                      #  7 2         4          merriam                 -94.7     39.0
                      #  8 3         2          northlakes              -81.4     35.8
                      #  9 8         2          julesburg              -102.      41.0
                      # 10 1         2          california junction     -96.0     41.6
                      # # … with 90 more rows
                      
                      find_nearest <- function(data, dist, 
                                               coordvar = c("longitude", "latitude"), 
                                               idvar = "city") {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(as.matrix(data[m]))
                          ## Extract minimum distances
                          diag(D) <- Inf # want off-diagonal distances
                          argmin <- apply(D, 2L, which.min)
                          distance <- D[cbind(argmin, seq_len(n))]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      dist_vel <- function(x) {
                        geosphere::distm(x, fun = geosphere::distVincentyEllipsoid)
                      }
                      
                      res <- find_nearest(dd, dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city         city_nearest  distance
                      # 1          4          2              donald              outlook 246536.39
                      # 2          4          2  warminster heights       milford square  38512.79
                      # 3          6          2 san juan capistrano palos verdes estates  77722.35
                      # 4          4          1         littlestown          lower allen  55792.53
                      # 5          3          8       port republic           rio grande  66935.70
                      # 6          4          2              taylor            kingsland  98997.90
                      # 7          2          4             merriam              leawood  13620.87
                      # 8          3          2          northlakes          stony point  30813.46
                      # 9          8          2           julesburg                sunol  46037.81
                      # 10         1          2 california junction              kennard  19857.41
                      
                      dd %>%
                        group_by(geo_block, ansi_block) %>%
                        group_modify(~find_nearest(., dist = dist_vel))
                      # # A tibble: 100 × 5
                      # # Groups:   geo_block, ansi_block [13]
                      #    geo_block ansi_block city                city_nearest  distance
                      #    <fct>     <fct>      <chr>               <chr>            <dbl>
                      #  1 1         2          california junction gray            89610.
                      #  2 1         2          pleasantville       middle amana   122974.
                      #  3 1         2          willacoochee        meigs          104116.
                      #  4 1         2          effingham           hindsboro       72160.
                      #  5 1         2          heath               dawson         198052.
                      #  6 1         2          middle amana        pleasantville  122974.
                      #  7 1         2          new baden           huey            37147.
                      #  8 1         2          hannahs mill        dawson         129599.
                      #  9 1         2          germantown hills    hindsboro      165140.
                      # 10 1         2          la fontaine         edgewood        63384.
                      # # … with 90 more rows
                      
                      find_nearest_by <- function(data, by, ...) {
                        do.call(rbind, base::by(data, by, find_nearest, ...))
                      }
                      
                      res <- find_nearest_by(dd, by = dd[c("geo_block", "ansi_block")], dist = dist_vel)
                      head(res, 10L)
                      #    geo_block ansi_block                city city_nearest   distance
                      # 1          3          1         grand forks         <NA>         NA
                      # 2          4          1         littlestown  martinsburg  122718.95
                      # 3          4          1         martinsburg  littlestown  122718.95
                      # 4          4          1            mitchell  martinsburg 1671365.58
                      # 5          5          1            bayfield         <NA>         NA
                      # 6          1          2 california junction         gray   89609.71
                      # 7          1          2       pleasantville middle amana  122974.32
                      # 8          1          2        willacoochee        meigs  104116.21
                      # 9          1          2           effingham    hindsboro   72160.43
                      # 10         1          2               heath       dawson  198051.76
                      
                      fn <- function(dist) find_nearest(dd, dist = dist)
                      
                      library("geosphere")
                      dist_geo <- function(x) distm(x, fun = distGeo)
                      dist_cos <- function(x) distm(x, fun = distCosine)
                      dist_hav <- function(x) distm(x, fun = distHaversine)
                      dist_vsp <- function(x) distm(x, fun = distVincentySphere)
                      dist_vel <- function(x) distm(x, fun = distVincentyEllipsoid)
                      dist_mee <- function(x) distm(x, fun = distMeeus)
                      
                      microbenchmark::microbenchmark(
                        fn(dist_geo),
                        fn(dist_cos),
                        fn(dist_hav),
                        fn(dist_vsp),
                        fn(dist_vel),
                        fn(dist_mee),
                        times = 1000L
                      )
                      # Unit: milliseconds
                      #          expr        min         lq       mean     median         uq       max neval
                      #  fn(dist_geo)   6.143276   6.291737   6.718329   6.362257   6.459345  45.91131  1000
                      #  fn(dist_cos)   4.239236   4.399977   4.918079   4.461804   4.572033  45.70233  1000
                      #  fn(dist_hav)   4.005331   4.156067   4.641016   4.210721   4.307542  41.91619  1000
                      #  fn(dist_vsp)   3.827227   3.979829   4.446428   4.033621   4.123924  44.29160  1000
                      #  fn(dist_vel) 129.712069 132.549638 135.006170 133.935479 135.248135 174.88874  1000
                      #  fn(dist_mee)   3.716814   3.830999   4.234231   3.883582   3.962712  42.12947  1000
                      
                      
                      library(spatialrisk)
                      library(data.table)
                      library(optiRum)
                      library(sf)
                      #> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1
                      library(s2)
                      
                      # Create data.table
                      s_dt = data.table::data.table(city = somewhere$city,
                                                    x = somewhere$latitude, 
                                                    y = somewhere$longitude)
                      
                      # Cross join two data tables
                      coordinates_dt <- optiRum::CJ.dt(s_dt, s_dt)
                      distance_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                        dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      head(distance_m)
                      #>                   city        x          y               i.city      i.x
                      #> 1:  warminster heights 40.18837  -75.08409               shiloh 39.46242
                      #> 2: san juan capistrano 33.50089 -117.65439 palos verdes estates 33.77427
                      #> 3:         littlestown 39.74517  -77.08921          lower allen 40.22675
                      #> 4:       port republic 39.53480  -74.47610           rio grande 39.01905
                      #> 5:              taylor 30.57326  -97.42712               aurora 33.05594
                      #> 6:             merriam 39.01761  -94.69396              leawood 38.90726
                      #>           i.y    dist_m
                      #> 1:  -75.29244 31059.909
                      #> 2: -118.42575 87051.444
                      #> 3:  -76.90277 24005.689
                      #> 4:  -74.87787 47227.822
                      #> 5:  -97.50961 37074.461
                      #> 6:  -94.62524  7714.126
                      
                      s_sf <- sf::st_as_sf(s_dt, coords = c("x", "y"))
                      cities <- sf::st_as_sfc(s2::as_s2_geography(s_sf))
                      s_dt$city_nearest <-  s_dt$city[sf::st_nearest_feature(cities)]
                      
                      
                      method_1 <- function(){
                        coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                          dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      }
                      
                      method_2 <- function(){
                        s_dt$city[sf::st_nearest_feature(cities)]
                      }
                      
                      microbenchmark::microbenchmark(
                        method_1(), 
                        method_2(), 
                        times = 100
                      )
                      #> Unit: milliseconds
                      #>        expr        min         lq       mean     median         uq       max
                      #>  method_1()   5.385391   5.652444   6.234329   5.772923   6.003445  11.60981
                      #>  method_2() 182.730850 188.408202 203.348667 199.049937 211.682795 303.14904
                      #>  neval
                      #>    100
                      #>    100
                      
                      
                      library(spatialrisk)
                      library(data.table)
                      library(optiRum)
                      library(sf)
                      #> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1
                      library(s2)
                      
                      # Create data.table
                      s_dt = data.table::data.table(city = somewhere$city,
                                                    x = somewhere$latitude, 
                                                    y = somewhere$longitude)
                      
                      # Cross join two data tables
                      coordinates_dt <- optiRum::CJ.dt(s_dt, s_dt)
                      distance_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                        dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      head(distance_m)
                      #>                   city        x          y               i.city      i.x
                      #> 1:  warminster heights 40.18837  -75.08409               shiloh 39.46242
                      #> 2: san juan capistrano 33.50089 -117.65439 palos verdes estates 33.77427
                      #> 3:         littlestown 39.74517  -77.08921          lower allen 40.22675
                      #> 4:       port republic 39.53480  -74.47610           rio grande 39.01905
                      #> 5:              taylor 30.57326  -97.42712               aurora 33.05594
                      #> 6:             merriam 39.01761  -94.69396              leawood 38.90726
                      #>           i.y    dist_m
                      #> 1:  -75.29244 31059.909
                      #> 2: -118.42575 87051.444
                      #> 3:  -76.90277 24005.689
                      #> 4:  -74.87787 47227.822
                      #> 5:  -97.50961 37074.461
                      #> 6:  -94.62524  7714.126
                      
                      s_sf <- sf::st_as_sf(s_dt, coords = c("x", "y"))
                      cities <- sf::st_as_sfc(s2::as_s2_geography(s_sf))
                      s_dt$city_nearest <-  s_dt$city[sf::st_nearest_feature(cities)]
                      
                      
                      method_1 <- function(){
                        coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                          dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      }
                      
                      method_2 <- function(){
                        s_dt$city[sf::st_nearest_feature(cities)]
                      }
                      
                      microbenchmark::microbenchmark(
                        method_1(), 
                        method_2(), 
                        times = 100
                      )
                      #> Unit: milliseconds
                      #>        expr        min         lq       mean     median         uq       max
                      #>  method_1()   5.385391   5.652444   6.234329   5.772923   6.003445  11.60981
                      #>  method_2() 182.730850 188.408202 203.348667 199.049937 211.682795 303.14904
                      #>  neval
                      #>    100
                      #>    100
                      
                      
                      library(spatialrisk)
                      library(data.table)
                      library(optiRum)
                      library(sf)
                      #> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1
                      library(s2)
                      
                      # Create data.table
                      s_dt = data.table::data.table(city = somewhere$city,
                                                    x = somewhere$latitude, 
                                                    y = somewhere$longitude)
                      
                      # Cross join two data tables
                      coordinates_dt <- optiRum::CJ.dt(s_dt, s_dt)
                      distance_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                        dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      head(distance_m)
                      #>                   city        x          y               i.city      i.x
                      #> 1:  warminster heights 40.18837  -75.08409               shiloh 39.46242
                      #> 2: san juan capistrano 33.50089 -117.65439 palos verdes estates 33.77427
                      #> 3:         littlestown 39.74517  -77.08921          lower allen 40.22675
                      #> 4:       port republic 39.53480  -74.47610           rio grande 39.01905
                      #> 5:              taylor 30.57326  -97.42712               aurora 33.05594
                      #> 6:             merriam 39.01761  -94.69396              leawood 38.90726
                      #>           i.y    dist_m
                      #> 1:  -75.29244 31059.909
                      #> 2: -118.42575 87051.444
                      #> 3:  -76.90277 24005.689
                      #> 4:  -74.87787 47227.822
                      #> 5:  -97.50961 37074.461
                      #> 6:  -94.62524  7714.126
                      
                      s_sf <- sf::st_as_sf(s_dt, coords = c("x", "y"))
                      cities <- sf::st_as_sfc(s2::as_s2_geography(s_sf))
                      s_dt$city_nearest <-  s_dt$city[sf::st_nearest_feature(cities)]
                      
                      
                      method_1 <- function(){
                        coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)][
                          dist_m > 0, .SD[which.min(dist_m)], by = .(city, x, y)]
                      }
                      
                      method_2 <- function(){
                        s_dt$city[sf::st_nearest_feature(cities)]
                      }
                      
                      microbenchmark::microbenchmark(
                        method_1(), 
                        method_2(), 
                        times = 100
                      )
                      #> Unit: milliseconds
                      #>        expr        min         lq       mean     median         uq       max
                      #>  method_1()   5.385391   5.652444   6.234329   5.772923   6.003445  11.60981
                      #>  method_2() 182.730850 188.408202 203.348667 199.049937 211.682795 303.14904
                      #>  neval
                      #>    100
                      #>    100
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      
                      `[.dist` <- function(x, i, j, drop = TRUE) {
                        class(x) <- NULL
                        p <- length(x)
                        n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n * (n - 1) / 2
                        
                        ## Column extraction
                        if (missing(i) && !missing(j) && is.integer(j) && length(j) == 1L && !is.na(j) && j >= 1L && j <= n) {
                          if (j == 1L) {
                            return(c(0, x[seq_len(n - 1L)]))
                          }
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- rep.int(j - 1L, j - 1L)
                          jj <- 1L:(j - 1L)
                          if (j < n) {
                            ii <- c(ii, j:(n - 1L))
                            jj <- c(jj, rep.int(j, n - j))
                          }
                          kk <- ii + round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(n)
                          res[-j] <- x[kk]
                          nms <- attr(x, "Labels")
                          if (drop) {
                            names(res) <- nms
                          } else {
                            dim(res) <- c(n, 1L)
                            dimnames(res) <- list(nms, nms[j])
                          }
                          return(res)
                        }
                        
                        ## Element extraction with matrix indices
                        if (missing(j) && !missing(i) && is.matrix(i) && dim(i)[2L] == 2L && is.integer(i) && !anyNA(i) && all(i >= 1L & i <= n)) {
                          m <- dim(i)[1L]
                          ## Subset off-diagonal entries
                          d <- i[, 1L] == i[, 2L]
                          i <- i[!d, , drop = FALSE]
                          ## Transpose upper triangular entries
                          u <- i[, 2L] > i[, 1L]
                          i[u, 1:2] <- i[u, 2:1]
                          ## Convert 2-ary index of 'D' to 1-ary index of 'D[lower.tri(D)]'
                          ii <- i[, 1L] - 1L
                          jj <- i[, 2L]
                          kk <- ii + (jj > 1L) * round(0.5 * (2L * (n - 1L) - jj) * (jj - 1L))
                          ## Extract
                          res <- double(m)
                          res[!d] <- x[kk] 
                          return(res)
                        }
                      
                        ## Fall back on coercion for any other subset operation
                        as.matrix(x)[i, j, drop = drop]
                      }
                      
                      n <- 6L
                      do <- dist(seq_len(n))
                      dm <- unname(as.matrix(do))
                      ij <- cbind(sample(6L), sample(6L))
                      identical(do[, 4L], dm[, 4L]) # TRUE
                      identical(do[ij], dm[ij]) # TRUE
                      
                      find_nearest2 <- function(data, dist, coordvar, idvar) {
                        m <- match(coordvar, names(data), 0L)
                        n <- nrow(data)
                        if (n < 2L) {
                          argmin <- NA_integer_[n]
                          distance <- NA_real_[n]
                        } else {
                          ## Compute distance matrix
                          D <- dist(data[m])
                          ## Extract minimum off-diagonal distances
                          patch.which.min <- function(x, i) {
                            x[i] <- Inf
                            which.min(x)
                          }
                          argmin <- integer(n)
                          index <- seq_len(n)
                          for (j in index) {
                            argmin[j] <- forceAndCall(2L, patch.which.min, D[, j], j)
                          }
                          distance <- D[cbind(argmin, index)]
                        }
                        ## Return focal point data, nearest neighbour ID, distance
                        r1 <- data[-m]
                        r2 <- data[argmin, idvar, drop = FALSE]
                        names(r2) <- paste0(idvar, "_nearest")
                        data.frame(r1, r2, distance, row.names = NULL, stringsAsFactors = FALSE)
                      }
                      
                      code <- '#include <Rcpp.h>
                      using namespace Rcpp;
                      
                      double distanceHaversine(double latf, double lonf, double latt, double lont, double tolerance) {
                        double d;
                        double dlat = latt - latf;
                        double dlon =  lont - lonf;
                        d = (sin(dlat * 0.5) * sin(dlat * 0.5)) + (cos(latf) * cos(latt)) * (sin(dlon * 0.5) * sin(dlon * 0.5));
                        if(d > 1 && d <= tolerance){
                          d = 1;
                        }
                        return 2 * atan2(sqrt(d), sqrt(1 - d)) * 6378137.0;
                      }
                      
                      double toRadians(double deg){
                        return deg * 0.01745329251;  // PI / 180;
                      }
                      
                      // [[Rcpp::export]]
                      NumericVector calc_dist(Rcpp::NumericVector lat, 
                                              Rcpp::NumericVector lon, 
                                              double tolerance = 10000000000.0) {
                        std::size_t nlat = lat.size();
                        std::size_t nlon = lon.size();
                        if (nlat != nlon) throw std::range_error("lat and lon different lengths");
                        if (nlat < 2) throw std::range_error("Need at least 2 points");
                        std::size_t size = nlat * (nlat - 1) / 2;
                        NumericVector ans(size);
                        std::size_t k = 0;
                        double latf;
                        double latt;
                        double lonf;
                        double lont;
                        
                        for (std::size_t j = 0; j < (nlat-1); j++) {
                          for (std::size_t i = j + 1; i < nlat; i++) {
                            latf = toRadians(lat[i]);
                            lonf = toRadians(lon[i]);
                            latt = toRadians(lat[j]);
                            lont = toRadians(lon[j]);
                            ans[k++] = distanceHaversine(latf, lonf, latt, lont, tolerance);
                          }
                        }
                        
                        return ans;
                      }
                      '
                      Rcpp::sourceCpp(code = code)
                      
                      rx <- function(n) {
                        data.frame(id = seq_len(n), lon = rnorm(n), lat = rnorm(n))
                      }
                      dist_hav <- function(x) {
                        geosphere::distm(x, fun = geosphere::distHaversine)
                      }
                      dist_dww <- function(x) {
                        res <- calc_dist(x[, 2L], x[, 1L])
                        attr(res, "class") <- "dist"
                        attr(res, "Size") <- nrow(x)
                        attr(res, "Diag") <- FALSE
                        attr(res, "Upper") <- FALSE
                        attr(res, "call") <- match.call()
                        res
                      }
                      
                      fn2 <- function(data, dist) {
                        find_nearest2(data, dist = dist, coordvar = c("lon", "lat"), idvar = "id")
                      }
                      
                      x1 <- rx(100L)
                      microbenchmark::microbenchmark(
                        fn2(x1, dist_hav), 
                        fn2(x1, dist_dww), 
                        times = 1000L
                      )
                      # Unit: microseconds
                      #               expr      min       lq     mean   median       uq       max neval
                      #  fn2(x1, dist_hav) 3768.310 3886.452 4680.300 3977.492 4131.796 34461.361  1000
                      #  fn2(x1, dist_dww)  930.044  992.241 1128.272 1017.005 1045.746  7006.326  1000
                      
                      x2 <- rx(20000L)
                      microbenchmark::microbenchmark(
                        fn2(x2, dist_hav),
                        fn2(x2, dist_dww),
                        times = 100L
                      )
                      # Unit: seconds
                      #               expr      min       lq     mean   median       uq      max neval
                      #  fn2(x2, dist_hav) 29.60596 30.04249 30.29052 30.14016 30.45054 31.53976   100
                      #  fn2(x2, dist_dww) 18.71327 19.01204 19.12311 19.09058 19.26680 19.62273   100
                      
                      x3 <- rx(40000L)
                      microbenchmark::microbenchmark(
                        # fn2(x3, dist_hav), # runs out of memory
                        fn2(x3, dist_dww),
                        times = 10L
                      )
                      # Unit: seconds
                      #               expr      min      lq     mean  median       uq      max neval
                      #  fn2(x3, dist_dww) 104.8912 105.762 109.1512 109.653 112.2543 112.9265    10
                      

                      How to add a percentage computation in pandas result

                      copy iconCopydownload iconDownload
                      #sampledata.txt
                      df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                                              'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                                              'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                                              'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})
                      
                      df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
                      df['within_brackets'].replace('\$', '', regex=True, inplace=True)
                      df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
                      df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
                      df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      df.sort_values(by=['col2'], ascending=False, inplace=True)
                      print(df)
                      
                            col1  col2          col3    col4            col5    col6
                      4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
                      7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
                      5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05    0.00           ACDEF      0%
                      3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df['col4'] =  '($' + df['col4'].astype(str) + ')'
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      
                      #sampledata.txt
                      df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                                              'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                                              'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                                              'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})
                      
                      df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
                      df['within_brackets'].replace('\$', '', regex=True, inplace=True)
                      df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
                      df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
                      df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      df.sort_values(by=['col2'], ascending=False, inplace=True)
                      print(df)
                      
                            col1  col2          col3    col4            col5    col6
                      4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
                      7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
                      5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05    0.00           ACDEF      0%
                      3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df['col4'] =  '($' + df['col4'].astype(str) + ')'
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      
                      #sampledata.txt
                      df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                                              'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                                              'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                                              'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})
                      
                      df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
                      df['within_brackets'].replace('\$', '', regex=True, inplace=True)
                      df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
                      df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
                      df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      df.sort_values(by=['col2'], ascending=False, inplace=True)
                      print(df)
                      
                            col1  col2          col3    col4            col5    col6
                      4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
                      7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
                      5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05    0.00           ACDEF      0%
                      3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df['col4'] =  '($' + df['col4'].astype(str) + ')'
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      
                      #sampledata.txt
                      df = pd.DataFrame(data={'col1': ['alpha', 'bravo', 'charlie', 'delta', 'echo','lima', 'falcon', 'echo', 'charlie', 'romeo', 'falcon'],
                                              'col2': [1, 3, 1, 2, 5, 6, 3, 8, 10, 12, 5],
                                              'col3': ['54,00.01', '500,000.00', '27,722.29 ($250.45)', '11 ($10)', '143,299.00 ($101)', '45.00181 ($38.9)', '0.1234', '145,300 ($125.01)', '252,336,733.383 ($492.06)', '980', '9.19'],
                                              'col4': ['ABC DSW2S', 'ACDEF', 'DGAS-CAS', 'SWSDSASS-CCSSW', 'ACS34S1', 'FGF5GGD-DDD', 'DSS2SFS3', 'ACS34S1', 'DGAS-CAS', 'ASDS SSSS SDSD', 'DSS2SFS3']})
                      
                      df['within_brackets'] = df['col3'].str.extract('.*\((.*)\).*') #Extract whats inside the brackets.
                      df['within_brackets'].replace('\$', '', regex=True, inplace=True)
                      df['col3'] = df['col3'].str.replace(r"(\s*\(.*\))|,", "", regex=True) #Extract whats outside the brackets
                      df.rename(columns={'col4': 'col5', 'within_brackets': 'col4'}, inplace=True)
                      df[['col3', 'col4']] = df[['col3', 'col4']].astype(float)
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      df.sort_values(by=['col2'], ascending=False, inplace=True)
                      print(df)
                      
                            col1  col2          col3    col4            col5    col6
                      4     echo    13  2.885990e+05  226.01         ACS34S1   60.0%
                      7    romeo    12  9.800000e+02    0.00  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  742.51        DGAS-CAS  900.0%
                      5   falcon     8  9.313400e+00    0.00        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01   38.90     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05    0.00           ACDEF      0%
                      3    delta     2  1.100000e+01   10.00  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03    0.00       ABC DSW2S      0%
                      
                      df = df.groupby(['col1', 'col5']).agg(col2 = pd.NamedAgg(column="col2", aggfunc="sum"),
                                                            col3 = pd.NamedAgg(column="col3", aggfunc="sum"),
                                                            col4 = pd.NamedAgg(column="col4", aggfunc="sum"),
                                                            col6 = pd.NamedAgg(column="col2", aggfunc=pd.Series.pct_change)).reset_index()
                      df['col6'].fillna(0, inplace=True)
                      #print df here and you will get to know what output looks like till now.
                      df['col6'].fillna(0, inplace=True)
                      df['col6'] = df['col6'].apply(lambda x: f"{str(round(x[-1], 4) * 100)}%" if isinstance(x, np.ndarray) else f"{round(x, 4) * 100}%")
                      df['col4'] =  '($' + df['col4'].astype(str) + ')'
                      df = df[['col1', 'col2', 'col3', 'col4', 'col5', 'col6']]
                      
                      def compute_percentage(row):
                          vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
                          i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
                          if float(int(i)) == i:
                              i = int(i)
                          return str(i) + '%'
                      
                      df['col6'] = df.apply(compute_percentage, axis=1)
                      
                            col1  col2          col3       col4            col5    col6
                      4     echo    13  2.885990e+05  ($226.01)         ACS34S1     60%
                      7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS    900%
                      5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05     ($0.0)           ACDEF      0%
                      3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S      0%
                      
                      def compute_percentage(row):
                          vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
                          i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
                          if float(int(i)) == i:
                              i = int(i)
                          return str(i) + '%'
                      
                      df['col6'] = df.apply(compute_percentage, axis=1)
                      
                            col1  col2          col3       col4            col5    col6
                      4     echo    13  2.885990e+05  ($226.01)         ACS34S1     60%
                      7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD      0%
                      2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS    900%
                      5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3  66.67%
                      6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD      0%
                      1    bravo     3  5.000000e+05     ($0.0)           ACDEF      0%
                      3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW      0%
                      0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S      0%
                      

                      See all related Code Snippets

                      Community Discussions

                      Trending Discussions on falcon
                      • Formatting Phone number with +1 with pandas.Series.replace
                      • Why does np.select not allow me to put in index above total length into choicelist?
                      • Mapping complex JSON to Pandas Dataframe
                      • Using a list of models to make predictions over a list of results using lapply in R
                      • How can I destructure my array of multiple objects
                      • How to group rows in pandas without groupby?
                      • Can I get the value of the grouped column in groupby apply?
                      • Need To Perform a Merge in Pandas Exactly Like VLOOKUP
                      • R: split-apply-combine for geographic distance
                      • How to add a percentage computation in pandas result
                      Trending Discussions on falcon

                      QUESTION

                      Formatting Phone number with +1 with pandas.Series.replace

                      Asked 2022-Mar-17 at 17:47

                      I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:

                      here is my code:

                      df = pd.DataFrame({'Company phone number': ['+1-541-296-2271', '+1-542-296-2271', '+1-543-296-2271'],
                                         'Contact phone number': ['15112962271', None,'15312962271'],
                                         'num_specimen_seen': [10, 2,3]},
                                        index=['falcon', 'dog','cat'])
                      
                      df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\+1-\d{3}-\d{3}-\d{4}')
                      

                      desired output of df['Contact phone number']:

                      falcon    +1-511-296-2271
                      dog       None
                      cat       +1-531-296-2271
                      

                      It is always 11 digits with no spaces or special characters. Thanks!

                      ANSWER

                      Answered 2022-Mar-17 at 17:34

                      You can use .str.extract, convert each row of results to a list, and then use .str.join (and of course concatenate a + at the beginning):

                      df['Contact phone number'] = '+' + df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
                      

                      Output:

                      >>> df
                             Company phone number Contact phone number  num_specimen_seen
                      falcon      +1-541-296-2271       +1-511-296-227                 10
                      dog         +1-542-296-2271                  NaN                  2
                      cat         +1-543-296-2271       +1-531-296-227                  3
                      

                      Source https://stackoverflow.com/questions/71516642

                      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                      Vulnerabilities

                      No vulnerabilities reported

                      Install falcon

                      You can download it from GitHub.
                      You can use falcon like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the falcon component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

                      Support

                      You can find the documentation on Apache Falcon website.

                      DOWNLOAD this Library from

                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      Explore Related Topics

                      Share this Page

                      share link
                      Consider Popular Java Libraries
                      Try Top Libraries by apache
                      Compare Java Libraries with Highest Support
                      Compare Java Libraries with Highest Quality
                      Compare Java Libraries with Highest Security
                      Compare Java Libraries with Permissive License
                      Compare Java Libraries with Highest Reuse
                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      • © 2022 Open Weaver Inc.