kandi background
Explore Kits

areweprivateyet | The crawler/analysis component of Are We Private | Crawler library

 by   ghostery Java Version: Current License: Non-SPDX

 by   ghostery Java Version: Current License: Non-SPDX

Download this library from

kandi X-RAY | areweprivateyet Summary

areweprivateyet is a Java library typically used in Automation, Crawler applications. areweprivateyet has no bugs, it has no vulnerabilities and it has low support. However areweprivateyet build file is not available and it has a Non-SPDX License. You can download it from GitHub.
Are We Private Yet?.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • areweprivateyet has a low active ecosystem.
  • It has 40 star(s) with 8 fork(s). There are 27 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 1 open issues and 0 have been closed. On average issues are closed in 1750 days. There are no pull requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of areweprivateyet is current.
areweprivateyet Support
Best in #Crawler
Average in #Crawler
areweprivateyet Support
Best in #Crawler
Average in #Crawler

quality kandi Quality

  • areweprivateyet has 0 bugs and 0 code smells.
areweprivateyet Quality
Best in #Crawler
Average in #Crawler
areweprivateyet Quality
Best in #Crawler
Average in #Crawler

securitySecurity

  • areweprivateyet has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • areweprivateyet code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
areweprivateyet Security
Best in #Crawler
Average in #Crawler
areweprivateyet Security
Best in #Crawler
Average in #Crawler

license License

  • areweprivateyet has a Non-SPDX License.
  • Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.
areweprivateyet License
Best in #Crawler
Average in #Crawler
areweprivateyet License
Best in #Crawler
Average in #Crawler

buildReuse

  • areweprivateyet releases are not available. You will need to build from source code and install.
  • areweprivateyet has no build file. You will be need to create the build yourself to build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
areweprivateyet Reuse
Best in #Crawler
Average in #Crawler
areweprivateyet Reuse
Best in #Crawler
Average in #Crawler
Top functions reviewed by kandi - BETA

kandi has reviewed areweprivateyet and discovered the below as its top functions. This is intended to give you an instant insight into areweprivateyet implemented functionality, and help decide if they suit your requirements.

  • Creates the analysis sheet .
    • Create the public suffix table
      • Creates the spreadsheet content .
        • Creates the top pages .
          • Get Guava domain from URL
            • Read profiles .
              • Clears the provided profiles .
                • Gets the named driver profile .
                  • Wait for the browser to load .
                    • Get the domain of a URL .

                      Get all kandi verified functions for this library.

                      Get all kandi verified functions for this library.

                      areweprivateyet Key Features

                      The crawler/analysis component of Are We Private Yet?

                      areweprivateyet Examples and Code Snippets

                      See all related Code Snippets

                      default

                      copy iconCopydownload iconDownload
                      This project is dedicated to automated recreation of Stanford's CIS [Tracking the Trackers: Self-Help Tools](http://cyberlaw.stanford.edu/blog/2011/09/tracking-trackers-self-help-tools) study.
                      This project extends certain feature counts as well, such as, total bandwidth, redirects, local storage, etc.
                      
                      This project is separated broadly into three areas:
                      - Firefox setup for the project
                      - [FourthParty](http://fourthparty.info) extension
                      - Crawler and Analysis utilities
                      
                      The crawler is based on Selenium through Firefox WebDriver interface. Each run, while using an existing profile
                      as a base is actually a separate anonymous profile for Firefox, and as such, the last thing crawler will do is
                      copy the FourthParty sqlite database into a path defined at runtime.
                      
                      All required libraries for the test are stored in lib directory.
                      
                      Results
                      =======
                      The results of several months' worth of crawling are available on the project's website: http://www.areweprivateyet.com/
                      
                      Setup
                      =====
                      
                      To set up the tool you will need Firefox (10 and up),
                      [our FourthParty build](https://github.com/ghostery/fourthparty), and AreWeBetterYet analysis utilities.
                      We currently test the following extensions in addition to baseline:
                      - Ghostery
                      - DoNotTrackMe
                      - Disconnect
                      - Adblock Plus with [Fanboy Adblock list](http://www.fanboy.co.nz/fanboy-adblock.txt)
                      - Adblock Plus with [EasyList](https://easylist-downloads.adblockplus.org/easylist.txt)
                      - TrackerBlock
                      - Firefox with third-party cookies disabled
                      
                      Firefox profiles need to be set up prior to running crawler from analysis utilities. The profiles must be named
                      as follows: __"ghostery", "dntme", "abp-fanboy", "abp-easylist", "trackerblock", "disconnect", "cookies-blocked"__.
                      Each profile needs to contain the FourthParty install (tho this could be force-installed on the profile
                      through Selenium within crawler) and the extension that is being tested. You are also responsible for setting up
                      the extension in the way that you want to test, meaning that if you just install ABP and have no lists, obviously
                      the result would be different if it did contain EasyPrivacy or any other list.
                      
                      You may add your own extensions by modifying the utilities array, or you may request that we add your extension for
                      future testing by emailing us at <areweprivateyet@ghostery.com>. Please remember that baseline is always the first profile to be
                      executed.
                      
                      
                      Executing the crawl
                      ===================
                      
                      The Crawler is a simple Selenium based utility that will use the top500.list (you may substitute it with your own) to
                      load and then navigate to each website in the list. To execute you may load the project into your IDE of choice and
                      simply run Crawler class.  Alternatively, you may use the build provided and run it with your local installation of
                      Java:
                      
                      ```
                      java -Dawby_path=/output_path/ -classpath "apache-mime4j-0.6.jar:lib/bsh-1.3.0.jar:lib/cglib-nodep-2.1_3.jar:lib/commons-codec-1.6.jar:lib/commons-collections-3.2.1.jar:lib/commons-exec-1.1.jar:lib/commons-io-2.2.jar:lib/commons-jxpath-1.3.jar:lib/commons-lang3-3.1.jar:lib/commons-logging-1.1.1.jar:lib/cssparser-0.9.8.jar:lib/dom4j-1.6.1.jar:lib/guava-14.0.jar:lib/hamcrest-core-1.3.jar:lib/hamcrest-library-1.3.jar:lib/htmlunit-2.11.jar:lib/htmlunit-core-js-2.11.jar:lib/httpclient-4.2.1.jar:lib/httpcore-4.2.1.jar:lib/httpmime-4.2.1.jar:lib/ini4j-0.5.2.jar:lib/jcommander-1.29.jar:lib/jetty-websocket-8.1.8.jar:lib/jna-3.4.0.jar:lib/jna-platform-3.4.0.jar:lib/json-20080701.jar:lib/junit-dep-4.11.jar:lib/log4j-1.2.13.jar:lib/nekohtml-1.9.17.jar:lib/netty-3.5.7.Final.jar:lib/operadriver-1.2.jar:lib/phantomjsdriver-1.0.1.jar:lib/poi-3.9-20121203.jar:lib/protobuf-java-2.4.1.jar:lib/sac-1.3.jar:lib/selenium-java-2.31.0.jar:lib/serializer-2.7.1.jar:lib/sqlite-jdbc-3.7.2.jar:lib/stax-api-1.0.1.jar:lib/testng-6.8.jar:lib/xalan-2.7.1.jar:lib/xercesImpl-2.10.0.jar:lib/xml-apis-1.4.01.jar:lib/xmlbeans-2.3.0.jar:."  com.evidon.areweprivateyet.Crawler
                      ```
                      
                      awby_path is the local setting for location of the top500.list file as well as the input and output folder that will
                      be used.  This value is used in Crawler and Aggregator classes.
                      
                      After each extension's crawl is completed, Crawler will copy the FourthParty SQLite database to the output directory
                      (awby_path) to be used in the Aggregator utility later on.  The file name of the copied FourthParty database is
                      fourthparty-profileName.sqlite.
                      
                      Crawling may be done in any order and at any time prior to the running of analysis utilites. You may also use another
                      automation tool to produce the fourthparty output.
                      
                      
                      Running analysis utilities
                      ==========================
                      
                      The Aggregator class is designed to query and collect information from multiple FourthParty databases
                      into a human-readable Excel spreadsheet as well as produce JSON output.
                      To run, either execute from your IDE or use the following command:
                      
                      ```
                      java -Dawby_path=/output_path/ -classpath "apache-mime4j-0.6.jar:lib/bsh-1.3.0.jar:lib/cglib-nodep-2.1_3.jar:lib/commons-codec-1.6.jar:lib/commons-collections-3.2.1.jar:lib/commons-exec-1.1.jar:lib/commons-io-2.2.jar:lib/commons-jxpath-1.3.jar:lib/commons-lang3-3.1.jar:lib/commons-logging-1.1.1.jar:lib/cssparser-0.9.8.jar:lib/dom4j-1.6.1.jar:lib/guava-14.0.jar:lib/hamcrest-core-1.3.jar:lib/hamcrest-library-1.3.jar:lib/htmlunit-2.11.jar:lib/htmlunit-core-js-2.11.jar:lib/httpclient-4.2.1.jar:lib/httpcore-4.2.1.jar:lib/httpmime-4.2.1.jar:lib/ini4j-0.5.2.jar:lib/jcommander-1.29.jar:lib/jetty-websocket-8.1.8.jar:lib/jna-3.4.0.jar:lib/jna-platform-3.4.0.jar:lib/json-20080701.jar:lib/junit-dep-4.11.jar:lib/log4j-1.2.13.jar:lib/nekohtml-1.9.17.jar:lib/netty-3.5.7.Final.jar:lib/operadriver-1.2.jar:lib/phantomjsdriver-1.0.1.jar:lib/poi-3.9-20121203.jar:lib/protobuf-java-2.4.1.jar:lib/sac-1.3.jar:lib/selenium-java-2.31.0.jar:lib/serializer-2.7.1.jar:lib/sqlite-jdbc-3.7.2.jar:lib/stax-api-1.0.1.jar:lib/testng-6.8.jar:lib/xalan-2.7.1.jar:lib/xercesImpl-2.10.0.jar:lib/xml-apis-1.4.01.jar:lib/xmlbeans-2.3.0.jar:."  com.evidon.areweprivateyet.Aggregator
                      ```
                      
                      Using the databases in the input folder, Aggregator collects results and outputs a final file named analysis.xls.
                      This should be a copy of the aforementioned study.
                      
                      
                      License
                      =======
                      AreWePrivateYet uses Apache License 2.0 http://www.apache.org/licenses/LICENSE-2.0
                      
                      License information is stored in the LICENSE file.
                      
                      
                      How to file an issue
                      ====================
                      You may file an issue using GitHub's own issue tracker: https://github.com/ghostery/areweprivateyet/issues
                      
                      
                      How to submit a fix/pull-request
                      ================================
                      You may fork the project and modify at will. Your changes may be submitted back to us via a GitHub's pull
                      request.

                      See all related Code Snippets

                      Community Discussions

                      Trending Discussions on Crawler
                      • How to test form submission with wrong values using Symfony crawler component and PHPUnit?
                      • Setting proxies when crawling websites with Python
                      • Can't Successfully Run AWS Glue Job That Reads From DynamoDB
                      • Why does scrapy_splash CrawlSpider take the same amount of time as scrapy with Selenium?
                      • How can I send Dynamic website content to scrapy with the html content generated by selenium browser?
                      • How to set class variable through __init__ in Python?
                      • headless chrome on docker M1 error - unable to discover open window in chrome
                      • How do I pass in arguments non-interactive into a bash file that uses "read"?
                      • Scrapy crawls duplicate data
                      • AWS Glue Crawler sends all data to Glue Catalog and Athena without Glue Job
                      Trending Discussions on Crawler

                      QUESTION

                      How to test form submission with wrong values using Symfony crawler component and PHPUnit?

                      Asked 2022-Apr-05 at 11:18

                      When you're using the app through the browser, you send a bad value, the system checks for errors in the form, and if something goes wrong (it does in this case), it redirects with a default error message written below the incriminated field.

                      This is the behaviour I am trying to assert with my test case, but I came accross an \InvalidArgumentException I was not expecting.

                      I am using the symfony/phpunit-bridge with phpunit/phpunit v8.5.23 and symfony/dom-crawler v5.3.7. Here's a sample of what it looks like :

                      public function testPayloadNotRespectingFieldLimits(): void
                      {
                          $client = static::createClient();
                      
                          /** @var SomeRepository $repo */
                          $repo = self::getContainer()->get(SomeRepository::class);
                          $countEntries = $repo->count([]);
                          
                          $crawler = $client->request(
                              'GET',
                              '/route/to/form/add'
                          );
                          $this->assertResponseIsSuccessful(); // Goes ok.
                      
                          $form = $crawler->filter('[type=submit]')->form(); // It does retrieve my form node.
                          
                          // This is where it's not working.
                          $form->setValues([
                              'some[name]' => 'Someokvalue',
                              'some[color]' => 'SomeNOTOKValue', // It is a ChoiceType with limited values, where 'SomeNOTOKValue' does not belong. This is the line that throws an \InvalidArgumentException.
                          )];
                      
                          // What I'd like to assert after this
                          $client->submit($form);
                          $this->assertResponseRedirects();
                          $this->assertEquals($countEntries, $repo->count([]));
                      }
                      

                      Here's the exception message I get :

                      InvalidArgumentException: Input "some[color]" cannot take "SomeNOTOKValue" as a value (possible values: "red", "pink", "purple", "white").
                      vendor/symfony/dom-crawler/Field/ChoiceFormField.php:140
                      vendor/symfony/dom-crawler/FormFieldRegistry.php:113
                      vendor/symfony/dom-crawler/Form.php:75
                      

                      The ColorChoiceType tested here is pretty standard :

                      public function configureOptions(OptionsResolver $resolver): void
                      {
                          $resolver->setDefaults([
                              'choices' => ColorEnumType::getChoices(),
                              'multiple' => false,
                          )];
                      }
                      

                      What I can do, is to wrap in a try-catch block, the line where it sets the wrong value. And it would indeed submit the form and proceed to the next assertion. The issue here is that the form was considered submitted and valid, it forced an appropriate value for the color field (the first choice of the enum set). This is not what I get when I try this in my browser (cf. the intro).

                      // ...
                      /** @var SomeRepository $repo */
                      $repo = self::getContainer()->get(SomeRepository::class);
                      $countEntries = $repo->count([]); // Gives 0.
                      // ...
                      try {
                          $form->setValues([
                              'some[name]' => 'Someokvalue',
                              'some[color]' => 'SomeNOTOKValue',
                          ]);
                      } catch (\InvalidArgumentException $e) {}
                      
                      $client->submit($form); // Now it submits the form.
                      $this->assertResponseRedirects(); // Ok.
                      $this->assertEquals($countEntries, $repo->count([])); // Failed asserting that 1 matches expected 0. !!
                      

                      How can I mimic the browser behaviour in my test case and make asserts on it ?

                      ANSWER

                      Answered 2022-Apr-05 at 11:17

                      It seems that you can disable validation on the DomCrawler\Form component. Based on the official documentation here.

                      So doing this, now works as expected :

                      $form = $crawler->filter('[type=submit]')->form()->disableValidation();
                      $form->setValues([
                          'some[name]' => 'Someokvalue',
                          'some[color]' => 'SomeNOTOKValue',
                      ];
                      $client->submit($form);
                      
                      $this->assertEquals($entriesBefore, $repo->count([]); // Now passes.
                      

                      Source https://stackoverflow.com/questions/71565750

                      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                      Vulnerabilities

                      No vulnerabilities reported

                      Install areweprivateyet

                      You can download it from GitHub.
                      You can use areweprivateyet like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the areweprivateyet component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

                      Support

                      For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

                      DOWNLOAD this Library from

                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      Explore Related Topics

                      Share this Page

                      share link
                      Consider Popular Crawler Libraries
                      Try Top Libraries by ghostery
                      Compare Crawler Libraries with Highest Support
                      Compare Crawler Libraries with Highest Quality
                      Compare Crawler Libraries with Highest Security
                      Compare Crawler Libraries with Permissive License
                      Compare Crawler Libraries with Highest Reuse
                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      • © 2022 Open Weaver Inc.