scws | chinese segment - scws-1

 by   yiyangest Shell Version: Current License: Non-SPDX

kandi X-RAY | scws Summary

kandi X-RAY | scws Summary

scws is a Shell library. scws has no bugs, it has no vulnerabilities and it has low support. However scws has a Non-SPDX License. You can download it from GitHub.

scws-1.x.x 自述文件 (written by hightman) homepage: $id: readme,v 1.10 2012/03/20 07:01:29 hightman exp $. scws 是 simple chinese words segmentation 的缩写(简易中文分词系统)。 它是一套基于词频词典的机械中文分词引擎,它能将一整段的汉字基本正确的 切分成词,因为词是汉语的基本语素单位,而书写的时候不像英语会在词之间 用空格分开,所以如何准确快速的分词一直是中文分词的攻关难点。. 本分词法并无太多创新成分,采用的是自己采集的词频词典,并辅以一定的专 有名称,人名,地名,数字年代等规则识别来达到基本分词,经小范围测试大 概准确率在 90% ~ 95% 之间,已能基本满足一些小型搜索引擎、关键字提取 等场合运用。首次雏形版本发布于 2005 年底。. 在线分词演示: g b k: utf-8: 繁 体: 注:这里和通用的 gnu 软件安装方式一样,具体选项参数执行 ./configure --help 查看。 常用的三个选项为: --prefix= --disable-mmap --enable-developer . /usr/local/scws/bin/gen_scws_dict -c gbk -i etc/dict_chs_gbk.txt -o /usr/local/scws/etc/dict_chs_gbk.xdb. 执行需要一段时间,最终生成可用的 xdb 文件于 /usr/local/scws/etc/ 中. 如果您需要使用 utf8 编码,请事先将 dict_chs_gbk.txt 转换成 utf8 编码再调用 gen_scws_dict 来转换。. 注:scws 自 1.0.1 起发布版中不再包含词典 text 文件,而直接在主页发布 xdb 格式的词典文件,请参看主页进行下载。. 本说明由 hightman 编写于 2007.06.08 网页地址:这套 scws 库没有外部扩展依赖,代码力争简洁高效,针对分词词典组织 上做了一些优化。除分词外,由于分词词库采用的是自行设计的 xdb 和 xtree 结构,故本库函数也可以用以 xdb 和 xtree 数据存取。 (另行介绍,暂时没有说明)。. struct scws_zchar { int start; int end; }; 注:xdict_t 和 rule_t 分别是词典和规则集的指针,可判断其是否为 null 来判断加载的成功与失败。. ·scws 系列结果集,每次 scws 返回的分词结果的数量都是不定的,直到返回结果为 null 才表示这次分词过程结束,这是一个单链表结构。 typedef struct scws_result *scws_res_t; struct scws_result { int off; float idf; unsigned char len; char attr[3]; scws_res_t next; };. ·scws 高频关键词统计集,简称"词表集",这是 scws 中统计调用时返回用的结构,也是一个单链表结构。 typedef struct scws_topword *scws_top_t; struct scws_topword { char *word; float weight; short times; char attr[2]; scws_top_t next; };. ·scws_t scws_new(); 描述:分配或初始化与scws系列操作的 scws_st 对象。该函数将自动分配、初始化、并返回新对象的指针。通过调用 scws_free() 来释放该对象。 返回值:初始化的 scws_st * (即 scws_t) 句柄。 错误:在内存不足的情况下,返回null。. ·scws_t scws_fork(scws_t p); 描述:在已有scws_st对象上产生一个分支,可以独立用于某个线程分词,但它和父对象共享词典、规则集资源。 同样需要调用 scws_free() 来释放对象。在该分支对象上加载词典、规则集将直接作用于父对象,如果父对象 提前释放,则再调用分支对象进行分词就会引起内存错误。 返回值:克隆出来的分支 scws_st * (scws_t) 句柄。 错误:在内存不足的情况下,返回null。
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              scws has a low active ecosystem.
              It has 1 star(s) with 0 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              scws has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of scws is current.

            kandi-Quality Quality

              scws has no bugs reported.

            kandi-Security Security

              scws has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              scws has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              scws releases are not available. You will need to build from source code and install.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of scws
            Get all kandi verified functions for this library.

            scws Key Features

            No Key Features are available at this moment for scws.

            scws Examples and Code Snippets

            No Code Snippets are available at this moment for scws.

            Community Discussions

            No Community Discussions are available at this moment for scws.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install scws

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/yiyangest/scws.git

          • CLI

            gh repo clone yiyangest/scws

          • sshUrl

            git@github.com:yiyangest/scws.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link