A set of utilities for generating quality scores for MediaWiki revisions
Project description
Revision Scoring
A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers ORES.
Example
Using a scorer_model to score a revision::
import mwapi
from revscoring import Model
from revscoring.extractors.api.extractor import Extractor
with open("models/enwiki.damaging.linear_svc.model") as f:
scorer_model = Model.load(f)
extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
user_agent="revscoring demo"))
feature_values = list(extractor.extract(123456789, scorer_model.features))
print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
Installation
The easiest way to install is via the Python package installer (pip).
pip install revscoring
You may find that some of the dependencies fail to compile (namely
scipy
, numpy
and sklearn
). In that case, you'll need to install some
dependencies in your operating system.
Ubuntu & Debian:
- Run
sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev enchant
- Run
sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi
MacOS:
Using Homebrew and pip, installing revscoring
and enchant
can be accomplished
as follows::
brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring
Adding languages in aspell (MacOS only)
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
Caveats:
The differences between the aspell
and myspell
dictionaries can cause
some of the tests to fail
Finally, in order to make use of language features, you'll need to download some NLTK data. The following command will get the necessary corpora.
python -m nltk.downloader omw sentiwordnet stopwords wordnet
You'll also need to install enchant-compatible dictionaries of the languages you'd like to use. We recommend the following:
- languages.arabic: aspell-ar
- languages.basque: hunspell-eu
- languages.bengali: aspell-bn
- languages.bosnian: hunspell-bs
- languages.catalan: myspell-ca
- languages.czech: myspell-cs
- languages.croatian: myspell-hr
- languages.dutch: myspell-nl
- languages.english: myspell-en-us myspell-en-gb myspell-en-au
- languages.estonian: myspell-et
- languages.finnish: voikko-fi
- languages.french: myspell-fr
- languages.galician: hunspell-gl
- languages.german: myspell-de-at myspell-de-ch myspell-de-de
- languages.greek: aspell-el
- languages.hebrew: myspell-he
- languages.hindi: aspell-hi
- languages.hungarian: myspell-hu
- languages.icelandic: aspell-is
- languages.indonesian: aspell-id
- languages.italian: myspell-it
- languages.latvian: myspell-lv
- languages.norwegian: myspell-nb
- languages.persian: myspell-fa
- languages.polish: aspell-pl
- languages.portuguese: myspell-pt-pt myspell-pt-br
- languages.serbian: hunspell-sr
- languages.spanish: myspell-es
- languages.swedish: aspell-sv
- languages.tamil: aspell-ta
- languages.russian: myspell-ru
- languages.ukrainian: aspell-uk
- languages.vietnamese: hunspell-vi
Development
To contribute, ensure to install the dependencies:
$ pip install -r requirements.txt
Install necessary NLTK data:
python -m nltk.downloader omw sentiwordnet stopwords wordnet
Running tests
Make sure you install test dependencies:
$ pip install -r test-requirements.txt
Then run:
$ pytest . -vv
Reporting bugs
To report a bug, please use Phabricator
Authors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for revscoring-2.11.13-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a27ede868393892a491bd9e589ba4866f9a2c459b3d3dd19fbd68c898efe8c5f |
|
MD5 | 8046ed079a74989ca028b406928a4263 |
|
BLAKE2b-256 | b54a5852c461a723e75d5f79efcd1344194867bbc40da54aa15aba6523c76379 |