Create the training dataset very easily by using a smart chrome extension. This helps in annotating HTML contents of a web page using chrome extension & a flask web application app.
Annotator consists of 2 components.
- A chrome extension: This helps in the annotation of HTML tags from a given webpage
- A Flask app: This helps in storing annotated HTML tag with the help of SQLite.
Python 3.6 and above
Running Flask app
- Clone the Github repo:
git clone https://github.com/sachinkalsi/html_tag_annotator.git pip3 install -r flask_app/requirements.txtpython3 flask_app/app.pyto start the server.- Flask server should be running on the port
5000. Checkhttp://localhost:5000/to verify.
Installing Chrome Extension
- Goto
chrome://extensions/in the URL - Click on
Load unpackedbutton & choose thechrome_extensionfolder
- Make sure, flask server is running on the
5000port - Create DB file if not created already (
python3 utils/create_db_file.py) - Go to URL in chrome for which you need annotation
- Press capital
Sto start annotation - Once started, mouse hovers through the web page & click on the tag which needs annotation. (in the following demo, it is the publication date)
- Once selected, click on the
Savebutton - Press capital
Sto stop annotation. - Look into
how_to_use.ipynbnotebook to know about the reading of the stored annotated data
Watch the following YouTube Playlist videos to know more about the usage and the installation:
Playlist link: https://www.youtube.com/playlist?list=PLfSv7CK7EjD2XmStXvZthQjGn1DAhfOaK
Installation link: https://youtu.be/MtQ1glIuzZ8