Dev.to Machine Learning2h ago|Research & PapersProducts & Services

Building an Open Bilingual Q&A Dataset for Swedish Construction Law

The author built an open, bilingual, legally-grounded Q&A dataset for the Swedish construction industry, covering topics like permits, taxes, trades, legal issues, and regulations. The dataset contains 503 question-answer pairs in both Swedish and English, with each answer citing relevant Swedish statutes or authority guidance.

đź’ˇ

Why it matters

This dataset provides a valuable resource for training AI models on Swedish construction law and regulations, which can help automate legal research and question-answering in this domain.

Key Points

  • 1Developed a 503-entry Q&A dataset covering Swedish construction law and regulations
  • 2Dataset is bilingual (Swedish and English) and released under CC BY 4.0 license
  • 3Answers are 30-150 words long and cite specific Swedish legal sources
  • 4Designed dataset for use in training language models on Swedish legal/construction domain
  • 5Released in multiple formats (JSON, JSONL, CSV) to ease integration

Details

The author identified a lack of open, legally-grounded training data for Swedish construction-related topics, which are often fragmented across various government websites and PDFs. To address this, they built a dataset of 503 question-answer pairs covering 39 categories, including permits, taxes, trades, legal issues, regulations, and cost/dispute resolution. Each answer is 30-150 words long and cites the relevant Swedish statute or authority guidance, such as PBL, BBR, or Skatteverket. The dataset is bilingual, with both Swedish and English versions provided, and is released under a CC BY 4.0 license. The author shares design choices, such as preserving Swedish legal terminology in the English set and embedding citations directly in the answer text. The dataset is available via Hugging Face and as a pip-installable Python package, with support for filtering by category and iterating over the data for language model fine-tuning.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies