blog

How to Gloss Arabic Sentences Using Python

<h2>How to Gloss Arabic Sentences Using Python</h2>

<h3>Step 1: Create a CSV File</h3>
<p>
First, create a CSV file with four columns:
</p>
<ul>
<li><strong>Column 1</strong>: Label it <code>"arabic_word"</code>. This column should contain the Arabic words you want to gloss.</li>
<li><strong>Column 2</strong>: Label it <code>"english_word"</code>. This column should contain the English equivalents of the Arabic words, which will be printed in the first line of the output.</li>
<li><strong>Column 3</strong>: Label it <code>"glosses"</code>. This column should contain the glosses of the Arabic words (linguistic explanations like "child-PL" for plural, etc.).</li>
<li><strong>Column 4</strong>: Label it <code>"translation"</code>. This column should contain the full translation of each word in English.</li>
</ul>
<p>You can use Excel or any spreadsheet tool to create the CSV file. Make sure to save it in <strong>CSV format</strong> (.csv extension) after filling out all the data.</p>

<h3>Step 2: Copy the Path of Your CSV File</h3>
<p>Once the CSV file is created, copy the <strong>file path</strong> where the CSV file is saved on your computer. You will need to replace <code>'PATH TO YOUR FILE.csv'</code> in the code with this path.</p>

<h3>Step 3: Run the Python Code</h3>
<p>Use the Python code below to gloss the Arabic sentences and save the output in a Word document.</p>

<pre>
<code># Step 1: Import pandas for handling CSV
import pandas as pd

# Define the path to your CSV file
file_path = 'PATH TO YOUR FILE.csv'

# Load the CSV file into a pandas DataFrame
df = pd.read_csv(file_path)

# Display a confirmation message
print("All is done great with the CSV file.")

#------------ The Glossing Code ------------
from docx import Document

# Function to save multiple sentences to a Word document with aligned glosses
def save_sentences_to_word(sentences_data):
# Create a new Word document
doc = Document()

# Loop through each sentence data (English words, glosses, translation)
for sentence_data in sentences_data:
english_words, glosses_list, translation = sentence_data

# Create a paragraph for the English words
english_line = " ".join([word.ljust(15) for word in english_words])
doc.add_paragraph(english_line) # Add English words with fixed-width formatting

# Create a paragraph for the glosses aligned below the English words
gloss_line = " ".join([gloss.ljust(15) for gloss in glosses_list])
doc.add_paragraph(gloss_line) # Add glosses with fixed-width formatting

# Add the translation on a new line
doc.add_paragraph(f'"{translation}"')

# Save the document
output_path = 'PATH TO YOUR FOLDER/sentences_glossing_output.docx'
doc.save(output_path)
print(f"Glossing saved to {output_path}")

# Main logic for processing multiple sentences
def process_multiple_sentences_glossing(df):
# Ask the user how many sentences they want to gloss
num_sentences = int(input("How many sentences would you like to gloss? "))

# Validate the number of sentences
if num_sentences < 1:
print("Please enter at least one sentence.")
return

sentences_data = [] # To store data for all sentences

# Loop through each sentence
for s in range(num_sentences):
print(f"\nProcessing sentence {s + 1} of {num_sentences}:")

# Ask the user for the full sentence in Arabic (space-separated words)
arabic_sentence = input("Please enter the full Arabic sentence: ")

# Split the sentence into individual words
arabic_words = arabic_sentence.split()

# Initialize lists to store the English words, glosses, and translations for each sentence
english_words = []
glosses_list = []
translations_list = []

# Loop to collect each Arabic word and retrieve its gloss
for arabic_word in arabic_words:
# Search for the row where 'arabic_word' matches the input
result = df[df['arabic_word'] == arabic_word]

# Check if the word is found
if not result.empty:
english_words.append(result.iloc[0]['english_word'])
glosses_list.append(result.iloc[0]['glosses'])
translations_list.append(result.iloc[0]['translation'])
else:
print(f"Word '{arabic_word}' not found in the database.")
return # Exit if any word is not found

# Join translations for a full sentence
full_translation = " ".join(translations_list)

# Add the current sentence's data to the list
sentences_data.append((english_words, glosses_list, full_translation))

# Print and save all sentences
save_sentences_to_word(sentences_data)

# Run the multiple sentence glossing function
process_multiple_sentences_glossing(df)
</code>
</pre>

<h3>Summary of Steps:</h3>
<ol>
<li><strong>Create a CSV file</strong>:
<ul>
<li>Column 1: Arabic words (<code>"arabic_word"</code>)</li>
<li>Column 2: English equivalents (<code>"english_word"</code>)</li>
<li>Column 3: Glosses (<code>"glosses"</code>)</li>
<li>Column 4: Translations (<code>"translation"</code>)</li>
</ul>
</li>
<li><strong>Copy the file path</strong> of the CSV file and replace <code>'PATH TO YOUR FILE.csv'</code> in the code with that path.</li>
<li><strong>Run the Python code</strong>:
<ul>
<li>Enter how many sentences you want to gloss.</li>
<li>Enter each sentence in Arabic when prompted.</li>
</ul>
</li>
<li><strong>Output:</strong> The glossed sentences will be saved to a Word document at the location you specify in the <code>output_path</code>.</li>
</ol>

د. محمد صويلح عيضه الزايدي

How to Gloss Arabic Sentences Using Python