Skip to main content
User Image

د. محمد صويلح عيضه الزايدي

Associate Professor

عضو هيئة تدريس بقسم اللغة الإنجليزية

اللغات وعلومها
أب ٢
blog

How to Gloss Arabic Sentences Using Python

<h2>How to Gloss Arabic Sentences Using Python</h2>

<h3>Step 1: Create a CSV File</h3>
<p>
    First, create a CSV file with four columns:
</p>
<ul>
    <li><strong>Column 1</strong>: Label it <code>"arabic_word"</code>. This column should contain the Arabic words you want to gloss.</li>
    <li><strong>Column 2</strong>: Label it <code>"english_word"</code>. This column should contain the English equivalents of the Arabic words, which will be printed in the first line of the output.</li>
    <li><strong>Column 3</strong>: Label it <code>"glosses"</code>. This column should contain the glosses of the Arabic words (linguistic explanations like "child-PL" for plural, etc.).</li>
    <li><strong>Column 4</strong>: Label it <code>"translation"</code>. This column should contain the full translation of each word in English.</li>
</ul>
<p>You can use Excel or any spreadsheet tool to create the CSV file. Make sure to save it in <strong>CSV format</strong> (.csv extension) after filling out all the data.</p>

<h3>Step 2: Copy the Path of Your CSV File</h3>
<p>Once the CSV file is created, copy the <strong>file path</strong> where the CSV file is saved on your computer. You will need to replace <code>'PATH TO YOUR FILE.csv'</code> in the code with this path.</p>

<h3>Step 3: Run the Python Code</h3>
<p>Use the Python code below to gloss the Arabic sentences and save the output in a Word document.</p>

<pre>
<code># Step 1: Import pandas for handling CSV
import pandas as pd

# Define the path to your CSV file
file_path = 'PATH TO YOUR FILE.csv'

# Load the CSV file into a pandas DataFrame
df = pd.read_csv(file_path)

# Display a confirmation message
print("All is done great with the CSV file.")

#------------ The Glossing Code ------------
from docx import Document

# Function to save multiple sentences to a Word document with aligned glosses
def save_sentences_to_word(sentences_data):
    # Create a new Word document
    doc = Document()

    # Loop through each sentence data (English words, glosses, translation)
    for sentence_data in sentences_data:
        english_words, glosses_list, translation = sentence_data

        # Create a paragraph for the English words
        english_line = " ".join([word.ljust(15) for word in english_words])
        doc.add_paragraph(english_line)  # Add English words with fixed-width formatting

        # Create a paragraph for the glosses aligned below the English words
        gloss_line = " ".join([gloss.ljust(15) for gloss in glosses_list])
        doc.add_paragraph(gloss_line)  # Add glosses with fixed-width formatting

        # Add the translation on a new line
        doc.add_paragraph(f'"{translation}"')

    # Save the document
    output_path = 'PATH TO YOUR FOLDER/sentences_glossing_output.docx'
    doc.save(output_path)
    print(f"Glossing saved to {output_path}")

# Main logic for processing multiple sentences
def process_multiple_sentences_glossing(df):
    # Ask the user how many sentences they want to gloss
    num_sentences = int(input("How many sentences would you like to gloss? "))

    # Validate the number of sentences
    if num_sentences &lt; 1:
        print("Please enter at least one sentence.")
        return

    sentences_data = []  # To store data for all sentences

    # Loop through each sentence
    for s in range(num_sentences):
        print(f"\nProcessing sentence {s + 1} of {num_sentences}:")

        # Ask the user for the full sentence in Arabic (space-separated words)
        arabic_sentence = input("Please enter the full Arabic sentence: ")

        # Split the sentence into individual words
        arabic_words = arabic_sentence.split()

        # Initialize lists to store the English words, glosses, and translations for each sentence
        english_words = []
        glosses_list = []
        translations_list = []

        # Loop to collect each Arabic word and retrieve its gloss
        for arabic_word in arabic_words:
            # Search for the row where 'arabic_word' matches the input
            result = df[df['arabic_word'] == arabic_word]

            # Check if the word is found
            if not result.empty:
                english_words.append(result.iloc[0]['english_word'])
                glosses_list.append(result.iloc[0]['glosses'])
                translations_list.append(result.iloc[0]['translation'])
            else:
                print(f"Word '{arabic_word}' not found in the database.")
                return  # Exit if any word is not found

        # Join translations for a full sentence
        full_translation = " ".join(translations_list)

        # Add the current sentence's data to the list
        sentences_data.append((english_words, glosses_list, full_translation))

    # Print and save all sentences
    save_sentences_to_word(sentences_data)

# Run the multiple sentence glossing function
process_multiple_sentences_glossing(df)
</code>
</pre>

<h3>Summary of Steps:</h3>
<ol>
    <li><strong>Create a CSV file</strong>:
        <ul>
            <li>Column 1: Arabic words (<code>"arabic_word"</code>)</li>
            <li>Column 2: English equivalents (<code>"english_word"</code>)</li>
            <li>Column 3: Glosses (<code>"glosses"</code>)</li>
            <li>Column 4: Translations (<code>"translation"</code>)</li>
        </ul>
    </li>
    <li><strong>Copy the file path</strong> of the CSV file and replace <code>'PATH TO YOUR FILE.csv'</code> in the code with that path.</li>
    <li><strong>Run the Python code</strong>:
        <ul>
            <li>Enter how many sentences you want to gloss.</li>
            <li>Enter each sentence in Arabic when prompted.</li>
        </ul>
    </li>
    <li><strong>Output:</strong> The glossed sentences will be saved to a Word document at the location you specify in the <code>output_path</code>.</li>
</ol>