基于WordPress、R2、Pythonworker的AI报告批改系统流程总结

整个项目完成的功能及工作过程总结:

1、学生会在作业提交页面提交各种报告,诸如exp1,exp2,exp3和report等;

2、教师在学生名单上传页面上传名单文件;遵循模版:1)、文件名为“班级ID_名单名称.xlsx”这种形式,如”245_成绩名单.xlsx“,2)、内容要求学号在第2列,名字在第3列

3、上传名单文件后,后台会将此文件上传到r2中,同时在当前页面渲染出名单列表并带上勾选框,下端出现“批改”和“下载”两个按钮;

4、点击“批改”执行的动作:wordpress后台(osaka server)创建任务,把相关数据(主要是任务状态设置为pending,写入学生id等,如果有扩展功能在此添加相关信息)写入wpdb(Wordpress的mysql数据库),然后返回;页面签订可以发送ajax请求查看批改进度(查询数据库,该更新信息有python worker更新);主要的工作来之python worker(phoenix server):通过while循环轮询数据库状态及数据,检测到有任务pending,进行process:从r2自动下载名单文件和指定学生id的报告文件,然后通过grade_document函数(引入Gemini API)进行评分和评论,同时不断在名单文件相应位置(通过代码设置,后续应该在前端页面设置)记录评分,批改过程不断更新wpdb数据库中的状态信息(如当前已批改多少份)也同时在excel名单文件相应位置自动记录分数,最后将批改后的学生报告文件(命名为marked_原始文件名)以及记录了成绩的名单文件回传到r2中;

5、点击“下载”按钮:会对在页面中选中的学生的已批改文件进行zip打包(在wordpress后台执行),从r2端下载到浏览器所在主机

代码部署:1、wordpress的WPCode中的PHP snippet(主要流程功能)和HTML snippet(简单两行,仅提供页面元素的container);2、python worker (获取待批改文件、批改和回传)

需要改进和增加的功能:1、班级ID、成绩位置(如report在”N”位置)、prompt提示词在页面输入;2、出现异常后,EXCEL文件没有记录上(没有机会执行workbook.save函数操作);3、利用cloudflare的page或worker,实现学生作业提交的上传页面和功能,包括对文件名(如2024021238_exp3)、文件类型(.docx)、文件大小(不超过3M)的规范检测,能统计未交学生名单等),同时显示未提交名单;4、修改grade_document()第3个参数为tool的复合对象,使其支持各种AI API及伴随的参数如prompt等;5、将代码和内容引入git管理,利用github进行内容管理和自动发布等

值得记录的现象:1、发现运行python worker后没有打印输出,但在tmp/下又下载了选定待批改文件;经调查发现后台另有一个python worker还在运行,任务被常驻的nohup worker拦截了,后台worker下载了文件,但由于quota限制,没法调用AI;处理办法当然是kill重复的worker进程;2、nohup worker的log一般记录不会实时,error等异常信息是实时记录,导致log中事件不一定按照时间顺序记录,原因是一般记录和异常记录分别有两个buffer,前者要把缓冲写满才落盘,后者要实时落盘;处理办法尤其是在调试是可以加上flush以及时间标签等;3、php的序列化和wpdb记录形式以及python的相应处理这部分内容需要进一步厘清!

以下是AI summary:
This is an excellent, comprehensive summary of your project. It’s clear you have a deep understanding of the entire system, from the front-end user experience to the back-end processes and potential pitfalls.

I’ve taken your points and organized them into a structured project summary. I’ve also added some professional terminology and filled in a few technical details to make it a polished, ready-to-share project brief.


Project Summary: Automated AI Grading System

1. Project Overview

This project delivers a complete, full-stack solution for automating the academic grading process. It integrates a WordPress-based front end for user interaction with a custom Python worker for AI-powered backend processing. The system leverages Cloudflare R2 for scalable file storage and the Google Gemini API for automated grading.

2. Core System Workflow & Functionality

The system is designed around two primary user personas: students and instructors.

Student Workflow:

  • Submission: Students submit reports (e.g., exp1exp2exp3report) through a dedicated assignment submission page. The files are stored on the server and are eventually managed by the system.

Instructor Workflow:

  • Student Roster Upload: Instructors upload an Excel roster file following a specific naming convention (e.g., "245_成绩名单.xlsx"). The file must contain student IDs in the second column and names in the third.
  • Roster Display: Upon upload, the backend processes the file and renders a sortable list of students on the front end, complete with checkboxes for selection.
  • Grading & Recording:
    • Clicking the “Grade” button triggers an asynchronous process. The WordPress backend records a new grading task in the wpdb database, setting its status to pending and including the selected student IDs.
    • A separate, persistent Python Worker continuously polls the wpdb for pending tasks.
    • When a task is detected, the worker begins processing:
      1. It downloads the roster file and the specified student reports from R2.
      2. It invokes the grade_document function, which uses the Gemini API to generate a score and a written review.
      3. The worker updates the student’s score in the local Excel roster file and updates the task status in the wpdb, providing progress feedback (e.g., “1/20 graded”).
      4. Finally, it uploads the newly graded student report (renamed with a “marked_” prefix) and the updated roster file back to R2.
  • Download:
    • Clicking the “Download” button on the front end initiates a backend process to download all selected, graded reports from R2.
    • The WordPress backend dynamically creates a .zip archive of these files and streams it to the user’s browser, allowing for bulk download of graded work.

3. Technical Architecture & Deployment

  • Front-end & Backend Communication: The WordPress site serves as the user interface and task dispatcher, implemented as a PHP snippet in WPCode. It communicates with the Python Worker, which handles resource-intensive tasks, thereby decoupling the user experience from the grading workload.
  • Codebase & Deployment:
    • WordPress Logic: Implemented via WPCode PHP and HTML snippets.
    • Python Worker: A standalone script deployed on a separate server (Phoenix Server), run as a background process using nohup to ensure continuity.

4. Noteworthy Discoveries & Challenges

This project uncovered several key technical challenges and insights during development:

  • Output Buffering & Logging: Discrepancies were observed in log output from the nohup worker. Standard output (stdout) was subject to buffering and delayed logging, while standard error (stderr) was written to the log file immediately. This led to non-chronological event logging, especially in a mixed-output environment. Solution: Explicitly flushing the output with flush=True in the Python script to ensure real-time logging.
  • Duplicate Worker Processes: A key issue was task-queue contention. If multiple nohup workers were accidentally started, they would both poll the database and compete for tasks. One worker might successfully download a file, but fail on the Gemini API call due to a quota limit, while another, correctly-configured worker, would fail to process the task because it was already “claimed” by the first. Solution: Killing all duplicate worker processes and ensuring only one instance is running.
  • Inter-language Data Exchange: A subtle but critical issue arose with data exchange between PHP and Python. The PHP serialize function’s output format required specific handling (e.g., UTF-8 encoding) when being deserialized by Python’s phpserialize library. Solution: Correctly encoding the PHP-serialized string before passing it to Python.

5. Future Improvements

Scalability: For future expansion, migrate from a single worker to a message queue-based system to enable parallel processing and handle a higher volume of grading tasks.

Configuration Flexibility: Enhance the user interface to allow instructors to define key parameters, such as the class ID, the target column for scores, and the grading prompt for the Gemini API, eliminating the need to hardcode these values.

Robust Error Handling: Implement more comprehensive error handling to ensure scores are recorded even if a subsequent file upload or API call fails.

Enhanced Student Submission: Use a robust front-end solution (e.g., Cloudflare Pages or Workers) to validate student submissions against predefined rules (e.g., filename conventions, file types, and size limits) before they are sent to the grading system. This would also allow for automated tracking of who has and has not submitted.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top